Presence of cancer-associated mutations in exhaled breath condensates of healthy individuals by next generation sequencing

Exhaled breath condensate (EBC) is a non-invasive source that can be used for studying different genetic alterations occurring in lung tissue. However, the low yield of DNA available from EBC has hampered the more detailed mutation analysis by conventional methods. We applied the more sensitive amplicon-based next generation sequencing (NGS) to identify cancer related mutations in DNA isolated from EBC. In order to apply any method for the purpose of mutation screening in cancer patients, it is important to clarify the incidence of these mutations in healthy individuals. Therefore, we studied mutations in hotspot regions of 22 cancer genes of 20 healthy, mainly non-smoker individuals, using AmpliSeq colon and lung cancer panel and sequenced on Ion PGM. In 15 individuals, we detected 35 missense mutations in TP53, KRAS, NRAS, SMAD4, MET, CTNNB1, PTEN, BRAF, DDR2, EGFR, PIK3CA, NOTCH1, FBXW7, FGFR3, and ERBB2: these have been earlier reported in different tumor tissues. Additionally, 106 novel mutations not reported previously were also detected. One healthy non-smoker subject had a KRAS G12D mutation in EBC DNA. Our results demonstrate that DNA from EBC of healthy subjects can reveal mutations that could represent very early neoplastic changes or alternatively a normal process of apoptosis eliminating damaged cells with mutations or altered genetic material. Further assessment is needed to determine if NGS analysis of EBC could be a screening method for high risk individuals such as smokers, where it could be applied in the early diagnosis of lung cancer and monitoring treatment efficacy.


INTRODUCTION
In spite of recent improvements in the treatment of many cancers, the prognosis of lung cancer has remained unchanged for 20 years and lung cancer is still the leading global cause of cancer related deaths [1].This is mainly due to the lack of early screening and suitable diagnostic markers, resulting in diagnosis of the disease only at a late stage.When a tumor is at an advanced stage, molecular pathogenesis has progressed to a level where there are numerous genetic and epigenetic changes, allowing cancer cells to be naturally resistant or to rapidly develop resistance to treatments, even to the new targeted tyrosine kinase inhibitors such as that for EGFR [2].
Developments in microarray based techniques, next generation sequencing (NGS) and bioinformatics tools have made it possible to identify genome-wide gene alterations from an extremely small amount (1-10 ng) of DNA or RNA [3][4][5].In turn, this has made it possible to use exhaled breath condensate (EBC) as a source of testing material since this is a patient-friendly, noninvasive approach.We are one of the first groups to successfully use the NGS approach for EBC analysis, as illustrated in our recent review article [6].Genetic changes in EBC DNA are thought to reflect alterations present in lung tissue and the sampling process is convenient for the patient and the specimens can be collected repeatedly throughout the follow-up [6].
Numerous recurrent somatic mutations have been well characterized in lung cancer and their predictive value and prognostic significance are widely acknowledged [7][8][9].However, very little, if anything, is known about the presence of these mutations in cells or cell-free DNA of non-malignant, seemingly healthy individuals.As EBC may open a promising route for early diagnosis and follow up of lung cancer, it is extremely important to determine whether mutations thought to be tumor-associated may also be present in healthy subjects.This kind of basic information is needed before any firm conclusion can be drawn on the significance of mutations in EBC of lung cancer patients.In this study, we describe the presence of hot spot mutations in healthy subjects, even in non-smoking individuals.

RESULTS
The sample volume collected after 15 minutes of breathing ranged in size from 1.5 ml to 4.0 ml (mean 3.1 ml).The average DNA yield obtained from the EBC specimens (1.5-4.0 ml) was 75.5ng.NGS was successful in all but one subject (success rate 95.5%).The average mean depth was 901 while the average percentage of reads on target was 83.7%.All sequencing data are shown in Supplementary Table 1.
Two subjects (9.5%) did not display any evidence of mutations in their specimens, three others (14.4%) showed only novel but no hotspot mutations, while the remaining fifteen (76.1%) exhibited various types of genetic mutations.

Hotspot mutations
Hotspot mutations refer to those somatic mutations that have been reported earlier and are recorded in COSMIC database.The number of hotspot mutations in the different genes is shown in Figure 1.In all, 35 hotspot mutations were detected in the EBC from our 20 healthy individuals.TP53 was the gene most frequently mutated with 11 mutations detected in eight subjects (40%) (five females and three males).Three individuals had two mutations in the TP53 at different positions, while the remaining five subjects showed only one mutation.TP53 mutations were the most frequent ones concurrently occurring with mutations in other genes such as PTEN, MET, EGFR, SMAD4, CTNNB1, BRAF, and KRAS.KRAS mutations were the second most frequently encountered (seen in three subjects: one male and two females).Importantly, one subject carried codon 12 mutation (G12V).SMAD4 mutations were found in three individuals (14.4%), and NRAS mutations were detected in two individuals (9.5%) with one subject harboring two different NRAS mutations, while two other subjects displayed CTNNB1 gene mutations.

Novel variants
A total of 106 novel mutations were found that led to an amino acid change (including missense, nonsense and indels) and which had not been reported previously in either the COSMIC or dbSNP databases.The most frequent novel mutations were in DDR2, SMAD4, MET followed by ERBB4, ALK, EGFR, FGFR3 then PIK3CA, PTEN, AKT1, ERBB2, KRAS, STK11, NRAS, FGFR1, CTNNB1, FBXW7, BRAF, FGFR2, and MAP2K1 while no novel mutations reported for TP53 as shown in Figure 2. All COSMIC hotspots and non-synonymous novel mutations seen in each sample are shown in Table 1.

DISCUSSION
As far as we are aware, this is the first study to use NGS to analyze mutations in EBC of healthy individuals.A total of 35 hotspot mutations and 106 novel mutations were detected.The genes with the most frequent hotspot mutations in order from top to bottom were: TP53, KRAS, NRAS, SMAD4, MET, CTNNB1, PTEN, BRAF, DDR2, EGFR, PIK3CA, NOTCH1, FBXW7, FGFR3, and ERBB2.
In the present study, 11 different TP53 mutations were seen, of which only three (Q104*, Y163*, and M169I) have been reported previously in lung tissues and upper aerodigestive tract according to COSMIC database.In addition, another TP53 mutation (P72R) has been reported in pleural tissue from a mesothelioma patient.The remaining eight TP53 somatic mutations have been reported in the COSMIC database in other types of malignancies such as colon cancer, breast cancer and hematological malignancies.There is one similar finding of TP53 mutations in the cell-free circulating DNA in 11% out of 205 non-cancerous control subjects, and in 35.7% early-stage and 54.1% late-stage small cell lung carcinoma (SCLC) patients [10].A prospective study demonstrated the presence of both TP53 (3.2%) and KRAS (1%) mutations in the plasma of healthy individuals.The authors reported that the patients remained clinically cancer-free after five years of follow up [11].Another approach, exploiting an ultra-deep sequencing technique, was able to detect a low frequency of TP53 mutations in peritoneal fluid of all non-cancerous controls [12].
Four KRAS hotspot mutations were seen in three individuals with one subject harboring clinically important codon 12 mutation.The previous study by Kordiak et al, using mutant-enriched PCR technique on EBC specimens [13], detected codon 12 KRAS mutations in 26 normal individuals (out of 52 control subjects) and in 11 patients with benign pulmonary lesions.Moreover, they detected mutated KRAS in the normal pulmonary tissue parenchyma excised from patients with lung cancer.The authors considered that this was attributable to the release of DNA from pulmonary cells through apoptosis, necrosis or spontaneous active release processes into airway   [14,15].By using Ion Torrent NGS technology, KRAS mutations have been reported in plasma of 3.7% of healthy controls and 4.3% of patients with chronic pancreatitis [16].These investigators noted that the mutant allele fraction was significantly lower (0.2% to 1%) when compared to the mutant KRAS allele fraction in patients with pancreatic cancer (1% to 50%).The authors speculated that somatic mutations occur at negligible frequencies in the normal cell population.Similarly, another study reported the finding of KRAS mutations in tissue specimens from patients with colitis, hyperplastic polyps, and normal colonic mucosa that did not have any kind of neoplasia [17].
In our study, one specimen exhibited the clinically relevant codon 12 KRAS mutation (G12V) with a mutant allele fraction of 6.8% (Figure 3).This codon mutation was found to be the most frequent mutation in tumor tissue in our previous study of Finnish NSCLC and has also been often described in tissues of lung cancer in other studies [18].This is in concordance with a recently published study that reported the detection of KRAS G12V mutation in the plasma of three out of six controls, at low concentration (1.25 to 1.87 copies/mL) by using droplet digital PCR [19].
NRAS mutations were detected in two of our subjects with one subject harboring 2 different mutations.The other subject had an NRAS mutation in association with other hotspot alterations; NOTCH1, PTEN and SMAD4.
The EGFR mutation (D761N) was seen in one of our EBC samples from a female never-smoker, while BRAF (K601E) was present in a normal ex-smoker subject.Our small sample size, and only one current smoker does not allow to analyze mutations in relation to the smoking status.Two mutations, p.D32Y and p.T41A, in the beta-catenin gene (CTNNB1) found in our healthy subjects have been reported in tumors of large intestine, hepatic and endometrial cancers, according to the COSMIC database.Additionally, a NOTCH1 mutation (p.V1578delV) was found in one EBC sample.This mutation is frequently seen in cancers of lymphoid origin but it has been reported also in non-malignant periprosthetic soft tissue masses (pseudotumors) from patients with metal on metal hip replacement [20].Two MET mutations in two individuals occurred along with TP53 mutations, and SMAD4 mutations were seen in three individuals.From one of our subject (EBC 2), we sampled EBC twice, with a gap of one month to compare the sequencing results.Although the sequencing depth from one of the replicates was not very good, most of the germline SNP (except those where the amplicon did not amplify in one sample with inadequate sequencing libraries) were detected in both of the samples.The somatic mutations were however not common in the repeated sample.

EBC sample
Thus, our results clearly demonstrate the presence of hotspot mutations in EBC from healthy individuals.In interpretation of positivity, we set our threshold for mutant allele frequency to a minimum of 3% based on our previous comparison of EGFR and KRAS mutations as detected by NGS and clinically approved PCR methods from FFPE samples [21].Of the total 35 hotspot mutations detected in our healthy subjects, there were 26 mutations that had a mutant fraction of 5% or more, of which 16 had more than 10%.Importantly, the clinically relevant KRAS codon 12 mutation seen in one subject had a mutant fraction of 6.8% (coverage, 1398).From a methodological point of view, before any firm conclusions can be drawn regarding the clinical significance of these mutations, it will be necessary to conduct a comparison of the mutant allele fraction in larger series of lung cancer patients and normal healthy individuals.A prospective study has reported an association between the presence of the codon 12 KRAS mutation in plasma of apparently healthy individuals and the development of bladder cancer after a follow-up period [11].Therefore, RAS pathway activation may cause early changes that could contribute to tumor development [22].However, the significance of these hotspot mutations in normal subjects needs to be clarified.The highly sensitive NGS technique used in this study could partially explain the detection of these hotspot mutations in healthy individuals.The presence of mutations, despite the relatively younger age of the normal subjects in our study compared to lung cancer patients, could indicate that they may be a part of an apoptotic process occurring in normal lung parenchyma.It might thus reflect the mutagenic load that normal cells are exposed to as a result of environmental factors such as air pollution, asbestos exposure, active and/or passive smoking [23].For tissues to maintain cellular homeostasis, cells with unrepaired DNA damages are eliminated and can be detected by sensitive methods [24].This would agree with earlier reports describing the presence of both TP53 and KRAS mutations in normal subjects who did not develop a malignancy during their follow-up [11].On the other hand, they may reflect molecular changes occurring in lung tissues of healthy subjects that might represent very early markers of an ongoing carcinogenesis process.Indeed, these findings might serve as indicators for disease, but their applications in clinical diagnostic procedures will require more investigations.
To conclude, in the current study, hotspot mutations were detected in EBC of 75% of healthy individuals.This could represent either a normal process of cell death and cellular renewal, or early carcinogenic changes.High throughput NGS technology now makes it possible to detect genetic mutations with high sensitivity and low allele frequencies.These observations will require further investigations to confirm whether it is possible to exploit NGS analysis on EBC DNA as a non-invasive screening method for high-risk individuals such as smokers, for example, in the early diagnosis of lung cancer.Our results highlight the importance of knowing the prevalence of cancer-related mutations, in any tissue under study, in healthy individuals before it can be applied for cancer diagnostics.

EBC specimen collection procedure
EBC samples were collected from twenty adult healthy subjects with a mean age of 34.9 years.From one individual, two different specimens were taken after an interval of one month (EBC 2a and EBC 2b).Detailed information about our subjects is given in Table 1.Smoking history was noted and individuals were classified into three categories: never-smoker, ex-smoker, and current smoker.The subjects were mainly never smokers (n= 15), there was only one current smoker and four exsmokers.
EBC was collected after 15 min of breathing into the EcoScreen instrument (Jaeger/Germany).Breathing frequency and mean breath volume were checked every 5 minutes till the end of the collection procedure.Collected EBC samples were transported on ice immediately to the laboratory.The samples were then transferred to 2ml tubes with the sample volume being measured before storage at -70°C.
The study was approved by the HUS review board (Ethical permission number 253/13/03/01/2015).Written informed consent was obtained from all subjects.

EBC DNA extraction
DNA was extracted from the whole EBC sample (ranging from 1.5 to 4 ml) using the QIAamp circulating nucleic acid kit (Qiagen Cat NO. /ID 55114) according to the manufacturer's instructions and using a vacuum pump.Extracted DNA was eluted in 35µl of elution buffer, and then DNA was quantified by a Qubit® 2.0 Fluorometer (Life Technologies) using the Qubit® dsDNA HS Assay kit.The extracted DNA was stored at -20°C.
The libraries were clonally amplified on Ion Sphere TM particles after dilution of the libraries to 100 pM.Template preparation was performed with the Ion OneTouch™ 2 System (Thermo Fisher Scientific), an automated system for emulsion PCR, recovery of Ion Sphere™ Particles, and enrichment of template-positive particles.
The Ion Sphere™ particles coated with template were applied to the semiconductor chip.A short centrifugation step was conducted to allow the spherical particles to be deposited into the chip wells.Finally, sequencing was carried out using Ion 316™ chips on the Ion Personal Genome Machine System (PGM™, Thermo Fisher Scientific) using the Ion PGM™ Sequencing Hi-Q kit v2.

Data analysis
The Torrent Suite Software v.4.0.2 (Life Technologies) was used to assess run performance and data analysis.Integrative Genomics Viewer (IGV v 2.2, Broad Institute) was used for visual inspection of the aligned reads.Sequencing data were further filtered and analyzed through quality checking.We selected all SNVs in the studied genes resulting in a non-synonymous amino acid change, or a premature stop codon, and all short indels resulting in either a frameshift or insertion/ deletion of amino acids.All SNVs were analyzed for previously reported hotspot mutations (somatic mutations reported in COSMIC database) and novel variations, i.e. new mutations detected by NGS but not reported in either COSMIC or dbSNP databases.

Figure 1 :
Figure 1: Number of hotspot mutations in different genes detected in exhaled breath condensates of 20 healthy subjects.

Figure 2 :
Figure 2: Number of all novel non-synonymous mutations detected in exhaled breath condensates of 20 healthy subjects.

Figure 3 :
Figure 3: KRAS: G12V mutation detected in the EBC sample from a healthy nonsmoking subject.