Detection of cancer through exhaled breath: a systematic review.

BACKGROUND
Timely diagnosis of cancer represents a challenging task; in particular, there is a need for reliable non-invasive screening tools that could achieve high levels of adherence at virtually no risk in population-based screening. In this review, we summarize the current evidence of exhaled breath analysis for cancer detection using standard analysis techniques and electronic nose.


METHODS
Relevant studies were identified searching Pubmed and Web of Science databases until April 30, 2015. Information on breath test performance, such as sensitivity and specificity, was extracted together with volatile compounds that were used to discriminate cancer patients from controls. Performance of different breath analysis techniques is provided for various cancers together with information on methodological issues, such as breath sampling protocol and validation of the results.


RESULTS
Overall, 73 studies were included, where two-thirds of the studies were conducted on lung cancer. Good discrimination usually required a combination of multiple biomarkers, and area under the receiver operating characteristic curve or accuracy reached levels of 0.9 or higher in multiple studies. In 25% of the reported studies, classification models were built and validated on the same datasets. Huge variability was seen in different aspects among the studies.


CONCLUSIONS
Analyses of exhaled breath yielded promising results, although standardization of breath collection, sample storage and data handling remain critical issues. In order to foster breath analysis implementation into practice, larger studies should be implemented in true screening settings, paying particular attention to standardization in breath collection, consideration of covariates, and validation in independent population samples.


INTRODUCTION
Cancer is a leading cause of death worldwide [1]. In 2012, cancer accounted for 8.2 million deaths, and number of deaths is projected to increase to over 13 million in 2030 [2]. Early detection is essential to improve successful treatment and reduce cancer mortality and cancer screening in the asymptomatic general population might be a particularly promising approach to achieve this goal. However, only few cancer screening programs are widely used. For most deadly cancers, such as pancreatic or gastric cancer, no reliable population-based screening exists, and for other common malignancies, like breast or colorectal cancer, there is large potential for improving currently used screening methods. In particular, there is a need for reliable non-invasive screening tools that could achieve high levels of adherence at virtually no risk in population-based screening.
Breath tests might be a particularly promising approach for non-invasive cancer screening. The analysis of volatile organic compounds (VOCs) in exhaled breath can provide information on metabolic processes in the body which are modified by underlying diseases [3][4][5], though a detailed picture of VOCs origin is still not complete.
In this systematic review, we summarize the current evidence of exhaled breath analysis for cancer detection. Performance of different breath analysis techniques is provided for various cancers together with information on methodological issues, such as breath sampling protocol, validation of the results, and VOCs proposed as cancerrelated compounds. Figure 1 shows the process of study selection. In total, 1277 papers were identified of which 262 were databases using following keywords: (cancer OR carcinoma OR adenocarcinoma OR tumor OR malignancy OR malignant disease) AND ((volatile AND (compound OR compounds OR marker OR markers OR biomarker OR biomarkers)) OR VOC OR VOCs OR breathprint OR breath-print OR breath print) AND (breath OR exhaled OR air). www.impactjournals.com/oncotarget duplicates, 24 non-English papers and 1 book chapter. The remaining titles and abstracts were checked and studies not relevant to the topic were excluded. For 17 studies, no full paper could be accessed. Also, 15 full-text papers were excluded as some of the required information was missing (see Additional Figure S1).

Literature search
In total, 73 studies met our inclusion criteria and were described in this review. The majority of them focused on malignancies in organs of the respiratory system (lung, n = 46; head and neck, n = 4) as these cancers have the biggest potential to be diagnosed by exhaled breath. Other malignancies also investigated with breath analysis included: breast cancer (n = 11), gastric cancer (n = 5), mesothelioma and colorectal cancer (each n = 3), cancers of the liver, thyroid, prostate and ovaries, and hematological and gynecological cancers (each n = 1).

Design and methods
Study designs and methods of analysis are described in Table 1. To analyze exhaled breath most of the studies used gas chromatography-mass spectrometry (GC-MS, n = 42) and/or electronic nose (e-nose, n = 24). The most commonly used electronic noses were gold nanoparticles sensors-based e-nose [7] from the TECHNION group (8 studies) and commercially available Cyranose 320 [8] (six studies). Breath samples were stored in different containers with Tedlar bag being the most popular.
The vast majority of studies were conducted in a case-control approach, in which clinically diagnosed patients were compared with controls without cancer. Eight studies enrolled patients coming to the hospital with some complaints for further investigations and breath samples were collected before the final diagnosis. Also, few studies investigated differences in exhaled breath composition before and after tumor resection [9][10][11] and VOCs released by cancer cells or tissues [12,13]. Differences of volatiles between Caucasians and Chinese were investigated by Amal et al. [14]. Two more studies were conducted to compare the performance of exhaled breath analysis to the performance of canine detection [15] or DNA hypermethylation in sputum [16]. Despite the differences in studies designs, we focused and extracted information related just to breath analysis part in all of the studies.

Study population
An overview on the studies and their population characteristics is shown in Additional Table S1. Studies were conducted in all parts of the world except South America and Africa. Numbers of people included into analyses varied from 14 ( Further information on critical study design and data collection issues is presented in Additional Table  S2. History of smoking is the main risk factor for lung cancer development; therefore, adjustment for smoking status between cases and controls is crucial. However, 8 studies on lung cancer did not provide information on smoking status at all. The majority of the studies collected alveolar breath, 12 studies focused on collecting maximum amount of exhaled breath (vital capacity) and 7 studies on collecting tidal breath. Also, around 25% of the studies did not perform lung washout or ambient air was not analyzed which might lead to exogenous (inhaled) compounds to be included into classification models. Time between breath collection and analysis was very short (analysis done immediately or within few hours) in most studies but extended up to six months in one study [18]. Although most of the studies included newly diagnosed untreated cancer patients, few studies recruited patients under different treatment regimens, and treatment might have had an influence on exhaled volatiles.   [93] SPME/GC-TOF-MS Tedlar bag Discriminant factor analysis Ulanowska, 2011 [94] SPME/GC-MS Tedlar bag DA Buszewski, 2012 [15] SPME/GC-MS Tedlar bag -Mangler, 2012 [95] TD-GC-MS Tenax test tube -Wang Y, 2012 [13] SPME/GC-MS Tedlar bag Linear DA Amal, 2013 [14] TD-GC-MS ORBO 420 Tenax TA sorption tubes -Altomare, 2013 [96] TD-GC-MS Tedlar bag Probabilistic neural networks Filipiak, 2014 [12] TD   Six studies reported the diagnostic performance of breath test for distinguishing breast cancer patients and healthy controls. The best discriminatory performance was achieved by Phillips et al. in 2006 [23], who reported AUC of 0.9. On the other hand, the same model validated in women with abnormal mammography findings showed specificity as low as 32%. Other studies by the same authors also showed better performance of the classification models when comparing cancer cases to healthy women rather than to women with abnormal mammography findings [24,38].

Performance of classification models
Good diagnostic performance was also reported in most of the studies focusing on the cancer organs other than lung and breast, and AUC or accuracy of 0.9 or higher was reported in studies on head and neck cancer

Performance of individual VOCs
The performance of classification using individual VOCs as cancer biomarkers in exhaled breath was reported in 8 studies and is presented in Table 3. Several volatiles, i.e., 3-hydroxybutan-2-one, showed promising results for different cancer sites. One study validated its results in a different population sample and showed superb performance of hexadecanal (AUC=1.00) [41]. Volatile organic compounds which were used to build a classification model or whose concentrations were significantly different between cancer cases and controls Cn -cases; Cs -controls, N -number of cases/controls; Sens -sensitivity; Spec -specificity; AUC -area under the receiver operating characteristic curve; RSS -random sample split-training set size: testing set size: validation set size. Numbers of cases and controls are total study population size and performance of breath test corresponds to testing (validation) set; LOOCV -leave-one-out cross-validation; VOCs-volatile organic compounds. a NO indicates studies which used same study population for model building and testing; b abnormal X-rays, no cancer; c Chronic obstructive pulmonary disease; d lung diseases; e non-small cell lung cancer; f small cell lung cancer; g abnormal mammography; h hepatoccirosis; i exposed to asbestos; j benign head and neck conditions; k ovarian benign conditions; l healthy+ovarian benign conditions; m Operative link on gastric intestinal metaplasia assessment stage 0-IV; n gastric ulcer.
in at least three independent studies are presented in Additional Table S3. Ethenylbenzene (styrene), heptanal and nonanal were the most commonly described compounds (each in 9 independent studies). Interestingly, these studies were performed on different cancer types. By contrast, 1-propanol was described just by the studies on lung cancer, 3 studies showed significantly different concentrations in exhaled breath and 4 others included this compound in classification models.

DISCUSSION
In this paper, we present a comprehensive up to date overview of studies on diagnostic performance of VOCs in cancer detection. Our review identified 73 studies which used breath analysis for classification of cancer cases and controls or analyzed specific VOCs in exhaled breath of cancer patients and healthy individuals. The majority of the studies focused on lung cancer; however, recent reports addressed other common malignancies including breast, gastric and other types of cancer. Very good diagnostic performance of breath tests was achieved, but one out of four studies lacked appropriate correction for overoptimism. It is worth pointing out, that studies differed significantly with respect to breath analysis techniques and data analysis methods. Based on current evidence, VOCs seem to hold a great potential in cancer diagnostics; nevertheless, the ultimate role of these markers for cancer screening needs to be determined and established in large scale studies conducted in true screening setting.
Breath analysis is a young field of research and majority of the studies were performed in recent years. That hundreds of VOCs are present in human breath is known for decades [42]. In the 1980s, the first studies reported higher levels of some volatiles in the breath of lung cancer patients [28, 43,44] and these studies fostered substantial interest in research of cancer specific biomarkers in breath. The first studies focused on identifying specific volatile organic compounds for diseases of interest using methods such as GC-MS, which is expensive, time-consuming and requires well-trained personnel for performing sample collection and analysis. Furthermore, identification of detected compounds is not straight forward and reference libraries have to be checked and validated using mass and retention time of the known standards. The latter, among other reasons, led researchers to look for different methods to analyze exhaled breath, one such technology is the nanomaterial-based sensor arrays that could be a good solution for solving the problems mentioned above [45][46][47]. As the main difference from standard analytic techniques, electronic nose mimics mammalian olfaction [48] and in that it cannot distinguish specific VOCs but is based on pattern recognition. First, the e-nose needs to be trained to build a database for recognition, and then it can be applied for classifying other unknown samples. The crucial factors of meaningful pattern recognition are the size of the training set and how good these samples represent tested populations. As one way to improve the performance of e-nose, a combination between other techniques and e-nose is possible, i.e. specific VOCs can be identified by GC-MS and used to select sensors most sensitive to target compounds [49].
Independent of analysis techniques, breath sample collection and storage are major challenges in breath research studies. The stability of compounds in different bags have been investigated [50][51][52], which showed that some polar compounds, including water, diffuse rather quickly through Tedlar bag walls, while other compounds are quite stable. Aldehydes were shown to be rather stable in Bio-VOC sampler in the first 10 hours after collection while analysis was done in less than 2 hours [53]. Sample storage time recorded in this review was very short and those five studies which exceeded few months for storage, used thermal sorption tubes which are suitable for longterm storing [54]. Apart from loss of compounds due to diffusion through the bags' walls, some compounds might be released by the bags material and accumulate in the collected air sample [55]. Reusing the same bag might represent another challenge as flushing and heating failed to remove some of the compounds from Tedlar bag [56]. Finally, concentration of VOCs also strongly depends on breath collection method. Alveolar breath has higher levels of exhaled components than the whole breath without separation [57,58] and also the lowest concentrations of contaminants [59]. Standardization of the breath collection process appears crucial for further advances in breath-based biomarker research. Additionally to ambient air analysis or lung washout before breath sampling [60,61], other standardization processes including recommendations for sample storage in thermal desorption tubes or ways to avoid some confounders while recruiting hospital personnel rather than healthy individuals outside the hospital were recently suggested as well [54].
A key issue in the analysis of high dimensional data such as those obtained from breath analysis is rigorous control for overoptimism by internal or external validation. External validation is particularly interesting where performance of classification model can be demonstrated on different populations or different recruitment conditions, as the purpose of marker discovery studies is their potential application in future screening strategies. Replication of the results might not be easily achieved as different methods and analysis techniques are being used by different research groups. Furthermore, different results were achieved even in the same study while applying different computational approaches for data analysis [25,62]. Still, internal validation by performing, for example, random sample split or leave-one-out cross-validation can help to get as close to the real estimate as it can get but does not guarantee good performance on different study populations.
Adjusting for covariates when building a classification model for breath analysis is another challenge as it still remains unclear which covariates should be taken into account. Controversial results were shown for the impact of age, gender and smoking status on VOCs [31, [63][64][65][66] for standard analysis techniques. On the other hand, results of "breathprints" pattern analysis with e-nose showed to be insensitive to various covariates including the ones mentioned above [34, 49], but it remains unclear and requires further research what factors may confound study results using electronic nose. While matching or adjusting for covariates is crucial for evaluating the discriminatory potential of VOCs per se, combined use of VOCs and covariates may provide the most powerful discriminatory algorithms for screening practice.
To date, there is no "universal" tumor marker that can detect any type of cancer; however, development of the VOCs field could potentially provide a tool for unified technological approaches in cancer screening. So far, the set of identified VOC patterns varies considerably among the studies. Even though promising results have been reported for certain single markers in individual studies (i.e. hexadecanal), enhanced accuracy for classification of cancer cases and controls is likely to be achieved by the combination of several compounds. Furthermore, the same compound may not be specific for a certain cancer but it might be characteristic for several types of cancer. For example, formaldehyde (methanal) was suggested to be a potential biomarker for breast [67], prostate and bladder cancers [68]. At the same time, there is emerging evidence for cancer specific markers. A review on potential cancer specific compounds was published recently [5] in which metabolic pathways for volatiles arising from bodily fluids was explained, and furthermore the potential of these compounds to be biomarkers for cancer was discussed. www.impactjournals.com/oncotarget Breath analysis as a cancer detection method and potential VOCs biomarkers for cancer were previously discussed and summarized [5,69]. Queralto et al. covered the existing evidence on exhaled breath analysis and cancer detection [70], but provided only a brief description of the results and focused mainly on differences between array-based sensors. Recently, increasing interest has been devoted to novel instruments for breath analysis. Reviews were published on different electronic noses used until then for biomedical and other applications [71], advances in breath analysis using e-nose for detection of various diseases [72] and nanosensor technologies used for VOCs detection [45,73].
Differently to previous reviews, we extensively discuss key shortcomings of methodological issues, such as correction for overoptimism, performance of the validation studies and influence of potential confounders. Furthermore, we did not restrict this systematic review to specific cancer site or analysis method, as we wanted to understand exact potential of application of breath analysis to cancer detection at this stage. Nevertheless, our review has certain limitations that need to be acknowledged. Despite a comprehensive research in two independent databases we cannot exclude the possibility of having missed relevant studies. Standardized summarization and presentation of results was hampered by heterogeneity in the reporting in the original studies. We did not include in vitro studies because performance of these markers may not always translate into direct clinical applications in screening and diagnostic settings. We also did not include studies which used sniffer dogs, as potential implementation of canine-based diagnosis in health care settings might face logistic limitations.
In conclusion, breath analysis is a young field of research with great potential in cancer screening. For establishing an accurate test in a point of care screening setting, a large throughput sampling protocol of participants is required, i.e., collection and analysis time should be short, the method itself should be cheap, non-invasive, and with minimal health risk. In order to foster implementation in practice, larger studies should be implemented in true screening settings, paying particular attention to standardization in breath collection, consideration of covariates, adjustment for overoptimism, and validation in independent population samples. With further advancements in the area, breath test may have the potential to become a useful supplement and improve existing screening tools for a variety of cancers.

MATERIALS AND METHODS
A systematic literature search was performed in literature published until April 30, 2015 by searching Pubmed and Web of Science databases using the following combination of keywords: (cancer OR carcinoma OR adenocarcinoma OR tumor OR malignancy OR malignant disease) AND ((volatile AND (compound OR compounds OR marker OR markers OR biomarker OR biomarkers)) OR VOC OR VOCs OR breathprint OR breath-print OR breath print) AND (breath OR exhaled OR air). Fulltext original studies in English language which reported statistics on discrimination between cancer cases and controls, or studies which investigated specific VOCs, were included in this systematic review. In addition, reference lists were checked for relevant published studies for inclusion. Studies exclusively conducted in vitro or with sniffer dogs were not considered in this review.
Data extraction was carried out independently by two of the authors, AK and JAH, and included characteristics of study populations, such as numbers of cases and controls, their age, sex and smoking prevalence, as well as the country where study populations were recruited. Study design and methods used for breath analysis were also recorded. Indicators for diagnostic value were extracted both for individual VOCs as well as for multi-VOCs classifiers where provided. The following statistical parameters were considered: sensitivity and specificity, accuracy and area under the receiver operating characteristic curve (AUC). Correction for overoptimism and validation was recorded for each study. The most reliable information was considered to avoid overoptimism, i.e. bootstrapped or cross-validated values were extracted wherever such results were provided. For studies which used random sample split to create a model and validate it separately, values corresponding to validation set were considered.
Additionally, we extracted names of VOCs which showed a significantly different concentration in exhaled breath from cancer cases and controls, or which were used to build a classification model. The IUPAC name [6] was checked for all extracted compounds to detect synonyms and to ensure comparable results.
Missing information in the tables was calculated where possible, i.e., accuracy was assessed as the sum of correctly classified cases and controls divided by total number of people in a classification model. Also, when comparison of exhaled breath was made just between two groups (cases and controls) but authors provided a study population description for separate smaller sub-groups, then numbers were added up or weighted averages (e.g. of age) were calculated where possible.
Additionally, some quality criteria were checked for the studies and included in this systematic review. Comparability with respect to gender (or smoking status) was determined by the difference in percent units between proportion of males (or smokers) between cancer cases and controls. As for age, difference among median (or mean) ages between study groups was calculated. Comparability with respect to these variables was set to "yes" if the difference was not greater than 10 units and "no" otherwise. Other potentially important information for evaluating and comparing the results between studies was extracted, such as collection time and which breath part was used for analysis, analysis time, restrictions, potential preceding treatment of cancer cases, and exclusion criteria used when recruiting patients in each of the studies.

ACKNOWLEDGMENTS
We acknowledge Ms. Prudence Carr for writing assistance.