An integromic signature for lung cancer early detection

We previously developed three microRNAs (miRs-21, 210, and 486-5p), two long noncoding RNAs (lncRNAs) (SNHG1 and RMRP), and two fucosyltransferase (FUT) genes (FUT8 and POFUT1) as potential plasma biomarkers for lung cancer. However, the diagnostic performance of the individual panels is not sufficient to be used in the clinics. Given the heterogeneity of lung tumors developed from multifactorial molecular aberrations, we determine whether integrating the different classes of molecular biomarkers can improve diagnosis of lung cancer. By using droplet digital PCR, we analyze expression of the seven genes in plasma of a development cohort of 64 lung cancer patients and 33 cancer-free individuals. The panels of three miRNAs (miRs-21, 210, and 486-5p), two lncRNAs (SNHG1 and RMRP), and two FUTs (FUT8 and POFUT1) have a sensitivity of 81-86% and a specificity of 84-87% for diagnosis of lung cancer. From the seven genes, an integromic plasma signature comprising miR-210, SNHG1, and FUT8 is developed that produces higher sensitivity (95.45%) and specificity (96.97%) compared with the individual biomarker panels (all p<0.05). The diagnostic value of the signature was confirmed in a validation cohort of 40 lung cancer patients and 29 controls, independent of stage and histological type of lung tumor, and patients’ age, sex, and smoking status (all p>0.05). The integration of the different categories of biomarkers might improve diagnosis of lung cancer.


INTRODUCTION
Over 85% lung cancers are non-small cell lung cancers (NSCLC). NSCLC mainly consists of adenocarcinoma (AC) and squamous cell carcinoma (SCC). Tobacco smoking is the major cause of NSCLC. Since the prognosis for patients with lung cancer is strongly correlated to the tumor stage, diagnosing lung cancer at a curable stage can reduce the mortality [1]. The early detection of lung cancer in a large randomized trial using low-dose CT (LDCT) has revealed a 20% reduction in mortality as compared to chest X-rays [1]. However, LDCT is associated with over-diagnosis, excessive cost, and radiation exposure [2,3]. The development of circulating biomarkers that can accurately and costeffectively identify early stage lung cancer is required [4].
Emerging evidences have demonstrated that aberrant glycosylation leads to cancer development and progression [16]. Fucosylation is the major type of glycosylation, and regulated by fucosyltransferases (FUTs) [16,17]. We recently found that combined use of two plasma FUTs (FUT8 and POFUT1) had 81% sensitivity and 84% specificity for diagnosis of lung cancer, thus providing a new category of cell-free circulating biomarkers for lung cancer.
Since NSCLC is a heterogeneous disease and develops from multifactorial molecular aberrations [18], the analysis of one type of molecular changes may not achieve the performance required to move forward for clinical application. Indeed, although our individual panels of plasma biomarkers show promise for lung cancer diagnosis, their sensitivities (81-86%) and specificities (84-87%) are not sufficient to be used in the laboratory settings.
Because miRNAs, lncRNAs, and FUTs have highly diverse roles that drive the development of lung cancer [7][8][9][10]16], we hypothesize that integrating the different classes of biomarkers may improve the early detection of lung cancer. Here we evaluate the individual and combined applications of the three categories of plasma molecular biomarkers for lung cancer.

The three individual panels of plasma biomarkers displayed a different level in NSCLC patients vs. smokers
Droplet Digital PCR (ddPCR) was used for quantification of the genes (miRs-21, 210, 486-5p, SNHG1, RMRP, FUT8, and POFUT1) in plasma of a development cohort of 64 lung cancer patients and 33 cancer-free individuals. All the seven genes generated at least 10,000 droplets in each well of the plasma samples. Therefore, the seven genes could be successfully ''read'' by ddPCR for their absolute quantification in plasma. These genes had a significantly different expression level in plasma of the NSCLC patients compared with the control individuals (all P<0.05). As a result, the individual genes resulted in 50.09 to 75.76% sensitivities and 63.64 to 90.91% specificities for detection of NSCLC (Table 1). Furthermore, the panel of three microRNA biomarkers (miRs-21, 210, and 486-5p) had an area under receiver operating characteristic curve (AUC) of 0.92 with 86.36% sensitivity and 87.88% specificity, the panel of two plasma lncRNA biomarkers (SNHG1 and RMRP) displayed 0.89 AUC with 83.33% sensitivity and 84.85% specificity, and the panel of two FUTs (FUT8 and POFUT1) exhibited an AUC of 0.85 with 81.82% sensitivity and 84.85% specificity for diagnosis of lung cancer ( Table 2). The individual panels of the genes didn't show special association with stage and histology of the NSCLC, age, gender, and smoking status of the participants (All p>0.05). The seven genes would be potential plasma biomarkers for lung cancer.

An integromic plasma signature for lung cancer early detection
We used logistic regression models with constrained parameters as in least absolute shrinkage and selection operator (LASSO) and AUCs to determine performance of different patterns of combining the genes. From the seven genes, one miRNA (miR-210), one lncRNA (SNHG1), and one FUT (FUT8) were selected as the best biomarkers (all P<0.001). A logisitic regression model with each of the different types of genes was developed as an integromic signature for diagnosing lung cancer: U=-7. 29+2.8 * log (SNHG1) +3.83 * log (FUT8) +3.36 * log (miR-210). Combined analysis of the 3 biomarkers by using the logisitic regression model produced a higher AUC (0.97) ( Figure 1) than did the individual panels of biomarkers (p<0.05). We used the highest Youden's J index to set up corresponding cut-off value [19]. The optimal cut-off for the integromic signature was U=0.79. Any subject with U≥0.79 was classified as a lung cancer case. As a result, the integromic plasma signature yielded significantly higher sensitivity (95.45%), specificity (96.97%), and accuracy (95.96%) compared with the individual panels of biomarkers (all p<0.05) ( Table 2). Furthermore, combined use of all the seven genes did not produce higher sensitivity and specificity compared with the integromic plasma signature (p>0.05). In addition, Pearson's correlation analysis showed that the relationships among levels of the three genes were very low (All p>0.05), implying that the integration of the different classes of molecular biomarkers has complementary classification. Moreover, the integromic plasma signature had no special association with histological type of the NSCLC, age, gender, and smoking status of the participants (All p>0.05). The integromic signature did not show statistical difference of sensitivity and specificity for different stages of NSCLC (Supplementary Figure 1).

Validating the integromic plasma signature for lung cancer detection
The plasma expression levels of the three genes (miR-210, SNHG1, and FUT8) were assessed by using ddPCR in a validation cohort of additional 40 NSCLC patients and 29 healthy controls. Combined analysis of www.oncotarget.com the three genes by using the logisitic regression model created 0.94 AUC for lung cancer diagnosis. There was no significant difference between the develop cohort and validation cohort with regarding the signature's AUCs (0.95 vs. 0.94, p=0.46) (Figure 2). In the validation cohort, the three genes used in combination could differentiate the NSCLC patients from healthy controls with a sensitivity of 95.00% (82.08% to 99.12%) and a specificity of 96.55% (80.37% to 99.82%). In line with the findings in the development cohort, the integromic plasma signature did not show statistical difference of sensitivity and specificity across different stages and subtypes of NSCLC (all p>0.05). Moreover, there was no association of expressions of the genes with the age, gender, or smoking status of the individuals (All p>0.05).

DISCUSSION
Although showing promise, the use of the individual miRNA, lncRNA, or FUT biomarker panels alone has moderate sensitivities (81-86%) and specificities (84-87%). The miRNAs, lncRNAs, and FUTs have highly different functions in carcinogenesis [20][21][22]. Given the heterogeneous nature of lung cancer and the numerous cellular pathways involved, we hypothesize that integrating the different classes of molecular biomarkers may improve the early detection of lung cancer. Intriguingly, the integrated analysis of only one of each type of biomarkers by using a single platform (ddPCR) yields a significantly higher diagnostic performance compared with any panel of one type of genes. Furthermore, the correlations among the changes of the miRNAs, lncRNA, and FUT are very low, supporting that the diagnostic vales of the three classes of molecular alterations could be complementary to each other. Therefore, the observation confirms our hypothesis. Moreover, since the integromic plasma signature shows similar sensitivity and specificity in the early vs. advanced stages of NSCLC, it might be a useful approach for the early detection of lung cancer, a clinically challenging.
The study does have some limitations. 1), the sample size of the cohorts is small. We will perform a new study to prospectively validate the integromic signature for lung cancer early detection using a large population. 2), it is well known that lung cancer-associated molecular genetic changes are also related to chronic obstructive pulmonary disease (COPD) [25]. Many lung cancer patients who are smokers and cancer-free heavy smokers have COPD [25]. COPD could impact molecular genetic profiles in plasma  In this present project, there is no COPD information of the cases and controls enrolled. Therefore, we are not able to evaluate if the biomarkers identified in the study are associated with COPD. We will recruit lung cancer patients and cancer-free smokers who have COPD, and determine if COPD is the confounding effect on the molecular changes. 3), the early detection of NSCLC using LDCT followed by appropriate treatments can significantly reduce lung cancer mortality in smokers [1]. LDCT is now recommended for Abbreviations: NSCLC, non-small cell lung cancer. www.oncotarget.com lung cancer screening in smokers. Yet LDCT has a low specificity for the early detection of lung cancer, presenting a major clinical challenge [1]. The participants enrolled in this project are not representative of the smokers in LDCT screening setting for lung cancer. We will perform a prospective trial to determine if the integromic signature could improve the spesitivity of LDCT for the early deteciton of lung cancer in smokers.

Patients and clinical specimens
Using a protocol approved by the local Institutional Review Boards Institutional Review Boards, we recruited lung cancer patients and cancer-free smokers according to the inclusion and/or exclusion criteria recommended by U.S. Preventive Services Task Force [26]. Briefly, we enrolled smokers between the ages of 55-80 who had at least a 30 pack-year smoking history and were former smokers (quit within 15 years). Exclusion criteria included pregnancy, current pulmonary infection, surgery within 6 months, radiotherapy within 1 year, and life expectancy of < 1 year. We collected blood in BD Vacutainer spraycoated K2EDTA Tubes (BD, Franklin Lakes, NJ) and prepared plasma using the standard operating protocols developed by The NCI-Early Detection Research Network [27]. The specimens were processed within 2 hours of collection by centrifugation at 1,300 X g for 10 minutes at 4°C. The surgical-pathologic staging of NSCLC was used as the ground truth according to the TNM classification of the International Union Against Cancer (UICC) with the American Joint Committee on Cancer (AJCC) and the International Staging System for Lung Cancer [28,29]. A total of 106 NSCLC patients and 62 cancer-free smokers were recruited. Among the cancer patients, 27 patients were female and 79 were male. Twenty-four had stage I NSCLC, 19 with stage II, 28 with stage III, 30 with stage IV, and 5 with unknown stage. Fifty-six lung cancer patients were diagnosed with AC, while 40 with SCC. Of the cancer-free smokers, 16 patients were female and 46 were male. There were no significant differences of age, gender and smoking status between the NSCLC patients and cancer-free smokers. The cases and controls were randomly grouped into two cohorts: a development cohort and a validation cohort. The development cohort consisted of 66 lung cancer patients and 33 cancer-free smokers, while the validation cohort comprised 40 lung cancer patients and 29 cancer-free smokers. The demographic and clinical variables of the two cohorts are shown in Table 3.

ddPCR
RNA was extracted from plasma by using Trizol LS reagent (Invitrogen Carlsbad, CA) and RNeasy Mini Kit (Qiagen, Hilden, Germany) [11,12]. The qualification and quantification of RNA were assessed by using Biospectrometer (Hutchinson Technology Inc, Hutchinson, MN) and Electrophoresis Bioanalyzer (Agilent Technologies, Foster City, CA). Reverse Transcriptase (RT) was carried out to generate cDNA by using a RT Kit (Applied Biosystems, Foster City, CA) [11,12]. ddPCR for analysis of expression level of the genes was performed as described in our published works by using a QX200™ Droplet Digital™ PCR System (Bio-Rad, Hercules, CA) [11][12][13][14][15]. Briefly, PCR reaction mix containing cDNA was partitioned into aqueous droplets in oil via the QX100 Droplet Generator, and then transferred to a 96-well PCR plate. A two-step thermocycling protocol (95°C ×10min; 40 cycles of [94°C ×30s, 60°C ×60s], 98°C ×10 min) was undertaken in a Bio-Rad C1000 (Bio-Rad, Pleasanton, CA). The PCR plate was then transferred to the QX100 Droplet Reader for automatic reading of samples in all wells. Copy number of each gene per μl PCR reaction was directly determined. Primers and probes of the targeted genes are shown in Supplementary Table 1. We used QuantaSoft 1.7.4 analysis software (Bio-Rad) and Poisson statistics to compute droplet concentrations (copies/μL). Only genes that had at least 10,000 droplets were considered to be robustly detectable by ddPCR in plasma and subsequently underwent further analysis [31]. All assays were done in triplicates, and one no-template control and two interplate controls were carried along in each experiment.

Statistical analysis
To estimate sample size, we set AUC of H0 (the null hypothesis) at 0.5. H1 represented the alternative hypothesis. To have a high reproducibility with adequate precision, we required ≥28 subjects per group. With this sample size, we would have 85% power to detect an AUC of 0.75 at the 2% significance level. Therefore, the sample size in the two cohorts could have enough statistical power. Pearson's correlation analysis was applied to assess relationship between gene expressions and demographic and clinical characteristics of the patients and control individuals. AUCs were used to determine accuracy, sensitivity, and specificity of each gene. We used the highest Youden's J index (sum of sensitivity and specificity-1) to set up corresponding cut-off value [19]. Logistic regression models with constrained parameters as in LASSO were used to eliminate the irrelevant genes, develop composite panels of biomarkers, and optimize a signature with the highest sensitivity and specificity. To compare the signature and our previously developed plasma biomarker panels, we compared their AUCs to determine the sensitivity and specificity as previously described [15].

CONCLUSIONS
Given the heterogeneous nature of NSCLC developed from multifactorial molecular aberrations, we have for the first time demonstrated that the integration of miRNA, lncRNA, and FUT biomarkers could provide an efficient approach for diagnosis of lung cancer. Nonetheless, a large multi-center clinical project to prospectively validate the full utility of the integromic signature is required.

Author contributions
QL, YL, MZ, and FJ conducted the experiments and participated in study design, coordination, and data interpretation, and preparing the manuscript. All authors read and approved the final manuscript.

CONFLICTS OF INTEREST
The authors declare no conflicts of interest.

Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.