A plasma miRNA signature for lung cancer early detection

The early detection of lung cancer continues to be a major clinical challenge. Using whole-transcriptome next-generation sequencing to analyze lung tumor and the matched noncancerous tissues, we previously identified 54 lung cancer-associated microRNAs (miRNAs). The objective of this study was to investigate whether the miRNAs could be used as plasma biomarkers for lung cancer. We determined expressions of the lung tumor-miRNAs in plasma of a development cohort of 180 subjects by using reverse transcription PCR to develop biomarkers. The development cohort included 92 lung cancer patients and 88 cancer-free smokers. We validated the biomarkers in a validation cohort of 64 individuals comprising 34 lung cancer patients and 30 cancer-free smokers. Of the 54 miRNAs, 30 displayed a significant different expression level in plasma of the lung cancer patients vs. cancer-free controls (all P < 0.05). A plasma miRNA signature (miRs-126, 145, 210, and 205-5p) with the best prediction was developed, producing 91.5% sensitivity and 96.2% specificity for lung cancer detection. Diagnostic performance of the plasma miRNA signature had no association with stage and histological type of lung tumor, and patients’ age, sex, and ethnicity (all p > 0.05). The plasma miRNA signature was reproducibly confirmed in the validation cohort. The plasma miRNA signature may provide a blood-based assay for diagnosing lung cancer at the early stage, and thereby reduce the associated mortality and cost.


INTRODUCTION
About 155,870 Americans will die from lung cancer in 2017, more than the other 3 leading cancers combined (breast, prostate, and colorectal cancers). Worldwide 1.37 million deaths are attributed to lung cancer annually. Over 85% lung cancers are non-small cell lung cancers (NSCLC). NSCLC mainly consists of adenocarcinoma (AC) and squamous cell carcinoma (SCC). Tobacco smoking is the major cause of NSCLC. The disease is usually diagnosed at advanced stages when the prognosis is poor, resulting in an overall www.impactjournals.com/oncotarget/ Oncotarget, 2017, Vol. 8, (No. 67), pp: 111902-111911

Research Paper
5-year survival rate of approximately 14% [1]. However, the 5-year survival rate in patients with stage I NSCLC that has been surgically resected can be as high as 83%. Therefore, finding NSCLC earlier may reduce the mortality [1]. The early detection of lung cancer in a large randomized trial using low-dose CT (LDCT) has revealed a 20% reduction in mortality as compared to chest X-rays [2]. However, there are some limitations in using LDCT for lung cancer early detection and screening, including over-diagnosis, excessive cost and the harm associated with radiation exposure [3,4]. Furthermore, most countries outside the US can't afford to institute widespread CT screening, and many medical centers in the US do not yet follow guidelines. In addition, European studies show less significant results, and thus LDCT scan is not recommended for lung screening [5]. Therefore, the development of noninvasive or circulating biomarkers that can accurately and cost-effectively diagnose early stage lung cancer is required [6].
MicroRNAs (miRNAs) are a class of small noncoding RNAs (~22-nt) that can regulate gene expression [7]. Dysregulation of some miRNAs has been found in relation to oncogenesis and tumor metastasis [8][9][10][11]. Importantly, plasma miRNAs directly released from primary tumors or the circulating cancer cells might provide biomarkers for malignancies [11]. We have been one of the first to show that miRNAs are highly stable in peripheral plasma, due to their small size and relative resistance to nucleases [10][11][12][13][14]. Using a microarray platform, we further identified three plasm miRNAs (miRs-21, 210, and 486-5p), which used in combination had 75% sensitivity and 85% specificity for diagnosis of stage I NSCLC [15]. To date, numerous plasma miRNAs have been identified that show the potential for distinguish lung cancer patients from non-cancer subjects [11,[16][17][18][19][20][21]. However, none of them has been accepted in the clinical settings for lung cancer diagnosis, mainly due to the low sensitivity and specificity.
Since next-generation deep sequencing (NGS) could analyze clinical specimens for detecting novel genes with high-throughput purposes and a wide detectable expression range [22], we recently used the whole-genomic NGS to define a miRNA profile of primary lung tumor tissues [23]. We successfully identified 54 lung cancer-related miRNAs (Supplementary Table 1) [23], which not only included the previously published lung cancer-related miRNAs [21,[24][25][26][27][28], but miRNAs that had not been identified as associated with lung cancer. The lung tumor-associated miRNAs defined by GNS may provide a comprehensive list of biomarker candidates for developing high quality circulating biomarkers of lung cancer. The objective of this study was to investigate whether the miRNAs defined by GNS could be used as plasma biomarkers for lung cancer with high sensitivity and specificity.

Developing miRNA biomarkers for lung cancer detection
The 54 miRNAs had a <35 Ct value in plasma of all the subjects, implying that the miRNAs could be reliably measurable in peripheral plasma samples. Among the 54 miRNAs, 30 (55.6%) displayed a significantly different plasma expression level of lung cancer patients vs. cancer-free smokers (All p<0.05) ( Table 1). The individual miRNAs exhibited AUC values of 0.52-0.80 in distinguishing lung cancer patients from cancer-free controls in the development cohort (Table 1). We used logistic regression models with constrained parameters as in LASSO based on ROC criterion to identify and optimize a panel of biomarkers. The four miRNAs (miRs-126, 145, 210, and 205-5p) were selected as the best biomarkers (all P <0.001) and incorporated into a logistic model: Probability of a lung cancer patient = e x / (1 + e x ), where x =. The logistic model produced 0.96 AUC for lung cancer detection ( Figure 1). Furthermore, Pearson correlation among expression levels of the four plasma miRNAs was low (p> 0.05), implying that their diagnostic values were complementary to each other. Using Youden's index, we set up optimal cutoff at 0.76. An individual tested with the signature, who had a cutoff ≥0.76, might be considered as a lung cancer patient. Subsequently, combined use of the four miRNAs as a signature by simply calculating the equation produced 91.5% sensitivity and 96.2% specificity. In addition, the plasma miRNA signature had significantly higher AUC (0.96 vs. 0.85) (Figure 1), sensitivity (91.5% vs. 76.0%), and specificity (96.2% vs. 85.3%) than did our previously developed three-plasma miRNA panel (all p<0.05) ( Table 2). Moreover, including other miRNAs in the signature did not improve the accuracy for lung cancer diagnosis. The expression level of miR-205-5p was associated with SCC (p<0.05). The expression levels of miRs-210 and 205-5p were related with smoking pack-years (all p<0.05). However, using the four-plasma miRNA signature could diagnose lung cancer independent of histological type and stage of the NSCLC, and age, gender, and ethnicity of subjects (All p > 0.05), but their smoking pack-years (p < 0.05).

Validating the plasma miRNA signature in a different cohort of cases and controls
The four miRNAs were successfully assessed in the validation cohort, and displayed a different plasma expression level between lung cancer patients vs. cancerfree individuals (All p<0.05). We used the optimal cutoff established in the above development cohort to determine diagnostic performance of the four-plasma miRNA signature. The plasma miRNA signature produced similar sensitivity (91.2%) and specificity (96.7%) in

DISCUSSION
So far, numerous circulating miRNA profiles in plasma and serum have been identified as noninvasive biomarkers for lung cancer [11,[16][17][18][19][20][21][29][30][31][32]. However, there is a small overlap between the identified miRNA biomarkers, because of some reasons below: 1), the biomarkers were developed from the limited number of miRNA biomarker candidates. 2), the lack of correlation of miRNA expression levels between serum and plasma may lead to the differences of the biomarkers between the two liquid components of blood [33]. 3), the variation in preanalytical factors, such as sample preparation procedures and different normalization strategies makes comparison between studies difficult.
To address the challenges, we systematically and comprehensively characterized a miRNA profile of NSCLC in surgically resected lung tumor tissues by using whole-transcriptome NGS [23]. We identified 54 miRNAs with a fold change (FC) ≥2.0 in NSCLC tissues vs. noncancerous tissues [23]. Interestingly, the identified miRNAs of lung cancer not only comprised these previously published lung cancer-related miRNAs including the ones discovered by The Cancer Genome Atlas [11,[16][17][18][19][20][21][24][25][26][27][28][29][30][31][32], but also miRNAs whose changes had not been found in lung tumors. The miRNAs may provide potential high-quality biomarkers for lung cancer diagnosis. Furthermore, given that RNA released during the coagulation process may alter the composition of circulating miRNAs in serum, plasma may be the preferred sample choice [33]. Therefore, the primary emphasis of this study was to evaluate the comprehensive set of 54 miRNA biomarker candidates in plasma to develop more accurate and robust circulating biomarkers for lung cancer early detection. To reduce the variation in pre-analytical factors in term of bias linked to sampling methods, storage and purification, in this study we collected blood and prepared plasma using the well-established SOPs developed by the NCI-EDRN [34,35]. We used lysis solution to maximally reduce the possible contamination from RBCs in plasma 13,[44][45][46][47][48][49] . To further diminish the bias linked to plasma quality, we tested the samples for hemolysis by measuring the free hemoglobin content using spectrophotometry [36]. Samples with absorbance peaks at 414 nm were considered positive for free hemoglobin and excluded from analysis. The plasma samples that were positive to these blood cells-related miRNAs were also excluded from the study. To further produce reliable results, we avoided repeated cycles of thawing and re-freezing plasma and RNA samples to diminish the degradation. Only RNA samples that had a 260/280 ratio of 1.8-2.0 and a RNA integrity number of ≥7 were analyzed. In addition, we used a well-established and validated qPCR assay [37] to quantify expression of the plasma miRNAs. As a result, our newly developed four-plasma miRNA signature created higher sensitivity and specificity for lung cancer detection than did the three plasma miRNA panel [38]. Furthermore, the diagnostic performance of the biomarkers was further blindly validated in a different cohort, suggesting that the plasma miRNA signature might be a robust assay for lung cancer diagnosis. In addition, the four miRNAs (miRs-126, 145, 210, and 205-5p) of the biomarker signature were not expressed in any type of blood cells [39]. Moreover, the performance of this plasma miRNA signature for lung cancer diagnosis was independent of tumor stage and histology. This new discovery in plasma might be an important characteristic if it is employed for more precisely and easily identifying early stage lung cancer.
Nadal et al. identified a four-miRNA signature (miRs-193b, 301, 141 and 200b) in serum for lung cancer detection with a sensitivity of 96% and a specificity of 95% [16]. In our present study all the four miRNAs also showed a different expression level in plasma of lung cancer patients vs. cancer-free smokers. However, our logistic regression models with constrained parameters as in LASSO analysis did not include the four serum miRNAs in our plasma miRNA signature. The differences of miRNAs between our plasma and Nadal's serum signatures could partially be due to the lack of correlation of miRNA expression levels in the two different body fluids. Since RNA released during the coagulation process may alter the composition of circulating miRNAs in serum rather than plasma samples, plasma may be the preferred sample choice for the development of circulating biomarkers [33]. Furthermore, our analysis of the plasma signature by simply calculating the equation with the established cut-off would be a convenient tool in the clinics. Nonetheless, a new study for directly comparing the miRNA signatures of lung cancer patients in plasma and serum samples is needed, Dysregulation of the four miRNAs has been proven to associate with lung tumorigenesis. For example, change of circulating miR-126 could act as a significant biomarker in the prognosis of various cancers, including NSCLC [40]. miR-145 is dysregulated in lung cancer cells [41]. miR-145 inhibits lung cancer cell migration and invasion by targeting PDK1 via the mTOR signaling pathway [42]. miR-210 can regulate the hypoxic response of tumor cells [43][44][45]. We have reported that miR-210 overexpression in plasma is associated with lung cancer [46][47][48]. Elevated miR-205-5p expression participates in the development and progression of lung SCC [23,49]. We have shown that miR-205-5p is one of three miRNAs that could be used as sputum biomarkers for the early detection of lung cancer [50,51].
There are some limitations in this present study. 1), the plasma samples were obtained from the hospital-based patients with clinical diagnosis. The participants might not be representative of high-risk populations (e.g., heavy smokers) in screening setting for lung cancer. We will perform a prospective and multisite lung cancer screening trial to validate the diagnostic value of the plasma miRNA signature. 2), the NLST indicated that the early diagnosis of lung cancer by using LDCT could considerably reduce the mortality [4]. However, LDCT has a low specificity for the early detection of lung cancer, presenting a major clinical challenge [52]. We are evaluating whether the  Abbreviations: NSCLC, non-small cell lung cancer. Abbreviations: NSCLC, non-small cell lung cancer. www.impactjournals.com/oncotarget plasma miRNA signature could improve the specificity of LDCT for the early detection of NSCLC by specifically distinguishing malignant from benign pulmonary growths.
3), exploration of biomarkers in blood exosomal miRNA profiles has recently become a hot research topic [53]. Yet critical issues, including methods to specifically and cost-effectively isolate exosomal miRNAs, need to be well addressed before the exosomal miRNAs could be employed as noninvasive biomarkers [54]. We will perform a different study to compare the cell-free plasma and exosomal miRNA biomarkers to determine which would be better or if their combined use has synergistic efficiency for lung cancer detection. 4), based on patient and pulmonary nodule characteristics on CT images, several clinical predictive models have been developed to discriminate lung cancer from benign growths [55][56][57][58]. However, they only have moderate diagnostic performance. Furthermore, the assessments of cell-free circulating tumor DNA (ctDNA) or DNA methylation status of gene promoters have attracted increasing attention as potential liquid biopsy tests for lung cancer [59][60][61]. Our ongoing efforts are to compare the diagnostic efficiency of the plasma miRNA signature with those of the cell-free DNA biomarkers and clinical prediction models in the early detection of lung cancer. In summary, a four-plasma miRNA signature that could accurately differentiate early stage NSCLC patients from cancer-free smokers was successfully developed and validated. Nevertheless, undertaking a prospective study to further validate this plasma miRNA signature in a prospective cohort is required.

Patients and clinical specimens
This study and the related protocols were approved by the Institutional Review Boards (IRB) of University of Maryland Baltimore and Veterans Affairs Maryland Health Care System. Using the inclusion and/or exclusion criteria recommended by U.S. Preventive Services Task Force for lung cancer screening in heavy smokers [62], we recruited lung cancer patients and cancer-free smokers. Briefly, we enrolled heavy smokers between the ages of 55-80 who had at least a 30 pack-year smoking history and were former smokers (quit within 15 years). Exclusion criteria included pregnancy, current pulmonary infection, surgery within 6 months, radiotherapy within 1 year, and life expectancy of < 1 year. We collected blood in BD Vacutainer spray-coated K2EDTA Tubes (BD, Franklin Lakes, NJ) and prepared plasma using the standard operating protocols (SOPs) developed by The NCI-Early Detection Research Network (EDRN) [34,35]. The blood samples from cancer patients were collected at the time of initial consultation, prior to definitive surgical management and/or adjuvant therapy. The specimens were processed within 2 hours of collection by centrifugation at 1,300 X g at for 10 minutes 4°C. We used red blood cell (RBC) lysis buffer to maximally reduce the possible contamination from RBCs in plasma [13,[44][45][46][47][48][49], which was immediately transferred to a fresh tube and stored at −80°C until use. A total of 126 NSCLC patients and 118 cancer-free smokers were recruited. Among the cancer patients, 44 were African American and 82 were Caucasian and 39 patients were female and 87 were male. Forty had stage I NSCLC, 43 with stage II, and 43 with stage III-IV. Of the cancer-free smokers, 41 were African American and 77 were Caucasian and 36 patients were female and 82 were male. There were no significant differences of age, race, gender and smoking status between the NSCLC patients and healthy individuals. In this study, the cases and controls were randomly grouped into two cohorts: a development cohort and a validation cohort. The development cohort consisted of 92 lung cancer patients and 88 cancer-free smokers, while the validation cohort comprised 34 lung cancer patients and 30 cancer-free smokers. The demographic and clinical variables of the two sets are shown in Tables 3-4.

RNA isolation
RNA was extracted from plasma by using a mirVana miRNA Isolation Kit (Ambion, Austin, TX) as described in our previous studies [12,36,38]. Purity and concentration of RNA were determined by using a dual beam UV spectrophotometer (Eppendorf AG, Hamburg, Germany). Integrity of RNA was determined by using a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA).

Quantitative reverse transcriptase PCR (qRT-PCR)
RT was carried out by using a RT Kit (Applied Biosystems, Foster City, CA) as described in our published works [12,36,38]. Briefly, RNA was applied for RT by using the Applied Biosystems 9700 Thermocycler (Applied Biosystems) according to the manufacturer's instructions. The reaction comprised 50 nM stem-loop RT primer, 1x RT buffer, 0.25 mM each of dNTPs, and 3.33 U/μl MultiScribe reverse transcriptase in a total volume of 15 μL. Real-time PCR was performed to measure expressions of target miRNAs by using a PCR kit (Applied Biosystems) on a Bio-Red IQ5 Muilt-color Realtime PCR Detection System (Bio-Red, Hercules, CA). The 20 μl PCR reaction included RT product, 1x TaqMan ® Universal PCR Master Mix (Applied Biosystems), and the corresponding primers and Taqman probe for the target genes. The reactions were incubated in a 94-well plate at 95°C for 15 min, followed by 40 cycles of 95°C for 15 s and 60°C for 1 min. Expression levels of the 54 miRNAs in plasma were determined using comparative cycle threshold (Ct) method with the equation 2-ΔΔCt as previously described [12,23,28,38,50]. Ct values of the target miRNAs were normalized in relation to that of U6 [63]. For comparison, our previously identified panel of three plasma miRNA biomarkers (miRs-21, 210, and 486-5p) for lung cancer diagnosis [38] was also tested in the specimens by using the same protocol.

Statistical analysis
To determine sample size, we used the area under receiver operating characteristic (ROC) curve (AUC) analysis and set the null hypothesis (H0) at 0.5. Accordingly, at least 28 subjects were required in each category of cases and controls to show a minimum difference of interest between an AUC of 0.75 versus an AUC of 0.5 with 80% power at the 5% significance level. Therefore, the development and validation cohorts of subjects in this present study would provide an enough statistical power for identification and verification of the biomarkers. Pearson's correlation analysis was applied to assess relationship between plasma miRNA expressions and demographic and clinical characteristics of the patients and control individuals. The ROC curve and AUC analyses were used to determine sensitivity, specificity, and corresponding cut-off value of each miRNA [64]. To decide sensitivity and specificity, clinicopathologic results were used as the gold standard. We utilized logistic regression models with constrained parameters as in least absolute shrinkage and selection operator (LASSO) based on ROC criterion to eliminate the large number of irrelevant genes, develop composite panels of biomarkers, and optimize a diagnostic signature with the highest sensitivity and specificity. To compare the new signature and our previously developed three-plasma miRNA panel [38], we compared their AUCs to determine the sensitivity and specificity as previously described [56][57][58].