Non-small cell lung cancer associated microRNA expression signature: integrated bioinformatics analysis, validation and clinical significance

Recently, increasing studies of miRNA expression profiling has confirmed that miRNA plays an essential role in non-small cell lung cancer (NSCLC). However, inconsistent or discrepant results exist in these researches. In present study, we performed an integrative analysis of 32 miRNA profiling studies compared the differentially expressed miRNA between NSCLC tissue and non-cancerous lung tissue to identify candidate miRNAs associated with NSCLC. 7 upregulated and 10 downregulated miRNAs were identified as miRNA integrated-signature using Robust Rank Aggregation (RRA) method. qRT-PCR demonstrated that miR-21-5p, miR-210, miR-205-5p, miR-182-5p, miR-31-5p, miR-183-5p and miR-96-5p were up-regulated, whereas miR-126-3p, miR-30a-5p, miR-451a, miR-143-3p and miR-30d-5p were down-regulated more than 2 folds in the NSCLC, which was further validated in Tumor Cancer Genome Atlas (TCGA) database. Receiver operating characteristic (ROC) curve analysis confirmed that 9 miRNAs had good predictive performance (AUC > 0.9). Cox regression analysis revealed that miR-21-5p (hazard ratio [HR]: 1.616, 95% CI: 1.114–2.342, p = 0.011) and miR-30d-5p (HR: 0.578, 95% CI: 0.400–0.835, p = 0.003) were independent prognostic factors in NSCLC for overall survival. The accumulative effects of the two miRNAs on the prognosis of NSCLC were further estimated. The results showed that patients with two positive markers had a worse prognosis than those with one or none positive marker. In conclusion, this study contributes to the comprehension of the role of miRNAs in NSCLC and provides a basis for further clinical application.


INTRODUCTION
Lung cancer, mainly consisted of non-small cell lung cancer (NSCLC), is the most frequently diagnosed carcinoma and the leading cause of cancer-related death worldwide [1].Despite it has a little improvement, the prognosis of lung cancer is still poor, with less than 18% of patients surviving more than 5 years, partly due to the fact that majority of cases are diagnosed at a late stage [2].The survival of these patients relies remarkably on diagnosis.Therefore, identification of new biomarkers for the early diagnosis of NSCLC is crucial for the patients to receive optimal therapeutic regimen As early as possible.
MiRNAs, a novel class of short non-coding RNA molecules, play essential roles via posttranscriptional regulation of gene expression in almost all biological processes, including cell proliferation, differentiation, apoptosis, etc.Many miRNA profiling studies reveal the correlation between miRNAs and human diseases, especially cancers.Large numbers of differentially dysregulated miRNAs involved in cancer have been identified by high-throughput technologies across various normal and cancer tissues [3][4][5].Thus, miRNAs have been regarded as potential biomarkers for diagnosis and precise predictions of prognosis for NSCLC, as well as targets for treatment [6,7].Although emerging evidences supported miRNAs as potential biomarkers of NSCLC, there were

Research Paper
inconsistent or discrepant results among these researches due to the existence of various drawbacks, such as limited sample size, application of different profiling platforms, diverse methods for data collection and analysis, and increasing discovery of new miRNAs [8][9][10].
To overcome these limitations, we need an integrated and unbiased manner to analyze the results and obtain more significant miRNAs.Robust rank aggregation (RRA) approach [11], used for comparison of various ranked gene lists, is an accurate and effective method for identification of differentially expressed miRNA integrated-signature [12], in which a P-value would be assigned for all miRNAs in the ranked lists to re-rank these miRNAs and decide their significance.
Thus, we performed the integrated analysis to identify the key miRNA signatures as well as their molecular pathway in NSCLC patients, and then we validated the expression of these miRNA signatures using qRT-PCR and TCGA datasets.Obtained results might contribute to the diagnostics, therapeutics and prognosis for NSCLC patients.

Selection and characteristics of the datasets
According to the inclusion criteria, 32 independent full-text studies were retrieved from public databases (Pubmed, GEO, and ArrayExpress), All these studies were published between 2006 and 2015.MiRNA sequencing technique was applied in 4 studies [8,[13][14][15] to identify the differentially expressed profiles, and the others used miRNA microarray chip technology.A simple summary of the studies was described in Table 1.In total, 1321 tumor and 930 non-cancerous samples were extracted in our study.The number of cases included ranged from 3 to 187 (median 20) over the studies.Diverse profiling platforms were applied and the median number of miRNAs detected was 688 (ranging from 235 to 2006) in these studies.Distribution of differentially dysregulated miRNA was shown in Figure 1.A total of 314 significantly upexpressed and 373 significantly down-expressed miRNAs were extracted from all studies.Moreover, 80 miRNAs with inconsistent alteration was found, suggesting that they were both up-regulated and down-regulated in different studies.The counts of remarkably alterative miRNAs varied extremely over these studies, but at least one up-and two down-regulated miRNAs was reported in each study.
The findings indicated the high expression of miR-21-5p and the low expression of miR-30d-5p were all significant predictors of poor prognosis for OS in patients with NSCLC.To assess the integrated effects of the two miRNAs expression on the prognosis, we divided the 423 patients into three groups according to the number of positive markers from miR-21-5p high expression and miR-30d-5p low expression.Namely, group1 (none positive marker), group2 (one positive marker), group3 (two positive markers).Cox regression analysis and Kaplan-Meier survival analysis were executed and the differences between three groups were estimated.The results demonstrated that the patients with two positive markers (n = 122) had a worse OS than those with one positive marker (n = 180) (HR: 1.776, 95% CI 1.181-2.674,p = 0.006), than those with none positive marker (n = 121) (HR: 2.212, 95% CI 1.376-3.559,p = 0.001).However, the group2 failed to reveal a significantly statistical difference compared to group1 (p > 0.05) (Figure 6D).

Targets prediction and functional enrichment
The high-precision target prediction for integratedsignature miRNAs was performed.Target genes were acquired from both prediction algorithms (TargetScan v7.1, miRDB and DIANA) and experimentally supported targets from TarBase and starBase.The results showed that miR-96-5p, miR-30a-5p and miR-30d-5p had the largest number of consensus target genes, whereas miR-126-3p, miR-210 and miR-451a had the smallest number of consensus targets.The counts of predicted targets were presented in Supplementary Figure 1.To elucidate the biological function of the miRNAs, we implemented enrichment analyses using consensus targets.As the result, 71 Panther pathways, 126 KEGG pathways, and 829 GO processes were enriched by the miRNAs targets.The most significantly enriched panther pathways concentrated on EGF receptor signaling pathway, integrin signaling pathway, PDGF signaling pathway, Wnt signaling pathway, FGF signaling pathway, angiogenesis, Ras pathway, and TGF-beta signaling pathway, the  7).KEGG pathways enriched remarkably were mainly associated with characteristics of cancer, such as pathways in cancer, focal adhesion, MAPK signaling pathway, regulation of actin cytoskeleton, and neurotrophin signaling pathway.The GO processes that were markedly enriched mostly involved in regulation of transcription.Ten GO processes, KEGG pathways and Panther pathways most strongly enriched by consensus targets were shown in Table 3.

DISCUSSION
MiRNAs have been identified as the potential biomarkers for diagnosis and clinical prognosis of many cancerous diseases.Although increasing miRNA profiling studies were performed to find the differentially dysregulated miRNAs, these efforts often presented inconsistent results across the different studies.Traditional meta-analysis has been done previously to determine differentially expressed miRNAs in cancer through a classical vote-counting method [37,38].However, such rigorous approach was often defective due to the existence of heterogeneity between different miRNA profiling platforms and the unavailability of raw data.In current study, to overcome these defects, an integrated analysis using RRA method was performed for identification of differentially dysregulated miRNAs from 32 independent profiling experiments.The miRNAs were re-ranked and their significances were re-determined by assigning a P-value for each miRNA.Finally, 7 upregulated and 10 downregulated integrated signature miRNAs were identified.
The majority of integrated-signature miRNAs were known to be functionally crucial in NSCLC.MiR-21 is upregulated and associated with poor prognosis in NSCLC [19,31].MiR-21 could promote tumorigenesis of NSCLC by potentiating indirectly the Ras/MEK/ERK pathway and inhibiting cell apoptosis [39].Besides, it is shown to promote the growth and invasion of NSCLC through directly targeting PTEN [40].Overexpression of miR-210 in NSCLC can decrease mitochondrial components related to cell survival and the activation of HIF-1 activity [26].MiR-182-5p and miR-183-5p were found to be up-regulated in NSCLC tissue compared with adjacent normal lung tissue [17], and the miR-183~96~182 cluster could inhibit the invasion and metastasis of lung cancer through directly suppressing the expression of Foxf2 [41].
MiR-205 was identified as a potential accurate marker to distinguish SCC from other classifications of NSCLC [42].It has also been shown to promote the growth, metastasis and chemoresistance of NSCLC cells by inhibiting the expression of PTEN [43].MiR-31, one of oncogenic miRNAs, promotes lung cancer cell growth and tumorigenesis by suppressing specific tumor suppressors [44].MiR-126-3p and miR-126-5p, two forms of miR-126, were associated with angiogenesis and cell proliferation in cancer [45], which were also down-regulation in present study.MiR-126 suppressed the expression of vascular endothelial growth factor A and potentiated the sensitivity of NSCLC cells to chemotherapy drugs [46].MiR-30a-5p and miR-30d-5p had highest number of consensus   to cisplatin [48].Furthermore, miR-451a could regulate survival of NSCLC cells through the inhibition of rasrelated protein 14 (RAB14) and might to be a promising therapeutic target for NSCLC patients [49].MiR-143 inhibited the genesis and development of NSCLC through directly suppressing the expression of its target genes [50].MiR-145 is down-regulation in NSCLC and the restoration of miR-145 expression remarkably suppresses the growth of lung adenocarcinoma cells with EGFR mutation [17].MiR-139-5p is associated with cancer cell proliferation, metastasis, and promotes apoptosis via the inhibition of c-Met [51].MiR-338-3p inhibited the invasion and migration of NSCLC cells through the suppression of EMT-related transcription factor Sox4 and might serve as a potential therapeutic target for NSCLC [52].Interestingly, many extensively investigated lung cancer-related miRNAs such as let-7, miR-34, miR-155 or miR-223 were not part of NSCLC integrated-miRNA signature.For example, miR-223 has been confirmed to play an essential role in development of NSCLC, but failed to reach a statistical significance in present study.
The expression of 17 integrated-signature miRNAs was validated using qRT-PCR and TCGA database.To assess the diagnostic value of these validated miRNAs in NSCLC tissue classification, we performed ROC curve analysis.The result showed that each miRNA had a good diagnostic performance, with the exception of miR-205-5p.Therefore, these miRNAs may serve as prospective biomarkers for the diagnosis of NSCLC.Further studies could be executed to evaluate the optimal diagnostic threshold value of these miRNAs in NSCLC through the use of more clinical data.In present study, we also estimated the performances of validated miRNAs to identify metastasis of NSCLC.The results demonstrated that miR-21-5p was associated with distant metastasis.Previous study has shown that inhibition of miR-21 expression was able to repress proliferation, invasion, and migration of A549 cells [53].Thus, miR-21 may be useful as an effective indicator related to metastasis.
In addition, the Cox regression analysis showed that high expression of miR-21-5p and low expression of miR-30d-5p were independent hazard factors for shorter OS in NSCLC.Consistent with our findings, high expression of miR-21-5p as a perfect prognostic biomarker has been confirmed to be associated with worse OS both in NSCLC tissues and plasma [19,31], respectively.Moreover, we reported that low expression of miR-30d-5p was associated with poor OS in NSCLC in the first time.Conversely, previous study showed that high expression   of miR-30d in serum of NSCLC patients was associated with unfavorable survival [54].A possible reason for the inconsistency was the use of different samples in both two studies.miR-30d-5p was down-regulation in NSCLC tissues in our integrated analysis (13 studies), but up-regulation in NSCLC serum samples compared to that in normal volunteers [55].Further studies might be performed to elucidate the specific mechanisms.
The accumulative effect of the two miRNAs expression on the prognosis of NSCLC was further investigated.The result demonstrated that patients with 2 positive markers had a significantly worse prognosis (OS) compared with 1 or 0 positive marker.Therefore, the combination of the two markers may be better prognostic factor for shorter OS in NSCLC.Previous studies revealed that miR-143-3p was down-regulated in NSCLC tissue and suppressed metastasis of NSCLC cells.However, the Cox regression analysis and K-M survival analysis showed that high expression of miR-143-3p was positive correlation to worse RFS in NSCLC, consistent with the study in bladder cancer [56].The possible reason was that Cox regression analysis only considered single factor of miRNA expression, and other factors (like age, TNM stages, treatment strategies, etc.) associated with survival were not included owing to the incomplete data in TCGA database.Multi-factor regression analysis based on more complete clinical data might be performed to illustrate the result in further studies.
The enrichment analysis of these miRNAs target genes suggested that the validated miRNAs were crucial regulator in the oncogenic process, which involved in several signaling pathways related to tumor genesis and progression.Therefore, these miRNAs might be also significant candidate biomarkers for identifying or monitoring relapse during postoperative follow-up in NSCLC.However, there are some limitations in our study.First, we did not perform validation experiments for biological function of these miRNAs and their targets.Functional analysis for the mechanism of these candidate miRNAs and potential targets in the progression of NSCLC should be researched in our future work.Second, considering miRNAs of carcinoma tissues are difficult to be detected, the serum miRNA level as biomarker for clinical diagnosis is more ideal.In addition, we believe that these miRNAs might be potential biomarkers for the early diagnosis of NSCLC patients and future studies should be performed in this issue using large, prospective and multi-center cohort.
In conclusion, 17 integrated-signature miRNAs were identified from 32 miRNA expression profiling studies.The expression of these miRNAs was validated using qRT-PCR test and TCGA database.The analyses of clinical significance showed that the majority of validated miRNAs were prospective biomarkers for diagnosis and predictions of prognosis for NSCLC.To gain sufficient sensitivity and specificity, the optimal diagnostic threshold value of the miRNAs should be determined in further clinical studies.In addition, targets prediction and functional enrichment analysis would contribute to the comprehension of the role of miRNAs in NSCLC.Further studies could focus on the underlying mechanisms of these miRNAs in genesis and progression of NSCLC and their clinical application.

Studies inclusion and exclusion
MiRNA expression profiling studies for lung cancer, published prior to December 31st, 2015, were searched via Gene Expression Omnibus (GEO), ArrayExpress and Pubmed database.The search strategy was performed according to the term: (miR-* OR miRNA OR microRNA) AND (lung AND (tumor OR cancer OR carcinoma)) AND (profil* OR expression).Every study was estimated thoroughly through Full text.The included studies met the following criteria: 1. miRNA expression profiling studies (miRNA microarray chip technology or miRNA sequencing technique); 2. Primary experimental studies published in English language; 3. Studies compared miRNA expression between lung cancer and noncancerous lung tissue in human.Expression studies using cell lines or body fluid (such as serum, saliva, peripheral blood, etc.) were excluded.Studies that only analyzed diverse histologic subtypes but did not include noncancerous tissue were also excluded.

Standardization of miRNA names
For these studies focused on only NSCLC (AD and/ or SCC), the lists of differentially expressed miRNAs were extracted from corresponding studies.If the miRNAs lists were not available in the publications, the authors would be contacted directly for Supplemental Data.The studies that included multiple tumor subtypes were re-analyzed by using GEO2R tool or a limma package in R language between only NSCLC samples and non-cancerous lung tissue, other tumor subtypes were excluded.Studies without publication or obtained from GEO were also re-analyzed.The differentially dysregulated miRNAs were obtained according to the criterion: fold change > ± 1.5, FDR < 0.05.All miRNA names were standardized according to miRBase version 21 [57].The nomenclature of miR/miR* were replaced with the -5p/-3p nomenclature, and many previous "major" miRNA names were re-designated according to miRBase database vesion21.Non-human miRNAs probes were eliminated from the studies.Pre-miRNAs were used in the analyses after the standardization, which were reported in some of the studies.

Datasets construction
The extracted miRNAs were ranked on the basis of statistical test fold changes.A hierarchical cluster analysis was carried out to estimate the correlativity between the results of these studies.The rank matrixes of up-regulated and down-regulated miRNA lists were constructed separately.Moreover, the rank of miRNA from upregulated and down-regulated miRNAs lists were both normalized, and a value was assigned to each miRNA.The value was one or 0.5 minus normalized rank of miRNA in upregulated or downregulated miRNA lists, separately.Value 0.5, above 0.5 or below 0.5 means that the miRNAs were not reported, upregulated or downregulated in corresponding study, separately.

Statistical analysis
We used a new RRA method performed as an R package RobustRankAggreg to identify integrated miRNAs [11].The leave one out cross-validation was executed in this method to estimate the stability of acquired p-values.The average p-value, representing the best p-value for each miRNAs, was acquired by repeating analysis 10,000 times.The integrated signature miRNAs selected in current study must reach statistical significance after Bonferroni correction and were reported by at least 1/3 studies.

Quantitative real-time PCR of the integratedsignature miRNAs
To validate the expression of integrated miRNAs, 12 pairs of fresh NSCLC and adjacent noncancerous lung tissue samples were obtained from the Qilu Hospital of Shandong University between January and June, 2015.Written informed consent was obtained from all patients or their guardians.The use of all patient samples was approved by the Ethical Committee of the Qilu Hospital of Shandong University.All patients didn't undergo any chemotherapy or radiation therapy prior to operation and had no other malignancies accompanied.Clinical information of these patients was described in the Supplementary Table 1.Total RNA was extracted from these samples using TRIzol reagent (Life Technologies), following the manufacturer's instructions.The synthesis of first-strand cDNA contained two procedures.Firstly, the poly (A) tailing was synthesized with a Poly (A) Polymerase (RiboBio).Secondly, cDNA was synthesized using miDETECT A TrackTM Uni-RT Primer (RiboBio), Reverse Transcriptase mix (RiboBio) and the product of poly (A) tailing.Real-time PCR was performed with SYBR Premix Ex Taq TM (TakaRa) on RT-PCR system (ABI Prism 7000) according to the following protocol: 95°C for 5 min, followed by 35 cycles of 95°C for 10 sec, 60°C for 35 sec and then 95°C for 15 sec, 60°C for 1 min and 95°C for 15 sec, with addition of a cycle for every 0.5°C.The primers of these miRNAs and U6 were obtained from RiboBio Corporation (Guangzhou, China).The information of the primers was listed in Supplementary Table 2.The fold change of each miRNA expression was calculated using the ΔΔCT method.

Validation of the integrated-signature miRNAs and clinical significance using TCGA database
To validated the expression of integrated-signature miRNAs, 39 pairs of lung adenocarcinoma and adjacent non-tumorous lung tissues were obtained from the TCGA data base (LUAD miRNA expression (IlluminaHiseq), n = 498).MiRNA expression information and corresponding clinical data for NSCLC were downloaded from the Cancer Browser (https://genome-cancer.ucsc.edu/).This study met the publication guidelines provided by TCGA.A limma package of R language was used to identify the differentially expressed miRNAs between lung adenocarcinoma and adjacent non-tumorous lung tissues.Corresponding heat map of miRNAs expression was constructed using pheatmap package of R language.Furthermore, the expression level of each miRNA was log2-transformed for further analysis.ROC curve was applied to estimate the predicted performances of the validated miRNAs for NSCLC tissue classification.The Cox regression analysis and Kaplan-Meier method were used to estimate the prognostic significance of the miRNAs for OS and RFS, and survival curves were compared through the log-rank test.The median value of miRNA expression was defined as cut-off value between high expression and low expression.The statistical analyses were carried out by using the SPSS 18.0.Statistical significance was defined as p < 0.05.

Enrichment analysis
To conduct enrichment analyses, the GeneCodis web tool (http://genecodis.dacya.ucm.es/)[63] was used for Panther and KEGG pathways and GO processes analyses.Predicted targets of all integrated miRNAs were used as input for enrichment analyses.The pathways or GO processes that the number of supportive genes was more than four were selected.A visualized heat map was constructed through False discovery rate (FDR)-corrected p-values.Pearson correlation and average linkage were applied to Cluster analysis of the heat map.

Figure 1 :
Figure 1: Distribution of NSCLC differentially dysregulated miRNAs extracted from 32 miRNA profiles studies.Upregulated and downregulated miRNAs were shown as short red and blue vertical bars, respectively.The number of miRNAs in each study is graphically depicted on the right.The positions of NSCLC integrated-signature miRNAs were marked.

Figure 2 :
Figure 2: RT-PCR analysis of up-regulated miRNAs expression in the NSCLC tissues and the adjacent noncancerous lung tissues.

Figure 3 :
Figure 3: RT-PCR analysis of down-regulated miRNAs expression in the NSCLC tissues and the adjacent noncancerous lung tissues.

Figure 4 :
Figure 4: Validation of miRNAs expression in NSCLC on the TCGA dataset.(A) Upregulated miRNAs expression.(B) Downregulated miRNAs expression.(C) ROC curve and AUC for performances of the miRNAs in NSCLC tissue classification.For boxplots, expression values of miRNAs were log2-transformed.

Figure 5 :
Figure 5: A heat map of differential expression of validated miRNAs for 39 pairs of lung adenocarcinoma and adjacent non-tumorous lung tissues in TCGA data base.Red indicates high expression; black indicates moderate expression; green indicates low expression.

Figure 6 :
Figure 6: Kaplan-Meier survival analysis of overall survival and recurrence-free survival for validated miRNAs.(A,B).The high expression of miR-21-5p and the low expression of miR-30d-5p were associated with poor prognosis for OS.(C) The high expression of miR-143-3p was associated with worse prognosis for RFS.(D) Kaplan-Meier analysis in patients with NSCLC according to the number of positive markers.The patients were divided into 3 Groups: 0 positive marker (group 1), 1 positive marker (group 2), 2 positive markers (Group 3).Patients of group 3 had a shorter OS.

Figure 7 :
Figure 7: Panther pathway enrichment of targets by validated miRNAs.The heat map was constructed using the validated targets and GeneCodis web tool, which showed the results of panther pathway enrichment analysis.The intensity of color represents the FDR-corrected p-value.Clustering was performed using Pearson correlation and average linkage method.

Table 1 : Characteristics of the studies First author and reference Region Number of miRNA probes Tumor type Number of samples Time
AD, adenocarcinoma; SCC, squamous cell carcinoma; TU, tumor samples; N, non-cancerous samples; pairs, TU and N samples from the same patient; '*', without publication.

Table 3 : The pathways and GO processes most strongly enriched by targets of validated microRNAs
[47]ets in our study, which associated with the regulation of angiogenesis, apoptosis, invasion and metastasis.Consistent with our results, previous study has proved that miR-486-5p was significantly down-regulated in NSCLC tissues, and could inhibit cell proliferation, promote apoptosis, and hinder cell-cycle progression by targeting oncogene CDK4[47].Previous study had shown that miR-451a could enhance the sensitivity of NSCLC cells