Identification of a seven-miRNA signature as prognostic biomarker for lung squamous cell carcinoma

Background Specific biomarkers for outcome prediction of lung squamous cell carcinoma (LUSC) are still lacking. This study assessed the prognostic value of differentially expressed miRNAs of LUSC patients. Results Twelve of the 133 most significantly altered miRNAs were associated with overall survival (OS) across different clinical subclasses of the Cancer Genome Atlas (TCGA) LUSC cohort. A linear prognostic model of seven miRNAs was developed to divide patients into high- and low-risk groups. Patients assigned to the high-risk group exhibited poor OS compared with patients in the low-risk group, which was further validated in the validation cohort and entire LUSC cohort. Methods MiRNA expression profiles with clinical information of 447 LUSC patients were obtained from TCGA. Most significantly altered miRNAs were identified between tumor and normal samples. Using survival analysis and supervised principal components method, a seven-miRNA signature for prediction of OS of LUSC patients was established. Survival receiver operating characteristic (ROC) analysis was used to assess the performance of survival prediction. The biological relevance of predicted miRNA targets was also analyzed using bioinformatics method. Conclusions The current study suggests that seven-miRNA signature may have clinical implications in the outcome prediction of LUSC.


INTRODUCTION
Lung cancer is the most common and fatal cancer in the world [1]. Non-small cell lung cancer (NSCLC) is the most frequent type of lung cancer and can be further subtyped mainly as adenocarcinomas (LAD) and squamous cell carcinomas (LUSC). Despite decades-long improvements in early detection and treatment, survival of advanced stage patients remains poor [2]. Effective biomarkers to identify patients with high recurrence and death risk are still lacking. Moreover, since the genetic and epigenetic alterations between LUSC and lung LAD are quite different, the targeted agents for LAD cannot be applied to LUSC. Thus, there is an urgent need to identify effective biomarkers for the prognosis of LUSC.
MicroRNAs (miRNAs) are small, non-coding RNAs which regulate gene expression at the posttranscriptional level [3]. By binding to the 3' [4] or 5' untranslated region (UTR) [5] of the target transcripts, miRNAs can modulate genes expression through translational repression or cleavage of mRNA. MiRNAs can function as either tumor suppressors or oncogenes by regulating genes involved in tumorigenesis. Moreover, miRNAs have been found in the blood samples and have been demonstrated to be remarkable stable even under harsh conditions such as treatment with RNase and DNase or multiple freeze-thaw cycles [6]. This suggests that miRNAs can be potential noninvasive biomarkers for cancer.
Although a number of miRNAs have been identified for predicting outcome for NSCLC, there is significant

Research Paper
inconsistences among previous studies. This may have resulted from the small sample sizes as well as the heterogeneous histological subtypes and tumor stages. Thus, our study aims to identify and assess the prognostic value of candidate miRNA biomarkers for LUSC patients with large cohort.

TCGA dataset and patients characteristics
In this study, a total of 447 LUSC patients from The Cancer Genome Atlas (TCGA) were enrolled. 45 normal tissues (n = 45) were also included for differential expression analysis of miRNAs. The patients were separated into the training set (n=224) and testing set (n=223) randomly. No significant difference in clinical covariates was observed between the two sets ( Table 1).

Identification of dysregulated miRNAs in LUSC
The miRNA expression profiles of 45 pairs of LUSC tumor tissues with normal lung tissues were analyzed. 133 miRNAs were identified differentially expressed miRNAs with p value less than 0.05 after FDR adjustment (Supplementary Table S1). Among these, 85 miRNAs were up-regulated and 58 miRNAs were down-regulated.

Association of miRNAs expression and clinical parameters with overall survival of LUSC patients
We conducted univariate Cox regression assays to identify common miRNAs correlated with overall survival (OS) within each subclass of the following clinical parameters: pathologic N stage, pathologic T stage, and pathologic M stage. MiRNAs were selected if they were significantly correlated with OS in at least two subclasses. Twelve miRNAs were identified in this analysis. The hazard ratios (HR) for the association of miRNAs with OS in each category were shown in Table 2.

Prognostic value of the seven-microRNA signature in LUSC
The ability of prognostic prediction of the seven-miRNA signature was then tested in the testing set and the entire LUSC cohort, respectively. Kaplan-Meier analysis revealed that patients in high-score group had poorer OS in both testing set (p=0.027, Figure 2A) and the entire LUSC cohort (p<0.001, Figure 3A).
Time-dependent ROC curves were also applied to evaluate the prognostic value of the signature model. The area under ROC curve (AUC) of signature model for the testing set and the entire LUSC cohort was 0.604 ( Figure  2B) and 0.610 ( Figure 3B), respectively.
We also assessed the prognostic power of the seven-miRNA signature in early stage patients (stage I and II patients, n = 364). Kaplan-Meier analysis revealed that patients in high-risk group are associated with worse OS (p<0.001, log-rank test).
Finally, multivariate Cox regression analysis were used to investigate the independent prognostic value of the seven-miRNA signature. Age, gender, T stage, N stage and the miRNA signature were used as covariates. The miRNA signature (HR = 1.965, p < 0.001) and T stage (HR = 1.684, p =0.004) are showed to be as independent prognostic factors related with OS (Table 3).

Correlation between miRNA signature and clinical characteristics
We examined the association of seven-miRNA signature with clinical factors in LUSC. No significant differences were observed when patients were stratified by gender, age and smoke status (Supplementary Table S2).

In silico analysis of target genes and pathways
The target genes of seven miRNAs was downloaded from miRecords. A total of 4242 target genes predicted by more than 4 data sets were selected for further analysis. Next, we performed a functional enrichment analysis to elucidate the biological function of target genes of seven miRNAs. A total of 39 Kyoto Encylopedia of Genes and Genomes (KEGG) pathways and 732 Gene Ontology www.impactjournals.com/oncotarget    Table  S3). The results showed that the predicted target genes involved in many important pathways associated with cancer development, e.g., adherens junction, Wnt, TGFbeta and MAPK signaling pathways (Table 4). Moreover, many target genes were enriched in cancer-related pathways for lung cancer.

DISCUSSION
Accumulating evidence reveals that miRNAs could play a crucial role in tumorigenesis and progression of lung cancer. Many studies have reported the potential of miRNAs as biomarkers for early detection, molecular classification, prediction of outcome and treatment efficacy for lung cancer [7]. Although several studies had identified a number of miRNAs with prognostic value, most of these studies did not accurately analyze the expression of miRNAs in the different histotypes and stages. Since the miRNA expression patterns are quite different in different pathological types and tumor stages, the miRNAs identified in those studies may have less power in the clinical application for lung cancer outcome prediction.  In the current study, from the 133 differential expressed miRNAs, 12 miRNAs associate with OS of LUSC patients were identified in clinical subgroups. By using supervised principal components method, a seven-miRNA (miR-101-2, miR-139, miR-182, miR-183, miR-190, miR-326 and miR-944) signature was established and was validated to be an independent factor for outcome prediction for LUSC patient. This signature was demonstrated with high prognostic ability in both the entire LUSC set and the early stage patients.
Dysregulation of miR-139 has been observed in a variety of cancers [8]. Liu et al. reported that downregulation of miR-139 was correlated with poor survival of colon cancer patients [9]. MiR-139 regulates diverse biological processes, such as proliferation, invasion and metastasis [8]. Low expression of miR-139 was significantly related with invasiveness and lymph node metastasis of NSCLC [10]. Wang et al. reported that downregulation of miR-101 accelerates the progression of lung cancer in RUNX1 dependent manner [11]. MiR-101 was also reported to suppress lung tumorigenesis through inhibition of DNMT3a [12]. By upregulating miR-101, Curcumin could inhibit the expression of EZH2, thus inhibiting lung cancer growth and metastasis [13]. MiR-190 could downregulate the expression of PH domain leucine-rich repeat protein phosphatase (PHLPP) and enhance the activation of Akt which leads to carcinogenesis [14]. Gennarino et al. reported that miR-190 could inhibit TGF-beta signaling and its effects on cell proliferation, morphology and scattering in NSCLC [15]. Recent studies reported that mir-326 was involved in carcinogenesis [16], metastasis, invasion of tumor [17,18], and chemotherapy resistance [19]. MiR-326 was also identified as a biomarker for outcome prediction for prostate cancer and gastric cancer [20]. MiR-326 could regulate cell proliferation and migration of lung cancer by targeting phox2a [21], CCND1 [22] and NSBP [23]. Several reports of miR-182 highlighted its crucial roles in various cancers [24][25][26]. Stenvold et al. reported that up-regulation of miR-182 was associated with a good prognosis in LUSC [27]. By targeting different target genes, miR-182 could inhibit cell proliferation and invasion of LAD and LUSC [28][29][30]. Zhang et al. reported that serum miR-182 and miR-183 could be an effective biomarker for early diagnosis of NSCLC [31].
Powrozek et al. reported that plasma miR-944 could serve as biomarker for diagnosis of LUSC [32]. Through comprehensive analysis of characterization of LAD and LUSC, miR-944 was identified a potential drive-miRNA in classifying tumor histology [33]. Liu et al. reported that miR-944 affects cell growth by targeting EPHA7 [34] or SOCS4 [35] in NSCLC. Up-regulation of miR-944 is correlated with lymph node metastasis and advanced-stage of LUSC [35].  Based on the public data with large sample size, the prognostic value of miRNAs have been evaluated in other tumor types, including lung adenocarcinoma [36], gliomas [37], bladder cancer [38], pancreatic cancer [39], breast cancer [40], hepatocellular carcinoma [41], head and neck squamous cell carcinomas [42,43], cervical cancer [44], melanoma [45]. Except miR-326 which have been reported as a member of a seven-miRNA signature predicting survival in hepatocellular carcinoma, the other six miRNAs identified in our study have not been recommended as biomarkers with best prognostic value in other tumor types. These results further supported the specificity of the seven-miRNA signature in predicting outcome for LUSC.
To gain a further insight into the functional role of the seven miRNAs, we retrieved their target genes and analyzed their related pathways. Bioinformatic analysis revealed that some important tumor-related genes are simultaneously by two or more miRNAs. Type 1 insulinlike growth factor receptor (IGF1R) is predicted to be the target gene of three miRNAs (miR-182, miR-326 and miR-944). The over-expression of IGF1R, which mediates tumor growth, adhesion, and protection from apoptosis, has been observed in various cancers [46]. ATRX is the predicted target of miR-101, miR-139 and miR-944. Misexpression of ATRX could modify the histone variant composition of chromatin by promoting genomic instability or gene expression changes related with tumorigenesis [47]. In addition, we also found that these miRNAs could regulate several crucial signaling pathways. Dysregulation of TGF-beta signaling pathway is important in cancer progression and cell invasion. Pajares et al. found that TGF-beta-induced protein expression was an independent outcome predictor for adjuvanttreated LUSC patients. Our informatics analysis identified several target genes that are involved in TGF-beta signaling pathway. Yang et al. found that high expression levels of VEGF-B is associated poor survival of LUSC patients. Three target genes related to VGEF signaling pathway have been identified. However, further molecular investigations are needed to confirm these predictions.
Nevertheless, some limitations may exist in the current study. Firstly, the censored rate of TCGA LUSC dataset was relatively high. This may have an impact on the reliability of the survival analysis. Secondly, the value of the miRNA signature should be validated in other independent cohort with long-term follow up.
In summary, our current study identified a seven-miRNA signature as potential outcome predictor for LUSC patients. Future studies using independent cohorts of large sample size from multiple institutions are needed to validate our findings for clinical practice. Future functional investigations are also required to explore the underlying mechanisms of these miRNAs in LUSC development.

TCGA dataset
The LUSC dataset, including Level 3 miRNA expression data with clinical information, was downloaded from TCGA data portal (March 2016). To exclude unrelated causes of death, cases with less than 1-month follow-up less were excluded in the subsequent analysis.

Identification of dysregulated miRNAs in LUSC
The raw counts of miRNA expression data (Illumina HiSeq Systems) of 45 LUSC with their paired normal tissue were obtained from the TCGA dataset. MiRNA-expression data was normalized by using the R/ Bioconductor package edgeR, which is designed for digital gene expression data [48]. MiRNAs with logFC (log2 fold change) < −1 or >1 (p less than 0.05, after FDR adjusted) were considered as differentially expressed miRNAs and were selected for further analysis.

Identification of miRNAs with prognostic value in LUSC
Semi-supervised method which combines the gene expression profile with clinical imformation was used to conduct univariate Cox regression analyses [49,50]. Common miRNAs associated with OS were identified within each of the subgroups stratified by the TNM system. Common miRNAs identified in at least two independent subclasses were selected for the subsequent studies, using a HR>1 or HR<1 with p<0.05 as the cutoff.

Definition of prognostic risk model and ROC curve analysis
An importance score was calculated by the SPC method and was assigned to each miRNA [49]. Ten-fold cross validation was used to calculate the best threshold in SPC model and to select significant miRNAs. The TCGA dataset was separated into the training group and the testing group randomly. The linear signature prognostic model was developed based on the SPC method. Then, using the prognostic model, risk scores were compute for all the 447 patients. The best cutoff value of prognostic score was decided in the ROC curve analysis for predicting 5-year survival of the training set. The OS curves were evaluated using the Kaplan-Meier and log-rank method. Time-dependent ROC curves were also applied to assess the predict power of the prognostic model. All analyses were performed by the R/BioConductor (version 3.3.1).