A novel microRNA signature predicts survival in stomach adenocarcinoma

Recent microRNA (miRNA) expression profiling studies suggest the clinical use of miRNAs as potential prognostic biomarkers in various malignancies. In this study, aiming to identify microRNAs with prognostic value for overall survival (OS) in stomach adenocarcinoma (STAD) patients, we analyzed the miRNA expression profiles and the associated clinical characteristics in 380 STAD samples from The Cancer Genome Atlas (TCGA) dataset. An eight-miRNA signature for predicting OS in STAD patients was identified and self-validated by survival analysis and semi-supervised principal components method. We developed a linear prognostic model composed of these miRNAs to divide patients into high- and low-risk groups according to the calculated prognostic scores. Kaplan-Meier analysis demonstrated that patients in the high-risk group had worse OS compared with patients in the low-risk group. Notably, this miRNA prognostic model showed prognostic significance to the STAD patients in early stages and the chemo-resistant patients, who would potentially benefit from additional medical interventions. Finally, this eight-miRNA signature is an independent prognostic biomarker and demonstrates a good predictive performance for 5-year survival. Thus, this signature may serve as a novel biomarker for predicting survival as well as chemotherapy response in STAD patients.


INTRODUCTION
Gastric cancer (GC) represents a major health problem worldwide, ranking the fifth most common malignancy and the third leading cause of global cancer mortality [1]. Early diagnosis is critical to the prognosis of stomach adenocarcinoma (STAD) patients, which constitute the majority (> 90%) of GC patients. Unfortunately, most STAD patients are diagnosed at advanced stage disease which is manifested by extensive tumor invasion and/or distant metastasis, and results in a poor 5-year survival rate with the median survival of less than 1 year [2][3][4]. Over the past few decades, despite significant progress in diagnostics and therapeutics, the overall outcomes of GC patients has only modestly improved [5]. Thus, it is essential to discover specific prognostic factors that may help guide the clinical therapeutic implications and improve the overall survival (OS) for these patients.

Research Paper
inconsistency, which may be primarily due to the fact that these results were mostly derived from the analysis based on microarray data. As we know, besides the limitations in the dynamic range, sensitivity, and specificity of the microarray data themselves, different technological detection platforms, small sample size, various methods employed for data processing and analysis may result in significant variations.
With the advent of next-generation sequencing and bioinformatics as well as the launches of large-scale cancer genome projects, such as The Cancer Genome Atlas (TCGA), comprehensive and multi-dimensional maps of the key genomic and epigenomic changes in cancer can be readily achieved and accessed. Bioinformatic analysis of TCGA datasets has been shown to be an outstanding tool in identifying genetic and epigenetic changes related to clinical outcomes, thus opening a new avenue for the discovery of novel prognostic markers for various malignancies [20,21]. In this study, aiming to identify a panel of specific miRNAs associated with OS in STAD patients, we applied the bioinformatics analysis based on the genome-wide miRNA-Seq profiles derived from TCGA dataset, which represents the largest STAD cohort available up to date. To minimize the variations among different individuals, we first characterized the differentially expressed miRNAs by comparing miRNA profiles in paired STAD and normal tissues. We then developed a linear prognostic model composed of eight miRNAs to divide patients into high-and low-risk groups according to their calculated prognostic scores. The following Kaplan-Meier analysis demonstrated that the eight-miRNA signature correlated with a good predictive performance for 5-year survival and chemotherapy response in STAD patients, suggesting that this signature may serve as a novel biomarker to complement the traditional histopathological prognostic factors and help guide individual therapy for these patients.

Differentially expressed miRNAs in STAD versus paired adjacent normal tissue
After filtering out unqualified cases, miRNA expression and clinical data of 380 STAD patients were retained for survival analysis. It contained 252 male and 128 female, among all the participants a total of 41 patients with adjacent non-tumor tissues. The patients were randomly divided into the training set (n = 190) and testing set (n = 190). No significant difference in clinical covariates was observed between the two sets ( Table 1).

Establishment of the miRNA prognostic model
Using univariate Cox regression, we characterized the common miRNAs that were associated with OS within each of the following independent subclass: differentiation, pathologic N stage, pathologic T stage, and pathologic M stage. Within each subset of clinical characteristics, the patient subclasses represented nonoverlapping sets, respectively. MiRNAs were selected if they were associated significance with OS in at least two independent subclasses. The respective HRs for the association of miRNA with OS in each subclass were shown in Table 2. Seventeen miRNAs were identified in this analysis.
Eight of 17 miRNAs were selected using the supervised principal component method in the training set. Among these eight miRNAs, five were associated with high risk (mir-145, mir-184, miR-20b, miR-9-1 and miR-9-2, HR > 1) and three were shown to be protective (miR-1537, miR-549 and miR-802, HR < 1). We then developed a miRNA prognostic model for predicting 5-year survival in the training set, by which the samples were classified into high-risk or low-risk groups using the optimum cutoff point of miRNA scores according to ROC curve. Figure 1 showed the distribution of patient prognostic scores and miRNA expression of all 380 STAD patients, ranked by the prognostic score values for the eight-miRNA signature. Patients with high prognostic scores tended to express highrisk miRNAs, whereas patients with low prognostic scores tended to express protective miRNA s ( Figure 1A and 1B).

Validation of the eight-miRNA signature in STAD patients
Using the optimum cutoff value obtained from the training set, patients were assigned into high-risk and lowrisk groups. The ability of prognostic prediction of the eight-miRNA signature was examined in the testing set and the entire STAD cohort, respectively. Kaplan-Meier analysis revealed that patients in the high-risk group had poor OS, compared with patients in the low-risk group in both testing set (p < 0.001, using the log-rank test, Figure 2A) and the entire STAD cohort (p < 0.001, using the log-rank test, Figure 3A). Time-dependent ROC curves were also used to assess the prognostic power of the eight-miRNA signature. The AUC of the eight-miRNA signature prognostic model for the testing set and the entire STAD cohort was 0.586 ( Figure 2B) 0.607 ( Figure 3B), respectively.
Since patients with early stage disease may benefit significantly from a prognostic biomarker signature, we evaluated the prognostic power of the eight-miRNA signature in stage I and II patients (n = 179). Kaplan-Meier analysis revealed that patients in the high-risk group associated with worse OS (p < 0.001, using the log-rank test). Thus, this eight-miRNA signature could distinctly predict the survival for STAD patients in an early stage, suggesting that it could have potential clinical value for help in guiding additional medical interventions to this patient subgroup. (Figure 4) It is often challenging to distinguish chemoresistant STAD patients from those that have good responses to chemotherapy. In order to evaluate the prognostic value of the eight-miRNA signature to chemotherapy response in STAD patients, we applied the eight-miRNA signature to the 112 patients, whose postoperative chemotherapy responses were recorded in the TCGA STAD cohort. Using our eight-miRNA prognostic model, the 112 patients were divided into low-risk and high-risk subgroups according to their responses to chemotherapy. As shown in Table 3, among 68 low-risk patients, most patients had either a complete or partial response after chemotherapy, while only 7 patients in this subgroup showed disease progression. In contrast, among 44 high-risk patients, approximately one-third of patients showed progression after chemotherapy. Thus, this eight-miRNAs signature showed a good prognostic power for predicting chemotherapy response in STAD patients (P value < 0.05).
We also examined the association of eight-miRNA signature with clinical characteristics in STAD patients. No significant differences were observed when patients were stratified by gender and age (Supplementary Table 2).

The eight-miRNA signature is an independent prognostic factor
We further conducted a multivariate analysis to evaluate the independent prognostic value of the eight-miRNA signature. Age, gender, Grade, T stage, N stage, M stage and the miRNA signature were used as covariates. The Cox multivariate regression analysis revealed that the miRNA signature is an independent prognostic factor associated with OS (Table 4; HR = 2.003, p < 0.001).

In silico analysis of target genes and pathways
The list of predicted target genes of these eight miRNAs was downloaded from miRecords. A total of 3510 target genes which predicted by more than 5 data sets were identified to be potentially regulated by the eight miRNAs. We then performed a functional enrichment analysis to elucidate the biological function of these target genes by Kyoto Encyclopedia of Genes and Genomes (KEGG) categories and Gene Ontology (GO) categories.    (Table 5). These results highlighted critical roles of these eight miRNAs in STAD onset and progression, and the underlying mechanisms warrant further investigation.

DISCUSSION
MiRNAs are 22-26 nucleotides small RNAs that play vital roles in modulating gene expression at the post-transcriptional level by binding to the 3′ or 5′ untranslated region of targeted mRNAs. Here we found 138 miRNAs were differentially expressed between STAD and adjacent normal tissues, among which 17 miRNAs levels were associated with at least two of following histopathological factors: T stage, lymph  node metastasis, distant metastasis and tumor grade. Using survival analysis and semi-supervised principal components method, an eight-miRNA signature (miR-145, miR-184, miR-20b, miR-9-1, miR-9-2, miR-1537, miR-549 and miR-802) for predicting OS of STAD patients was identified and self-validated in a large cohort. Importantly, this eight-miRNA signature was further confirmed to be an independent prognostic factor. Furthermore, with respect to the association between their expression levels and patient survival, the eight miRNAs in the signature were divided into two groups: five risky miRNAs that were negatively associated with the survival and three protective miRNAs which were positively associated with the survival.
Among the five risky miRNAs (mir-145, mir-184, miR-20b, miR-9-1 and miR-9-2), overexpression of miR-145 has been reported to be a key factor for the prediction of poor overall survival in bladder cancer [22]. However, a recent study based on qRT-PCR showed that downregulation of miR-145 correlates with a poor survival in a clinical cohort composed of 145 GC patients from the north of China, which is inconsistent with our result that mir-145 was identified as one of the five risky miRNAs leading to poor 5-year survial. The discrepancy among these studies could be because there are distinct expression patterns of miR-145 in either different cancer types or the same cancer type with subtle variations, such as race. Upregulation of miR-184 enhanced the malignant phenotype of glioma cancer cells by reducing FIH-1 protein expression and facilitated the proliferation and invasion in pancreatic ductal adenocarcinoma [23,24]. Overexpression of miR-20b was reported to be associated with poor outcomes in GC patients [6,25]. Also, high level of miR-20b facilitated brain metastasis of breast cancer and promoted hepatocellular carcinoma invasion and progression [26,27]. The role of miR-9 has been investigated in many types of malignancy, but the results were inconsistent and inconclusive. Recently, a systematic meta-analysis, representing a quantified synthesis of all published studies of miR-9, found that high expression of miR-9 was significantly associated with poor survival in patients with malignancies, including colorectal cancer, non-small lung cancer, breast cancer, hepatocellular carcinoma, laryngeal and esophageal squamous cell carcinomas, glioma, ovarian, osteosarcoma, adrenocortical cancer, bladder cancer, and leukemia [28].
Among the protective miRNAs (miR-1537, miR-549 and miR-802), deletion of mir-1537 has been reported in aggressive neuroblastoma [29]. During neoadjuvant therapy in the patients with advanced adenocarcinomas of the gastroesophageal junction, there was a significant increase of serum miR-549, suggesting a suppressive role it played in tumor progression [30]. Downregulation of miR-802 was significantly associated with overall survival and cancer-specific survival in rectal cancer patients [31]. Taken together, these biological and clinical studies of the miRNAs have provided some insights into their potential prognostic value, although future work is needed to validate their roles in clinical applications.
It is of note that this eight-miRNA prognostic model could predict the high-risk patients in early stages (stage I-II), which accounts for approximately 10% of all STAD recurrences [32]. These patients may benefit significantly from a prognostic biomarker to guide their further medical interventions, such as the shorter interval between the exams for monitoring the recurrence or alternative treatment approaches. Since chemotherapy resistance is one of the major factors leading to a poor OS for STAD patients, we applied our miRNA signature for the prediction of chemotherapy response. Indeed, the chemoresponsive or chemoresistant patients were well distinguished using this eight-miRNA signature, which made it possible in clinical practice to stratify patients into two groups: one that could benefit from standard therapy and another that should be placed on alternative therapeutic protocols or novel clinical trials.
Bioinformatic analysis of potential and validated targets of miRNA and the subsequent analysis of KEGG pathways clustered by target genes are promising strategies for gaining insights into plausible biomarkers and key events involved in cancer development and progression. Enriched KEGG pathway analysis of the eight-miRNA model indicated that both predicted and validated target genes of this miRNA signature were clustered in cancer-associated KEGG pathways. Furthermore, our in silico analysis proved that the eight-miRNA signature is biologically meaningful. Specifically, Ago2, the co-target of miRNA-145, miR-184 and miR-802, played an important role in GC differentiation, lymph node invasion and clinical stage [33]. Myc, a multiplayer in carcinogenesis, progression and metabolism, was also the predicted target of miR-145 and miR-184 [34]. Moreover, many target genes related to VEGF receptor signaling pathway have been identified, such as VEGFR-3. Indeed, high expression level of VEGFR-3 has been found to be associated poor survival of gastric adenocarcinoma [35]. Certainly, further molecular biological experiments need to be performed to validate these predictions.
In summary, using integrated bioinformatic analysis of the largest available cohort of STAD patients and the corresponding genome-wide miRNA sequencing results, we identified a specific eight-miRNA signature that could serve as an independent prognostic factor for the prediction of OS in STAD patients. However, before this finding can be applied to clinical practice, further validation in independent large cohorts is required in the future. To exclude unrelated causes of death, only patients with follow-up longer than 1 month were included in the subsequent analysis.

Identification of dysregulated miRNAs in STAD
The raw counts of miRNA expression data of 41STAD with their paired normal tissue were obtained   from the TCGA dataset (Illumina HiSeq Systems). MiRNA-expression data was normalized by the R/ Bioconductor package edgeRv [36]. The expression differences were characterized by logFC (log2 fold change). MiRNAs with logFC < −1 or logFC >1 (FDRadjusted p < 0.05) were considered as differentially expressed miRNAs and were selected for further analysis.

Identification of miRNAs with prognostic value in STAD
The semi-supervised method which combines both the gene expression data and the clinical data was used to identify candidate miRNAs with prognostic value [37,38]. Univariate Cox regression analyses were conducted to identify common miRNAs related to OS within each of the subgroups stratified by the TMN stage. Within each group of clinical characteristics, the patient subclasses represented non-overlapping sets. Common miRNAs associated with OS in at least two independent subgroups were selected for the subsequent studies, using an HR>1 or HR<1 with p < 0.0.5 as the cutoff.

Definition of prognostic risk model and ROC curve analysis
An importance score was calculated by the supervised principal components method and was assigned to each miRNA [37]. Ten-fold cross-validation was used to estimate the optimal feature threshold in supervised principal components and to select significant miRNAs. The TCGA dataset was randomly divided into the training set and the testing set. The linear miRNA signature prognostic model was developed based on the supervised principal component method. The miRNAs expression level was as the log2 reads per million of total aligned miRNA reads. The prognostic score was calculated as follows: Prognostic-score = (0.1482 × miR-145) + (-0.0987 × miR-1537) + (0.1126 × miR184) + (0.0964 × miR20b) + (-0.1662 × miR-549) + (-0.1374 × miR-802) + (0.0915 × miR-9-1) + (-0.0148 × miR-9-2). Then, the prognostic scores were computed for the 380 patients using our miRNA prognostic model. The best cutoff values of the prognostic score were decided in the ROC curve analysis for predicting 5-year survival of the training set. The OS curves were evaluated using the Kaplan-Meier and log-rank method. Time-dependent ROC curves were used to evaluate the predicted power of the miRNAs signature model. All analyses were performed using the R/ Bioconductor (version 3.3.1).

Bioinformatic analysis of miRNA-target genes and pathways
The list of predicted target genes of the candidate miRNAs was obtained from miRecords v4.0 (www. mirecords.biolead.org) database, which offers a comprehensive data of possible miRNA-targets of 11 different data sets. The pathway enrichment analysis was conducted with the GeneTrail gene set enrichment tool. The results were considered significant when the p-value was less than 0.05 after FDR corrected [39,40].