The prognostic value of a seven-microRNA classifier as a novel biomarker for the prediction and detection of recurrence in glioma patients

Glioma is often diagnosed at a later stage, and the high risk of recurrence remains a major challenge. We hypothesized that the microRNA expression profile may serve as a biomarker for the prognosis and prediction of glioblastoma recurrence. We defined microRNAs that were associated with good and poor prognosis in 300 specimens of glioblastoma from the Cancer Genome Atlas. By analyzing microarray gene expression data and clinical information from three random groups, we identified 7 microRNAs that have prognostic and prognostic accuracy: microRNA-124a, microRNA-129, microRNA-139, microRNA-15b, microRNA-21, microRNA-218 and microRNA-7. The differential expression of these miRNAs was verified using an independent set of glioma samples from the Affiliated People's Hospital of Jiangsu University. We used the log-rank test and the Kaplan-Meier method to estimate correlations between the miRNA signature and disease-free survival/overall survival. Using the LASSO model, we observed a uniform significant difference in disease-free survival and overall survival between patients with high-risk and low-risk miRNA signature scores. Furthermore, the prognostic capability of the seven-miRNA signature was demonstrated by receiver operator characteristic curve analysis. A Circos plot was generated to examine the network of genes and pathways predicted to be targeted by the seven-miRNA signature. The seven-miRNA-based classifier should be useful in the stratification and individualized management of patients with glioma.


INTRODUCTION
Glioblastoma multiforme (GBM), which arises from transformed astrocytes, is the most malignant form of brain neoplasms. GBM is a highly aggressive glioma that is resistant to standard therapeutic interventions, thus its prognosis is often dire.. Currently, GBM has a median survival of 13-16 months post diagnosis, and an overall survival (OS) of 2.5-70 months [1]. While GBM tumors may share similar histological characteristics, their varied molecular characteristics contribute to the vastly different survival rates associated with GBM. Genomic analysis studies have identified molecular signatures which respond to specific treatment regimens [2]. Data from phase III clinical trials suggests that monotherapy with the angiogenesis inhibitor bevacizumab preserves quality of life, reduces corticosteroid use, and improves diseasefree survival (DFS); however, bevacizumab monotherapy does not substantially improve OS [3]. Therefore, drugs such as cilengitide are currently used in combination with bevacizumab with the goal of improving OS [4]. A major hurdle in GBM research is identification of prognostic biomarkers that can sensitively predict the clinical outcome of patients.
MicroRNAs (miRNAs) are short 18-25 nucleotide non-coding RNAs that regulate gene expression at the post-transcription level by inhibiting mRNA. Currently, miRNA profiling is an important method that is used to characterize tumors [5]. Bioinformatics tools suggest that miRNAs may regulate more than 60% of human genes including oncogenes, tumor suppressors, and those that impact chemoradioresistance [6]. Moreover, reports suggest that miRNA signatures may be prognostic indicators of GBM thus predict clinical outcome [7][8][9][10][11][12][13]. While Cox proportional hazards regression statistics is typically used for modeling covariate analysis associated with patient Research Paper www.impactjournals.com/oncotarget survival, it is not suitable analyzing high-dimensional microarray data with variable sample size ratio (for example data of less than 10:1). Instead, the least absolute shrinkage and selection operator (LASSO) method has been introduced to eliminate this limitation associated with Cox regression analysis [14]. The LASSO method has been applied broadly to the Cox proportional hazard regression model for survival analysis with high-dimensional data [15]. However, most of the studies that have been performed to identify miRNAs in glioma have used small sample sizes and have not used comprehensive statistical approaches.
The aim of this study was to use advanced statistical methods and an expanded data set to determine a miRNA signature that can proficiently predict DFS and OS in GBM. Using the LASSO Cox regression model and GBM tumor tissues compared with non-tumor brain tissues from patients who underwent surgery, we developed a multi-miRNAbased classifier to predict DFS. We assessed the prognostic accuracy of this classifier using two validation patient groups and confirmed our findings using an independent patient group. Moreover, using bioinformatics analysis, we investigated the functional relevance and ability of the selected miRNAs as potential biomarkers for recurrence and progression. The identification of a validated miRNA signature should be useful in predicting patient outcomes and tailoring treatment regimens for glioma.

Identification of miRNAs that are dysregulated in GBM
To identify miRNAs that are dysregulated during GBM, we analyzed data from 300 patients from TCGA database who underwent surgical resection for glioma. A training set (89 patients), a validation set (102 patients), and an independent verification set (109 patients) were randomly assigned to GBM patients. A set of 10 unmatched samples from normal non-cancerous tissue was also analyzed for comparison. The distribution of the clinicopathological characteristics of malignant glioma in the three subgroups of GBM patients is shown in Table 1. According to the microarray data, 49 miRNAs were differentially expressed between the 300 GBM samples and 10 matched non-cancerous normal brain tissue samples (fold change ≥ 3.0; false discovery rate 0) ( Figure  1A). Of these, 28 miRNAs were upregulated and 21 were downregulated in the 300-sample GMB set (Table 2). Similar dysregulation was observed in the training set, validation set, and independent set (Supplemental Tables  1-3). Hierarchical clustering based on the 49 differentially expressed miRNAs determined that the 300 tumor tissue and 10 normal brain tissue samples could be successfully separated into two discrete groups (p < 0.005); similar groups could be distinguished all three sets.

Identification of seven miRNAs significantly associated with DFS in GBM
To further investigate whether the miRNAs are significantly correlated with DFS in GBM, we performed univariate analysis. We used a LASSO Cox regression model to build a prognostic classifier, which selected seven miRNAs from among the 49 miRNAs identified in the training set ( Figure 1B and 1C). Among the seven miRNAs, the expression of five miRNAs (miR-124a, miR-129, miR-139, miR-218 and miR-7) was down-regulated in GBM tissues, and the expression of the other two (miR-15b and miR-21) was up-regulated. The association with DFS was verified in all three patient sets (Table 3).
Quantitative RT-PCR was then used to validate the seven miRNAs for an additional set of 34 pathologically proven malignant glioma tissues and 10 non-cancer normal brain tissues from the Affiliated People's Hospital of Jiangsu University. The expression of these miRNAs was significantly different between malignant glioma and non-cancerous normal brain samples and was consistent with the microarray expression patterns ( Figure 2).

Establishment of a seven-miRNA-based signature that is associated with GBM disease risk
To assess the utility of the seven miRNAs in predicting disease risk, we derived a formula using the LASSO Cox regression model to calculate a score for the risk of disease recurrence for patients based on their individual seven-miRNA expression levels. By introducing and observing auxiliary problems with continuous variables, a simple and efficient algorithm was established. Training set patients were further stratified into high-and low-risk groups, with the median risk score (4.25) as the cutoff value ( Figure 3). No differences in clinical characteristics were observed between the two groups. However, Kaplan-Meier analysis showed that high-risk patients had shorter DFS and OS, which was confirmed for the three patient sets ( Figure 4).
To further validate that these seven miRNAs are important for the signature, constructed six-miRNA signature was respectively created wherein one miRNA was sequentially deleted from the original seven-miRNA signature and survival analysis was compared by log-rank testing. Results showed that none of the iterations of the six-miRNA signatures was consistently correlated with DFS or OS in any of the three patient sets ( Table 4).

Comparison of the seven-miRNA-based signature to other parameters of disease risk
To compare the correlative ability of the seven-miRNA signature to that of other disease parameters, we performed univariate analysis. The results demonstrate that the risk score established by the miRNA expression was more www.impactjournals.com/oncotarget effective at distinguishing DFS time than other criteria that are traditionally used to distinguish disease status, including Kamofsky score, tumor location, recurrence status, MGMT methylation, IDH1 mutation, smoking and family history of cancer. Similar results were obtained upon dividing the group into training and validation sets (Table 5).
To further compare the seven-miRNA signature to other disease parameters, pairwise Pearson correlations were calculated for the signal values between 11 tracks (DFS, miRNA expression, age, sex, Kamofsky score, tumor location, recurrence, MGMT methylation, IDH1 mutation, smoking, and family history of cancer). The resulting matrix provides a pictorial representation showing that the miRNA signature has a high correlation with DFS and a high negative correlation with recurrence, whereas other parameters do not correlate as highly with DFS ( Figure 5).

Assessment of the prognostic capability of the seven-miRNA-based signature
To assess the prognostic capability of the seven-miRNA signature, we performed ROC analysis. The seven-miRNA signature provided a high sensitivity and specificity of survival prediction in the training set, validation set and independent set, with areas under the ROC curve (AUROC) ranging from 0.51 to 0.71 for DFS prediction and 0.69 to 0.73 for OS prediction ( Figure 6). To further examine the prognostic capability of the miRNA, we applied the same cutoff risk scores to all samples from TCGA in the cohort. A significant segregation of patients based on DFS (P < 0.001) was observed (Figure 7). These analyses underscore the robustness of our miRNA-based classifier in predicting glioma patient outcome.

Analysis of the network of genes and pathways targeted by the 7-miRNA signature using Circos plots
Though initially designed for displaying comparative genomic data, Circos plots have also been used to analyze mutations in cancer metagenomic regulatory networks [18]. To provide a comprehensive evaluation of the putative biological functions of the seven-miRNA signature, we applied Circos plots to integrate data from the seven miRNAs, their predicted transcriptional targets, and annotated functions in glioma. To our knowledge, this is the first time that data from miRNAs and their predicted targets have been simultaneously combined in this manner. To decrease the complexity of the graphs and allow for better readability, we only showed selected transcriptional targets and inferred functions with known relevance to glioma. The results suggest that many oncogenes and tumor suppressor genes are targeted by miRNAs, including STAT3 and E2F1 (Figure 8-14).

DISCUSSION
An excessive number of cancers occur without an effective means of prevention or treatment. In routine clinical practice, disease recurrence determines the prognosis of many glioma patients. However, the  A. The relative expression of miR-124a, miR-129, miR-139, miR-15b, miR-21, miR-218 and miR-7 from microarray analysis of 300 GBM samples and 10 healthy control samples from TCGA database. B. Relative expression of these same miRNAs was determined by quantitative RT-PCR of 34 GBM samples and 10 control non-cancerous glial tissues. Expression was normalized to U6 expression. Data are presented as the mean± SEM, and the statistical significance calculated using the unpaired t-test is indicated. underlying molecular differences in glioma, even those with the same staging, leads to variation in patient outcomes, thus suggesting that there is a need for improvement of the current staging system [1]. To gain better insights into the biology of cancers, precision medicine has emerged as a promising approach for disease treatment and prevention that considers individual variability in genes, environment, and lifestyle [19]. Therefore, identifying novel genetic signatures may help clinicians to determine the likelihood of recurrence and tailor treatments accordingly.
In this study, we established a seven-miRNA diagnostic tool that can improve the predictability of disease recurrence. This novel prognostic tool was validated and our results showed that this tool can successfully stratify GBM patients in high-and lowrisk categories and predict patient DFS. Additionally, in comparison with other clinicopathological risk factor analysis, the tool could more accurately predict patient survival, suggesting that the seven-miRNA-based classifier provides prognostic value that complements clinicopathological features.
Previous studies have identified multiple miRNAs that are differentially regulated in glioma compared with normal tissue. Each of the miRNAs in this study has previously been shown to be associated with prognosis or therapeutic outcome in patients with malignant  glioma [7,[11][12][13][20][21][22][23][24][25]; however, the integration of these miRNAs as a seven-miRNA signature for glioma prognosis is unique to our study. Interestingly, the two up-regulated miRNAs (miR-15b and miR-21) have been identified as circulating markers for glioma [24]. Furthermore, miR-15b has recently been identified as a "hub" gene that regulates multiple pathways associated with glioma progression [12]. Conversely, each of the down-regulated genes has been directly shown to have tumor suppressor properties, including the ability to inhibit glioblastoma invasion, migration and viability [13,[26][27][28][29]. These findings support the inclusion of the miRNAs within the seven-miRNA signature.
Our study is unique from previous studies in that it provides an approach for classifications of gliomas using a large sample size and advanced statistical approaches. Previous studies that have assessed miRNA biomarkers in glioma have been limited by small numbers of miRNAs screened, small sample sizes, lack of independent validation sets, and the use of inappropriate statistical methods to mine miRNA microarray data. The use of the LASSO Cox regression model has allowed us to integrate multiple miRNAs into one tool, which has significantly greater prognostic accuracy than that of single miRNAs alone [14]. We demonstrated that similarly-staged glioma patients can be classified into high-and low-risk groups based on their seven-miRNA signature identified by this method. Additionally, the seven-miRNA signature had a better survival prognostic capability compared with other indices as determined by ROC analysis. Thus, classification of patients by this approach could provide useful information to clinicians on the glioma recurrence risk and guide decisions for treatment. Based on the highly sensitive results of our prognostic model, modifications can be made on disease management, for example over-treatment of low-risk patients can be avoided which could save extraordinary expense and unnecessary aggravation. Notably, we validated the differential expression of the seven miRNAs identified in this study using an independent set for formalin-fixed paraffin-embedded tissues. As we could not obtain fresh paired frozen specimens, a direct miRNA expression comparison between corresponding frozen and formalin-fixed paraffin-embedded clinical specimens could not be performed. However, studies on mouse liver show similar miRNA expression results for both freshly frozen and formalin-fixed paraffin-embedded specimens [30]; however, it still would be important to repeat the RT-PCR analyses of the seven-miRNA signature using fresh frozen samples. As an additional potential limitation, our study was performed using samples from the United States and China. The distribution of clinical characteristics could potentially differ in other geographic locations. Therefore, the generalizability of our results should be further validated on other prospective studies for population-based analysis.
In summary, our results suggest that the novel seven-miRNA based prognostic tool can effectively classify malignant glioma patients into groups at low or high risk of glioma relapse, thereby adding prognostic value to the traditional clinicopathological risk factors used to assess these patients' prognosis. Overall, we believe that the distinctive seven-miRNA classifier provides a useful prognostic tool with clinical value for appropriately categorizing patients with malignant glioma.

Patients and glioma samples
Clinical information for 495 patients with GBM and 10 unmatched non-tumor samples from patients undergoing brain trauma were obtained from The Cancer Genome Atlas (TCGA) (tcga-data.nci.nih.gov"; accessed November 2013). The database includes both primary and secondary GBM. To rule out changes in miRNA expression caused by prior disease treatment, 308 patients that had not received radiotherapy of chemotherapy prior to biopsy sampling were extracted from the 495 in the database. We stochastically used computer-generated random numbers to select 300 of the 308 samples for analysis and to assign 89 of the samples to the training set, 102 samples to the validation set and the other 109 samples to the independent set. The demographic and clinical features of all patients and healthy controls are listed in Table 1.
In addition to the data from TCGA, we obtained 34 pathologically proven glioma (grade III-IV) specimens and 10 paraffin-embedded normal brain tissues from patients with brain trauma from the Affiliated People's Hospital of Jiangsu University between 2009 and 2015. The demographic and clinical features of these patients and healthy controls are shown in Table 6. All samples were assessed by pathologists of the Affiliated People's Hospital of Jiangsu University. The institutional review The matrix was generated by R language. The numbers in the squares indicate how the independent prognostic factors correlate with each other and seven-microRNA, the larger absolute value is the larger the correlation. Blue indicates positive correlation and read indicates negative correlation boards at each participating institution approved retrospective analysis of anonymous data.

RNA Isolation, Microarray Analysis, and Quantitative RT-PCR
The isolation of total RNA from the 300 GBM samples and 10 unmatched non-tumor samples and microarray analysis has been described previously [16].
Data that were obtained using the Agilent human miRNA 8×15k microarray (level 3) containing 534 miRNA probes were downloaded from the 2013 TCGA repository. Level 3 data were derived after the raw signals per probe (level 1) were normalized per probe set (level 2) and then averaged for each miRNA. We performed a significance analysis of the microarray data, with a false discovery rate of less than 0.001, to identify miRNAs differentially expressed between the paired cancer and normal samples [17].      (miR-7, miR-15b, miR-21, miR-124a, miR-129, miR-139 and miR-218. www.impactjournals.com/oncotarget     (miR-7, miR-15b, miR-21, miR-124a, miR-129, miR-139 and miR-218. www.impactjournals.com/oncotarget MiRNAs were classified as differentially expressed if the expression change was >3.0-fold, and the difference were considered to be significant if p-values were less than 0.01. We perform hierarchical clustering analysis with the average linkage method, and un-centered Pearson's correlation coefficients with MEV version 4.9. RNA was isolated from the 34 glioma (grade III-IV) specimens and 10 normal brain tissues from the Affiliated People's Hospital of Jiangsu University by the Trizol method (Invitrogen, USA). RT-PCR was performed using the one-step SYBR RT-PCR Kit II (Takara ® , Japan). Values were standardized to the expression of U6 RNA.

Statistical analysis
The statistical software SPSS version 19.0 was used for data analysis. All computational work was performed using R language (v3.1.1). We analyzed data from TCGA for the selected miRNAs to evaluate their prognostic value in malignant glioma. The primary outcome of this paper was DFS. We assessed the relationship between the clinical characteristics and miRNA expression using the Student's t test, χ 2 test or Fisher's exact test. We analyzed correlations between the microarray data and the RT-PCR data of our samples using the Spearman correlation test. We used the Kaplan-Meier method and the log-rank test to estimate DFS and OS, and calculated hazard ratios with an adjusted multivariate Cox regression analysis. We used the LASSO Cox regression model [14] to select the most useful prognostic markers among the GBM-associated miRNAs identified with the cohort set, and we constructed a multi-miRNA-based classifier for predicting the DFS of patients with malignant glioma in the training set. We also used the R software version 3.1.1 "glmnet" package to perform the LASSO Cox regression model analysis. To identify the differentially expressed miRNAs that were significantly associated with DFS in the training set, we use LASSO Cox regarding analysis with R language. MiRNAs that were associated significantly with DFS were selected to construct an miRNA signature with the risk-score method and were used for survival analysis. The following algorithm was used to divide patients into high-risk (score > 4.25) and low-risk (score < 4.25) groups: risk score = (0.039 × status of miR-124a) + (0.389 × status of miR-129) + (0.495 × status of miR-139) -(0.396 × status of miR-15b) -(0.124 × status of miR-21) + (0.387 × status of miR-218) + (0.062 × status of miR-7). We compared the two groups using the t test for continuous variables and χ 2 test for categorical variables.
For survival analyses, we used the Kaplan-Meier method to analyze the correlation between variables and DFS, and the log-rank test to compare survival curves. We used the Cox regression model to perform the multi-variable survival analysis, and Cox regression coefficients to generate nomograms. Calibration plots were generated to explore the performance characteristic of the nomograms. The x-axis represents the prediction calculated with use of the nomogram, and the y-axis represents the actual freedom from cancer recurrence for our patients. The 45-degree dashed line represents the performance of an ideal nomogram. Nomogram and calibration plots were generated with the rms using R software version 3.9.0. Statistical significance was set at 0.05.
We performed multivariate analysis using a backward stepwise approach to test if the signature was an independent prognostic factor of OS, age, gender, Karnofsky score, tumor location, recurrence, MGMT methylation, IDH1 mutation, smoking, or family history of cancer. Individual miRNAs within the 7-miRNA-based classifier were used as covariates. The observed two-tailed significance level was less than 0.001.
We also used R software version 3.1.1 and the "survival ROC" package to perform time-dependent receiver operating characteristic (ROC) curve analysis. We investigated the prognostic accuracy of each feature and multi-miRNA-based classifier using time-dependent ROC analysis. We assessed the area under the curve at different cutoff times to measure prognostic accuracy.

Generation of a circos plot for identification of genes and pathways targeted by the seven-miRNA classifier
The Circos plots show glioblastoma signature genes regulated by miRNA. Several utility tools are bundled with Circos to help analyze, filter, and format data. Circos plots apply simulated annealing to a linked data set to generate an ideogram order that minimizes/maximizes the number of links that cross in the image. R language is a collection of tools that is used to parse tabular data and generate data and configuration files for visualizing tables with Circos.