Seven protective miRNA signatures for prognosis of cervical cancer

Cervical cancer is the second cause of cancer death in females in their 20s and 30s, but there were limited studies about its prognosis. This study aims to identify miRNA related to prognosis and study their functions. TCGA data of patients with cervical cancer were used to build univariate Cox's model with single clinical parameter or miRNA expression level. Multivariate Cox's model was built using both clinical information and miRNA expression levels. At last, STRING was used to enrich gene ontology or pathway for validated targets of significant miRNAs, and visualize the interactions among them. Using univariate Cox's model with clinical parameters, we found that two clinical parameters, tobacco use and clinical stage, and seven miRNAs were highly correlated with the survival status. Only using the expression level of miRNA signatures, the model could separate patients into high-risk and low-risk groups successfully. An optimal feature-selected model was proposed based on two clinical parameters and seven miRNAs. Functional analysis of these seven miRNAs showed they were associated to various pathways related to cancer, including MAPK, VEGF and P53 pathways. These results helped the research of identifying targets for targeted therapy which could potentially allow tailoring of treatment for cervical cancer patients.


INTRODUCTION
Cervical cancer is the second cause of cancer death in females in their 20s and 30s. It was estimated that there would be 12,900 new cases and 4,100 death in the United States in 2015 [1]. Patients with cervical cancer may have significantly different clinical outcomes, which are hard to predict. If the group of high-risk patients were identified, they could got modified treatment including higher radiation doses. As a result, however, it is difficult to apply personalized treatment to patients with cervical cancer due to the fact we cannot predict the outcome using prediction models to date, which may improve clinical outcomes significantly.
The expression level of miRNA have been considered as potential biomarkers for predicting survival status for various human cancers [2]. However, there were few studies that investigated the prognostic value of miRNAs in cervical cancer. Only Hu et al [3] found two miRNA biomarkers for cervical cancer prognosis with a smaller size of patients, but other studies [4] focused on specific miRNAs and cell line experiments instead of real patients survival status. Given the fact that the poor survival for patients with advanced stages of cervical cancer, here we aim to identify miRNA which can predict the prognosis of cervical cancer and study their functions by analyzing their targeted mRNAs, which may be used to predict the survival time and help determine the therapy in the future.

Clinical parameters related to prognosis
After filtering, 166 censored and 48 non-censored patients were used in the following analysis. Their clinical characteristics were shown in Table 1. Selected clinical parameters were utilized to fit the Cox's model Table 2. Parameters which were unavailable in many patients, like whether lymphovascular was involved or whether the patient was infected with HPV, were not included, even though lymphovascular involvement had a significant p-value and HPV infection is an important cause of cervical cancer. Among these clinical characteristics, only tobacco use and clinical stage had a p-value less than 0.05. On the other hand, tumor grade or histologic diagnosis was not significant.

miRNA signature related to prognosis
Seven miRNAs had an FDR less than 0.05, which were regarded as significantly correlated with survival time ( Table 3). All of these miRNAs have protective impacts on the survival time based on relationship between their expression and survival status; that is, the high expression of these miRNAs leads to a longer survival time.

Performance of miRNA signature
After building the multivariate Cox's model, the hazard ratio was assigned to each patients. The comparison of hazard ratio with patient survival time was shown in Figure 1A. Patients were divided into two groups evenly. Most of patients who died of cancer and survived for a relatively shorter time were located on the right side of the cutoff. To ensure that the selected miRNA signature was able to separate patients into low-risk and high-risk groups, the Kaplan-Meier curve was plotted using the group separation of hazard ratio ( Figure 1B). The p-value, representing the separation of two groups, was only 1.13E-08 by log-likelihood test.
The heat map ( Figure 2) showed that when the patients were clustered using scaled expression level of seven miRNA signatures, these miRNAs was able to separate the patients into low-risk and high-risk group, which indicated that the high expression of these miRNAs led to a lower hazard ratio and a longer survival time.

Multivariate model with clinical parameters
With seven miRNAs and two clinical parameters, the multivariate model only had two variables having a p-value lower than 0.05. As a result, after building the multivariate model, a feature-selected procedure was took. The final model contains miR-150, miR-3607, miR-378c, miR-642a, smoking status and clinical stage. Due to the amount of variables in the original model, we believed this final model was more suitable to use in clinics. Considering that the arbitrary coefficients change with different methods or kinds of microarray used, they had no value to be shown here. However, we proposed to utilize the expression levels of four miRNAs and two clinical characteristics as signature for predicting prognosis of cervical cancer, which should be tested in the wet lab or in clinics.
After splitting patients into the train group and test group randomly, the final model highly depended on how the patients were separated. The main reason is the limited non-censored patients (only 48) and the within-group variations. To test the final model, other independent data set is needed.

Validated mRNA targets
Nowadays, tools for predicting mRNA targets of miRNAs, which may cause high false positive and false negative rate, are still immature. As a result, we used validated targets of miRNA from miRTarBase [5] to finish the following analysis (Supplementary Table S1). Still, considering the high false positive hit of NGS data, only mRNAs with strong evidences were left. Some of miRNAs didn't have any mRNA targets with strong evidences, so they were neglected here. The validated miRNA-mRNA interactions were shown in Table 4.

Gene ontology and pathway enrichment
Gene ontology and pathway enrichment using online tool STRING [6]. The gene ontology included here are just slim gene ontology, which gives an overview without the details of the specific terms [7]. Interestingly, the enriched gene ontology seemed more related to embryo development than cancer. In the category of biological processes, only homeostatic process had close relationship with the cancer progression. On the other hand, other three terms were all related to structure development, which might be induced by the fact that these miRNAs were obtained from cancer in cervix, which in turn bind to mRNA highly expressed in cervix ( Table 5).
Most of the enriched KEGG pathways were associated with cancer, which supported our predicted miRNAs that they played important roles in the cancer development and progression (Table 6). Six out of eight enriched pathways were involved in cancer directly, except that two pathways were associated with virus infection. Supplementary Table S2 contained more enriched KEGG pathways which had adjusted p-value with method Bonferroni less than 0.05. Most of the additional pathways were also associated with cancer or virus infection. STRING [6] was used to visualize the protein interaction. Figure 3A should the network with methods of www.impactjournals.com/oncotarget neighborhood, gene fusion, co-occurrence, co-expression, experiments, databases and text-mining. This network was enriched in interaction, which assumed that these protein worked altogether and joined in the related pathways.

Network analysis of targeted miRNAs
After removing evidences with lower confidence and only using data experiments, a network with fewer genes was built ( Figure 3B). In both networks, TP53 and EP300 were both the centers and had high degrees, indicating the important roles of them in the prognosis of cervical cancer. In fact, both of TP53 and EP300 are the targets of human papillomavirus (HPV), which is the top risk factor of cervical cancer.

DISCUSSION
In this study, we identified two clinical parameters and seven miRNAs as biomarkers for the prognosis of cervical cancer. All of these miRNAs are shown to have protective impacts on the survival time in this study, whose high expression leads to a longer survival time.
Compared with results of Hu, Schwarz [3], they found that only the number of lymph node involved in cervical cancer was negatively associated with the disease outcome. In this study, lymph node was also significant after removing missing data. However, we found that clinical stage and tobacco use were both highly correlated with survival time. In fact, tobacco use is an important risk factor for cervical cancer, which was ignored in the study of Hu, Schwarz [3]. Clinical stages were simplified into four stage; that is, I, II, III and IV. They were treated as numeric. % Significant variables had a p-value less than 0.05 It was reported that miR-142 worked as a cancer suppressor in colon cancer [8]. On the meantime, it can be used as one part of prognosis predictor in gastric cancer and esophageal squamous cell cancer [9,10]. Few studies between miR-642a and cancer were found. For miR-101, it was proposed that the loss of it lead to overexpression of EZH2 and cancer progression in further [11,12]. miR-502 inhibited colon cancer growth by a negative feedback loop with p53 [13] and was correlated with early development of breast cancer [14]. miR-378c was downregulated in the rectal cancer and gastric cancer [15,16] but its expression may enhance cell survival and tumor growth [17]. This result fought against its protective role in this study. miR-510 was suggested to be a potential biomarker of colorectal cancer and lower expression suggested a shorter survival time [18]. However, Wu, Jin [19] reported that  overexpression of miR-150 resulted in proliferation and growth of gastric cancer. Such contradiction may come from the fact that each miRNA has dozens of targets and iteither promotes or inhibits the cancer through different pathways based on the cancer types and specific cellular environment.
Most of the seven miRNA predictors were reported to play roles in the cancer, which strengthened the possibility to use the seven miRNA signature as predictor of prognosis of cervical cancer. The mRNAs regulated by miRNA signatures were enriched in pathways which are involved in cancer progression orprognosis. The most significant pathway enriched was microRNAs in cancer, which was certainly accordance with analysis procedures. The results suggested that our obtained seven miRNA signature were in fact highly correlated with cancer.  Pathways of colorectal cancer and pancreatic cancer were enriched, while no cervical cancer pathway was obtained since the specific pathway of cervical cancer is unavailable in KEGG. MAPK and VEGF signaling pathway were enriched, which are critically important pathways in cancer development. It was also reported that oncoprotein E5 activated the MAPK pathway and helped the activities of E6 and E7 [20]. Another stuff should be noticed is the HTLV-I infection and Hepatitis B pathway. Infection of both viruses may lead to cancer, and involve MAPK and P53 signaling pathway [21]. Interestingly, oncoprotein E6 of HPV targets TP53 and EP300 and results in inhibition of apoptosis, cancer proliferation and accumulations of mutations shown in Figure 3C [21]. When it comes to the results of protein network, TP53 and EP300 were both the centers of the network [22].
Moreover, due to the amount of variables in the original model, we believed our final model was more suitable to use in clinics. Considering that the arbitrary coefficients change with different methods or kinds of microarray used, they had no value to be shown here. However, we proposed to utilize the expression levels of four miRNAs and two clinical characteristics as signature for predicting prognosis of cervical cancer, which should be tested in the wet lab or in clinics.
To summary, our study figured out seven miRNAs which had protective impacts on the survival status of cervical cancer patients, which suggested that these miRNAs played a critical role in the pathogenesis, progression and prognosis of cervical cancer. Patients with a higher hazard ratio had worse survival status in the Cox's model. These miRNAs play their role by binding to mRNAs enriched in various pathways related to cancer. mRNAs may affect the prognosis of cervical cancer through MAPK, P53 and VEGF pathway. These results helped the research of identifying targets of targeted therapy. The only drug of targeted therapy now used in advanced cervical cancer is bevacizumab and targets the   [23]. This study suggested that drugs targeted the MAPK and P53 pathways may also work.

TCGA cervical cancer dataset
Expression levels of miRNA and clinical information of patients with cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) were retrieved from The Cancer Genome Atlas (TCGA). Level 3 data of miRNA expression level, including 1046 miRNAs, were downloaded. The dataset contained 307 patients with solid tumor, 2 patients with metastatic tumor and 3 samples from normal solid tissues. Only patients with solid tumor were used in this analysis considering the limited size of samples with metastatic tumor. The clinical information and follow-up data of these patients were also obtained to extract the information of recurrence time.
Patients were filtered by the following criteria. First, patients who did not have tumor recurrence but died were removed. Second, the cases that patients were alive and the last contact days were unavailable were discarded. Third, patients with less than 50-day follow-up were removed, which were all censored data.

Univariate survival analysis
Clinical information was used to build univariate Cox proportional hazard ratio model [24], including clinical stage, tumor grade and so forth. If one parameter was unavailable in certain patients, these patients were treated as missing. Significant parameters were filtered out using 0.05 as the cutoff.
Each miRNA was fitted to univariate Cox's model as well [24]. FDR adjustment was carried out on the results of miRNA expression data and significant miRNAs were selected with a cutoff of 0.05. Seven miRNAs were identified as biomarker of prognosis.
Next, whether these miRNAs can really separate patients into high-risk and low-risk groups was checked. A multivariate Cox's model was built based on significant miRNA expression data first and patients were separated The network predicted only with experiments. Disconnected nodes were hidden. This network was not enriched in interaction. C. KEGG pathway related to cervical cancer. Only the part related to results of this study is plotted. This graph was reorganized based on viral carcinogenesis (hsa05203) from KEGG. www.impactjournals.com/oncotarget into two groups using proportional hazard ratio [25]. Median of the all hazard ratios was used as a cutoff to evenly separate the patients in this case. The log-rank test [26] was carried out to get the quantitative measure (p-value) of the separation of the two groups and Kaplan-Meier curves [27] was plot to visualize such separation. Moreover, using scaled expression level of these seven miRNAs, a heat map was plotted with the annotation of high-risk or low-risk group to visualize the relationship between expression level and groups.

Multivariate survival analysis
On the meantime, the clinical information was considered to be significant and was included into the Cox's model. Seven miRNAs, together with two clinical characteristics, were used to build the model. After that, an optimal formula was constructed using Akaike information criterion (AIC) based on the log-likelihood ratio [28] from this model, which was believed to be a better model.

GO and pathway of targets of miRNAs
Validated targets of miRNAs were available in miRTarBase [5]. Only miRNA-target interactions with strong evidences, either by reporter assay or western blot, were extracted. Gene ontology enrichment [7] and pathway enrichment [21] were carried out based on these validated targets using online tool STRING [6]. The interactions among products of these targets were visualized by the same tool.