A novel microRNAs expression signature for hepatocellular carcinoma diagnosis and prognosis

This study aims to identify prognostic microRNAs (miRNAs) biomarkers for diagnosis and survival of hepatocellular carcinoma (HCC) based on large patients cohort analysis. HCC patient cohort data were downloaded from The Cancer Genome Atlas, including paired HCC and adjacent non-cancer tissues. Receiver operating characteristic curve method was used to classify cancer and non-cancer tissues according to microRNAs expression levels. The aberrant microRNAs expression level were ranked and risked for building a prognostic miRNAs signature model. Kaplan–Meier survival was used to analyze the differences among various risk factors in accordance with miRNAs ranking scores. The study showed 33-miRNA signature, 11 were down-regulated and 22 were up-regulated through comparison between cancer samples and non-cancer samples. The maximum correct classification rate is up to 98.7%. Five microRNAs, hsa-mir-3677, hsa-mir-421, hsa-mir-326, hsa-mir-424 and hsa-mir-511-2, significantly correlated with patient survival. The survival rate and time negatively associated with lowering miRNAs index. In the low risk group, over 70% patients showed 5 years survival, while none patients survived longer than 5 years in the high risk group. MiR-424, miR-326 and miR-511 could be applied for HCC diagnostic biomarkers. These five miRNAs were significantly associated with lysosome pathway and D-Glutamine and D-glutamate metabolism pathway via Kyoto Encyclopedia of Genes and Genomes pathway analysis and Gene Ontology annotation. Conclusively, the five miRNAs expression signature could be used as HCC prognostic and diagnostic biomarkers.


INTRODUCTION
In the past 30 years, liver cancer (mostly hepatocellular carcinoma, HCC) is mainly prevalent mostly in Asia and Africa. However it has become a global disease nowadays [1]. In developing countries, HCC is the second leading cause in male cancer death, while it ranked sixth in more developed countries [2]. Up to now, the early screening of hepatocellular carcinoma mainly depends on liver ultrasound and alpha-fetoprotein (AFP). Liver ultrasound is undoubtedly an economical choice with sensitivity of 60%-90% and specificity of above 90% [3]. Even though serum AFP has been utilized for 40 years with sensitivity of 60%-80% and specificity of 70%-90%, respectively [4]. It was found that serum AFP concentration was influenced by the tumor size and cancer stage [1]. Moreover, its rise is also commonly seen in chronic liver inflammation and other diseases. Thus its specificity is not satisfying. The European Association for the Study of Liver (EASL) and American Association for the Study of Liver Diseases (AASLD) guidelines do not even recommend AFP as a diadynamic criteria of hepatocellular carcinoma [5]. The AASLD and EASL guidelines only consider the results made from four-phase computed tomography (CT) and dynamiccontrast magnetic resonance (MR), while the Asian Pacific Association for the Study of the Liver (APASL) concerns the size of the lesion [6]. Unfortunately, even if pathological biopsy was deemed as the gold standard, it still has a high false negative rate, regular follow-up inspection is of great necessity [7].
Many studies focused on exploration of cancer diagnosis and prognosis biomarkers using microRNAs (miRNAs, miRs) expression signature. miRNA as biomarker has its advantages, such as stable and high sensitivity. It is reported that detection of miRNAs in section slides was successfully applied [8]. Tissue specific miRNAs are unique identifiers for tumor origin and type [9]. However, the most prominent advantage would be the high-through put sequencing of miRNAs [10]. It has been demonstrated that combination of miR-10b, miR-106b and miR-181a could discriminated HCC patients from normal controls (area under curve (AUC) of 0.85, 0.82, and 0.89, respectively) [11]. Zhang et al. reported that they found serum miR-143 distinguished HCC from healthy individuals with 71% sensitivity and 83% specificit, and miR-215 with 80% sensitivity and 91% sepcificity [12]. Another study identified a panel of 7 miRNAs (miR-122, miR-192, miR-21, miR-223, miR-26a, miR-27a and miR-801) that provided a high diagnostic accuracy of HCC (AUC for training and validation data set are 0.864 and 0.888, respectively) [13].
In this study, we introduced a novel set of miRNAs for HCC diagnosis and prognosis using TCGA database. Because the combined miRNAs signature have more convincing power than every single miRNA, we ranked risk factor for each miRNA, and scored them.

RESULTS
Identification of a 33-miRNA signature to discriminate HCC from corresponding noncancerous liver tissues miRNAs expression profile of 377 patients were downloaded from TCGA database using TCGA-Assembler [14], specially, 37 paired tumor and non-tumor data were also included. A total of 207 miRNAs were found differently expressed between cancer and noncancer tissue (student's t test, p<0.05). We got 33-miRNA signature by class prediction and clustering of these paired data using MultiExperiment Viewer v4.2 software. The maximum correct classification rate is up to 98.7% for HCC and noncancerous liver ( Figure 1). These 33 miRNAs, in which, 22 were down-regulated and 11 were up-regulated, were listed in Table 1.

miRNAs signature for HCC prediction
We randomly divided the TCGA cohort into two groups: training group and test group respectively, using SPSS software. The training group was used to get the area under the ROC curve using ROC method, and the test group was used to validate effect of having or not having cancer outcome. For the above five miRNAs, we got miR-424, miR-326 and miR-511, which sensitivity and specificity were greater than 0.9 (Figure 2A Figure  2B, p<0.0001). The sensitivity and specificity of miR-125b-2 and miR-451 were greater than 0.85 both in the training group and the test group (Supplementary Figure  2A and 2B, p<0.0001). The signature could be served as a diagnostic marker, and enhanced the accuracy when combined with pathological diagnosis.

Identification of five miRNAs associated with HCC patients survival
In order to identify survival sensitive miRNAs profile, we used ROC curve to discriminate the 33 miRNAs in 304 patients. These patients have completed miRNAs data and clinical data. Among the 33 miRNAs, 8 miRNAs including hsa-mir-3677, hsa-mir-421, hsa-mir-125b-2, hsa-mir-326, hsa-mir-424, hsa-mir-511-1, hsamir-511-2 and hsa-mir-451, showed significantly different outcome after ROC curve analysis. We dichotomized 304 patients according to the miRNAs value comparison to ROC cutoff score, and named high level or low level group. The 8 miRNAs with patients survival time were analyzed Kaplan-Meier survival analysis. Only hsamir-3677, hsa-mir-421, hsa-mir-326, hsa-mir-424 and hsa-mir-511-2, were significantly correlated with patients survival (Figure 3, p<0.05, Log-rank test). The shorter time group was named high risk group, and the longer time as low risk group.

Prognostic five miRNAs signature index for HCC survival analysis
In the procedure of miRNAs signature index definition, we defined high risk group patients getting 1 score, while low risk group getting 0 score for each miRNA. Under this criteria, the highest score would be 5, and the lowest score was 0. We ranked 304 HCC patients according to the miRNAs signature index, and grouped them into 3 groups. High risk group represented miRNAs signature index was above 4 scores, and medium risk group was 2-3 scores, when low risk group was below 2 scores. Kaplan-Meier survival analysis showed these 3 groups are significantly correlated with patients survival (Figure 4). In the low risk group, over 70% patients showed 5 years survival, while none patient survived longer than 5 years in the high risk group (Figure 4, blue line as low risk, red line as high risk). In the result, we could also conclude that the survival rate and time increase companied with lowed miRNAs signature index. MiR-3677 and other miRNAs showed less sensitivity and specificity in cancer prediction (less than 0.7, data not shown)

Kyoto encyclopedia of genes and genomes (KEGG) signal pathway and gene ontology (GO) annotation of five miRNAs predicted genes
We used KEGG pathway to analyze the 5-miRNA potentially down-or up-regulated genes in light of their pivotal role in patients survival prognosis and disease diagnosis. KEGG pathway was usually facilitated to illustrate all the associated pathways containing differentially expressed genes. Interestingly, only four significant pathways were shown even the p-value was larger than 0.05 (Table 2), including lysosome pathway and glutamate metabolism pathways. In Go annotation, three main classes of processes were distinguished according to the ordering of the ontology system. They are biological processes, cellular components and molecular function. In this study, we defined the significantly process with p-value less than 0.05. In molecular functions, 19 processes were exhibited to be associated with 5-miRNA predicted genes (Table 3). In the table, many processes were involved in membrane protein transporter and enzyme activity function, especially, glutamate transmembrane transporter activity and dehydrogenase activity. Over 34 processes were significant shown in the GO term biological process (Supplementary Table 1). The four most genes involved processes were membrane transporter, organic development and function associated, cell growth and ubiquitination. In cellular components, endomembrane system and plasma membrane part consisted of the main process (Supplementary Table 2, GO:0012505 and GO:0044459).
When combined the analyses of KEGG and GO annotation, we considered that the most significantly associated pathway was lysosome pathway, because it is tightly linked to cell metabolism and dead cell clearance, and to organ development, cell growth and ubiquitination in GO term. Another appealing pathway was the D-Glutamine and D-glutamate metabolism pathway in KEGG analysis. Many processes and functions indicated that membrane transporters were pivotal in GO term, which might be involved in glutamine-axis function. However, the map of KEGG and GO annotation is too large to get a very specific result or to draw a convinced conclusion. The axis of miRNA-targets-function-disease in HCC diagnosis and prognosis needs to be deeply studied.

DISCUSSION
miRNAs are promising biomarkers for cancer diagnosis and prognosis. Increasing studies on miRNAs as biomarkers have been reported. Wei et al. conducted a microarray consisting of 683 miRNAs used in profiling 166 hepatocellular carcinomas in South China, and found 4 miRNAs (hsa-mir-451, hsa-mir-766, hsa-mir-103 and hsa-mir-18a) were potential biomarkers [15]. In another study, an integrated miRNA signature was identified from 26 published HCC datasets, and validated by TCGA database. Three miRNAs were the same with our findings: hsa-mir-214, hsa-mir-145 and hsa-mir-199a [16]. In our study, a 33-microRNA signature was identified to discriminate hepatocellular carcinoma from corresponding noncancerous liver tissues, among which 5 miRNAs (hsamir-3677, hsa-mir-421, hsa-mir-326, hsa-mir-424 and hsa-mir-511-2) were identified significantly associated with patients survival. In addition, five miRNAs (miR-424, miR-326, miR-511, miR-125b-2 and miR-451) were identified to provide high diagnostic accuracy of HCC. The main reason might be probably that miRNAs expression profiles varied considerably in different studies due to different technological platforms and sample conditions, like stage and pathological grading.
The miRNAs panel also differentiated HCC from the healthy (AUC 0.941), chronic hepatitis B (AUC 0.842), and cirrhosis (AUC0.884), respectively [13]. MiRNA-21 ROC analysis showed an AUC of 0.773 with 61.1% sensitivity and 83.3% specificity when differentiating HCC from chronic hepatitis [17]. The combination of miR-16, AFP, AFP-L3 and DCP correctly identified HCC from healthy controls up to 92.4% sensitivity and 78.5% specificity [18]. To our disappointment, we did not find the association between miRNAs signature and HBV/HCV, or AFP, or other routine biomarkers. It suggested that there are some differences between the western and eastern countries in terms of tumorigenesis and tumor progression of HCC. Chinese patients are mainly infected with HBV whereas most US patients usually carry HCV instead.
Moreover, we have taken Go annotation into consideration. The result of GO annotation showed glutamate transmembrane transporter activity and dehydrogenase activity were the most related cellular    TPK1, ACTR3, ACTR2, ANKRD17, PAK2, AAK1,  DHX33, TLK2, EIF2B2, ARL5B, AKT3, RBM12, RHOH,  RAP2B, ARL1, KIF5C, PRKCI, PPARGC1A, RAD50,  MARK1, RND3, ACVR2A, KIF1B, RFK, TESK2, STEAP2,  RAB10, SLC27A4, GLUD2, GLUD1, MKNK1, IGF2BP3,  EPHB4, MAP3K2, CHD2, PPIL4, MSI2, IDH1, DCLK2,  HCN4, POLQ, MYO5B, RAB2A, DNM1L, ELAVL2,  ELAVL3, DOCK8, RAB33B, SIRT3, MEF2D, HSPA12B,  PLK2, ILF2,  components and molecular function. This further confirmed our conclusion. In KEGG pathway analysis of five miRNAs potential targets, several genes were reported to increase HCC development. Lysosome pathway makes organelle and protein homeostasis and acts as a cell survival mechanism under a variety of stress conditions [19]. AKT/mTOR activates autophagic lysosome pathway, which is regarded as autophage or apoptosis switch in injured liver cell fate [19]. It is reported impaired lysosomal maturation may be crucial to the carcinogenesis of HBV-related HCC [20]. Autophagy is an important mediator for the suppression of liver tumorigenesis. Its deficiency is associated with a poor prognosis of HCC [21]. The HCC development was also associated with expression of early HCC markers (glutamine synthetase, glypican 3, heat shock protein 70, and the serum marker AFP) [22][23][24]. The neoplastic nature of the HCC was confirmed by histology and expression of the HCC marker glutamine synthetase (GS) [25]. GS is the target of the Wnt/beta-catenin pathway in the liver, therefore, glutamine metabolism by beta-catenin is a contributing factor to HCC development [26]. Several miRNAs of our signatures have not been reported previously in HCC, which may provide a novel molecular approach for HCC diagnosis and prognosis. Our 33-microRNA signature was also essential for identifying potential targets for HCC therapy and monitoring the tumor progression and recurrence. As HCC is a highly complex, multi-factorial and heterogeneous disease, many miRNAs are dysregulated during tumorigenesis and progression. Therefore, a combination of multiple circulating miRNAs or a plasma/serum miRNA panel could provide more accurate information than just one single miRNA for the diagnosis and prognosis of HCC. Moreover, the combination of serum/plasma miRNAs with already established markers (such as AFP, FP-L3 and DCP etc.) may also improve the performance of HCC diagnosis [27]. In spite of a great amount of evidence for the presence of circulating miRNAs, their functions and mechanisms have not thoroughly been clarified yet. We have to agree that there are several limitations in the study design and the findings. The pivotal limit of the study design is the lack of cross-validation with different HCC patient cohort. Such cohorts could be from other database, or our collected patient data. After cross-validation with other cohort, the miRNAs signature for diagnosis and prognosis will be more convincing. Secondly, the finding of tissue miRNAs signature limits the use of early diagnosis for HCC detection, unless such signature is tested in serum samples. Actually, we are building up a database related to HCC patient which belongs to our cancer institute. The data includes gene expression array, miRNA expression profile, gene promoter methylation array and clinical laboratorial markers from serum or tissues. We look forward to solving the above limitations after the accomplishment of the database.

Patients selection
The results shown here are wholly based upon data generated by The Cancer Genome Atlas (TCGA) Research Network: https://gdc.cancer.gov/. The expression profile of 1042 miRNAs was from 377 patients who were diagnosed of HCC and 37 normal controls. Thirty-seven paired tumor and non-tumor tissues data were also included. Due to the missing follow-up survival data, we excluded those samples which survival status was not recorded. Actually, 334 samples were used for further analysis. The flowchart of the study was shown in Supplementary Figure 1.

Setting cutoff score based on ROC curve
We used the maximum value of the sum of specificity and sensitivity as a cut-off point for each miRNAs [28,29]. For each miRNAs, two groups were separated, and named as high level or low level group. Kaplan-Meier survival analysis was employed to evaluate the 8 miRNAs' correlation with patients survival.

Prognosis prediction using miRNAs scoring
In the survival analysis, patients were stratified into the low risk group and high risk group based on their expression level of the 5 miRNAs which were significantly correlated with patients survival. Higher level was considered as high risk group, lower level as lower risk group. We defined high risk group patients getting 1 score, while low risk group getting 0 score for each miRNA. Therefore, the highest score would be 5, and the lowest score was 0.

Statistics
Statistical analyses were used Statistical Package for the Social Science (SPSS) 13.0, except for the hierarchical clustering analysis (HCL) was conducted by MultiExperiment Viewer version 4.2. Student's t test was applied to analyze the differentially expressed miRNAs. Receiver operating characteristic (ROC) analysis was used to discriminate 8 miRNAs which were significantly different among the 33 miRNAs, and to classify HCC and noncancerous liver samples. Kaplan-Meier survival analysis was employed to evaluate the correlation of 8 miRNAs with patients survival. All p<0.05 were marked as significantly different.