Hepatocellular carcinoma associated microRNA expression signature: integrated bioinformatics analysis, experimental validation and clinical significance.

microRNA (miRNA) expression profiles varied greatly among current studies due to different technological platforms and small sample size. Systematic and integrative analysis of published datesets that compared the miRNA expression profiles between hepatocellular carcinoma (HCC) tissue and paired adjacent noncancerous liver tissue was performed to determine candidate HCC associated miRNAs. Moreover, we further validated the confirmed miRNAs in a clinical setting using qRT-PCR and Tumor Cancer Genome Atlas (TCGA) dataset. A miRNA integrated-signature of 5 upregulated and 8 downregulated miRNAs was identified from 26 published datesets in HCC using robust rank aggregation method. qRT-PCR demonstrated that miR-93-5p, miR-224-5p, miR-221-3p and miR-21-5p was increased, whereas the expression of miR-214-3p, miR-199a-3p, miR-195-5p, miR-150-5p and miR-145-5p was decreased in the HCC tissues, which was also validated on TCGA dataset. A miRNA based score using LASSO regression model provided a high accuracy for identifying HCC tissue (AUC = 0.982): HCC risk score = 0.180E_miR-221 + 0.0262E_miR-21 - 0.007E_miR-223 - 0.185E_miR-130a. E_miR-n = Log 2 (expression of microRNA n). Furthermore, expression of 5 miRNAs (miR-222, miR-221, miR-21 miR-214 and miR-130a) correlated with pathological tumor grade. Cox regression analysis showed that miR-21 was related with 3-year survival (hazard ratio [HR]: 1.509, 95%CI: 1.079-2.112, P = 0.016) and 5-year survival (HR: 1.416, 95%CI: 1.057-1.897, P = 0.020). However, none of the deregulated miRNAs was related with microscopic vascular invasion. This study provides a basis for further clinical application of miRNAs in HCC.


INTRODUCTION
Hepatocellular carcinoma (HCC) is one of the most common cancers worldwide and the third most common cause of cancer-related death worldwide [1]. The prognosis of HCC is very poor, with a median survival of 6 to 20 months and less than 5% of symptomatic patients surviving more than 2 years. Identification of new biomarkers for the early detection of HCC is critical for the patients to receive proper therapeutic treatment as early as possible.
MicroRNA (miRNA), as a class of short noncoding RNA molecules, controlling approximate one third of the protein-coding genes by posttranscriptional regulation of gene expression can directly or indirectly affect almost all cellular pathways [2]. Recent years, the number of miRNA profiling studies has increased rapidly. Specific miRNA aberrations involved in cancer development and progression have been identified by high-throughput technologies across different normal and cancer tissues [3][4][5]. Therefore, many miRNAs are proposed as promising biomarkers for early detection of HCC and accurate predictions of prognosis, as well as targets for treatment [6,7]. Unfortunately, the common drawback of miRNA expression profiling studies is a lack of agreement due to many factors including application of different technological platforms, small sample size, inconsistent annotation, ongoing discovery of novel miRNAs, and use of different methods for data processing and analysis [8][9][10].
To overcome these limitations, we need to integrate their results in an unbiased manner. Robust rank aggregation (RRA) approach has been specifically designed for comparison of several ranked gene lists [11]. The tool looks at how each item is positioned in the ranked lists and compares this to the baseline case where all the preference lists are randomly shuffled. As a result, a P-value would be assigned for all items, showing how much better it was positioned in the ranked lists than expected by chance. This P-value is used both for reranking the items and deciding their significance. RRA is a suitable and effective solution for identification of statistically significant miRNA integrated-signature and is particularly useful when input experiments are performed by different technological platforms cover different sets of genes and full rankings of miRNAs are not available.
Herein, we would use this integrated bioinformatics approach to obtain a consistent miRNA expression signature as well as novel miRNA-regulated molecular pathways that contribute to the pathogenesis of HCC, which would help to prioritize the putative targets for further experimental studies of HCC development. Moreover, we further validated the most dysregulated miRNAs in a clinical setting.

Characteristics of the datasets
According to the inclusion criteria, 26 independent full-text studies retrieved from public databases (GEO, ISI Web of Science, and ArrayExpress) were used to build the 26 HCC miRNA expression profiling datasets. Of the 26 HCC miRNA expression profiling datasets, 1 profiling datasets were re-analyzed in GEO DataSets [12], while the others were extracted from the published studies directly. The description of the studies was provided in Supplementary Table 1.
Our integrated dataset included a total of 1250 pairs tumor and adjacent noncancerous tissues. The number of samples investigated ranged from 8 to 241 pairs (median 21) across the studies. Various microarray platforms were used in the studies (either commercial or custom) and the median number of miRNA probes assayed was 384 (ranging from 114 to 208818). Distribution of HCCspecific miRNA alterations as reported by primary studies was shown in Figure 1, which reflected the expansion of studied miRnome from 2006 to 2013. In total, 278 significantly up-regulated miRNAs and 231 significantly down-regulated miRNAs were reported in at least one study, respectively. In addition, 39 discordant alteration miRNAs were reported, indicating that they both up-regulated and down-regulated across the different studies. The number of significantly dysregulated miRNAs varies greatly across the studies. At least two upregulated or downregulated miRNAs were reported in each study, with the exception of GL, 2007 [13], in which only one up-regulated miRNA was reported. The deregulated miRNA lists varies across the studies (Supplementary Figure 1). Finally, the rank matrixes of normalized upregulated and downregulated miRNAs lists were separate constructed (Table 1).

Targets prediction and functional enrichment
The high-stringency target prediction for validated miRNAs was conducted. Target genes were obtained from both prediction algorithms and experimentally supported databases. The counts of predicted targets, experimentally validated targets, prediction based consensus targets, and consensus targets were summarized in Supplementary  Figure 4. miR-21-5p, miR-195-5p, and miR-214-3p had highest number of consensus targets, whereas miR-199a-3p were the miRNAs with smallest number of consensus targets. In addtion, we performed enrichment analyses to elucidate the biological function of miRNA integrated-signature using target genes. Finally, 72 Panther pathways, 143 KEGG pathways, and 857 GO processes were enriched with the miRNAs targets.
The top enriched panther pathways maps regulated by the miRNAs converge on apoptosis signaling pathway, interleukin signaling pathway, angiogenesis, PDGF signaling pathway, PI3 kinase pathway, p53 pathway feedback loops 2, and Ras Pathway, most of which are known to play an important role in carcinogentics ( Figure 6). KEGG pathways that were significantly enriched with the miRNAs targets were mainly associated with cancer, e.g. pathways in cancer, chronic myeloid leukemia, pancreatic cancer, colorectal cancer, prostate cancer, and MAPK signaling pathway (Supplementary Figure 5). The miRNAs target genes are statistically enriched in GO processes of regulation of transcription. Ten pathways and GO processes most strongly enriched by integrated-signature miRNA targets were shown in Supplementary Table 2.
Furthermore, to evaluate association between these pathways and HCC, the published papers which described HCC related constituent objects in the pathways were searched in PubMed. The pathway maps in which the constituent objects were supported in previous literature were considered to be HCC-related. After text mining, 39 Panther and 71 KEGG pathways pathways were found to be HCC-related. To visualize the most significantly enriched pathways, volcano plots were constructed by plotting the -log10 of p-value versus gene enrichment ratio on the y-and x-axes, respectively. Finally, 12 out of the 39 Panther pathways and 16 out of the 71 KEGG pathways were highly saturated with HCC (enrichment ratio > 0.15, p-value < 0.0001) (Figure 7, Table 2).

DISCUSSION
miRNA profiling efforts have often led to inconsistent results between the studies. Systematic review or meta-analysis has been done previously to determine differentially expressed genes in cancer at the gene level [32,33]. However, such rigorous approach is often not possible due to the lack of cross-platform standardization of miRNA profiling technologies or the unavailability of raw data. In current study, we overcame the drawback of lack of agreement among miRNA expression profiling studies in HCC using RRA method which directly analyzed 26 prioritized miRNA lists detected from a total of 1250 paired HCC tissues and adjacent noncancerous tissues. The miRNAs would be reranked and their significance would be re-decided. A true combined P value were calculated for each miRNA. In addition, we determined the overlap among many studies using different platforms and observe which miRNAs are consistently reported as differentially expressed. Finally, an integrated-signature of 5 upregulated and 8 downregulated miRNAs was identified. The miRNAs from every dataset matched at least one of the integratedsignature miRNAs. These integrated-signiture miRNAs likely showed biological relevance to the tumorigenesis of HCC, as opposed to sporadically reported genes.
Futhermore, we attempted to divide the patients into smaller subgroups according to etilogies of HCC. Most cases of HCC were secondary to either a viral hepatitis infection (hepatitis B or C) or cirrhosis (alcoholism being the most common cause of hepatic cirrhosis). However, in the included datasets, 8 datasets derived from HBV related HCC samples, 1 from samples with HCV infection, while the others were muti-etiologies. Therefore, the only subset of datasets derived from HBV related HCC samples was reanalyzed. The results showed that 6 aberant miRNAs involving only HBV related HCC samples were included in the integrated-signature miRNAs with overall HCC datasets. This indicated that miRNA profile of the HBV related and non-HBV-related HCC might not be fundamentally different and the most significant aberrations probably reflect the mechanisms that are common to all subtypes of the disease.
Most integrated-signature miRNAs were known to be functionally associated with cancer development. miR-93 has been identified as a potential biomarker for detection of many cancers [34,35]. Downregulation of miR-93 expression could reduce cell proliferation and clonogenicity of HepG2 cells. Furthermore, it is shown to directly target some tumor-suppressors [36,37]. Many of validated targets of miR-93 were notably associated with the regulation of angiogenesis, apoptosis, and cell cycle regulation [38,39]. Previous evidence has shown that miR-224 and miR-222 may function as an onco-miRNA in HCC cells by activating AKT signaling [15,40]. Furthermore, miR-224, miR-222 and miR-221 were increased in HCC tissues and might be an independent poor prognostic factor [41,42]. miR-21 has been validated as specific biomarker for many cancers, including HCC [43,44]. miR-223, miR-199a, miR-145, miR-195 and miR-130a are commonly repressed in hepatocellular carcinoma. Downregulation of miR-214 contributes to HCC via activation of the HDGF paracrine pathway for tumor angiogenesis [8]. miR-214 could inhibit the tumorigenesis of HCC through suppression of β-catenin. miR-195 may exert its tumor suppressive function by decreasing the expression of multiple NF-κB downstream effectors by way of the direct targeting of IKKα and TAB3 in HCC. miR-150 is not extensively studied in HCC, but it functions as a tumour suppressor in human colorectal cancer by targeting c-Myb, an important pro-invasive molecule [45]. In addition, the results showed that corrected p-values of integrated-signature miRNAs were less than 1.0E-03 (Table 1). Interestingly, miR-122, as the most frequent miRNA in the liver, and a central player in liver biology and disease, was not part of HCC integrated-miRNA signature. miR-122 has been shown to be an essential regulator in the development of HCC [46]. miR-122 was found among downregulated miRNAs (4 studies), but did not reach the statistical significance in our integrated-analysis.
To determine whether these 13 miRNAs have been previously validated to have diagnostic/prognostic values as biomarkers in HCC, we also performed a validation experiment, and our data confirmed that miR-93-5p, miR-224-5p, miR-221-3p and miR-21-5p were up-regulated and miR-214-3p, miR-199a-3p, miR-195-5p, miR-150-5p and miR-145-5p were down-regulated in HCC tissues, which further supported the findings obtained in the present integrated bioinformatics analysis. Consistent with our initial analysis, 11 miRNAs were found to be significantly dysregulated in HCC tissues in TCGA data base, except miR-199a-5p and miR-199a-3p which were not listed. In current study, as was validated by qRT-PCR, the miRNAs were all expressed in liver tissues. Therefore, this miRNA panel might be novel potential biomarkers for the diagnosis of HCC. The miRNA based score using LASSO regression model provided a high classification accuracy of HCC tissue. Further studies could be performed to evaluate the diagnostic value of the miRNA expression signature in HCC. In addition, the target genes enrichment analysis suggested that the validated miRNAs were key regulatory drivers of the oncogenic process, which indicated very strong impact on several pathways related to signaling, regulation of transcription and tumor development. Therefore, these miRNAs may be good candidate biomarkers for diagnosing or monitoring remission during postoperative follow-up in HCC. In current study, tumor grades were also identified by some of the 13 miRNAs (miR-93-5p, miR-222-3p, miR-221-3p, miR-21-5p and miR-214-3p). Using LASSO regression, the signature can separate patients into well-differentiated and moderately/ poorly differentiated tumor grades and may have clinical utility for decisions on patient management. However, none of the 13 most deregulated miRNAs was related with MVI in our initial analysis and TCGA data. Furthermore, we used Cox regression analysis to build a prognostic classifier, by which only miR-21 was selected.
The biological function of each validated miRNA were thoroughly investigated in our study. A single miRNA may target multiple target genes, and a specific mRNA may be regulated by many different miRNAs, which allow the miRNAs to induce changes in various pathways and processes and to present a further level of mechanism via which HCC may be induced. Supplementary Table 2 listed the ten most strongly enriched pathways and GO processes. The most significant pathways enriched in KEGG and Panther pathway by targets of rank aggregation miRNAs were pathways in cancer and apoptosis signaling pathway respectively, which highlighted the essential roles of miRNAs in cancer development. Regulation of transcription, known as the primary functions of miRNAs, was ranked first in the in the GO processes list. In functional enrichment analysis, when mapped to higher functional levels, inconsistent microRNA lists could fall within the same functional modules, pathways or networks and become more consistent. A better understanding of the functions of the miRNAs would advance their use in clinical settings. In addition to the known pathways in HCC tumorigenesis, we also performed text mining at pathway to evaluate C. 5-year survival analysis X-tile plots are shown in the left panels. The plot showed the chi-squared log-rank values created when the cohort was divided into two groups. The optimal cut-point highlighted by the black circle in the left panels is shown on a histogram of the entire cohort (middle panels) and a Kaplan-Meier plot (right panels).
the relevance of the enriched pathways in HCC. Sixteen KEGG pathways and 12 Panther pathways were shighly saturated with HCC ( Figure 6).
Although our analysis was limited to comparison and validation between tumor and noncancerous tissue only, the 13 most significantly and consistently reported differentially expressed miRNAs could be used as potential diagnostic and/or prognostic biomarkers. In a clinical setting, sufficient sensitivity and specificity of the panel of miRNAs should be determined in the further welldesigned clinical studies. Furthermore, targets prediction and functional enrichment analysis may provide a clue for elucidating the role of miRNAs in tumorigenesis of HCC and the precise underlying mechanisms. Taken together, the findings of the current study may have substantial clinical significance or implications.
In conclusion, a HCC associated microRNA expression signature, consisting of 11 highly significant and consistently dysregulated miRNAs, were identified in our integrated bioinformatics analysis and experimental validation study, which may be potential candidate biomarkers for HCC. The rigorous evaluation of integrated-miRNA signature and functional enrichment analysis of their targets were promising them as candidates

Studies selection and datasets
Gene Expression Omnibus (GEO, www.ncbi.nlm. nih.gov/geo/), ISI Web of Science (thomsonreuters.com/ web-of-science/), and ArrayExpress (www.ebi.ac.uk/ arrayexpress) were searched for hepatocellular carcinoma miRNA expression profiling studies that had been published prior to December 31 st , 2013. The search strategy was based on a combination of (mirna* OR microrna* OR mir-*) AND profil* AND ((liver AND (cancer* OR tumor* OR tumour* OR carcinoma)) OR (hepato* AND (cancer* OR tumor* OR tumour* OR carcinoma)). Citations of retrieved articles were also screened. Only original experimental articles published in English language were included. Full text of each study was carefully evaluated. The studies analyzed miRNA expression between HCC and noncancerous liver tissue in human were further analyzed. Expression studies of individual preselected candidate genes or studies using only cell lines were excluded. Studies that profiled different histologic subtypes but did not include noncancerous tissue were also excluded. indicate pathways of interest that display both large enrichment ratio (> 0.15, x-axis) as well as high statistical significance (P < 0.0001, y-axis). HCC, hepatocellular carcinoma; miRNA, microRNA. www.impactjournals.com/oncotarget

Standardization of miRNA names
The lists of miRNAs with statistically significant (less than 0.05 was considered significant) expression changes between HCC and noncancerous liver tissue were extracted from the included studies. Authors were contacted for supplemental data, if the gene list was not available in the publication. For a comprehensive integrated analysis of miRNA expression, it is essential that the miRNA names are comparable across the studies and follow the same nomenclature. Because of the relative novelty of the miRNA profiling fi eld and frequent updates in the miRBase, miRNA nomenclature can vary depending on when the study was conducted. Therefore, all miRNA FDR, false discovery rate names were standardized according to miRBase version 21 (http://www.mirbase.org/). Many traditional "major" miRNA names throughout the main text were redesignated according to miRBase database vesion 21. Viral miRNAs and non-miRNA probes were excluded from the analysis. Pre-miRNAs, reported in some of the studies, were used in the analyses after the standardization of precursor names.

Datasets construction
The extracted miRNAs were ranked based on statistical test fold changes where reported, and p-values where old changes were not reported. The rank matrixes of upregulated and downregulated miRNAs lists were separate analyzed, which constructed the overall rank matrix. Furthermore, the rank of miRNA from the analysis of upregulated and downregulated miRNAs lists were both normalized, which was the original rank divided by the maximal possible rank in the study. In the normalized rank matrixes, a value was given to each miRNA, which was one minus normalized rank of miRNA from the analysis of upregulated gene lists, or normalized rank from analysis of downregulated gene lists. Value 0.5 means that this miRNA was not reported in that study, value above 0.5 means it is upregulated and value below 0.5 means that this miRNA is downregulated in that study.

Statistical analysis
A novel RRA method implemented as an R package RobustRankAggreg was used to identify miRNAs that were ranked consistently better than expected by chance [47]. This method detects genes that are ranked consistently better than expected under null hypothesis of uncorrelated inputs and assigns a P-value for each gene. To assess the stability of acquired p-values, the leave one out cross-validation was applied on the robust rank aggregation algorithm. Analyses were repeated 10,000 times, and one random gene list was excluded from the analysis each time. Acquired P-values from each round for each miRNA were then averaged. All integratedsignature miRNAs that reached statistical significance after Bonferroni correction and were reported by at least 1/3 datasets were selected.

Validation of the integrated-signature miRNAs using quantitative real-time PCR
To validate the results of integrated bioinformatics analysis, 11 pairs of fresh HCC and adjacent noncancerous liver tissues were obtained from 11 patients by experienced surgeons and examined by experienced pathologists at the the First Affiliated Hospital of Wenzhou Medical University between July and December, 2014. Written informed consent was obtained from all patients or their guardians. The samples were frozen immediately and stored in liquid nitrogen after being surgically resected. Clinical information was summarized in the Table 3. Total RNA was extracted using the Qiagen RNeasy Kit (QIAGEN GmbH, Germany) according to the manufacturer's instructions. First-strand complementary DNA (cDNA) was synthesized from 2 μl of total RNA using an oligo-dT primer and superscript II reverse transcriptase (Invitrogen). Then, quantification of the significantly up-regulated or down-regulated miRNAs was performed by real-time PCR, using SYBRRPremix Ex Taq TM (TakaRa). The primers of each dys-regulated miRNAs The paired test was used to examine the difference of miRNA expression levels between tumor and adjacent nontumor liver tissues. The prognosis of HCC strongly depends upon nuclear grade and the presence of microscopic vascular invasion (MVI). Therefore, the difference of miRNA expression levels were also tested in samples with or without MVI, different tumor grades and survival. MVI was defined by the presence of tumour emboli within either the central hepatic vein, the portal, or the large capsular vessels [48]. Edmondson and Steiner's nuclear grades were used to classify the tumor grade, in which grades 1 and 2 were defined as well-differentiated, and grades 3 and 4 as moderately/poorly differentiated [49]. The results were validated on the TCGA datasets. miRNA expression data and corresponding clinical information for HCC dataset were downloaded from TCGA data portal in January 2015. TCGA data are classified by data type (clinical, mutations, gene expression) and data level, to allow structured access to this resource with appropriate patient privacy protection (Supplementary Table 4). This study meets the publication guidelines provided by TCGA. The miRNA expression profiling was performed using the Illumina HiSeq 2000 miRNA sequencing platforms (Illumina Inc, San Diego, CA). The miRNA expression level was demonstrated as reads per million miRNA mapped data. The miRNA expression analyses were performed using BRB-ArrayTools (version 4.4) developed by Dr. Richard Simon and the BRB-ArrayTools Development Team. In brief, the miRNAs with missing data exceeded 10% of all subjects were excluded from the dataset and the expression level of each individual miRNA was log2-transformed for further analysis. The predicted performances of the validated miRNAs for classifying HCC, MVI, and tumour grade were estimated on the TCGA datasets using ROC curve. The TCGA samples were assessed using a LASSO penalized regression analysis to predict HCC, MVI, tumor grade and survival using microRNA expression with leave-one-out crossvalidation using R software (v3.1.2) and the Lars package (v1.2) [50]. A risk score was generated using the sum of microRNA expression values weighted by the coefficients from the LASSO regression, as described. The statistical analyses were performed using the SPSS 18.0 (SPSS Inc.). Statistical significance was defined as p < 0.05.
For survival analysis, we used the Kaplan-Meier method to analysis the correlation between overall survival and the miRNAs, and the logrank test was used to compare survival curves. The optimum cut-off value for the miRNAs using X-tile plots based on the association with mortality of the patients. X-tile plots provide a single and intuitive method to assess the association between variables and survival. The X-tile program can automatically select the optimum data cut point according to the highest χ 2 value (minimum p value) defined by Kaplan-Meier survival analysis and log-rank test [51]. We did the X-tile plots using the X-tile software version 3.6.1 (Yale University School of Medicine, New Haven, CT, USA).

Enrichment analysis
Enrichment analyses for Panther and KEGG pathways and Gene Ontology terms were carried out with GeneCodis web tool (http://genecodis.dacya.ucm.es/) [52]. Predicted target genes for each miRNA were used as input and false discovery rate (FDR)-corrected p-values were visualized as a heatmap. Clustering of the heatmap was based on Pearson correlation and average linkage. Furthermore, the association between the pathways affected by altered expression of miRNAs and HCC was evaluated.

FUNDING
This work was supported by grants from the Scientific Research Foundation of Wenzhou, Zhejiang Province, China (Y2013073, Y20120183) and Incubator program of the First Affiliated Hospital of Wenzhou Medical University (HFY2014050).