Identification and functional analysis of differentially expressed genes in poorly differentiated hepatocellular carcinoma using RNA-seq

Poorly differentiated (PD) hepatocellular carcinoma (HCC) has a worse prognosis compared to moderately differentiated (MD) and well differentiated (WD) HCC. We aimed to identify differentially expressed genes (DEGs) to explore the mechanism of PD HCC. Transcriptome sequencing was performed on tumor and adjacent non-tumorous tissues of PD, MD and WD HCC patients (3 for each group). DEGs were thus identified and functionally analyzed. Further RT-PCR was performed to validate DEGs specific for PD HCC in 47 pairs of samples (15 for PD, 18 for MD, 14 for WD). A total of 681 PD DEGs were detected, including 368 up-regulated and 313 down-regulated genes. Less DEGs were found for MD and especially for WD HCC. Through bioinformatics analysis, PD HCC DEGs were enriched in liver tissue and liver cancer cells, and in biological process and pathway including metabolism, cell cycle, translation and blood coagulation. Potential drugs and genetic perturbations were found to reverse the cancer condition. The RT-PCR results showed consistency with RNA-seq in the validation of 4 DEGs specific for PD HCC. This study detected and validated DEGs of PD HCC, which provides useful information on molecular mechanism of PD HCC for development of new biomarkers, therapeutic targets and drugs.


INTRODUCTION
As one of the most common malignancies worldwide, hepatocellular carcinoma (HCC) represents the second-leading cause of cancer deaths globally with 745,000 deaths per year [1].The HCC were categorized into poorly-, moderately-, and well-differentiated types.Hepatic resection is currently the most optimal choice for HCC treatment.However, poorly differentiated HCC has a worse prognosis with high recurrence rate compared to other two types [2,3].In addition, it is hard to discriminate poorly differentiated HCC from other two types of HCC before treatment.The lack of good diagnostic markers and therapeutic targets has rendered HCC a major challenge.
Recently, high-throughput technologies like RNA sequencing make it possible and easy to illustrate the transcriptome characteristics of cancers including HCC [4].Some potential biomarkers for HCC were identified using transcriptome sequencing e.g.SERPINA11 whose expression is correlated with pathology stages lack documented expression profiles in liver cancer [5].Some signaling pathways like cell cycle were also involved

Research Paper
via analysis of transcriptome sequencing data [6].Those studies show a great promise of exploring the molecular basis of HCC.
In this study, we performed transcriptome sequencing for patients with poorly differentiated (PD), moderately differentiated (MD) and well differentiated (WD) HCC. 3 patients diagnosed as HCC of each grades were recruited, respectively.Differentially expressed genes (DEGs) were identified through analysis of transcriptome sequencing data.DEGs were then subjected to gene set enrichment analysis.Furthermore, RT-PCR was performed to validate potential biomarkers in 15 pairs of poorly differentiated tumor and adjacent non-tumorous samples.Our investigation may provide new clues on the molecular event responsible for the progression of HCC and potential biomarkers and therapeutic targets for diagnosis and treatment of HCC patients.

Overview of transcriptome sequencing statistics
Pair-end second-generation transcriptome sequencing was performed in 9 HCC patients.Sample characteristics were shown in Table 1.An average of 35,772,695 pair-end 125 bp clean reads was generated (Table 2).The average mapping rate was 93.17%, resulting an average coverage of depth of 32 × (Table 2).Expression levels of more than 25,200 genes were calculated using Tophat/Cufflinks (Supplementary Table 1).The heatmap of all expressed genes were drawn in Figure 1, which showed a big group of differentially expressed genes (DEGs) between tumor and adjacent non-tumorous samples of poorly differentiated HCC.

Identification of DEGs between tumor and adjacent non-tumor tissues
We next detected DEGs between tumor and adjacent non-tumorous samples of poorly differentiated, moderately differentiated and well differentiated HCC.Through paired-T test analysis (P value < 0.05, Fold change > 2), a total of 1020 DEGs were detected including 372 up-regulated genes (313, 47 and 12 for poorly-, moderately-and well-differentiated HCC, respectively) and 648 down-regulated genes (368, 249 and 31 for poorly-, moderately-and well-differentiated HCC, respectively) (Figure 2A; Supplementary Table 2).Only 1 down-regulated gene (CD44) was overlapped in all 3 grades.Top 20 up-regulated genes and Top 20 down-regulated genes of poorly differentiated HCC were shown in Table 3 and Table 4, respectively.

Significantly enriched functions for DEGs of poorly differentiated HCC
DEGs of poorly differentiated HCC were subjected to functional enrichment analyses to illustrate their biological function characteristics using Enrichr tool [7].The top enriched terms for categories were shown in Figure 2B (P value < 0.1).From Human Gene Atlas and Cancer Cell Line Encyclopedia, those DEGs were enriched in liver tissue and various liver cancer cells.The enriched biological process and pathway included metabolism, cell cycle, translation and blood coagulation.The DEGs tended to lie in mitochondrial, chromosome and ribosome.From the ENCODE TF ChIP-seq data, 5 transcription factors in liver cancer cell HEPG2 were enriched to regulate the poorly differentiated DEGs.

Potential perturbations for reverse of abnormal gene expression change of poorly differentiated HCC
LINCS is a collection of signatures of gene expression for a broad range of conditions such as drug treatment, ligand treatment, gene knockdown, and gene over-expression in many different types of human cells [8].We inputted the up-regulated genes and down-regulated genes into lincscloud to find perturbations that had opposite gene expression change with differentiated HCC.Top 12 perturbations with high negative connectivity scores (Figure 2C), which included 4 compounds, 7 knockdown genes and 1 over-expression gene.

Comparison of tumor tissues between PD HCC and MD&WD HCC
Besides DEGs between tumor and adjacent nontumorous samples, we compared tumor tissues between PD HCC and MD&WD HCC.As a result, 725 differentially expressed genes were identified (Supplementary Table 2); among them, 209 were identified as DEGs when comparing tumor and adjacent non-tumorous samples (Figure 2D).Gene ontology enrichment analysis was also done based on this 725gene list.Similarly, these genes were enriched in organelle fission, mitotic nuclear division, positive regulation of cell cycle phase transition and serine family amino acid catabolic process.

Validation of DEGs specific for poorly differentiated HCC by RT-PCR
4 DEGs specific for poorly differentiated HCC as well as related with cell cycle or proliferation were validated by RT-PCR.In the discovery phase, the expressions of NOVA1, NSMCE2 and KIAA0196 were significantly up-regulated, while expression of AQP9 was significantly down-regulated in poorly differentiated samples, as compared with that in adjacent non-tumorous samples, moderately differentiated samples and well differentiated samples (P < 0.05) (Table 5).RT-PCR results showed that all these 4 genes were successfully validated (Figure 3), and the dysregulation trend matched with those observed in the RNA-seq data.

DISCUSSION
We have applied the transcriptome sequencing approach to illustrate the gene expression characteristics of poorly differentiated HCC.The number of DEGs for poorly differentiated HCC is significantly bigger than that of other two stages.Those DEGs are enriched in liver tissue and cells, which is easily understood.Pathway analysis showed that two pathways, cell cycle and complement and coagulation cascades, were overrepresented with DEGs.Deregulation of the cell cycle pathway is expected since uncontrolled cell division is the major character of cancer cells.As for the complement and coagulation cascades pathway, consistent with our results, both gene expression [9] and proteomics [10] analysis have shown that this pathway is related to the pathogenesis of HCC.Some transcription factors were found to regulate the DEGs.For example, HNF4-α could represent a central regulator of gene transcription in hepatocytes, and a strong candidate to be involved in liver cancer cell development [11].In addition, we also compared tumor tissues between PD HCC and MD&WD HCC.And similarly, the differentially expressed genes are enriched in the process of cell cycle and division.
After query of the poorly differentiated HCC signature in the LINCS server in our study, some compounds, knockdown genes and overexpression genes were found to have strong negative connections with the signature.PI-103 is a potent ATP-competitive dual inhibitor of phosphatidylinositol 3-kinase (PI3K) and mTOR (mammalian target of rapamycin).The combination of PI-103 and sorafenib was found to inhibit hepatocellular carcinoma cell proliferation by blocking Ras/Raf/MAPK and PI3K/AKT/mTOR pathways [12].MK-2206 is an oral selective allosteric inhibitor of Akt that targets all three isoforms of human Akt (Akt-1, Akt-2 and Akt-3).MK2206 was found to inhibit hepatocellular carcinoma cellular proliferation via induction of apoptosis and cell cycle arrest [13].Among 8 genetic perturbations, 5 genes are enriched in pathways in cancer or PI3K-Akt signaling pathway (p<0.001).Myc proto-oncogene protein (MYC) as a transcription factor can activate the transcription of growth-related genes.HCC is frequently associated with overexpression of MYC.In this study, MYC is up-regulated in the poorly differentiated HCC.Knockdown of oncogenic KRAS was found to suppress tumor growth in non-small cell lung cancers [14].As mentioned, AKT2 is the target of MK2206, knockdown and inhibition of AKT2 may both help to reverse HCC.Overexpression of cyclin dependent kinase inhibitor 1B (CDKN1B) has the reverse effect in our study.CDKN1B binds to and prevents the activation of cyclin complexes, and thus controls the cell cycle progression at G1 [15].All those perturbations may be the potential therapies or targets for treatment of HCC.In the validation cohort, further RT-PCR was performed to validate 4 DEGs specific for poorly differentiated HCC, of which NOVA1, NSMCE2 and KIAA0196 were significantly up-regulated, while AQP9 was significantly downregulated in poorly differentiated samples, as compared with that in 9 adjacent non-tumor samples, 3 moderately differentiated samples and 3 well differentiated samples.The RT-PCR results showed consistency with RNA-seq.Interestingly, these DEGs were closely related with cell cycle or proliferation.High expression of NOVA1 was found correlated with poor survival rate and increased recurrence rate in HCC [16].NSMCE2 was required for G1-S transition in breast cancer cells and manipulation of NSMCE2-mediated sumoylation may alter the growth rates of breast cancer cells [17].KIAA0196, involving in meiosis-related spindle assembly [18], has been showed increased expression in clinical prostate carcinomas and also amplified in 30-40% of xenografts and hormonerefractory tumors [19].However, AQP9 has been found to be down-regulated in hepatocellular carcinoma and its over-expression suppresses hepatoma cell invasion through inhibiting epithelial-to-mesenchymal transition [20].In conclusion, this study explored the molecular mechanism of hepatocarcinogenesis through assessment of RNA seq data of HCC and validation of 4 DEGs specific for poorly differentiated HCC in an independent cohort.It provides useful information on the transcriptomic landscape as well as a mechanistic overview of HCC.
Our findings offer novel insights and useful support in biomarker development and suggest new potential targets in poorly differentiated HCC characterization.

Ethics statement
Our study design was approved by the Ethics Committee of the Fujian Provincial Hospital.Written informed consent was obtained from all subjects.Subjects 47 subjects were diagnosed as primary HCC in the Fujian Provincial Hospital (Table 1), of which 33 subjects were present with cirrhosis on the non-neoplastic background, including 29 subjects with hepatitis B virus (HBV), 1 subjects with hepatitis C virus (HCV) and 3 subjects with NBNC.HBV related tumors were defined according to the presence of HB surface antigen (HBsAg) in serum, and HCV related tumors were according to the presence of antibody to HCV (HCVAb) in serum.NBNC tumor was defined according to the absence of both HBsAg and HCVAb in serum.Primary tumor and adjacent non-tumorous samples were obtained from all patients who underwent surgical tumor resection.All samples were frozen immediately at -80°C until RNA extraction.Total RNA was isolated by using RecoverAll™ Total Nucleic Acid Isolation Kit (Life Technologies, Carlsbad, CA, Note: N1, N2, N3 and T1, T2, T3 represent adjacent non-tumor tissue and tumor tissue of P1, P2, P3 with poorly differentiated HCC, respectively.N4, N5, N6 and T4, T5, T6 represent adjacent non-tumor tissue and tumor tissue of M1, M2, M3 with moderately differentiated HCC, respectively.N7, N8, N9 and T7, T8, T9 represent adjacent non-tumor tissue and tumor tissue of W1, W2, W3 with well differentiated HCC, respectively.p value refers to T1, T2, T3 vs N1, N2, N3, N4, N5, N6, N7, N8, N9, T4, T5, T6, T7, T8, T9 by t test.

Transcriptome sequencing
Sequencing libraries were prepared by using TruSeq RNA Sample Prep Kit (Illumina, San Diego, CA, USA) according to standard protocols.Briefly, total RNA was firstly randomly fragmented and poly-A-selected.Secondly, the RNA fragments were reverse transcribed to cDNA, end-repaired and ligated with adapters.The libraries then underwent size selection, PCR and purification.The quality of libraries was assessed by using Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA).Sequencing was then performed on an Illumina HiSeq2500 sequencer with 125 bp pair-end reads.

Reads processing
Raw sequencing reads were firstly filtered for adapters and ribosomal RNA.Reads containing five or more low quality (quality score < 20) bases were also removed.The remained high-quality reads were then aligned to human genome (hg19) by using Tophat [21].The mapped reads were then subjected to alignment against the human transcriptome (Ensembl, GRCh37.73).Gene expression level measured by FPKM (fragments per kilobase per million) was calculated by Cufflinks [22].

Differentially expressed genes (DEGs) analysis
For each differentiated HCC group, DEGs between the tumor and matched non-tumorous tissues were identified with pair-wise t test and the significant threshold was set as adjusted p-value of less than 0.05 and |log 2 (fold change, FC)| > = 1.Enrichr was used to do functional enrichment analysis like Gene Ontology (GO) and KEGG pathway [7].The significant threshold for enrichment was set as p < 0.1.In addition, the list of up-regulated genes and down-regulated was uploaded into lincscloud [8] to find perturbations which can reverse the cancer condition.
Using similar methods (t test instead of paired t test), DEGs were identified and analyzed between tumor tissues of PD HCC and those of MD&WD HCC.

DEGs specific for poorly differentiated HCC validated by RT-PCR
DEGs specific for poorly differentiated HCC (Compared with 3 matched adjacent non-tumor tissue of poorly differentiated HCC, P<0.05, |log 2 (fold change, FC)| > = 1; Compared with all adjacent non-tumor tissue, 3 moderately differentiated tissue and 3 well differentiated tissue, P<0.05, |log 2 (fold change, FC)| > = 1 ) were subjected to validation using RT-PCR.The validation cohort included 15 poorly differentiated HCC, 18 moderately differentiated HCC and 14 well differentiated HCC (Table 1).For the RT-PCR reactions, total RNA was converted to cDNA with random hexamer primers using the High-Capacity cDNA Reverse Transcription kit (Applied Biosystems, Foster City, CA, USA).Realtime PCR was performed with SYBR Green I (Applied Biosystems, United States) on ABI 7300 (Applied Biosystems).The primers used were described in Table 6.

Statistical analysis
T-test was used to compare continuous measurement data with a normal distribution between two groups.Mann-Whitney U test was used to evaluate continuous data with a non-normal distribution.Statistical analyses were performed with the SPSS13.0software (SPSS, United States).P < 0.05 was considered statistically significant.

Figure 1 :
Figure 1: Heatmap of expressed genes.For the sample labels, N and T represented matched adjacent non-tumor tissue specimens and tumor tissue specimens respectively.Each row represented of Z scores of genes across different samples.

Figure 2 :
Figure 2: Functional characterization of DEGs.(A) Venn diagram of DEGs between poorly differentiated (PD), moderately differentiated (MD) and well differentiated (WD) HCC.(B) Functional enrichment analysis of DEGs of poorly differentiated HCC.(C) Top 12 perturbations with high negative connectivity scores with DEGs of poorly differentiated HCC.(D) Venn diagram of DEGs between PD HCC tumor tissues and MD&WD tumor tissues, DEGs of PD HCC, DEGs of MD HCC, and DEGs of WD HCC.

Figure 3 :
Figure 3: RT-PCR validation of 4 DEGs specific for poorly differentiated HCC.GAPDH mRNA was used as an internal control.Expression data were obtained as 2 -ΔΔCT relative to adjacent non-tumor tissue values.N PD and T PD represented matched adjacent non-tumor tissue specimens (n = 15) and poorly differentiated HCC tissue specimens (n = 15) respectively.N MD and T MD represented matched adjacent non-tumor tissue specimens (n = 18) and moderately differentiated HCC tissue specimens (n = 18) respectively.N WD and T WD represented matched adjacent non-tumor tissue specimens (n = 14) and well differentiated HCC tissue specimens (n=14) respectively.(A) NOVA1; (B) NSMCE2; (C) KIAA0196; (D) AQP9.# P < 0.05 vs T MD and T WD .

Table 1 : Detailed characteristics of the patients Patient Age Gender Hepatitis Serum AFP level (ng/mL) Metastasis Glisson capsule invasion Tumor size (mm) Differentiation Grade
Note: The nine samples used for sequencing are labeled in bold and italic.

Table 2 : Summary statistics of the transcriptome sequencing Patient Differentiation grade Sample Type Total reads Mapped reads
Note: T and N represent tumor and adjacent non-tumor tissue, respectively.www.impactjournals.com/oncotarget