G2/M checkpoint plays a vital role at the early stage of HCC by analysis of key pathways and genes

The present study was designed to explore the molecular mechanism at the early stage of hepatocarcinoma (HCC) and identify the candidate genes and pathways changed significantly. We downloaded the gene expression file dataset GSE6764 from GEO, adopted the Robust Multi-array Average (RMA) algorithm to preprocess the raw file. 797 differentially expressed genes (DEGs) were screened out based on the SAM method using R language. Ingenuity Pathway Analysis (IPA) was used to perform canonical pathway analysis in order to calculate the most significantly changed pathways and predict the upstream regulators. In order to confirm the results from the DEGs which based on the individual gene level, the gene set enrichment analysis (GSEA) was done from the gene set level and the leading edge analysis was performed to find out the most appeared genes in several gene sets. The PPI network was built using GeneMANIA and the key genes were calculated using cytoHubba plugin based on cytoscape 3.4.0. We found that the Cell Cycle: G2/M DNA damage checkpoint regulation is the top-ranked pathways at the early stage of HCC by IPA. The high expression of several genes including CCNB1, CDC25B, XPO1, GMPS, KPNA2 and MELK is correlated with high risk, poor prognosis and shorter overall survival time in HCC patients by use of Kaplan-Meier Survival analysis. Taken together, our study showed that the G2/M checkpoint plays a vital role at the early HCC and the genes participate in the process may serve as biomarkers for the diagnosis and prognosis.


INTRODUCTION
Hepatocellular carcinoma (HCC) is the fifth most common cause of cancer and responsible for a third of the cancer-related deaths worldwide. The occurrence of HCC comprises many changes such as gene mutations, chromosomal aberrations and molecular pathways which always accompanied by cell cycle dysregulation, evasion of apoptosis [1]. So far, the best therapeutic approach for the HCC patients is liver transplantation which can eliminate HCC. However, recurrence rates remain high. Methods for early HCC detection are often evaluated on specificity and sensitivity [2] and many guidelines have been established for the early liver cancer diagnosis [3].
To identify potentially useful biomarkers and targets for the early diagnosis of HCC, the molecular mechanism of the cancer has been studied intensely especially the onset of HCC [4][5][6][7][8]. SPRTN could decrease DNA replication stress in DNA replication and G2/Mcheckpoint regulation and the mutation of SPRTN could cause early onset of hepatocellular carcinoma [9]. In order to determine candidate genes and the most significant pathways associated with the early stage of HCC, we performed the individual and gene set level analysis by www.impactjournals.com/oncotarget/ Oncotarget, 2017, Vol. 8, (No. 44), pp: 76305-76317 Research Paper www.impactjournals.com/oncotarget use of a series of bioinformatics approaches. Especially, the differential expressed genes (DEGs) were screened out using the SAM method and the pathways enrichment was performed using Ingenuity Pathway Analysis (IPA). Furthermore, in order to avoid the drawback of individual gene analysis, GSEA was performed to verify the former result. Then, we built the PPI network from DEGs to identify the key genes using cytoHubba plugin. And then the co-expression network was built from the key genes by use of the geneMANIA plugin based on Cytoscape.

Microarray analysis and data pre-processing
In order to guarantee the quality of every chip before the next analysis, we performed quality control (QC) for every raw file. The results of QC plot and box plot before and after normalization were shown in Figure 1.

Identification of DEGs
A total of 981probes were screened out at the delta = 2.44 with the FDR<0.1% (Figure 2A) (Supplementary Table 1). The minimum FDR value was reserved if several probes corresponded the same gene. At last, 797 DEGs between the early HCC and normal controls were screened out using SAM, including 421 up-regulated and 376 down-regulated genes (Supplementary Table 1). All of these DEGs are classed into 14 types according to IPA as shown in Figure 2B.
According to the classification, enzymes, TFs, transporter, kinase composed most of the DEGs.

Canonical pathway analysis
We compared the early HCC group with the control group using IPA tool. 78 canonical pathways were identified with a p-value<0.05 and the top 26 pathways associated with the onset of HCC are shown in

The upstream regulator analysis
The upstream regulator analysis was performed by IPA and 7 transcription factors (TFs) were predicted to be activated and 6TFs be inhibited as shown in Table 1. The 7 predicted activated TFs and their target genes are shown in Figure 4. The DEGs regulated by FOXO1 participate in cell cycle mainly.   Red represents the up-regulated genes, the green represents the down-regulated genes and the grey represents the no overlap genes with dataset. The significance (-log p value) of every pathway is indicated in parenthesis.

Gene set enrichment analysis confirmed the enrichment of G2/M checkpoint at the early stage of HCC
The results from IPA showed that G2/M checkpoint regulation was the most significantly changed biological process which relates to cell proliferation closely. So we selected 14 gene sets related to the G2/M checkpoint from all 15142 gene sets in GSEA to confirm the enrichment of G2/M checkpoint or related process (Supplementary Table 2). We set the number of permutations was 1000, the permutation type was gene-set, the max and min size of gene sets selected was 500 and 10 respectively with the other parameters were default. As a result, 13/14 genes sets were up-regulated in HCC with 7 gene sets were significantly enriched at FDR<25% and one gene set was enriched at nominal p-value<0.05. 1/14 gene sets was upregulated in the control group (BHATI_G2M_ARREST_ BY_2METHOXYESTRADIOL_DN). The enrichment plot of 8 up-regulated gene sets are shown in Figure 5.

Leading edge analysis
In order to determine which genes appeared frequently in 8 genes sets associated with G2/M checkpoint and explore the genes that have the highest impact on G2/M checkpoint, 8 gene sets were dedicated to perform the leading edge analysis as shown in Figure 6. Three terms from GO overlapped mostly. CCNA2 appeared in 7 gene sets, CDC25B appeared in 6 gene sets, and NEK2,NBN,CCNB1,CDC7,ATM,XPO1, MRE11A,CENPF,TAOK3 appeared in 3 gene sets.

PPI network construction and analysis from all DEGs
From the 797 DEGs, a network with 721 nodes and 30900 edges was constructed using GeneMANIA plugin. And eleven scoring methods including the newly developed algorithms MCC were performed by use of cytoHubba plugin. At last, 15 genes were screened out according to local-based method MCC and global-based method bottleneck and stress. The co-expression network from the 15 top-ranked genes was constructed as shown in Figure  7. 14 out 15 genes were up regulated and only C8A down regulated. 6 out of 20 top related genes were DEGs and were all up regulated. Most of these genes related to cell cycle.

Kaplan-Meier survival analysis
In order to find the relationship between the key genes and survival of the HCC patients, we performed the Kaplan-Meier Survival analysis. The data showed

DISCUSSION
In order to explore the most significantly dysregulated pathways and key genes which play roles at the early stage of HCC and gain an insight into the onset of HCC which  ADAR,APOBEC3B,CLIC4,HLA-F,IFI27,IFI44,IFI6,IFIT2,ISG15  ,OASL,PARP12,PLAC8,PNP,STAT1,STAT2,TAP1,TDRD7,TLR4   could be applied to the early diagnosis and therapy, a series of bioinformatics methods were performed. According to our studies, the Cell Cycle: G2/M checkpoint regulation was the most dysregulated pathways with 11 DEGs are all up regulated. As the second checkpoint within the cell cycle, G2/M checkpoint prevents cells with damaged DNA from entering the M phase so that these DNAs could be repaired. This kind of regulation is critical to prevent cells from going through malignant transformation. The deficiency of p53 in most human cancers make G1 checkpoint defective. The S-phase checkpoint slows rather than arrest of the cell cycle. So the cancer cell with damaged DNA could accelerate through the cell cycle and arrest at the G2 checkpoint. All the above makes the G2 checkpoint an attractive therapeutic target for anticancer therapy [10]. It has been demonstrated that Polo-like kinase (PLK) may be an early diagnostic marker for the development of HCC by regulating the G2/M checkpoint [11]. lncRNA16 is a promising biomarker for early diagnosis of lung cancer by promoting the G2/M transition by regulating the transcription of cyclin B2. As a promising antitumor agent, Isocorydine(ICD) could induce G2/M cycle arrest of HCC through activation of GADD45A-p21 pathway [12]. So, we inferred the dysregulation of this pathway is very important to the onset of HCC.
In order to confirm the result, GSEA was performed from the gene set level. GSEA is a computational method which determines whether an a priori defined set of genes shows statically significant between two biological states at the level of gene sets instead of an individual gene. GSEA can make up for the deficiency of traditional strategies which focused on the DEGs. The result of 13 gene sets associated with G2/M checkpoint upregulated in HCC and 1 upregulated in control confirmed that the G2/M checkpoint changed significantly. In order to determine the genes which contributed most to the enrichment result of 13 gene sets, the leading edge analysis was performed. 11 genes including CCNA2 (DEG), CDC25B, NEK2 (DEG), NBN, CCNB1(DEG), CDC7, ATM, XPO1(DEG), MRE11A, CENPF(DEG), TAOK3 appeared most often in several gene sets. The transcriptional factor FOXO1 was predicted active and the target genes regulated by it were associated with cell cycle most. The crosstalk of genes participated in the G2/M checkpoint is shown in Figure 9. from the molecular activity predictor(MAP).
As a plugin in cytoscape, cytoHubba provides an effective method to identify important nodes in biological networks. It could accomplish the computation of eleven methods in one stop shopping way including four local-    except C8A are upregulated and most of them participate in the cell cycle. From the analysis of co-expression network, we can find that the co-expressed genes which belong to DEGs upreguated too. Notably, XPO1 and CENPF is also screened out in the leading edge analysis. SR (serine/arginine-rich domain) proteins play a critical role in many process including nuclear export of mature mRNA, polymerase II transcription and nonsensemediated mRNA decay. SRPK1 (Serine/threonine-protein kinase) can phosphorylate SR proteins through its PKc superfamily kinase domain. But, SRPK1 displays pleiotropic effects in various cancers and regulates different cellular properties which might be related to preferential activation of different downstream signaling pathways [13][14][15][16][17][18][19][20]. Expression of SRPK1 was significantly upregulated frequently in HCC cell lines and HCC samples compared with the normal tissue sample both at the mRNA and protein level [21]. In the study of HCC, SRPK1 may be located downstream of AKT and activated AKT may induce the autophosphorylation of SRPK1 which lead to the phosphorylation of downstream splicing factors [20]. SRPK1 may be associated with FAK signaling, MAPK signaling, Wnt/β-catenin signaling and angiogenesis [13,22]. More research implied that SRPK1 may be a novel target for cancer diagnosis and therapy. But the detailed roles and mechanisms of SRPK1 in cancer especially in HCC are not clear.
KPNA2 (karyopherin alpha 2) may participate in carcinogenesis by regulating the translocation of some cargo proteins which are involved in cancer. It was demonstrated that KPNA2 promotes cell proliferation and tumorigenicity in epithelial ovarian carcinoma by regulating c-Myc and FOXO3a [23]. The knockdown of KPNA2 could inhibit proliferation of several cancer cells including liver and lung and KPNA2 may be a useful prognostic biomarker to monitor cancer prognosis [24,25].
To date, over 230 kinds of proteins were verified as the cargo of XPO1(Chromosome region maintenance 1,CRM1) including p53, p21, IkB, ribosomal subunits and so on [26]. XPO1 plays a vital role in nucleo-cytoplasmic transport through RanGTP dependent mechanism. For most typical tumor suppressive proteins such as p21, they play different functions according to their subcellular localization which elucidate that XPO1 plays an vital role in the process of cancer and may lead to a new method for cancer therapy which associated with cell cycle arrest and induction of apoptosis [27][28][29].
As glutamine amidotransferases involved in de novo purine biosynthesis, GMPS (GMP synthetase) was shown to have a striking role in cell proliferation. Under genomic stress, GMPS plays a vital role in the relay of p53 stabilization by TRIM21-GMPS-USP7 molecular cascade. The guanine nucleotides is essential for nucleotide formation, energy storage and nuclear transport which could be provided by guanine biosynthesis pathway in cancer cells [30,31]. Repression of GMPS by p53 through p21 is a functionally relevant part of the p53mediated process in inhibiting tumor cell growth in liver cancer [32].
It is supposed that MELK (maternal embryonic leucine zipper kinase) plays critical roles in many aspects including cell cycle, cell proliferation, embryogenesis and oncogenesis due to its overexpression in many kinds of cancers. MELK is associated with early HCC recurrence and poor patients' survival but the mechanism has not been elucidated [33]. MELK knockdown or deletion in GC (gastric cancer) and ovarian cancer cells activates G2/M arrest and enhances apoptosis [34,35].
Previous studies have revealed that amplified CENPF (centromere protein F) may play a role as common cancerdriver genes in human cancers. CENPF contains many leucine zipper motifs and is regulated in a cell cycledependent manner [36]. It amplified and overexpressed not only in HCC but also in many other types of human cancer including breast cancer, colorectal cancer, prostate cancer [37,38]. Silence of CENPF arrests HCC cells at the G2/M transition with the accumulation of MPF (mature promoting factor) and CCNB1/CDC2 complex [39]. In consistence with these studies, the present study found that CENPF is identified as a key gene not only in DEGs but also in leading edge analysis which based on gene set level.
As the single down regulated one in all the key genes, C8A (complement component alpha) involves in the complement system and participates in the formation of MAC (membrane attack complex) combined with other complement proteins such as C5b, C6, C7, C8 and C9. In addition, the expression of several other complement components or subunits are all down regulated ,including C1S, C2, C5, C6, C7, C8B, C8G, C9. Obviously, it is not coincidence. That is to say, the activity of complement system down regulated at the early stage of HCC. The transcription of C8A and C5 is regulated by HNF1α (hepatocyte nuclear factor 1 alpha) both are essential components of MAC [40]. The relationship between C8A and cancer is not clear yet.
G2/M checkpoint provides an opportunity for DNA repair by increasing the time for repair and by transcriptionally inducing gene expression and stopping the proliferation of damaged cells [41]. To the best of our knowledge, G2/M checkpoint transition activated at the early stage of HCC was provided for the first time through the microarray analysis. The negligent G2/M checkpoint enhanced the possibility for DSB (double strand breakage). And the unrepaired DSBs before mitosis will pose a higher risk for genomic instability and tumor cell development [42]. So, the defect of G2/M checkpoint may play a critical role at the onset of HCC. www.impactjournals.com/oncotarget

Data source
The gene expression dataset GSE6764 were downloaded from the Gene Expression Omnibus (GEO) database by Wurmbach E et.al (https://www.ncbi.nlm. nih.gov/geo/). 75 tissue samples obtained from patients undergoing resection or liver transplantation were divided into 8 groups from pre-neoplastic lesions to HCC and normal liver were used as control (http://www.ncbi.nlm. nih.gov/geo/). We combined very early and early HCC to the case group including 18 tissue samples altogether, and the control group, with 10 normal liver tissue. All tissue samples are hybridized on the human U133 plus 2.0 array (Affymetrix).

Microarray data analysis and identification of differentially expressed genes (DEGs) using SAM
Robust Multi-array Average (RMA) algorithm including background correction, normalization and summarization was performed to convert the .CEL raw file to expression data which based on R language [43]. The simpleaffy package was utilized to perform the quality control. Once the signal value for each probe set calculated in every microarray, the t-test based significance analysis of microarrays (SAM) which make use of permutations to simulate for every gene a situation in which there is no difference between the two groups was utilized to determine the DEGs. SAM method adjusts the p-value to false discovery rate (FDR) to reduce the false-positive through multiple testing. A <0.1% False discovery rate (FDR) cut-off was used for all differential expression calculations [44].

Ingenuity pathway analysis
Ingenuity Pathway Analysis (IPA) is a functional analysis tool (Ingenuity Systems, Mountain view, CA, USA). We use IPA to identify the most significant pathways (including 302 metabolic pathways and 360 signaling pathways) and construct molecular interaction networks from the DEGs. In brief, we uploaded the DEGs list file containing gene symbols, FC, p-values to IPA and performed the core analysis. In general settings, the node types, data sources, confidence, species, tissues & cell lines and mutation were specified.

The IPA upstream transcriptional regulator analysis
In order to explain the biological activities due to the DEGs, we identified the cascade of upstream transcriptional regulators with p-value of overlap <0.05 and the absolute activation z-score>2.

Gene set enrichment analysis (GSEA) and leading edge analysis
GSEA is a kind of gene enrichment method considering the full list of genes different from single gene method [45]. In GSEA, genes are ranked by their correlation with phenotype and every enrichment gene set will get an enrichment score (ES). In this study, 2000 gene permutations were used to generate a null distribution for ES, then each pathway will attain a normalization enrichment score(NES). Gene sets with considered significantly enriched with a relatively relax p-value and FDR<0.25. A leading edge analysis was performed to elucidate key genes associated with the early stage of HCC, especially the G2/M checkpoint regulation [46].

Construction of PPI network from all DEGs and the screening of key genes
In order to comprehend the specific molecular mechanism of early HCC, we constructed the PPI network based on GeneMANIA plugin and calculated the key DEGs using cytoHubba plugin [47,48]. At last, we built the co-expression network of top-ranked genes from all DEGs and performed the visualization and analysis by use of Cytoscape 3.4.0(http://cytoscape.org/).

Kaplan-Meier survival analysis
SurvExpress(http://bioinformatica.mty.itesm. mx:8080/Biomatec/SurvivaX.jsp) was employed to perform the survival analysis in the datasets TCGA-liver cancer containing 422 samples provided by SruvExpress using the key genes as an input. For the duplicated genes, all probe sets/records will be averaged per sample using the original (Quantile-Normalized) data. The maximum risk groups were selected for the cox survival analysis. This method uses an optimization algorithm from the ordered PI to produce risk groups as described in the tutorial provided in SurvExpress website [49].

CONCLUSION
The combinatorial effect of the GSEA, DEGs, and leading edge analysis output shed a light on the elucidating of key pathways and genes which genetically dysregulated at the early stage of HCC. The study unveiled that the G2/M checkpoint plays a vital role at the onset of HCC. And the genes SRPK1, XPO1, GMPS, MELK, DUT, TCERG1, RAD21, CENPF, PTTG1, EZH2, ANLN, KPNA2, RACGAP1, ADAR, C8A could be considered as critical genes for this process. These findings contributed to a better understanding of the onset of HCC. Further studies were required to elucidate the mechanism of the process.