Integrated analysis of genes associated with poor prognosis of patients with colorectal cancer liver metastasis

Colorectal cancer (CRC) is one of the most common malignances in the gut. Liver is the most common metastasis site of CRC. This study focuses on the primary CRC and its liver metastasis, aiming to discover several liver metastasis related genes and provide therapeutic candidates. We compared gene expression patterns among the groups of normal colorectal mucosa, primary tumor and the liver metastasis using a CRC gene expression dataset. 84 genes were found to be upregulated in both primary tumor and liver metastases. Function enrichment analysis indicated that these genes are enriched in pathways such as chemotaxis, coagulation and lipid metabolism which are crucial in multi-step cancer metastasis. Gene network analysis identified several important hub genes that may be involved in carcinogenesis and liver metastasis. Then we used a validation dataset containing 562 CRC samples with detailed clinical information, to screen prognostic biomarkers for overall survival (OS) and relapse free survival (RFS). Finally, overexpression of THBS2 (thrombospondin 2), INHBB (inhibin, beta B) and BGN (biglycan) were proved to be correlated with poor OS and RFS. In conclusion, this study indicated that chemotaxis, coagulation and lipid metabolism might play critical roles in the processes of carcinogenesis and liver metastasis. THBS2, INHBB and BGN are prognostic markers and potential therapeutic targets for CRC.


INTRODUCTION
Colorectal cancer (CRC) is one of the most common malignances in the digest system [1][2][3]. Frequently, distal metastasis affects clinical outcomes. Colorectal cancer liver metastasis (CRCLM) is the most common metastasis pattern in CRC [4]. It is possible to have a radical surgery on advanced colorectal cancer accompanied with liver metastasis in certain cases [4,5]. It is also probable to cure thoroughly by radical surgery, radio-chemotherapy, and molecular targeted therapy [6][7][8][9]. Unfortunately, such patients only accounts for a very small proportion while most CRCLM patients have poor outcomes. Therefore, two questions need to be addressed. Firstly, which subgroup of CRC patients will progress to liver metastasis? Secondly, which subgroup of patients with CRCLM will have poorer prognosis? Investigations on novel prognostic biomarkers and therapeutic targets are needed to improve the outcomes of CRCLM patients.

Flow chart of this study
This study focuses on the carcinogenesis and liver metastasis of CRC ( Figure 1). Firstly, we searched Gene Expression Omnibus (GEO) for choosing appropriate dataset for further analysis (GSE68468). Then we compared the gene expression pattern between normal colon mucosa and primary tumors; primary tumor and liver metastasis, respectively. Genes upregulated in both primary tumors and liver metastasis were selected for function enrichment analysis, gene-gene interaction analysis and prognostic power evaluation.

Identification of upregulated genes in both primary tumor and liver metastasis
The genes expression patterns of 54 normal colon mucosa (N), 186 primary tumors (PT) and 54 liver metastases (LM) were compared to discover upregulated genes in both PT and LM. 100 probes representing 84 genes were found to be upregulated in both PT and LM ( Figure 2 and Table 1). Heat map visualization of 84 upregulated genes in N, PT and LM samples was demonstrated in Figure 3. Function enrichment of these 84 upregulated genes indicated that chemotaxis, coagulation and lipid metabolism etc. were critical signaling pathways (Figure 4). Gene-gene interaction network was constructed ( Figure 5) and hub genes were listed in Table 2 (degree ≥ 15).

Evaluation the prognostic power of 84 upregulated genes
A dataset containing 562 samples was utilized to evaluate the prognostic power of 84 upregulated genes. The expression of each gene was defined as high expression and low expression by quartile division at the cutoff of 25%, 50% and 75%. Then we analyzed the relationship between survival and gene expression at different cutoff level. Survival analysis results indicated that four hub genes, namely thrombospondin 2 (THBS2), inhibin beta B (INHBB), biglycan (BGN), and serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 1 (SERPIN1) were associated with both OS and RFS ( Figure 6, cut-off value: upper quartiles of gene expression, p<0.05). Since the association between Figure 1: Flow chart of this study. www.impactjournals.com/oncotarget SERPIN1 and CRC metastasis has been reported [10,11], we chose THBS2, INHBB and BGN for further study.

The prognostic power of three hub genes in different tumor stages
Patients were divided into groups of stage 0-II and stage III-IV. Then the prognostic power of each hub genes were tested on these two groups separately. Figure  7 indicated that prognostic power of THBS2 and INHBB were independent from tumor staging (cut-off value: upper quartiles, P<0.05). However, BGN was not an independent prognostic marker (cut-off value: upper quartiles, P>0.05).

Combination of three hub genes for predicting OS and RFS
According to the expression level of three hub genes (cut-off value: upper quartiles), samples are divided into four groups: triple high expression, double high expression, double low expression and triple low expression, respectively (TH, DH, DL and TL). TL patients have the best survival for OS and RFS while the survival of TH patients are the worst (p<0.5). There was no significant difference between DL and DH groups ( Figure 8).

DISCUSSION
Colorectal cancer with liver metastasis (CRCLM) is the research hotspot in recent years [12]. As is known, liver is the most common metastasis site of colorectal cancer, because of dissemination via the portal venous system. About 10% CRCLM can be resected by surgery, however, only 25%-50% of them could be free of recurrence after treatments [13]. Cheng D et al. reported that MicroRNA -20a-5p, as an intermediator, may promote the ability of invasion and metastasis in colorectal cancer by suppressing   [14]. Mismatch repair deficiency status, such as frequent PIK3CA mutation, was also seemed as a cause to carcinogenesis in colorectal cancer [15,16]. Moreover, microsatellite instability in colorectal cancer was also very important. For example, methylation of ITF2 plays a role in modulating WNT signaling in CRC [17]. However, key gene regulators related to carcinogenesis and metastasis of CRC are still unclear. Therefore, discovering novel genes associated with CRCLM and distinguishing new prognostic biomarkers are critical for managing patients with CRCLM.
In this study, 84 genes were found to be associated with the carcinogenesis and liver metastasis of CRC. Chemotaxis, coagulation and lipid metabolism are key signaling pathways involved in carcinogenesis and liver metastasis. By performed gene-gene interaction network analysis, 32 hub genes were discovered. Kaplan-Meier analysis was performed for each gene separately, and SERPINE1, THBS2, INHBB and BGN were found to be associated with poorer OS and RFS.
Metastasis to distant organs is a multistage process: local invasion, intravasation, survival in the circulation, extravasation and colonization [18]. The  By forming cell aggregates that protect tumour cells from immune surveillance or by collaborating during extravasation, tumour cell-platelet interactions could also enable dissemination [20][21][22][23]. Chemotaxis and vasculogenesis in cancer are crucial for increasing vascular permeability which could improve extravasation and lead to metastasis eventually [19]. In the present study, we showed that chemotaxis and coagulation processes etc. are key characteristics CRC progression and are potential therapeutic targets.
SERPINE1 encodes a member of the serine proteinase inhibitor superfamily which is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is a inhibitor of fibrinolysis. It reacts directly to integrin, uPA-uPAR and ECM, causing to invade and migrate to surrounding and distance [10,24]. Meanwhile, SERPINE1 has been considered as biomarker related to poorer prognosis in CRC [11]. Since there are few reports on THBS2, INHBB and BGN in CRC, especially in CRCLM, we chose these three genes for further analysis.
THBS2, a member of thrombospondin family, is a disulfide-linked homotrimeric glycoprotein that mediates cell-to-cell and cell-to-matrix interactions, and hence modulates the cell adhesion and migration. Initially, THBS2 was thought to be related to heart failure [25]. Subsequently, it's also found linked to poorer prognosis in oral cavity squamous cell carcinoma [26]. Interestingly, in many other carcinomas, THBS2 was found downregulated and was correlated with poorer prognosis in tumors such as gastric cancer [27,28], prostate cancer [29] and breast cancer [30]. In the present study, we found that THBS2 was overexpressed in both primary colon tumor and liver metastases. Moreover, its expression was negatively correlated with OS and RFS.
INHBB is the subunit of inhibin, which regulates gonadal stromal cell proliferation negatively and has tumorsuppressor activity. It seems as an independent prognostic parameter in uterine non-endometrioid cancer, and a low expression demonstrates a significant better cause-specific survival [31]. Investigating the effects of INHBB gene knockdown on the development of mouse granulosa cells in vitro, Mohamed et al. found INHBB has the function of inhibiting apoptosis in mouse granulosa cell [32]. However,    the role of INHBB in gastrointestinal cancer, especially in CRC, has not been thoroughly studied from now on. Our analysis confirmed the overexpression of INHBB in CRCLM and its association with poorer OS and RFS, which may result from its function of inhibiting apoptosis. BGN gene encodes a member of the small leucinerich proteglycan family of proteins. The encoded protein may contribute to atherosclerosis and aortic valve stenosis in human patients. It seems to enhance the ability of migration and invasion in endometrial cancer [33]. BGN also promotes tumor invasion and metastasis of gastric cancer both in vitro and in vivo [34]. Activated FAK signaling pathway, which regulates cell adhesion and motility by relaying ECM signals from integrin to the intracellular compartment, leads to tumor invasion and metastasis [35]. However, there are few reports on BGN in CRC. In the present study, BGN was found overexpression in CRCLM, and it was correlated with poorer OS and RFS. The underlying molecular mechanism was not clear at present, but FAK signaling pathway may take an important role in this process.
The correlation among three hub genes and survival were also explored in respective to TNM stage. THBS2 and INHBB were independent prognostic biomarker for OS and RFS in both stage 0-II and stage III-IV, while the prognostic value of BGN was associated with TNM staging. Nevertheless, synchronous high expression of these three hub genes indicates the worst clinical outcome, while synchronous low level indicates the best, emphasizing their contributions to poor prognosis.
In summary, several critical signaling pathways such as chemotaxis, coagulation and lipid metabolism may have critical roles in the processes of CRC carcinogenesis and liver metastasis. THBS2, INHBB and BGN are prognostic markers and potential therapeutic targets for CRC. Validation of large cohorts and wet lab experiments are still needed before achieving any clinical significance.

Ethics statement
The databases used in our study are available online. Anyone is permitted to use all the data in the website of ArrayExpress (http://www.ebi.ac.uk/arrayexpress/), which contains more than 65 thousands of experiments and about 2 millions of assays. It also supports for a search facility to get what type of array and the related information we want. Gene expression datasets were downloaded from Gene Expression Omnibus (GEO) website (www.ncbi. nlm.nih.gov/geo/), which can be linked in the experiments searched out in ArrayExpress. GEO is a public functional genomics data and open for everyone. It contains more than 4300 datasets, 77000 series and 2000000 samples. It provides us some useful tools, such as GEO2R, to get the information we needs. Since all the data are publicly available, The Research Ethics Committee of Zhejiang Provincial People's Hospital therefore waived the requirement for ethical approval.

Selection of databases
We searched in the ArrayExpress with the condition of colorectal adenocarcinoma and liver metastasis, filtered by organism of Homo sapiens, experiment type of array assay and RNA assay. As a result, we searched out 29 experiments that satisfied the conditions. According to the total sample capacity and the volume of colorectal mucosa, primary tumor and liver metastasis respectively, we chose GSE68468 as the database looking for upregulated genes both in primary CRC and in liver metastasis. Then we researched the ArrayExpress with the condition of colorectal adenocarcinoma, filtered by organism of Homo sapiens, experiment type of array assay and RNA assay. As a result, there are 479 experiments satisfied. According to the dataintegrity and sample quantity, we focused on the dataset of GSE40967 on the platform of GPL570, which contains the follow-up of OS and RFS in 562 samples testing the gene expression of primary tumor.

Genomic analyses
GEO dataset [36] GSE68468 [37], which containing 54 normal colon mucosa, 186 primary tumors and 47 liver metastases, was reanalyzed for discovering gene associated with both colon carcinogenesis and liver metastasis. While GSE40967 [38] was used for validating the prognostic value of these genes in 562 patients with colon cancer. Differentially expressed genes were computed using Limma package in R environment (version 3.2.1, R Foundation for Statistical Computing [http://www.r-project.org/]). Venn diagram was plotted by Venny online software (Oliveros, J.C. [http://bioinfogp. cnb.csic.es/tools/venny/index.html]). MeV was employed for Heat map and clustering analyses [39]. Function enrichment was performed using GATHER [40], while gene-gene interaction network was constructed by utilizing GeneMANIA plugin [41].

Statistical analyses
Differentially expressed genes were selected by the following criteria: P < 0.05, Log 2 Transformed Fold Change > 0.6 or < -0.6. Kaplan-meier plot analysis was performed using SPSS software (IBM, version 21.0) and log-rank test was employed to evaluate statistical significance. All the data were analyzed using standard statistical tests. Significance was defined as P value less than 0.05.

ACKNOWLEDGMENTS
We thank Key Laboratory of Gastroenterology of Zhejiang Province for helpful suggestion and guidance. This study was supported by Medical science Foundation