Transcriptome profiling identifies a recurrent CRYL1-IFT88 chimeric transcript in hepatocellular carcinoma

Yi Huang; Jiaying Zheng; Dunyan Chen; Feng Li; Wenbing Wu; Xiaoli Huang; Yanan Wu; Yangyang Deng; Funan Qiu

doi:10.18632/oncotarget.17244

Research Papers:

Transcriptome profiling identifies a recurrent CRYL1-IFT88 chimeric transcript in hepatocellular carcinoma

PDF | Full Text | Supplementary Files | How to cite

Oncotarget. 2017; 8:40693-40704. https://doi.org/10.18632/oncotarget.17244

Metrics: PDF 2357 views | Full Text 3630 views

Yi Huang1,2,*, Jiaying Zheng1,2,*, Dunyan Chen1,2, Feng Li1,3, Wenbing Wu1,2, Xiaoli Huang1,2, Yanan Wu1,2, Yangyang Deng4 and Funan Qiu1,5

1Provincial Clinical College, Fujian Medical University, Fuzhou 350001, Fujian, China

2Department of Clinical Laboratory, Fujian Provincial Hospital, Fuzhou 350001, Fujian, China

3Department of Pathology, Fujian Provincial Hospital, Fuzhou 350001, Fujian, China

4Department of Bioinformatics, MyGene Diagnostics Co., Ltd, Guangzhou 510300, Guangdong, China

5Department of Hepatobiliary Surgery, Fujian Provincial Hospital, Fuzhou 350001, Fujian, China

*These authors contributed equally to this work

Correspondence to:

Funan Qiu, email: [email protected]

Yi Huang, email: [email protected]

Keywords: HCC, transcriptome sequencing, fusion transcript, CRYL1-IFT88, tumorigenesis

Received: January 28, 2017 Accepted: April 05, 2017 Published: April 19, 2017

ABSTRACT

We performed transcriptome sequencing for hepatocellular carcinoma (HCC) and adjacent non-tumorous tissues to investigate the molecular basis of HCC. Nine HCC patients were recruited and differentially expressed genes (DEGs) were identified. Candidate fusion transcripts were also identified. A total of 1943 DEGs were detected, including 690 up-regulated and 1253 down-regulated genes, and enriched in ten pathways including cell cycle, DNA replication, p53, complement and coagulation cascades, etc. Seven candidate fusion genes were detected and CRYL1-IFT88 was successfully validated in the discovery sequencing sample and another 5 tumor samples with the recurrent rate of about 9.52% (6/63). The full length of CRYL1-IFT88 was obtained by 3′ and 5′ RACE. The function of the fusion transcript is closed to CRYL1 because it contained most of domain of CRYL1. According to the bioinformatics analysis, IFT88, reported as a tumor suppressor, might be seriously depressed in the tumor cell with this fusion because the transcript structure of IFT88 was totally changed. The function depression of IFT88 caused by gene fusion CRYL1-IFT88 might be associated with tumorigenesis or development of HCC.

INTRODUCTION

Hepatocellular carcinoma (HCC) is one of the most common malignancies worldwide [1–3]. About 745,000 deaths per year can be attributed to HCC [4]. Hepatic resection is currently the most optimal choice for HCC treatment. However, surgical resection is not applicable in most patients, and its long-term prognosis remains unsatisfactory [5]. To date, it is known that both cellular changes and etiological agents (i.e., virus infection and alcohol) are responsible for the cause of HCC [6, 7]. However, like any other complex diseases [8], the molecular pathogenesis of HCC remains poorly understood [9]. The lack of good diagnostic markers and therapeutic targets has rendered HCC a major challenge.

Recently, with dramatically increased throughput, next-generation sequencing provides an efficient tool to illustrate the transcriptome characteristics of cancers, including HCC. Transcriptome sequencing has been used to identify latent biomarkers for HCC [10–12], implicating its great potential in exploring the molecular basis of HCC. Of note, by using transcriptome sequencing, a recurrent chimeric transcript DNAJB1-PRKACA [13] was identified in fibrolamellar HCC (FL-HCC) patients, suggesting that this fusion transcript contributes to the pathogenesis of the FL-HCC and may represent a therapeutic target. Therefore, transcriptome sequencing is a revolutionary tool to investigate the cancer transcriptome and identify possible therapeutic targets [14].

In this study, we performed transcriptome sequencing for HCC and adjacent non-tumorous tissues to investigate the molecular basis of HCC. Nine patients diagnosed as primary HCC were recruited and differentially expressed genes (DEGs) were identified. Candidate fusion transcripts were also identified by using defuse [15]. Further RT-PCR and Sanger sequencing experiments were performed to validate potential recurrent fusion transcripts in other 54 pairs of tumor and adjacent non-tumor samples. Our investigation may shed light on the molecular event responsible for the progression of HCC and offer new possibilities for clinical management of HCC patients.

RESULTS

Overview of transcriptome sequencing statistics

Pair-end second-generation transcriptome sequencing was performed in nine HCC patients. Sample characteristics are list in Table 1. An average of 35,772,695 pair-end 125 bp clean reads was generated (Table 2). The average mapping rate was 93.17%, resulting an average coverage of depth of 32x (Table 2).

Table 1: Sample characteristics

Patient	Age	Gender	Hepatitis	Serum AFP level(ng/mL)	Metastasis	Glisson capsule invasion	Tumor size (mm)	Multiple liver nodules
*P10*	59	M	*HBV*	*670.10*	No	–	35	-
P14	50	M	HBV	12483	No	+	50	-
P17	37	M	HBV	4.47	No	+	41	-
P21	62	M	HBV	266.5	No	+	32	-
P24	76	M	HCV	5.84	No	+	70	-
P26	59	M	HBV	3.73	No	–	6	-
P29	69	M	HBV	6.90	No	+	160	synchronous
P36	61	M	NBNC	1.11	No	+	80	-
P40	19	F	NBNC	2.85	No	–	29	-
L24	63	M	HBV	2975	No	+	70	synchronous
L26	47	M	HBV	5375	Yes	+	130	-
L44	72	M	HBV	303.8	No	+	40	synchronous
L134	66	M	HBV	60500	No	+	50	-
H19	59	M	NBNC	54.84	No	+	26	-
L6	48	M	NBNC	55.1	Yes	+	180	-
L21	41	M	HBV	4.70	Yes	+	50	-
L25	72	M	HBV	1110	No	+	80	-
L30	50	M	HBV	260.9	No	–	25	-
L36	64	M	HBV	244.6	Yes	+	65	synchronous
L39	67	M	HBV	321.1	No	+	55	synchronous
L45	71	M	HBV	4606	Yes	+	12	-
L46	18	F	HBV	37979	No	+	70	-
L49	45	M	HBV	6528	No	+	60	synchronous
L52	50	M	HBV	3.03	Yes	+	55	synchronous
L53	61	M	HBV	1.90	No	+	30	-
L54	52	M	HBV	26.53	No	+	20	-
L55	55	M	HBV	1.82	No	+	34	-
L57	58	F	HBV	2.55	Yes	+	55	synchronous
L61	42	M	HBV	3.38	No	+	20	synchronous
L64	64	M	HBV	774.2	No	+	10	-
L68	54	F	NBNC	23784	No	+	22	-
L72	40	M	HBV	1816	No	+	30	-
L73	74	M	NBNC	5.90	No	+	30	-
L85	76	M	NBNC	2.10	No	+	34	-
L154	47	M	HBV	62.47	Yes	+	105	-
L195	67	M	NBNC	3.01	Yes	+	80	-
H1	37	M	HBV	1810	No	–	55	-
H2	67	M	HBV	3103	No	+	40	-
H3	48	M	HBV	2.07	No	–	35	-
H4	68	M	HBV	15.87	No	+	35	-
H5	71	M	HBV	11.65	No	+	35	-
H6	56	M	HBV	33571	No	+	57	-
H7	52	M	HBV	2100	Yes	–	35	-
H8	52	M	NBNC	5.30	No	+	65	-
H9	37	M	HBV	>60500	Yes	–	95	-
H11	46	M	HBV	4.64	No	–	45	-
H13	52	F	NBNC	2.45	No	+	22	-
H16	75	M	NBNC	4.14	Yes	+	87	synchronous
H22	47	M	HBV	5.78	No	+	40	-
H25	60	M	HBV	61.54	No	–	25	-
H27	65	M	HBV	5.34	No	+	70	-
H28	63	M	HBV	3396	No	+	40	-
H29	52	M	HBV	66.50	No	+	22	-
H30	58	M	HBV	6.52	Yes	+	58	synchronous
H31	42	M	HBV	3.31	No	+	30	-
H32	62	M	HBV	570	No	+	28	-
H33	46	M	HBV	3780	No	+	21	synchronous
H34	64	M	HBV	36541	Yes	+	140	-
H35	47	F	HCV	250.5	No	–	22	-
H36	39	M	HBV	8.10	Yes	+	22	synchronous
H37	63	M	HBV	208	No	+	40	-
H38	48	M	NBNC	3.07	No	+	33	-
H39	66	M	HBV	82.81	No	+	34	-

Note: The nine samples used for sequencing are shown in bold. The six samples with validated fusion CRYL1-IFT88 are shown in italic.

Table 2: Summary statistics of the transcriptome sequencing

Patient	Sample type	Total reads	Mapped reads	Total base (bp)	Mapped base (bp)	Mappping ratio	Coverage (X)
P10	T	35,930,408	33,819,287	4,491,301,000	4,227,410,875	94.12%	33
P10	N	34,902,228	32,863,973	4,362,778,500	4,107,996,625	94.16%	32
P14	T	35,315,378	33,125,921	4,414,422,250	4,140,740,125	93.80%	32
P14	N	34,383,802	32,279,443	4,297,975,250	4,034,930,375	93.88%	31
P17	T	35,532,658	33,258,212	4,441,582,250	4,157,276,500	93.60%	32
P17	N	34,537,914	32,152,860	4,317,239,250	4,019,107,500	93.09%	31
P21	T	34,762,386	32,701,011	4,345,298,250	4,087,626,375	94.07%	31
P21	N	35,314,918	33,021,170	4,414,364,750	4,127,646,250	93.50%	32
P24	T	35,610,444	32,509,118	4,451,305,500	4,063,639,750	91.29%	31
P24	N	34,637,398	31,280,461	4,329,674,750	3,910,057,625	90.31%	30
P26	T	34,566,526	32,670,216	4,320,815,750	4,083,777,000	94.51%	31
P26	N	35,384,486	33,407,169	4,423,060,750	4,175,896,125	94.41%	32
P29	T	34,564,642	32,385,294	4,320,580,250	4,048,161,750	93.69%	31
P29	N	34,771,048	32,792,475	4,346,381,000	4,099,059,375	94.31%	32
P36	T	35,371,920	32,571,695	4,421,490,000	4,071,461,875	92.08%	31
P36	N	35,959,532	32,990,753	4,494,941,500	4,123,844,125	91.74%	32
P40	T	42,279,434	39,097,967	5,284,929,250	4,887,245,875	92.50%	38
P40	N	40,083,384	36,852,292	5,010,423,000	4,606,536,500	91.90%	35
Average		35,772,695	33,321,073	4,471,586,847	4,165,134,146	93.17%	32
Total		643,908,506	599,779,317	80,488,563,250	74,972,414,625

DEGs analyses results

We next detected DEGs between tumor and non-tumor samples. A total of 1943 DEGs were detected, including 690 up-regulated and 1253 down-regulated genes.

DEGs were subjected to KEGG pathway analyses. As shown in Table 3, with the cut off of FDR < 0.05, DEGs were enriched in ten pathways, including cell cycle (hsa04110), DNA replication (hsa03030), p53 (hsa04115) and complement and coagulation cascades pathway (hsa04610), as well as retinol, xenobiotics by cytochrome P450, drug, arachidonic acid, tyrosine and fatty acid metabolism pathway (hsa00830, 00980, 00982, 00590, 00350, 00071). The GO category enrichment analyses resulted extensive items overrepresented with DEGs. As shown in Table 3, GO items (FDR < 10^–3) enriched with DEGs were associated with cell cycle and other processes, such as immune response, DNA replication, complement activation, oxidation, metabolism, and so on.

Table 3: KEGG pathway and gene ontology biological process items enrichment analyses result for the DEGs

Term	DEG Count	FDR
KEGG pathway
hsa04110:Cell cycle	31	6.10 × 10^–6
hsa00830:Retinol metabolism	17	1.82 × 10^–4
hsa00980:Metabolism of xenobiotics by cytochrome P450	17	7.69 × 10^–4
hsa03030:DNA replication	12	2.97 × 10^–3
hsa00982:Drug metabolism	16	4.21 × 10^–3
hsa04610:Complement and coagulation cascades	15	9.85 × 10^–3
hsa00590:Arachidonic acid metabolism	13	1.03 × 10^–2
hsa00350:Tyrosine metabolism	11	3.96 × 10^–2
hsa00071: Fatty acid metabolism	10	4.17 × 10^–2
hsa04115:p53 signaling pathway	13	4.58 × 10^–2
GO item
GO:0022403~cell cycle phase	84	5.15 × 10^–16
GO:0000278~mitotic cell cycle	77	4.11 × 10^–15
GO:0000279~M phase	70	3.45 × 10^–14
GO:0007067~mitosis	54	1.87 × 10^–13
GO:0000280~nuclear division	54	1.87 × 10^–13
GO:0048285~organelle fission	55	2.72 × 10^–13
GO:0000087~M phase of mitotic cell cycle	54	4.11 × 10^–13
GO:0007049~cell cycle	118	1.26 × 10^–12
GO:0022402~cell cycle process	94	3.43 × 10^–12
GO:0009611~response to wounding	87	6.60 × 10^–11
GO:0006954~inflammatory response	61	7.36 × 10^–10
GO:0051301~cell division	57	1.09 × 10^–9
GO:0007059~chromosome segregation	26	8.40 × 10^–9
GO:0006952~defense response	90	1.38 × 10^–8
GO:0006955~immune response	97	2.41 × 10^–8
GO:0051726~regulation of cell cycle	53	4.06 × 10^–6
GO:0006260~DNA replication	36	9.33 × 10^–6
GO:0002526~acute inflammatory response	24	1.13 × 10^–5
GO:0002253~activation of immune response	23	2.07 × 10^–5
GO:0050778~positive regulation of immune response	29	4.88 × 10^–5
GO:0007051~spindle organization	15	5.21 × 10^–5
GO:0031960~response to corticosteroid stimulus	21	5.78 × 10^–5
GO:0051384~response to glucocorticoid stimulus	20	5.84 × 10^–5
GO:0048545~response to steroid hormone stimulus	34	9.52 × 10^–5
GO:0055114~oxidation reduction	79	1.70 × 10^–4
GO:0010033~response to organic substance	86	2.60 × 10^–4
GO:0002684~positive regulation of immune system process	38	2.84 × 10^–4
GO:0002252~immune effector process	26	3.08 × 10^–4
GO:0051329~interphase of mitotic cell cycle	22	3.60 × 10^–4
GO:0045087~innate immune response	26	5.19 × 10^–4
GO:0000079~regulation of cyclin-dependent protein kinase activity	15	5.37 × 10^–4
GO:0000226~microtubule cytoskeleton organization	27	5.58 × 10^–4
GO:0051325~interphase	22	5.67 × 10^–4
GO:0006956~complement activation	13	6.96 × 10^–4
GO:0008203~cholesterol metabolic process	20	7.33 × 10^–4
GO:0000070~mitotic sister chromatid segregation	12	7.35 × 10^–4
GO:0009719~response to endogenous stimulus	54	7.58 × 10^–4
GO:0000819~sister chromatid segregation	12	9.75 × 10^–4

Fusion genes detection and validation

To detect fusion transcripts that might be the potential cause for tumorigenesis, the software defuse was used and after the filtering process as described in the methods section. Seven candidate fusion genes were detected (Table 4). Using RT-PCR and Sanger sequencing, the fusion gene CRYL1-IFT88 was successfully validated in the discovery sequencing sample (P10 tumor sample) and it was also validated in another 5 tumor samples (Figures 1–2). Therefore, this fusion gene is considered as a recurrent fusion transcript related to HCC with the recurrent rate of about 9.52% (6/63). To investigate how the gene fusion CRYL1-IFT88 affected their expression and proteins, we observed the reads coverage in genomes by IGV (Integrative Genomics Viewer) and domains by NCBI CD-search (Conserved Domains search). We found that quite a lot RNA-Seq reads covered in the intron between exon 15 and exon 16 of IFT88 in tumor sample of P10, but this was not observed in the matched normal sample (Figure 3 top), which indicated that the transcription structure of IFT88 was very likely changed in the tumor cells with this fusion. This break was just occurred in the middle of IFT88 protein sequence ((Figure 3 bottom), which might totally destroy the function of IFT88. However, the change of reads coverage was not observed in the genes CRYL1, it might be the high expression of wild-type CRYL1 masked the change of expression structure.

Table 4: Information of the identified fusion genes

Sample id	Sample type	Gene1-Gene2	Break point1	Break point2	Break point1 location	Break point2 location
P10	tumor	CRYL1-IFT88	chr13:21063509	chr13:21184715	coding	intron
P14	tumor	BRD9-OR4N2	chr5:886707	chr14:20230047	coding	5′-UTR
P14	tumor	MPV17-TRIM54	chr2:27545315	chr2:27527827	coding	coding
P24	tumor	BHMT-MTATP6P1	chr5:78427385	chr1:569753	3′-UTR	exon
P26	tumor	NOMO3-XYLT1	chr16:16372646	chr16:17323802	coding	intron
P26	tumor	NOMO3-XYLT1	chr16:16372646	chr16:17318037	coding	intron
P29	tumor	TNFSF14-C3	chr19:6664861	chr19:6677181	3′-UTR	downstream

Figure 1: Detection and Sanger sequencing validation of a recurrent fusion transcript CRYL1-IFT88. (A) Illustration of the CRYL1-IFT88 fusion gene (top) and Sanger sequencing validation result for the transcriptome tumor sequencing sample (P10) (bottom). (B) Sanger sequencing validation result for the other five tumor samples. The IDs of samples corresponded to Table 1. The fusion point was marked with dash line.

Figure 2: The agarose gel electrophoresis of the six samples with successfully validated fusion transcript CRYL1-IFT88. The RT-PCR product for the fusion gene was 230 bp. The IDs of samples corresponded to Table 1. T and N represented tumor and adjacent non-tumor tissue, respectively.

Figure 3: The RNA-Seq reads coverage and functional domains of IFT88. The RNA-Seq reads coverage of gene IFT88 by IGV in the samples of patient P10 was shown at the top, and the functional domains of IFT88 according to NCBI CD-search was shown at the bottom. The red dotted line indicated the breakpoint.

To discover the function of the fusion transcript, we performed both 3′ and 5′ RACE (rapid-amplification of cDNA ends) experiments and obtained the full-length CRYL1-IFT88 fusion sequence (779 bp). The longest ORF was 279 bp (predicted by NCBI ORF-finder) and corresponding to a 92 aa protein sequence (see Supplementary File.docx). And the fusion protein contained most of the domain of CRYL1, including 3-hydroxybutyryl-CoA dehydrogenase domain, NAD binding domain, and so on, which indicated that the functional of the CRYL1-IFT88 were similar with CRYL1 (see Supplementary File.docx).

DISCUSSION

We have applied the transcriptome sequencing approach to illustrate the gene expression characteristics of HCC. Pathway analyses showed that ten pathways, including cell cycle, DNA replication, p53 and complement and coagulation cascades as well as six metabolism processes, were overrepresented with DEGs. Deregulation of the cell cycle [16], DNA replication [17] and p53 pathways [18] are expected since uncontrolled cell division and aberrant tumor suppressor are the major character of cancer cells. As for the complement and coagulation cascades pathway, consistent with our results, both gene expression [19] and proteomics [20] analyses have shown that this pathway is related to the pathogenesis of HCC.

Gene fusion is an important event involved in the development of various types of malignancies, which is the consequence of the genomic rearrangements with a deletion, insertion, translocation or inversion of distal intra- or inter-chromosomal sequences [21]. The oncogenic activation may be triggered by the gene activation or repression due to gene fusion. The Philadelphia chromosome found in chronic myelogenous leukemia (CML) consisting of the BCR-ABL fusion gene is a classical example of gene fusion, an activated tyrosine kinase that drives CML [22]. Recent technology advances especially transcriptome sequencing and bioinformatics strategy for processing cancer profiling data, make the discovery of more fusion genes in cancers, including a recurrent chimeric transcript DNAJB1-PRKACA in FL-HCC [9] and SLC45A3-ELK4 in prostate cancers [23], etc. These known fusion genes provide key insights into tumor biology and have significant clinical impact by serving as potential diagnostic markers or therapeutic targets. In our study, we detected seven chimeric transcripts in total by transcriptome sequencing. Fusion genes were subjected to validation and successfully confirmed that CRYL1-IFT88 is a recurrent fusion transcript related with HCC. To our knowledge, this fusion gene was first reported and validated in HCC. Protein encoded by CRYL1 catalyzes the dehydrogenation of L-gulonate into dehydro-L-gulonate in the uronate cycle, which is an alternative glucose metabolic pathway. Reduced expression of CRYL1 in HCC has been observed in many studies [24–26]. Moreover, it was reported that reduced CRYL1 expression in HCC confers cell growth advantages and correlates with adverse patient prognosis [26]. Protein encoded by IFT88 is a member of the tetratrico peptide repeat (TPR) family, and is involved in liver oval cell proliferation, differentiation, and ploidy control [27]. Tumor suppression activity of this gene was demonstrated and this gene was reported as a liver neoplasia tumor suppressor gene in a previous study [28]. According to our bioinformatics analysis, the protein function of IFT88 might be suppressed due to the gene fusion. Considering the potential involvement of these two genes in HCC, the fusion transcript we identified here might be responsible for the tumorigenesis and serve as potential targets for further therapeutic strategy by the overexpression of IFT88 to overcome the function inhibition due to gene fusion CRYL1-IFT88.

In conclusion, we used transcriptome sequencing approach to illustrate the gene expression characteristics of HCC. Cell cycle, DNA replication, p53 and complement and coagulation cascades pathways as well as some metabolism processes were overrepresented with DEGs. Of note, we detected and successfully validated CRYL1-IFT88 as a recurrent fusion transcript in HCC with the recurrent rate of about 9.52% (6/63).

MATERIALS AND METHODS

Ethics statement

Our study design was approved by the institutional review board of the Fujian provincial hospital. Written informed consent was obtained from all subjects.

Subjects

Sixty-three subjects aged from 18 to 76 were diagnosed as primary HCC in the Fujian provincial hospital during the period from 03/01/2014 to 12/31/2015. Hepatitis B virus (HBV) related tumors were defined according to the presence of HB surface antigen (HBsAg) in serum, and hepatitis C virus (HCV) related tumors were according to the presence of antibody to HCV (HCVAb) in serum. NBNC tumor was defined according to the absence of both HBsAg and HCVAb in serum. Primary tumor and adjacent non-tumorous samples were obtained from all patients who underwent surgical tumor resection. All samples were frozen immediately at –80°C until RNA extraction. Total RNA was isolated by using RecoverAll™ Total Nucleic Acid Isolation Kit (Life Technologies, Carlsbad, CA, USA). Integrity of RNA was assessed by Agilent 2100 bioanalyzer (Agilent, Santa Clara, CA, USA). RNA from nine samples was subjected to sequencing and other samples were used in the validation experiments.

Transcriptome sequencing

Sequencing libraries were prepared by using prepared by using TruSeq RNA Sample Prep Kit (Illumina, San Diego, CA, USA) according to standard protocols. Briefly, total RNA was firstly randomly fragmented and poly-A-selected. Secondly, the RNA fragments were reverse transcribed to cDNA, end-repaired and ligated with adapters. The libraries then underwent size selection, PCR and purification. The quality of libraries was assessed by using Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA). Sequencing was then performed on an Illumina HiSeq 2500 sequencer with 125 bp pair-end reads. All raw data have been deposited in the NIH Short Read Archive database (Access number: SRP 102722).

Reads processing

Raw sequencing reads were firstly filtered for adapters and ribosomal RNA. Reads containing five or more low quality (quality score < 20) bases were also removed. The remained high-quality reads were then aligned to human genome (hg19) by using Tophat [29]. The mapped reads were then subjected to alignment against the the human transcriptome (Ensembl, GRCh37.73). Gene expression level measured by FPKM (fragments per kilobase per million) was calculated by Cufflinks [30]. All processed expression data have been submitted to GEO database (Access number: GSE 97214).

Differentially expressed genes (DEGs) analysis

For each group, DEGs between the tumor and matched non-tumor tissues were selected with pair-wise t test and the significant threshold was set as p-value of less than 0.05 and |log₂(fold change, FC)| > = 1. DAVID [31] was used to do the Gene Ontology (GO) and KEGG pathway annotation and enrichment analyses. The significant threshold for enrichment was set as false discovery rate (FDR) < 0.05.

Fusion gene detection

Fusion transcripts were identified by defuse [15]. The default filtering processes of defuse were carried out as previously described [15]. The results of deFuse were further filtered to reduce false positives with the following criteria: 1) predictions supported by less than eight reads were removed; 2) predictions between adjacent genes were filtered unless implied in genomic inversion or eversion; 3) predictions related to ribosomal proteins or small nuclear ribosomal proteins were removed.

Fusion gene validation

Selected fusion transcripts were subjected to validation using RT-PCR and Sanger sequencing. For the RT-PCR reactions, total RNA was converted to cDNA with random hexamer primers using the High-Capacity cDNA Reverse Transcription kit (Applied Biosystems, Foster City, CA, USA). The RT-PCR products were gel purified and sequenced by Sanger sequencing using an ABI 3730 DNA Sequencer (Applied Biosystems, Foster City, CA, USA).

5′ RACE and 3′ RACE

5′ RACE. RLM-RACE was performed with the SMARTer^TM RACE cDNA Amplification Kit (Clontech). Total RNA (10 mg) was first treated with calf intestinal alkaline phosphatase (CIP) to remove 5′ phosphate groups, followed by tobacco acid pyrophosphatase to remove 5′ cap structures. After RNA linker ligation, mRNA transcripts were reverse-transcribed with SMARTScribe^TM reverse transcriptase. To amplify first-strand cDNAs, we performed NEST PCR. Firstly, outer 5′ PCR using 5′ RACE outer primers (provided in kit) and a IFT88 intron 15 primer (TAGGGAATGACAGGAAACGGGGAT) with SuperTaq Plus polymerase (Applied Biosystems). Subsequently, inner 5′ PCR was performed with a 5′ RACE inner primer (provided in kit) and a CYRL1 exon 3 primer (TTCCA CACTCAGGGAGCCTTTCA). After gel electrophoresis, PCR bands of interest were excised and cloned. The PCR production were sequenced bidirectionally on an ABI 3730 automated sequencer (Applied Biosystems). A minimum of 3 independent colonies were sequenced in each experiment.

3′ RACE

RLM-RACE was performed with the SMARTer^TM RACE cDNA Amplification Kit (Clontech). Total RNA (10 mg) was first treated with calf intestinal alkaline phosphatase (CIP) to remove 3′ phosphate groups, followed by tobacco acid pyrophosphatase to remove 3′ cap structures. After RNA linker ligation, mRNA transcripts were reverse-transcribed with SMARTScribe^TM reverse transcriptase. To amplify first-strand cDNAs, we performed NEST PCR , firstly, outer 3′ PCR using 3′ RACE outer primers (provided in kit) and a CYRL1 exon 2 primer (GAGGCTTCCAGGTGAAACTCTATGA) with SuperTaq Plus polymerase (Applied Biosystems). Subsequently, inner 3′ PCR was performed with a 3′ RACE inner primer (provided in kit) and a IFT88 intron 15 primer (TTATCC CCGTTTCCTGTCATTCCCT). After gel electrophoresis, PCR bands of interest were excised and cloned. The PCR production were sequenced bidirectionally on an ABI 3730 automated sequencer (Applied Biosystems). A minimum of 3 independent colonies were sequenced in each experiment.

Abbreviations

HCC (hepatocellular carcinoma); DEGs (differentially expressed genes) RACE (Rapid-Amplification of cDNA Ends); RLM-RACE (RNA Ligase-Mediated-Rapid Amplification of cDNA Ends); FL-HCC (fibrolamellar hepatocellular carcinoma); GO enrichment analyses ( Gene Ontology-based statistical enrichment analysis); FDR(false discovery rate); IGV (Integrative Genomics Viewer); CD-search (Conserved Domains search); ORF (open reading frame); NAD (Nicotinamide adenine dinucleotide); TPR (tetratrico peptide repeat); HBV (Hepatitis B virus); HBsAg (HB surface antigen); HCV (Hepatitis C virus); NBNC tumor (non-B non-C tumor); FPKM (fragments per kilobase per million); CIP (calf intestinal alkaline phosphatase); CRYL1 (crystallin, lambda 1); IFT88 (intraflagellar transport 88); BRD9 (bromodomain containing 9); OR4N2 (olfactory receptor, family 4, subfamily N, member 2); MPV17 (MpV17 mitochondrial inner membrane protein); TRIM54 (tripartite motif containing 54); BHMT (betaine-homocysteine S-methyltransferase); MTATP6P1 (mitochondrially encoded ATP synthase 6 pseudogene 1); NOMO3 (NODAL modulator 3); XYLT1 (xylosyltransferase I); TNFSF14 (tumor necrosis factor (ligand) superfamily, member 14); C3 (complement component 3)

Authors’ contributions

Yi Huang guarantor of integrity of the entire study, study concepts, study design, definition of intellectual content, literature research, experimental studies, data acquisition, data analysis, statistical analysis, manuscript preparation, manuscript editing, manuscript review; Jiaying Zheng guarantor of integrity of the entire study, study concepts, study design, definition of intellectual content, literature research, experimental studies, data acquisition, data analysis, statistical analysis, manuscript preparation; Dunyan Chen experimental studies, data acquisition, data analysis, statistical analysis; Feng Li experimental studies, data acquisition, data analysis; Wenbing Wu experimental studies, data acquisition; Xiaoli Huang data analysis, statistical analysis; Yanan Wu data analysis, statistical analysis; Yangyang Deng bioinformatics analysis and data submittion; Funan Qiu guarantor of integrity of the entire study, study concepts, study design, definition of intellectual content, literature research, manuscript editing, manuscript review; all authors read and approved the final manuscript. Yi Huang and Jiaying Zheng contributed equally to this study.

ACKNOWLEDGMENTS

We thank AIC biotechnology for their support in transcriptome sequencing and analysis.

CONFLICTS OF INTEREST

The authors have no financial conflicts of interest.

FUNDING

This study is supported by the Middle-young age backbone talent cultivation program of Fujian health system (No. 2013-ZQN-JC-2); Key projects of science and technology plan of Fujian province (No. 2014Y0009).

REFERENCES

1. Colombo M. Hepatocellular carcinoma. Journal of hepatology. 1992; 15:225–236.

2. Raza A, Sood GK. Hepatocellular carcinoma review: current treatment, and evidence-based medicine. World journal of gastroenterology. 2014; 20:4115–4127. doi: 10.3748/wjg.v20.i15.4115.

3. He B, Qiu X, Li P, Wang L, Lv Q, Shi T. HCCNet: an integrated network database of hepatocellular carcinoma. Cell Res. 2010; 20:732–734.

4. Organization WH. Cancer. 2015.

5. Fan ST, Mau Lo C, Poon RT, Yeung C, Leung Liu C, Yuen WK, Ming Lam C, Ng KK, Ching Chan S. Continuous improvement of survival outcomes of resection of hepatocellular carcinoma: a 20-year experience. Annals of surgery. 2011; 253:745–758. doi: 10.1097/SLA.0b013e 3182111195.

6. He B, Zhang H, Shi T. A comprehensive analysis of the dynamic biological networks in HCV induced hepatocarcinogenesis. PLoS One. 2011; 6:e18516.

7. He B, Li T, Guan L, Liu FE, Chen XM, Zhao J, Lin S, Liu ZZ, Zhang HQ. CTNNA3 is a tumor suppressor in hepatocellular carcinomas and is inhibited by miR-425. Oncotarget. 2016; 7:8078–8089. doi: 10.18632/oncotarget.6978.

8. He B, Lu C, Wang M, Zheng G, Chen G, Jiang M, He X, Bian Z, Zhang G, Lu A. Drug discovery in traditional Chinese medicine: From herbal fufang to combinatory drugs. Science. 2015; 350:S74–S76.

9. Bruix J, Boix L, Sala M, Llovet JM. Focus on hepatocellular carcinoma. Cancer cell. 2004; 5:215–219.

10. Simon EP, Freije CA, Farber BA, Lalazar G, Darcy DG, Honeyman JN, Chiaroni-Clarke R, Dill BD, Molina H, Bhanot UK, La Quaglia MP, Rosenberg BR, Simon SM. Transcriptomic characterization of fibrolamellar hepatocellular carcinoma. Proceedings of the National Academy of Sciences of the United States of America. 2015; 112:E5916–5925. doi: 10.1073/pnas.1424894112.

11. Ho DW, Kai AK, Ng IO. TCGA whole-transcriptome sequencing data reveals significantly dysregulated genes and signaling pathways in hepatocellular carcinoma. Frontiers of medicine. 2015; 9:322–330. doi: 10.1007/s11684-015-0408-9.

12. Lin KT, Shann YJ, Chau GY, Hsu CN, Huang CY. Identification of latent biomarkers in hepatocellular carcinoma by ultra-deep whole-transcriptome sequencing. Oncogene. 2014; 33:4786–4794. doi: 10.1038/onc.2013.424.

13. Honeyman JN, Simon EP, Robine N, Chiaroni-Clarke R, Darcy DG, Lim II, Gleason CE, Murphy JM, Rosenberg BR, Teegan L, Takacs CN, Botero S, Belote R, et al. Detection of a recurrent DNAJB1-PRKACA chimeric transcript in fibrolamellar hepatocellular carcinoma. Science. 2014; 343:1010–1014. doi: 10.1126/science.1249484.

14. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews Genetics. 2009; 10:57–63. doi: 10.1038/nrg2484.

15. McPherson A, Hormozdiari F, Zayed A, Giuliany R, Ha G, Sun MG, Griffith M, Heravi Moussavi A, Senz J, Melnyk N, Pacheco M, Marra MA, Hirst M, et al. deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLoS computational biology. 2011; 7:e1001138. doi: 10.1371/journal.pcbi.1001138.

16. Visconti R, Della Monica R, Grieco D. Cell cycle checkpoint in cancer: a therapeutically targetable double-edged sword. J Exp Clin Cancer Res. 2016; 35:153. doi: 10.1186/s13046-016-0433-9.

17. Wasylishen AR, Lozano G. Attenuating the p53 Pathway in Human Cancers: Many Means to the Same End. Cold Spring Harb Perspect Med. 2016; 6. doi: 10.1101/cshperspect.a026211.

18. Vassilev A, DePamphilis ML. Links between DNA Replication, Stem Cells and Cancer. Genes (Basel). 2017; 8. doi: 10.3390/genes8020045.

19. Zhang L, Guo Y, Li B, Qu J, Zang C, Li F, Wang Y, Pang H, Li S, Liu Q. Identification of biomarkers for hepatocellular carcinoma using network-based bioinformatics methods. European journal of medical research. 2013; 18:35. doi: 10.1186/2047-783X-18-35.

20. Tsai TH, Song E, Zhu R, Di Poto C, Wang M, Luo Y, Varghese RS, Tadesse MG, Ziada DH, Desai CS, Shetty K, Mechref Y, Ressom HW. LC-MS/MS-based serum proteomics for identification of candidate biomarkers for hepatocellular carcinoma. Proteomics. 2015; 15:2369–2381. doi: 10.1002/pmic.201400364.

21. Heim S, Mitelman F. Molecular screening for new fusion genes in cancer. Nature genetics. 2008; 40:685–686. doi: 10.1038/ng0608-685.

22. Nowell PC, Hungerford DA. Chromosome studies in human leukemia. II. Chronic granulocytic leukemia. Journal of the National Cancer Institute. 1961; 27:1013–1035.

23. Rickman DS, Pflueger D, Moss B, VanDoren VE, Chen CX, de la Taille A, Kuefer R, Tewari AK, Setlur SR, Demichelis F, Rubin MA. SLC45A3-ELK4 is a novel and frequent erythroblast transformation-specific fusion transcript in prostate cancer. Cancer research. 2009; 69:2734–2738. doi: 10.1158/0008-5472.can-08-4926.

24. Chen J, Yu L, Li D, Gao Q, Wang J, Huang X, Bi G, Wu H, Zhao S. Human CRYL1, a novel enzyme-crystallin overexpressed in liver and kidney and downregulated in 58% of liver cancer tissues from 60 Chinese patients, and four new homologs from other mammalians. Gene. 2003; 302:103–113.

25. Chen CF, Yeh SH, Chen DS, Chen PJ, Jou YS. Molecular genetic evidence supporting a novel human hepatocellular carcinoma tumor suppressor locus at 13q12.11. Genes, chromosomes & cancer. 2005; 44:320–328. doi: 10.1002/gcc.20247.

26. Cheng IK, Ching AK, Chan TC, Chan AW, Wong CK, Choy KW, Kwan M, Lai PB, Wong N. Reduced CRYL1 expression in hepatocellular carcinoma confers cell growth advantages and correlates with adverse patient prognosis. The Journal of pathology. 2010; 220:348–360. doi: 10.1002/path.2644.

27. Richards WG, Yoder BK, Isfort RJ, Detilleux PG, Foster C, Neilsen N, Woychik RP, Wilkinson JE. Isolation and characterization of liver epithelial cell lines from wild-type and mutant TgN737Rpw mice. The American journal of pathology. 1997; 150:1189–1197.

28. Isfort RJ, Cody DB, Doersen CJ, Richards WG, Yoder BK, Wilkinson JE, Kier LD, Jirtle RL, Isenberg JS, Klounig JE, Woychik RP. The tetratricopeptide repeat containing Tg737 gene is a liver neoplasia tumor suppressor gene. Oncogene. 1997; 15:1797–1803. doi: 10.1038/sj.onc.1201535.

29. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009; 25:1105–1111. doi: 10.1093/bioinformatics/btp120.

30. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology. 2010; 28:511–515. doi: 10.1038/nbt.1621.

31. Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009; 4:44–57. doi: 10.1038/nprot.2008.211.

Publication Alerts

European Cancer and Oncology Congress

Post-Publication Promotion

Research Papers:

Transcriptome profiling identifies a recurrent CRYL1-IFT88 chimeric transcript in hepatocellular carcinoma

Publication Alerts

European Cancer and Oncology Congress

Oncoscience

Post-Publication Promotion

Research Papers:

Transcriptome profiling identifies a recurrent CRYL1-IFT88 chimeric transcript in hepatocellular carcinoma