Integrated analysis of chromosome copy number variation and gene expression in cervical carcinoma

Objective This study was conducted to explore chromosomal copy number variations (CNV) and transcript expression and to examine pathways in cervical pathogenesis using genome-wide high resolution microarrays. Methods Genome-wide chromosomal CNVs were investigated in 6 cervical cancer cell lines by Human Genome CGH Microarray Kit (4x44K). Gene expression profiles in cervical cancer cell lines, primary cervical carcinoma and normal cervical epithelium tissues were also studied using the Whole Human Genome Microarray Kit (4x44K). Results Fifty common chromosomal CNVs were identified in the cervical cancer cell lines. Correlation analysis revealed that gene up-regulation or down-regulation is significantly correlated with genomic amplification (P=0.009) or deletion (P=0.006) events. Expression profiles were identified through cluster analysis. Gene annotation analysis pinpointed cell cycle pathways was significantly (P=1.15E-08) affected in cervical cancer. Common CNVs were associated with cervical cancer. Conclusion Chromosomal CNVs may contribute to their transcript expression in cervical cancer.


INTRODUCTION
Although the most important etiological agent in cervical cancer is human papillomavirus (HPV) infection, only a small proportion of infected women developed cervical cancer. HPV infection alone is insufficient to induce malignant changes. Copy number variation (CNV) is a very common phenomenon in cervical cancer and may be important in its pathogenesis [1]. Comparative genomic hybridization (CGH) studies for cervical cancer progression have shown that chromosome 3q gain was associated with the transition from pre-invasive to invasive cervical carcinoma. Subsequently, array-based CGH (aCGH), where arrays of genomic sequences replaced metaphase chromosomes as hybridization targets, was established. More detailed and precise genomic variations had been found in cervical cancer by using aGCH [2]. Lando et al. reported several potential driver genes for cervical carcinogenesis aCGH [3]. However, a recent study reported a limited correlation between chromosomal CNV www.impactjournals.com/oncotarget/ Oncotarget, 2017, Vol. 8, (No. 65), pp: 108912-108922 Research Paper and gene expression by single nucleotide polymorphisms array platform [4].
In this study, a high resolution Human Genome CGH Microarray Kit was used to detect genome-wide chromosomal CNV in 6 cervical cancer cell lines. We also performed gene expression studies in the cervical cancer cell lines, cervical carcinoma tissues and normal cervical epithelia by using Whole Human Genome Microarray Kit. Using appropriate bioinformatics software, we identified several chromosomal CNV regions and aberrantly expressed genes in cervical cancer. Statistical analysis and gene annotation analysis were also performed for the array data.

Fifty common chromosomal CNVs were identified in cervical cancer cell lines
Using aCGH at a genome-wide resolution of 250 kb, a total of 50 common chromosomal CNV regions were identified, ranging from 0.5 Mb to 80 Mb. Of these, 21 common amplification regions (including 13 significant amplification regions) and 29 deletion regions (including 2 significant deletion regions) were identified ( Table 1).

Chromosomal CNVs could contribute to their transcript expression in cervical cancer
To evaluate if there is any association between chromosomal CNVs and gene expression changes in cervical cancer, we analyzed the gene expression profiles of the cervical cancer cell lines and normal cervical epithelium samples. 17.52% of transcripts (7,211 out of 41,152) exhibited a 2-fold over-expression and 9.02% of transcripts (3,712 out of 41,152) displayed a 2-fold downregulation in 6 cervical cancer cell lines compared with 3 normal cervical epithelium tissues. Within the 21 common genomic amplification regions, 27.94% of the transcripts (772 out of 2,794) showed 2-fold over-expression. In the 13 significant amplification regions, the percentage was 29.56% (459 out of 1,553). In the 29 deletion regions, 10.46% (287 out of 2744) revealed 2-fold down-regulation. In the 2 significant deletion regions, the percentage was 11.67% (7 out of 60) ( Figure 1A and 1B). Statistical analysis showed that gene up-regulation or down-regulation was significantly correlated with genomic amplification (P < 0.01, Spearsman correlation test) or deletion (P < 0.01) events. Thus, chromosomal CNVs can contribute to their transcript expression in cervical cancer. Two tumor related genes, ABL1 and p53, which were located in the genomic amplification regions, were found to be over-expressed by at least 2-fold in cervical cancer.

Profiles differed between cervical cancer cell lines, primary cervical carcinoma and normal cervical epithelium tissues
Expression profiles of transcripts across 6 different cervical cancer cell lines and 2 cervical carcinoma tissues and 3 normal, age-matched, cervical epithelium samples were analyzed using a hierarchical clustering algorithm (unsupervised K-means clustering). Cervical cancer cell lines, cervical carcinoma and normal cervical epithelium were divided two main groups ( Figure 2A): cervical cancer cell lines for one group, and clinical cervical carcinoma and normal cervical epithelium for the other group. Interestingly, when gene tree clustering analysis was used to analyze genes with aberrant expression within the 50 common chromosome CNV regions and 15 significant chromosome CNV regions, similar results were obtained ( Figure 2B and 2C)

Gene ontology analysis for aberrantly expressed genes
By using a volcano plot in GeneSpring, 9,446 transcripts (27.4%) were found to be changed over two fold in the "Cancer group" (including 6 cervical cancer cell lines and 2 clinical cervical carcinomas) compared with the "Normal group" (including 3 cervical epithelium tissues). Among these genes, 6,001 transcripts were upregulated by over 2 fold, and 3,445 transcripts were downregulated by over 2 fold.

Pathway analysis showed that cell cycle pathways, cell communication pathways and DNA polymerase pathways were significantly affected pathways in cervical cancer
To further investigate the biological significance of these aberrantly expressed genes, pathway analysis was performed. The analysis results showed that cycle cycle pathways (P=1.15e-8), cell communication pathways (P=1.15e-8) and DNA polymerase pathways (P=1.15e-8) were significantly were affected in the cervical cancer ( Table 2). Pathway analysis for the 446 differentially expressed transcripts in the 15 significant chromosomal CNV regions also showed that the cell cycle pathway involved the highest number of transcripts (including ABL1, DUSP9, E2F4, TP53, PKMYT1 and PPP1CA) (Figure 3), and the P value is 0.00062 (Table 3).

DISCUSSION
In our genome-wide CNV analysis, we identified 50 frequently altered genomic regions (ranging from 0.5 Mb to 80 Mb), of which 11 have not been previously described in cervical cancer (Table 1). These differences in our results may be due to the different platforms of assay, different settings of analysis or the different cervical cancer cells. In these 50 commonly altered genomic regions, 3514 genes are included. Some of these, especially oncogenic or tumor suppressor genes, may be associated with the development of cervical cancer. The gene tree clustering result suggested that during the development of cervical carcinoma, gene expression significantly changes, and cervical carcinoma can be distinguished from normal cervical epithelium tissue by clustering analysis of the gene expression profile. Since the cervical cancer cell lines were separate from primary cervical carcinoma and normal cervical epithelium tissue, we assumed that extended culturing of cervical cancer cell lines may also significantly alter their gene expression profiles.
The integrated analysis of genome-wide chromosomal copy number changes and gene expression profiling indicated that the identified CNVs could contribute to the expression of some but not all genes ( Figure 1). This finding is consistent with a report by Vazquez-Mena et al. which used a different array platform to detect the correlation between CNVs and gene expression variation in cervical cancer cell lines. Other factors, such as epigenetic changes or transcription factors, may also contribute to variation of gene expression in cervical cancer [4]. Pathway analysis indicated that significant changes of some pathways, especially those involving the cell cycle, may be involved in the pathogenesis of cervical cancer.

Region
Region length  ABL1 and p53, the two cell cycle pathway tumorrelated genes which were located in the significant genomic amplification regions, were found to be overexpressed at least 2-fold in cervical cancer. ABL1, which was located in the chr9:128,223,213-139,309,447 genomic amplification region, plays a role in apoptosis. p53 is a well-known tumor suppressor gene, and an increase in p53 levels plays a critical role in the induction of genes that results in cell cycle arrest [7], allowing repair of damaged DNA or activation of apoptotic pathways [8]. In cervical cancer with high risk of HPV-infection, the E6 protein from high-risk HPV can bind to tumor suppressor protein p53 for rapid degradation via a cellular ubiquitin ligase [9]. Other studies have indicated that p53 protein over-expression is not common or associated with survival in cervical carcinoma [10,11]. However, from our array CGH and gene expression array data, p53 was located in the chr17:7,175,150-8,229,647 genomic amplification region and was over-expressed at the mRNA level (the median of the expression of p53 in cervical cancer cell lines was 1.316 [from 0.82 to 3.38); the median of the expression of p53 in normal cervical epithelium was 0.295 [from 0.153 to 0.485]). This suggests that gene dosage of p53 contributes to RNA over-expression in some cervical cancers.

Cytoband location
Our results demonstrated that over-expression of transforming growth factor-beta 1 (TGF-β1), a gene important in cell cycle pathways, may be due to a chromosomal CNV. TGF-β1 was amplified and was also over-expressed (>2 fold) in the cancer group compared with the normal group. TGF-β1 is involved in many different critical processes, such as embryonic development, cellular maturation and differentiation, wound healing, immune regulation and inflammation. TGF-β1 is a potent inhibitor of cell proliferation at the beginning of carcinogenesis [12,13]. When cells become resistant to TGF-β1, tumor growth may be enhanced and metastasis promoted via immune evasion and angiogenesis. An increased expression of TGF-β1 has been found in cervical cancer. Kirma et al. suggested that TGF-β1 may be a factor in inducing over-expression of an oncogene, c-fms. Blocking c-fms has been demonstrated to result in increased apoptosis and decreased motility in cervical cancer [14].
In summary, we have identified several chromosomal CNV regions and demonstrated that chromosomal CNVs are a common phenomenon which can affect the level of RNA expression in cervical cancer. Pathway analysis for the aberrantly expressed genes   suggested that significant changes of some pathways, especially those involving the cell cycle, may contribute to the pathogenesis of cervical cancer. This study provided some clinical significance for us to have a better understanding of cervical cancer pathogenesis.

Cervical cancer cell lines and specimens
Six human cervical cancer cell lines (HeLa, SiHa, C33A, ME180, CC2 and CC3) were used for aCGH and gene expression array. Five clinical specimens, including three normal cervical tissues and two cervical carcinoma specimens (FIGO stage: II A ), were collected from patients (aged between 36-42 years old) at the Department of Obstetrics and Gynaecology at the Prince of Wales Hospital in Hong Kong from January 2014 to December 2014. Informed consent was obtained from all participating subjects, and Institutional Review Board approval was obtained.

Tissue micro-dissection
Micro-dissection was used as described in our previous study [5]. Briefly, the tissue specimens were frozen in OCT cryomoulds (SAKURA, Japan), sectioned (8 μm) by a cryostat at -20°C (Leica Corp., CRYOCUT 1800), and then mounted onto glass slides (SAIL BRAND, Cat No 7105) at room temperature. Sections were stained by 0.1% methyl green (Sigma) and micro-dissected using a sterile surgical blade (AESCULAP) and collected immediately for further experiments.

Microarray comparative genomic hybridization analysis
Microarray comparative genomic hybridization using Human Genome CGH Microarray Kit (4x44K) (Agilent Technologies, Santa Clara, CA, USA) was used for identifying chromosomal CNV of 6 cervical cancer cell lines and 3 normal cervical samples as per the manufacturer's protocol. Briefly, genomic DNA of cervical cancer cell lines was extracted using a DNeasy Blood & Tissue Kit (QIAGEN, Cat No. 69506). 1 μg genomic DNA of test sample and 1 μg human sex-matched control DNA as a reference sample (Promega G1521A; Lot no. 20929604) were digested using Alu I and Rsa I. This was followed by fluorescent labeling, clean-up of labeled genomic DNA, microarray hybridization and scanning. The data were extracted using the Agilent Feature Extraction (FE) v11.0 program. After calculating the background signal, non-uniform signal and the average raw signal on each probe, the resulting data files were generated and transferred to the bioinformatics software, Nexus Copy Number version 6.1 (BioDiscovery, Inc., El Segundo, CA, USA) for analysis [6].

Gene expression analysis
The Whole Human Genome Microarray Kit, 4x44K (G4112F) was used to probe gene expression in 6 cervical cancer cell lines, 2 cervical carcinomas and 3 normal cervical epithelium tissue samples. The resulting data files were generated and transferred to GeneSpring GX version 11.5 (Agilent Technologies, Santa Clara, CA, USA) for further analysis.

Statistical analysis
For the Microarray Comparative Genomic Hybridization Analysis, 0.37 was used as the cut off value for amplification or 0.5 for deletion of a single probe. Putative chromosome copy number changes were defined by intervals of three or more adjacent probes with log 2 ratios suggestive of a deletion or duplication when compared with the log 2 ratios of adjacent probes. The p value for significant difference was set to less than 0.05 to reduce the false discovery rate (FDR). For Gene Expression Analysis, the cut-off value defining an aberrant change of gene expression was set at 2 fold for data analysis. Pathway analysis for the gene expression data was performed by GeneSpring GX version 11.5, and the pathway was downloaded from the KEGG database (ftp://ftp.genome.jp/pub/kegg/).