Association of genetic variants in lncRNA H19 with risk of colorectal cancer in a Chinese population

Objective The long non-coding RNA (lncRNA) gene, H19, has been involving in multiple biological functions, which also plays a vital role in colorectal cancer carcinogenesis. However, the association between genetic variants in H19 and colorectal cancer susceptibility has not been reported. In this study, we aim to explore whether H19 polymorphisms are related to the susceptibility of colorectal cancer. Methods We conducted a case-control study to evaluate the association between four selected single nucleotide polymorphisms (SNPs) (rs2839698, rs3024270, rs217727, and rs2735971) in H19 and the risk of colorectal cancer in a Chinese population. Results We found that individuals with rs2839698 A allele had a significantly increased risk of colorectal cancer, compared to those carrying G allele [odds ratio (OR) = 1.20, 95% confidence interval (CI) = 1.05–1.36 in additive model]. Further stratified analyses revealed that colon tumor site, well differentiated grade and Duke's stage of C/D were significantly associated with colorectal cancer risk (P < 0.05). Additionally, bioinformatic analysis showed that rs2839698 may change the crucial folding structures and alter the target microRNAs of H19. Conclusions Our results provided the evidence that rs2839698 in H19 was associated with elevated risk of colorectal cancer, which may be a potential biomarker for predicting colorectal cancer susceptibility.


INTRODUCTION
Colorectal cancer is the most-common malignant tumor worldwide, with over 132,700 new cases and 49,700 deaths estimated every year in the United State [1]. Epidemiological data reported by the International Agency for Research on Cancer (IARC) demonstrated that colorectal cancer accounts for 8.3% and 6.3% of all malignancies incidence and mortality in China, respectively [2]. The occurrence and development of colorectal cancer is caused by a series of multifactorial and complex factors including environmental alterations and genetic aspects [3,4]. In recent years, considerable genome-wide association studies (GWAS) have identified numerous genetic variants impacting the risk of colorectal cancer [5][6][7][8]. Zhang et al. conducted a GWAS in East Asians and identified 6 new loci associated with colorectal cancer risk [9]. In addition, Jia et al. identified three new colorectal cancer susceptibility loci [10]. These studies provide additional insights into the genetic and biological basis of colorectal cancer.
As we known, the abnormality of gene expression may increase the risk or severity of diseases [11][12][13]. Long noncoding RNAs (lncRNAs) have also been implicated in the crucial functions of various biological process involved in cancer susceptibility [14,15]. The lncRNA H19, highly conserved on chromosome 11p15.5 in human, is a maternal expressed gene that plays key roles in embryogenesis during fetal time [16,17]. However, it is down-expressed in maturing tissues postnatal [18]. Accumulating evidences suggested that H19 was up-regulated in a variety of cancer types, including breast cancer [19,20], esophageal cancer [21], bladder cancer [22] and colorectal cancer [23]. In addition, the differentially methylated regions (DMRs), which located upstream of the transcription start of H19, act the part of methylation-sensitive insulator [24]. Furthermore, emerging studies indicated that H19 may activate tumorigenicity by acting as the precursors of microRNAs (miRNAs) or competitive endogenous RNAs (ceRNAs) [25][26][27]. Tsang et al. observed that miR-675, derived from H19, may decrease the expression of retinoblastoma (RB) and increase the growth and development of colorectal cancer cells [28]. Induction of epithelialmesenchymal transition (EMT) in cancer cells due to aberrant H19 expression can promote pancreatic ductal adenocarcinoma cell invasion and migration [29].
Recently several single nucleotide polymorphisms (SNPs) within lncRNA genes have been extensively confirmed to modulate the expression and function of lncRNA and further cause tumor susceptibility and prognosis changing [30,31]. As for SNPs in lncRNA H19, cumulative studies have identified the associated with malignant diseases [32,33]. In this study, we conducted a case-control study to genotype the candidate SNPs in H19 (rs2839698, rs3024270, rs217727, and rs2735971) and investigate the association with the risk of colorectal cancer.

Characteristics of the study subjects
1147 colorectal cancer patients and 1203 controls were recruited in this study. No significant differences were observed regarding to age and gender between patients and cancer-free controls (P = 0.751 and P = 0.116, respectively), indicating satisfactory matching by these factors. There were no significant differences in smoking and drinking status between the patients and controls (P > 0.05). However, more colorectal cancer individuals were found to have family history of cancers than subjects in control (P < 0.001). For tumor grade, 7.4% of colorectal cancer cases were in low grade, and 76.7 % in the intermediate, and 15.9 % in the high grade. Moreover, the frequencies of the tumor Duke's stage were 8.4% (A), 43.1% (B), 36.8% (C) and 11.7% (D).

Associations of selected SNPs in H19 and colorectal cancer risk
The positions of four selected SNPs in H19 are shown in Figure 1. Primary information and the distributions of genotypes were consistent with those expected from Hardy-Weinberg equilibrium (HWE) in the control group (P = 0.666 for rs2839698, P = 0.979 for rs3024270, P = 0.959 for rs217727 and P = 0.175 for rs2735971, respectively. Besides, we calculated genotype frequencies of H19 tagSNPs among cases and controls and their associations with colorectal cancer risk according to variant genetic effect models (additive, dominant, recessive and co-dominant models) ( Table 1, Supplementary Table S1). As a result, we observed that rs2839698 was significantly associated with the risk of colorectal cancer after the adjustment for age, gender, smoking and drinking status by performing multivariate logistic regression analysis in additive model [odds ratios (ORs) = 1.20, 95% confidence intervals (CIs) = 1.05-1.36, P = 0.007 and P = 0.028 after Bonferroni correction]. No significant association between rs3024270, rs217727 and rs2735971 and colorectal cancer risk were found in the additive model.

Stratification analysis of associations between rs2839698 and colorectal cancer
To exclude whether the possible confounders play roles in the colorectal cancer risk, we conducted the stratified analysis upon the associations between rs2839698 and colorectal cancer by age, sex, smokers, drinkers and family history of cancers. Due to the small number of AA genotype group, we performed the stratification analysis under dominant model. As shown in Table 2, more profoundly increased risk of colorectal cancer were identified in terms of younger subjects (age ≤ 61) (P = 0.007), males (P = 0.022), drinkers (P = 0.008) and smokers (P = 0.002).

Association between rs2839698 and clinicopathologic characteristics of colorectal cancer
Next, we performed the subgroup analysis in different clinicopathologic variables to evaluate the relationship between rs2839698 and colorectal cancer risk. As shown in Supplementary Table S2, rs2839698 GA/AA genotypes were associated with an increased risk of colorectal cancer in individuals with colon tumor site (OR = 1.25, 95% CI = 1.02-1.52, P = 0.033), well differentiated grade (OR = 1.54, 95% CI = 1.13-2.11, P = 0.007) and Duke's C/D stage (OR = 1.37 95% CI = 1.12-1.68, P = 0.002). However, no dramatically significant risk effect of colorectal cancer was observed in other subgroups.

Prediction of rs2839698 on H19 folding structures and target miRNAs
We performed in silico analyses using RNAfold and SNPfold to predict the H19 secondary structure of selected SNPs. As a result, the secondary structure was dramatically changed with rs2839698 G/A alleles (Figure 2), rs3024270 C/G alleles and rs217727 G/A alleles (Supplementary Figure S1). However, there were few changes with rs2735971 T/C alleles.
According to the fact that rs2839698 (G/A) is located in the exon (3ʹuntranslated region) of H19 gene, we speculate that genetic variant in rs2839698 may change the promoter activity and function of H19 to a certain extent through alteration of target miRNAs and subsequently lead to colorectal cancer. Consequently, we used miRNASNP v2.0 to predict whether rs2839689 (G/A) in 3ʹuntranslated region of H19 gene can induce target miRNAs gain/loss. We found that hsa-miR-24-1-5p, hsa-miR-4486, hsa-miR-566 and hsa-miR-24-2-5p may lose the target H19 gene, following with the creating binding site of hsa-miR-612, hsa-miR-5189, hsa-miR-1285-3p and hsa-miR-3187-5p (Supplementary Table S3).

DISCUSSION
H19 is a paternally imprinted oncofetal lncRNA gene locus on chromosome 11p15.5 which is downregulated after birth and possesses oncogenic properties. Previous studies have indicated that H19 involves in the  complex biological process of oncogenesis [26,34,35]. Liang et al. reported that the lncRNA H19 play the part of miRNA sponges to promoting EMT in colorectal cancer [36]. H19 may also act as a primary miRNA precursor to continue the function [37]. Moreover, H19 has the potential to produce 91H RNA, which regulates insulin like growth factor 2 (IGF2) expression and is overexpressed in breast cancer cells [38]. Despite of extensive evidence, the function of H19 in the molecular mechanism of tumorigenesis is still not clear. Emerging evidences have implied that genetic variants in lncRNAs may modify the risk of multiply tumors [30,39]. Verhaegh et al. have found that H19 gene polymorphisms were concerned in susceptibility of bladder cancer in European Caucasians [33]. However, to our knowledge, no data previously has explored the correlation between H19 genetic variants and colorectal cancer susceptibility in a Chinese population.
In this study, we selected four tagSNPs (rs2839698, rs3024270, rs217727, and rs2735971) in H19 gene and DMR to estimate the association between these variants and colorectal cancer susceptibility. We observed that rs2839698 GA/AA genotype has an increased risk of colorectal cancer in the Chinese populations compared with the GG genotype. Yang et al. also demonstrated that rs2839698 contributed to the risk of gastric cancer in a Chinese population [32]. All of the above suggested that H19 genetic variants play an important role in cancer susceptibility.
Some environmental factors, such as alcohol intake and tobacco smoking, were related with the elevated colorectal cancer risk [40][41][42]. Our stratified analyses demonstrated that individuals, including smokers and drinkers, carrying rs2839698 GA/AA genotype had a significantly increased susceptibility of colorectal cancer. Therefore, the markedly induced risk of colorectal cancer associated with variant rs2839698 genotypes could partly attribute to the accumulated exposure/exposure history to alcohol consumption or tobacco carcinogens. Moreover, we found that increased colorectal cancer risk correlated with rs2839698 was more remarkable in subgroups of younger individuals and male, suggested that promoting effects of H19 variants on colorectal cancer may be modulated by specific epidemiological features. These results provided confirmed that colorectal cancer tumorigenesis is a complex and multistep process involving diverse genetic and environmental modifications. However, we found that the associations of colorectal cancer risk in subjects with and without family history of cancers were almost the same. It is reasonable that the number of controls with family history was less than the patients.
We further found that rs2839698 GA/AA genotype had an increased risk of colorectal cancer among patients with Duke's stage of C or D. It is rational that the genetic variants may play a vital role in the advanced stage of colorectal cancer and lead to our present result. However, the results showed that subjects with the gene loci variation were involved in the obviously increased risk of colorectal cancer among colon site and well differentiated grade subgroups, indicating that different colorectal cancer site and grade regulated by different molecular biological mechanisms may bring about different level of risk in carcinogenesis of colorectal cancer [43].
Given the important function influence of folding structure changes of lncRNAs caused by SNPs, we predicted the secondary structure changes of H19 ascribed to selected SNPs using RNAfold and SNPfold algorithms. We found that the folding architectures markedly changed along with the genetic variant of rs2839689, rs3024270, rs217727, and rs2735971, suggested that SNPs may be involved in occurrence and development of colorectal cancer by altering the specific structural motifs of H19 and exerting various effects on H19 expression and function [44]. Besides, accumulating studies have revealed that SNPs in lncRNAs can be directly regulated and modified by miRNAs [34,43], and SNPs might be plausible reason for alteration of interactions between miRNAs and lncRNAs [45]. Based on previous evidence, miRNASNP v2.0 was used to predict the lost miRNAs of wild sequence and the obtained miRNAs of SNP sequence. We found that four obtained miRNAs and four lost miRNAs possibly linked with lncRNA H19. The changes of target miRNAs may potentially affect the expression and function of H19 due to rs2839689 variant, which ultimately modulate the risk of colorectal cancer.
In summary, we have provided the evidence that H19 rs2839689 contributes to the susceptible to colorectal cancer in the Chinese population. Further both larger prospective studies and functional researches are needed to validate the finding in different ethnicities.

Study participants
The present study was approved by the Institutional Review Board of Nanjing Medical University. All the study participants were genetically unrelated Chinese and provided written informed consent. Briefly we consecutively recruited 1,147 patients with colorectal cancer and 1,203 cancer-free controls. All cases were histopathologically confirmed colorectal tumor from the Affiliated Nanjing First Hospital and the First Affiliated Hospital of Nanjing Medical University on September 2010, without age or sex restrictions. Control individuals were matched to the cases based upon age ( ± 5 years) and sex. Details of the study participants have been demonstrated previously [39,46]. www.impactjournals.com/oncotarget

SNP selection
We focused on both lncRNA H19 gene and its promoter (including DMR) located in human chromosome 11p15.5 using UCSC browser (http://genome.ucsc.edu/). Four SNPs were selected on the basis of four filtering criteria: (a) minor allele frequency (MAF) > 0.05 in the CHB and JPT population from the 1000 Genomes Project; (b) r 2 > 0.8 analyzed based on pairwise linkage disequilibrium using Haploview version 4.0; (c) the secondary structure changed using RNAfold; (d) the Gibbs binding free energy (∆G, kJ/mol) > 0.

Genotyping
Genotyping was performed using the TaqMan allelic discrimination assay. The 384-well ABI 7900HT real-time PCR system (Applied Biosystems, Foster City, CA, USA) was applied to amplify all the sample genotypes, with SDS 2.4 software (Applied Biosystems) used to read and analyze allelic discrimination. Both the sequences of the primers and fluorescent probes are showed in Supplementary  Table S4. The average call rates for four SNPs were more than 99%. Additionally, we randomly selected over 10% of the samples for repeated assays and the final concordance rate between duplicate samples was 100%.

Statistical analysis
Differences in the distribution of epidemiological variables between cases and controls were calculated using Student's t-tests (continuous variables) and chisquare χ 2 tests (categorical variables). The crude and adjusted ORs and 95% CIs were using to examine the correlation between different genotypes and colorectal cancer risk from unconditional univariate as well as multivariate logistic regression analyses under variant genetic models. Age, sex and smoking and drinking status were involved in the possible confounders in order to perform multivariate logistic regression analyses. HWE in the cancer-free groups was computed by a goodness-of fit chi-square test. Linkage equilibrium (LD) between SNPs in H19 was calculated using Haploview 4.0 software. Bonferroni correction was applied to conservatively account for multiple comparisons. All outputted P-values were 2-sided and the criterion of P-value for statistical significance is less than 0.05. Moreover, all of the tests were performed using SAS software package (version 9.1.3; SAS Institute, Inc.,Cary, NC).