Risk of eighteen genome-wide association study-identified genetic variants for colorectal cancer and colorectal adenoma in Han Chinese

Background Recent genome-wide association studies (GWAS) identified eighteen single-nucleotide polymorphisms (SNPs) to be significantly associated with the risk of colorectal cancer (CRC). However, overall results of the following replications are inconsistent and little is known about whether these associations also exit in colorectal adenomas (CRA). Methods The SNP genotyping was performed using a Sequenom MassARRAY to investigate the association of these eighteen SNPs with colorectal neoplasm in a case-control study consisted of 1049 colorectal cancers, 283 adenomas, and 1030 controls. Results Two of these SNPs, rs10505477 and rs719725, showed evidence of an association in both CRC and CRA in our study population. Besides, seven SNPs (rs10808555, rs7014346, rs7837328, rs704017, rs11196172, rs4779584, and rs7229639) were significantly associated with CRC, and another one SNP rs11903757 was over-represented in CRA compared with controls. The strongest association was provided by rs11196172 (OR = 2.02, 95% CI = 1.66 - 2.46, P < 0.0001) and rs11903757 (OR = 1.96, 95% CI = 1.28 - 3.00, P = 0.0026). Conclusion These results suggest that some previously reported SNP associations also have impact on CRC and CRA predispositions in the Han Chinese population. A part of genetic risk to CRC is possibly mediated by susceptibility to adenomas.


INTRODUCTION
Colorectal cancer (CRC) is one of major causes of morbidity and mortality worldwide [1].In China, CRC is currently the fifth commonest malignancy [2].Twin studies have clearly shown that inherited susceptibility accounts for about 35% of variance in CRC risk [3].According to the classical adenoma-carcinoma cascade model, genetic mutation is a critical accelerant contributing to malignant transformation and progression of CRC [4].However, high-penetrance susceptibility mutations are supposed to explain only < 5% of CRC cases, which suggests the remaining inheritance lies in common low-penetrance variants [5].
The application of genome-wide association studies (GWAS) has made high-throughput genotyping of such common genetic variants possible.Several recent GWAS have uncovered multiple common single nucleotide polymorphisms (SNP) to CRC risk [6][7][8][9][10][11].Among these SNPs, rs10808555, rs719725, rs11632715, rs4813802, and rs3824999 are almost exclusively performed in Caucasian and little or nothing is known about the role of such variants in other populations.Moreover, the results from subsequent replication studies of the associations between CRC risk and other seven SNPs rs7014346, rs4939827, rs4779584, rs961253, rs11903757, rs16969681, and rs10505477 are not always concordant [12][13][14].Besides, the association between six

Research Paper
newly identified CRC risk variants in Asians (rs704017, rs11196172, rs10849432, rs12603526, rs7837328, and rs7229639) and colorectal neoplasm predisposition in Han Chinese is still unclear [8,15].Investigating their role in CRC risk susceptibility in other population is extremely crucial since the effect of a risk variant might differ greatly between populations [16].
The large majority of CRC develop from adenomas [17].It has been reported that much of genetic risk to CRC is likely to be mediated in part by predisposition to adenomas [18].CRC-related SNPs might act through increasing the risk of CRC, CRA, or both.
Thus, to capture a broader spectrum of CRC carcinogenesis, our subjects included both CRC patients and CRA patients.Then we conducted a case-control study to clarify the association between eighteen GWAS identified CRC risk related SNPs and colorectal neoplasm, both colorectal cancer and adenoma, in Han Chinese.We identified seven CRC-predisposing SNPs, one CRApredisposing SNP and two as both.

Subject characteristics
The characteristics of 1049 CRC, 283 CRA, and 1030 controls were detailed in Table 1.CRC cases were older on average and more likely to be male (P<0.05).Of the CRC cases, 557 had colon cancer, 482 had rectal cancer, and 10 had cancer in both sites.Regarding histological differentiation, 118, 615, and 78 cases were classified as high, intermediate and low grade, respectively, with the remaining 238 cases were unclear.As for the tumor stage, 147, 362, 359, and 181 cases were classified as stage I, II, III, and IV, respectively.Of the 283 cases with colorectal adenoma, 110 had at least one advanced adenoma.

Association between individual SNPs with CRC and CRA risk
The genotype of the eighteen SNPs in the cases and controls were shown in Table 2-3.Remarkably, we found the frequencies of genotypes of rs10505477 were significantly associated with the risk of CRC (CT genotype: OR = 1.50, 95% CI = 1.23-1.84,TT genotype: OR = 1.56, 95% CI = 1.21-2.02)as well as CRA with an OR of 1.44 (95% CI = 1.08-1.92)for CT genotype under dominant model.For another SNP, rs719725, the AC genotype was also associated with an increasing risk of both CRC (OR = 1.27, 95% CI = 1.04-1.54)and CRA (OR = 1.37, 95% CI = 1.03-1.82).Besides, we found the G allele of rs11196172 showed a strong association with the risk of CRC.Subjects with the rs11196172 AG or rs11196172 GG genotype had an OR of 2.02 (95% CI, 1.66-2.46)or 1.83 (95% CI, 1.31-2.56)compared with rs11196172 AA genotype.While the rs10808555 AG (OR = 1.26, 95% CI = 1.05-1.53)but not GG genotype (OR = 1.21, 95% CI = 0.89-1.64)had an increase risk compared with the rs10808555 AA genotype.An obvious increased risk was observed among individuals with rs7014346 GA genotype (OR = 1.29, 95% CI = 1.07-1.55),however, the rs7014346 AA was not associated with the risk (OR = 1.21, 95% CI = 0.89-1.64).Likewise, rs7837328 GA and rs704017 AG genotype had an OR of 1.48 (95% CI = 1.20-1.82)and 1.21 (95% CI = 1.01-1.46),respectively.The G allele of rs7229639 increase the likelihood of CRC only in a dominant genetic model (OR = 1.23, 95% CI = 1.02-1.48).For rs4779584, the TC genotype was associated with a significantly decrease in the risk of CRC, with an OR of 0.70 (95% CI = 0.57-0.86)and this is the only protective factor in our study.
Among the CRA risk associated genetic variants, the TC genotype of rs11903757 was associated with an increasing risk of CRA (OR = 1.96, 95% CI = 1.28-3.00),whereas no heterozygous variants carrier was found in our population.There were no significant associations between other SNPs and risk of CRC or CRA.
The four SNPs genotyped in the 8q24.21region within 17 kb showed high linkage disequilibrium (r 2 range: 0.97-0.99).Haplotype analysis revealed that the haplotype containing the variant allele at rs10505477, rs10808555, rs7837328, and rs10505476 were associated with an increase risk of CRC (OR = 1.251-1.487).

Association between SNPs and clinical characteristics
Further more, we assessed the associations between SNP genotype and clinical characteristics (Table 4-6).After stratification, four SNPs (rs10505477, rs7837328, rs11196172, and rs4779584) were associated with the cancer risk in rectum compared with keeping the association with cancer risk in colon under dominant model.Furthermore, we found a significant histological grade-specific difference in risk for six SNPs: rs10505477, rs10808555, rs7014346, rs7837328, rs4779584, and rs4813802.All of them were significantly associated with increased risk of low or intermediate histological grade, while rs719725, and rs11196172 also presented significantly elevated risk in high histological grade.In the case of tumor stage, rs10505477, rs719725, rs11196172, and rs4779584 were significantly associated with stage I or II disease, whereas eight SNPs (rs10505477, rs10808555, rs7014346, rs7837328, rs704017, rs11196172, rs4779584, and rs7229639) were significantly associated with stage III or IV.High-risk allele of rs10505477, rs7014346, rs7837328, rs11196172, and rs4813802 were more frequently found in patients with metastatic CRC.However, all of the CRC risk variants mentioned before showed significant associations with non-metastatic CRC.Among all these CRC risk SNPs, we found rs10808555, rs7014346, and rs7837328 seem to significantly increase risk for being more aggressive CRC.
With regarding to CRA, rs1193757 was only associated with non-advanced CRA and the association between rs11196172 and rs3824999 and colorectal adenoma risk was limited to advanced stage.Moreover, the effect of risk variants rs11903757 to non-advanced adenoma is more pronounced than that to overall adenoma, with an OR of 2.27 (95% CI = 1.39-3.71).

DISCUSSION
Despite that a number of genetic variants associated with CRC risk have now been identified, they have been performed almost exclusively in Caucasian populations and the effects of these risk variants in Han Chinese population are as yet unknown [19].Because the prevalence of CRC, allele frequency of SNPs and linkage disequilibrium (LD) structure differ greatly among populations, it is thus extremely important to known whether these associations still exist in different ethnic populations [20][21][22].We, therefore, systematically evaluated the association between eighteen SNPs and colorectal neoplasm risk in CRC as well as CRA to examine the full spectrum of colorectal carcinogenesis.Furthermore, we explored the potential effect of clinic-pathological variables of CRC and CRA on these variant-associated susceptibilities.SNP rs10505477 and rs719725 were found to be significantly associated with both CRC and CRA.In addition, we confirmed SNPs at 8q24.21 10q22.3,10q25.2,15q13.3, and 18q21.1 were genetic risk factors for CRC.We also observed the association between SNP from risk loci at 2q32.2 and the risk of CRA (Supplementary Table S1).

Associations of both colorectal cancer and adenoma risk with individual SNP
SNP rs10505477 on chromosome 8q24.21was firstly identified as a risk locus for CRC by a GWAS in populations derived from the South America and Scotland [23], and then supported by another two GWAS in the Spanish and the East Asians populations as well as several other population-based studies [6,[24][25][26][27][28][29][30][31].However, there were various studies failed to confirm the association between rs10505477 and CRC risk [12,24,32].It has also been reported that the locus influences the susceptibility in both CRA and CRC [33].Although a GWAS identified rs10505477 is the most likely SNP associated with CRA, it did not achieve a genome-wide significant P-value [34].
We also found significant association of rs10505477 with both colon and rectum in stratified analysis.In a previous study, the association between this SNP and CRC risk had no difference at tumor site [26].And similar result held in our study.While Monir et al. observed tumor sitespecific association of rs10505477 with CRC risk merely in colon [31].Rs10505477 is near the gene putative POU domain, class 5, transcription factor 1B (POU5FIB).Overexpression of this gene has recently been reported in gastric cancer and it can promote tumor formation and growth in vivo [35].But the association was not observed between POU5FIB expression and genotype of rs10505477 [36].This SNP also maps to the intron of a long noncoding RNA (lncRNA), cancer-associated region long noncoding RNAs-1 (CARLos-1).Silico studies have shown that rs10505477 change the CARLos-1 local folding structure, which suggests it may lead to aberrant expression of CARLos-1 [37].Although the biological function of CARLos-1 is still unclear, the CARLos-5, which is in the same region with CARLos-1, has a function in tumor development [38], indicating that CARLos-1 may also play a role in carcinogenesis.
Another CRC and CRA risk-associated genetic variant rs719725 resides in an intergenic region on chromosome 9p24.1.The most proximal gene, tumor protein D52-like 3(TPD52L3), one of three members of human TPD52 (hTPD52), is 37 kb distal to it.hTPD52 is first identified in breast cancer.Moreover, they have been shown to participate in cell proliferation, apoptosis, and vesicle trafficking [39].This SNP was previously identified as a genetic susceptibility factor for CRC in a GWAS study of various Caucasian populations [40].Although a majority of subsequent validation attempts were failed, there were still several lines of evidence support its association with CRC risk.However, its role in CRA predisposition has drawn little attention [41][42][43][44][45][46].Merely marginally significant evidence for the association between rs719725 and CRA risk was observed in a case-control study [47].In our study, respectively compared with rs10505477 CC or rs719725 AA genotype, individuals who carried rs10505477 CT / TT or rs719725 AC genotype had elevated risk in colorectal carcinoma as well as adenoma, suggesting the possibility that the association between these two SNPs and CRC susceptibility is mediated through adenoma risk, namely the SNP might involve in initiation of tumor only, or might still have an effect at the progression stage, or both.

Colorectal cancer susceptibility genetic variants
Three SNPs rs10808555, rs7837328, and rs7014346, all mapping to 8q24.21, were associated with CRC susceptibility.This finding is consistent with a metaanalysis of genome-wide association data [7].Moreover, after stratified by tumor differentiation, stage, presence or absence of distant metastasis, we found an association of these three SNPs with aggressive advanced cancer, providing evidence that they play a role in tumor initiation and development.The identification of rs10808555 as a risk variant for CRC was confirmed by a study in Caucasians [33].And SNP rs7837328 had been identified as a susceptibility variant of prostate cancer in various studies [48][49][50].The finding that colorectal epithelial cell proliferation was higher with the presence of either rs10808555 GG genotype or rs7837328 AA genotype suggested a plausible linkage of the CRC risk association [13].Just as rs10505477, SNP rs7014346 also resides near POU5FIB.A study confirmed the association between rs7014346 and the susceptibility of CRC in Hong Kong Chinese, although this association was not observed in other two studies in Chinese population [51,52].
We further determined the haplotype block structure across 8q24.21 containing the variant allele at rs10505477, rs10808555, rs7837328, and rs10505476 to capture information on the LD structure in the region, and hence potentially provide greater power to detect association.Haplotype analysis of these variants showed that the haplotype containing rs10505477, rs10808555, rs7837328, and rs10505476 were associated with an increased risk of CRC (Table 7).Howerer, the strength of the association for the haplotype (OR per T-A-G-G haplotype = 1.487) was not greater than that of single variant rs10505477 and rs7837328 (OR = 1.52 for both under dominant model) while slightly larger than previous haplotype analysis in this region (OR=1.17)[33].
Another two SNPs located on 10q22.3(rs704017) and 10q25.2(rs11196172) were new loci associated with CRC observed by a recent large scale genetic study in East Asians, yielded an OR of 1.10 (95%CI = 1.10-1.18)and 1.14 (95%CI = 1.10-1.18)respectively based on the risk allele [8].SNP rs704017 maps within the third intron of the gene ZMIZ1 antisense RNA 1 (ZMIZ1-AS1), which is a miscRNA and its function is still unknown.In our study, rs11196172 GG genotype associated with 2.02-fold Rs4779584 is proximal to the GREM1 locus on chromosome 15q13.3,and the association between this SNP and CRC susceptibility had been validated in Hong Kong and Taiwan Chinese [55,56].It was reported that rs4779584 is also associated with CRA risk [57][58][59].But this association was not observed in the Han Chinese population we studied.GREM1 encodes a member of the bone morphogenic protein (BMP) antagonist family.It is an essential component in the transforming growth factor-β (TGF-β) pathway.Moreover, the TGF-β/BMP pathway has been demonstrated implicated in colorectal carcinogenesis.GREM1 promote the loss of cancer cell differentiation [60].A recent research found that aberrant expression of epithelial GREM1 initiates colonic tumorgenesis [61].This indicates that GREM1 may be a possible explanation for the association between rs4779584 and CRC risk.
Another SNP in the TGF-β family signaling pathway, rs7229639, was also found to be associated with CRC risk.This SNP is located in the third intron of the SMAD 7 gene, which is a key member in TGF-β pathway.The association of this variant with CRC risk was initially observed through a GWAS conducted in East Asians in 2014 [62].In the stratified analysis of this study, an OR of 1.20 (P = 6.65×10 -5 ) per-allele was yielded in Chinese and for the first time the association between this SNP and CRC was replicated.Our similar finding further validated the association between this variant and the susceptibility of CRC in Chinese.
Little evidence showed genetic involvement of these seven CRC predisposition variants in CRA risk suggests that they might only affect the malignant stage of colorectal carcinogenesis in our study population.

Colorectal adenoma risk related variants
Interestingly, we found SNP rs11903757 was associated with CRA but not CRC risk.Previous studies of rs11903757 focused on CRC risk, but the overall results were inconclusive and little is known about its role in CRA susceptibility.In our study, TC genotype of rs11903757 showed the most significantly association of CRA risk with an OR of 1.96 (95% CI = 1.28-3.00,P=0.0026).This SNP lies closely to the region of the gene nucleic acid binding protein 1 (NABP1), which encodes a single-stranded DNA (ssDNA)-binding protein and plays a critical role in genomic stability and participates in DNA damage response [63,64].Functional experiments are needed to identify the causal loci.
In this study, neither CRC nor CRA predisposition are related with previously identified SNPs: rs2287939, rs10849432, rs7229639, and rs961253.The failure to Another reason for the disparity might lie in the geneenvironment interaction.It is also possible that the association is real but the gene effect size is too weak to be detected with sufficient statistical power.
Recently, a GWAS of colorectal cancer in East Asians [65] and two in Chinese (labeled as Chinese 1 [66] and Chinese 2 [67]) identified new novel loci for colorectal cancer risk.Most of our variants were included in the three GWAS (Supplementary Table S2).However, the majority of them failed to reach statistical significance.We noticed that study subjects of the GWAS in East Asians and in Chinese 2 were enrolled mainly from the three largest cosmopolitan cities of China: Beijing, Shanghai, and Guangzhou.It was also notable that the minor allele of rs4779584 was C in our population but T in these studies.Genetic heterogeneity on account of sampling from a mixture of migrant workers from all over China would probably cause different study findings.In addition, for GWAS in Chinese 2, almost every interesting associations of our study demonstrated P values of less than 0.05.These variants did not reach statistical significance of GWAS possibly due to insufficient sample size, with only 1,049 colorectal cancer cases and 1,315 controls in the GWAS stage.These could be the explanation for the disparity.
Our study has several limitations.First, our sample size, especially for CRA , was relative small and the subjects were not representative of the general Chinese population.Second, carcinogenic exposure which may modify the effects of the genetic factors was not taken into consideration.Third, the bioinformatics analysis we performed is insufficient for clarifying a direct causal association.Therefore, the interpretation of our results should be taken with caution.
In summary, our case-control study provide convincing evidence for assigning seven SNPs as CRCpredisposing, one as CRA-predisposing and two as both.These variations may have potential implications for modulating colorectal neoplasm screen measures and improve our understanding of the initiation and progression of CRC.Further studies are warranted to characterize functional sequences that cause carcinogenesis and identify novel variants contributing to susceptibility of colorectal tumor.

Study subjects
All participating individuals were genetically unrelated Han Chinese.The characteristics of CRC, CRA, and controls included in this study were summarized in Table 1.CRC patients were recruited between 2011 and 2013 in the Second Affiliated Hospital, School of Medicine, Zhejiang University, China.CRA and controls were enrollees of a colorectal cancer screening programme conducted in the same regions and underwent a colonoscopy during the same time period.The diagnosis of CRC and CRA were confirmed histologically without any treatment before.Individuals with negative colonoscopic findings and no previously recorded or currently diagnosed CRC or CRA were defined as the controls of this study.The exclusion criteria of all subjects included a family history of CRC, histories of previous cancers, genetic colorectal cancer syndromes (i.e.familial adenomatous polyposis), bowel resection, inflammatory bowel disease, and partial colonoscopy.Based on colonoscopy and pathology reports, adenomas were defined as advanced if they 1) were ≥1cm in diameter or 2) had ≥20% villous components or showing highgrade dysplasia.Written informed consent was obtained from each subject before collection of blood samples and information.This study was approved by the Institutional Review Board of Clinical Research, the Second Affiliated Hospital, Zhejiang University School of Medicine and complete in accordance with the ethical principles in declaration of Helsinki (Oct 2008).

Genotyping
Blood samples collected from each study subject were stored at -80°C before DNA extraction.Genomic DNA was isolated from peripheral blood leukocytes, using the DNA Isolation Kit for Mammalian Blood (Roche, Indianapolis, USA).Genotyping of the selected SNPs was performed using the Sequenom MassARRAY platform with iPLEX Gold chemistry on a matrixassisted laser desorption/ ionization time-of-flight mass spectrometer (Sequenom, San Diego, California) according to the supplier's instructions.The extension primers were designed using MassARRAY Assay Design 4.0 software (Sequenom, San Diego, California).Products generated from the polymerase chain reaction (PCR) were transferred to 384-well Spectro-CHIPs (Sequenom, San Diego, California), and analyzed in a Compact Mass Spectrometer, using the MassARRAY Typer 4.0 Software.The PCR assay was arrayed with positive and negative controls and duplicated samples in each 384-well format as quality control.All the genotyping results were generated and checked by laboratory staff blinded to the sample status.

Statistical analysis
Differences in demographic variables were examined by the Χ 2 test and t-test.All the eighteen SNPs were tested for the Hardy-Weinberg equilibrium using the Χ 2 test and all the genotype distribution in controls complied with Hardy-Weinberg equilibrium.Haploview version 4.2 was used to infer LD structure and Haplotype.
The association between the risk and SNPs were estimated as odds ratios (OR) and 95 % confidence intervals (95% CI) computed by logistic regression under a dominant or recessive genetic model adjusted for age and gender.Twotailed P values at levels less than 0.05 were considered as significant.All the statistical analysis were carried out with the SPSS 18.0.

Table 2 : Association of 18 SNPs with CRC risk in Chinese location a rsID CAU b CHB b GWAS f none risk / risk allele MAF controls MAF CRC Colorectal cancer miror allele miror allele MAF OR P-value Ref g OR(95%CI) P-value d
Minor allele and minor allele frequencies of Caucasian and Han Chinese in Beijing from HapMap Release 28 Ulrike Peters et al.Gastroenterology 2012; 2.Brent W Zanke et al.Nature Genetics 2007;3.Richard S Houlston et al.Nature Genetics 2008; 4.Wei-Hua Jia et al.Nature Genetics 2013;5.R Cui et al.Gut 2011;6.Brent W et al.Zanke et al.Nature Genetics 2007;7.Ben Zhang et al.Nature Genetics 2014;8.Luis M. Real et al.PLOS ONE 2014;9.Tomlinson et al.Nature Genetics 2011;10.Ian PM Tomlinson et al.Nature Genetics 2007;11.Ben Zhang et al.International Journal of Cancer 2014; 12. COGENT Nature Genetics 2008.
a SNP locations based on Human Genome build 36.b f results in validation stage g References:1.

Table 3 : Association of 18 SNPs with CRA risk in Chinese
a SNP locations based on Human Genome build 36.b Minor allele and minor allele frequencies of Caucasian and Han Chinese in Beijing from HapMap Release 28 c not available d Adjusted for age sex e OR and 95%CI for rs11903757 TC VS TT genotype in our study

Table 5 : Association of 18 SNPs with tumor stage and metastatic status of CRC
aOR and 95%CI for rs11903757 TC VS TT genotype

Table 7 : Association of CRC risk with the haplotypes comprising rs10505477, rs10808555, rs7837328, and rs7014346 Haplotype a Frequency P b OR (95% CI)
Haplotypes of rs10505477, rs10808555, rs7837328, and rs7014346 b P values from unconditional logistic regression analyses, adjusted for age and gender.replicateCaucasian(CAU)-identifiedvariants could be due to clear difference in terms of allele frequency present between CAU and Han Chinese in Beijing (CHB) (Table2) or different pattern of linkage dis-equilibrium. a