Functional single nucleotide polymorphisms within the cyclin-dependent kinase inhibitor 2A/2B region affect pancreatic cancer risk

The CDKN2A (p16) gene plays a key role in pancreatic cancer etiology. It is one of the most commonly somatically mutated genes in pancreatic cancer, rare germline mutations have been found to be associated with increased risk of developing familiar pancreatic cancer and CDKN2A promoter hyper-methylation has been suggested to play a critical role both in pancreatic cancer onset and prognosis. In addition several unrelated SNPs in the 9p21.3 region, that includes the CDNK2A, CDNK2B and the CDNK2B-AS1 genes, are associated with the development of cancer in various organs. However, association between the common genetic variability in this region and pancreatic cancer risk is not clearly understood. We sought to fill this gap in a case-control study genotyping 13 single nucleotide polymorphisms (SNPs) in 2,857 pancreatic ductal adenocarcinoma (PDAC) patients and 6,111 controls in the context of the Pancreatic Disease Research (PANDoRA) consortium. We found that the A allele of the rs3217992 SNP was associated with an increased pancreatic cancer risk (ORhet=1.14, 95% CI 1.01-1.27, p=0.026, ORhom=1.30, 95% CI 1.12-1.51, p=0.00049). This pleiotropic variant is reported to be a mir-SNP that, by changing the binding site of one or more miRNAs, could influence the normal cell cycle progression and in turn increase PDAC risk. In conclusion, we observed a novel association in a pleiotropic region that has been found to be of key relevance in the susceptibility to various types of cancer and diabetes suggesting that the CDKN2A/B locus could represent a genetic link between diabetes and pancreatic cancer risk.


INTRODUCTION
The majority of pancreatic cancer patients die within a year of diagnosis [1]. The poor prognosis is caused by various factors, including the lack of appropriate markers for early detection, the aggressiveness of the disease and the dearth of effective treatment possibilities available. One of the best strategies to reduce the mortality of the disease is to improve early diagnosis, and it is, therefore, important to identify individuals at high risk in the population and subject them to enhanced surveillance.
Only a few epidemiologic risk factors have been established for pancreatic cancer, including cigarette smoking, heavy alcohol intake, diabetes mellitus, obesity, chronic pancreatitis and family history of pancreatic cancer [2,3]. Even less is known about the genetic contribution to the disease, since only a rather small number of susceptibility loci have been identified through genomewide association studies (GWAS) [4][5][6][7][8][9] and confirmed by follow-up studies [10]. Moreover it has been shown that a small proportion of pancreatic tumors arises as a result of high penetrance germline mutations in genes such as BRCA1, BRCA2, p16/CDKN2A, STK11/LKB, ATM, PALB2, and DNA mismatch repair genes, usually in the context of familial cancer syndromes [2,3,[11][12][13][14][15]. However, the very low frequency of those mutations cannot explain the bulk of genetic susceptibility to pancreatic cancer.
There are compelling epidemiologic and molecular evidences pointing to a key role for the CDKN2A gene in pancreatic cancer etiology. CDKN2A is one of the most commonly somatically mutated genes in pancreatic cancer [16], rare germline mutations have been found to be associated with increased risk of developing familiar pancreatic cancer [15,17], and also CDKN2A promoter hyper-methylation has been suggested to play a critical role both in pancreatic cancer onset and prognosis [18]. Additionally, the 9p21.3 region, that includes in addition to CDKN2A also CDKN2B and CDKN2B-AS1,is pleiotropic and several polymorphic variants in the region spanning the three genes are susceptibility markers for several cancer types [19][20][21][22][23][24]. In addition Li and colleagues performed an association study and meta-analysis across multiple cancers corroborating the pleiotropic role of the locus [25]. Several polymorphic variants in the region are also associated with type two diabetes mellitus (T2DM), which is a predisposing factor for pancreatic cancer, suggesting a possible role of the 9p21.3 region as a genetic link between the two diseases [26][27][28]. The pleiotropic role of the variants of this region is probably due to the central importance of the genes situated in it in cell cycle regulation. For example CDKN2A codes, by alternative splicing, for the two oncosuppressors p16INK4a and p14ARF [29,30]. Despite all the hints pointing towards an association between common genetic variability in the CDKN2A/B gene region and pancreatic cancer risk no one has attempted to directly relate them so far. We sought to fill this gap in a case-control study genotyping 13 single nucleotide polymorphisms (SNPs) in the context of the PANcreatic Disease ReseArch (PANDoRA) consortium.

Data filtering and quality control
The characteristics of the population enrolled in the study are shown in Table 1. All analyzed SNPs were in HWE in controls (P>0.0038) with the exception of rs3731246 in the controls from Mannheim (Germany) and Southern Italy and rs2811710 in the controls from Mannheim. The populations not respecting the HWE were not included in the statistical analyses for the relevant SNPs. Starting from a population of 9,796 subjects, 828 subjects with a call rate <75% were removed after genotyping, leaving 8,968 for further analysis. The average SNP call rate was 96% with a minimum of 79.95% (rs3731246) and a maximum of 99.19% (rs3218009). The analysis of the random duplicate samples showed a concordance rate of 99.68%. For the Japanese cases it was not possible to correctly genotype rs3731246 and therefore this SNP was not used in the risk analysis for the Japanese population. Supplementary Table S1 shows the call rate and the HWE equilibrium for each SNP.

SNP main effects
We analyzed separately Caucasian and Japanese individuals. For the individuals of Caucasian origin we observed that 6 SNPs showed a statistically significant association (p<0.05) with increased or decreased PDAC risk. The strongest association with an increased risk of PDAC was observed with the A allele of rs3217992 SNP (OR het =1.14, 95% CI 1.01-1.27, p=0.026, OR hom =1.30, 95% CI 1.12-1.51, p=0.00049, unadjusted p-trend =0.0002, adjusted p-trend= 0.32). Other, less significant associations were observed with rs3731249, rs2811708, rs3731211, and rs1063192. The frequencies and distributions of the genotypes (for the Caucasian group) and the OR for the association of each polymorphism with PDAC are described in Table 2. In the Japanese population the G allele of rs2811708 was associated with a decreased risk of PDAC (OR het =0.58, 95% CI 0.36-0.95, p=0.029) and the A allele of rs1063192 was associated with a decreased risk of PDAC (OR het =0.49, 95% CI 0.30-0.80, p=0.005, p-trend=0.03). The frequencies and distribution of the Japanese genotypes www.impactjournals.com/oncotarget and the OR for the association of each polymorphism with PDAC are described in Table 3.

Possible functional effects
We used several bioinformatic tools to predict possible functional relevance for the SNPs showing the most significant associations. Using Genevar, we observed that the C allele of rs3217992 was associated with increased expression of the interferon alpha 4 (IFNA4) gene (P=0.047). This association, however, was not below the threshold suggested by Genevar for significance (P<10 -3 ). GTEx did not show any signifcant eQTL for any of the SNPs associated with pancreatic cancer risk. RegulomeDB showed a score of 5, indicating the possible presence of a transcription factor binding motif or a DNase sensitivity peak for rs2811708, rs3217992 and rs1063192 and a score of 4 suggesting the presence of a transcription factor binding motif and a DNase sensitivity peak for rs3731211 and rs3731249. HaploReg did not suggest any relevant signals for the significant SNPs.

DISCUSSION
Several unrelated SNPs in the 9p21.3 region, that includes the CDNK2A, CDNK2B and the CDNK2B-AS1 genes, are associated with the development of cancer in various organs [19-21, 23, 24, 31-33] and with T2D [26][27][28]. In this study we have performed an in depth analysis of the region and found several promising associations between the common genetic variability and the disease onset. However, considering the correction for multiple testing only rs3217992 showed a statistically significant association with increased risk of developing PDAC.
This finding is interesting for two reasons. The first is that rs3217992 shows a plethora of associations with several human traits such as primary open-angle glaucoma [34], aggressive periodontitis, chronic periodontitis [35] and myocardial infarction [36]. These phenotypes are very different from each other, but all the studies, including the one presented here, have in common the fact that is always the A allele to be associated with the increased risk of the disease. This observation strongly suggests a pleiotropic role for rs3217992 and also highlights that the polymorphisms alters the function of the protein in a way that affects the related phenotypes. In recent years several pleiotropic SNPs have been identified to be associated with multiple human phenotypes or multiple cancer types. These pleiotropic stretches of DNA that are densely packed with risk alleles have been defined Nexus regions [37] and intensely studied in relation to cancer risk. PDAC has at least another such region in the TERT-CLPTM1L locus as widely demonstrated by several authors [6,9,38].
The second reason is that in addition to the strong statistical association and the putative pleiotropic effect, rs3217992 SNP lies in a miRNA target region (miR-138-2-3p). In a very recent study Ghanbari and collaborators showed, using cardiac cell lines, that the rs3217992 SNP might have an effect on the miRNA-mRNA interactions [39]. Specifically, the G allele increased the miRNA-dependent regulation of CDKN2B.  Given the role of CDKN2B in the cell cycle this result seems counterintuitive since the A allele is associated with increased risk in our study and in others as mentioned before. The 9p21.3 locus has, however, a very complex regulation, and it is possible that the binding between the G allele and miR-138-2-3p might be specific for the cardiac tissue. Moreover the 9p21.3 locus hosts several genes that are key cell cycle regulators and therefore the fact that rs3217992 changes the binding site of one or more miRNAs could influence the normal cell cycle progression and in turn increase PDAC risk. There are growing evidences that mir-SNPs could be involved in various pathologies including cancer and diabetes given their ability to affect gene regulation. However, the functional significance of this SNP needs thorough scrutiny in the pancreatic tissue to address its possible role in PDAC susceptibility. Another finding of potential significance is the association between the A allele of rs1063192 with decreased PDAC risk. This SNP also lies in the CDKN2B 3'UTR, is a putative mir-SNP and shows a weak linkage disequilibrium (LD) with rs3217992 (r 2 =0.474 in CEU and r 2 =0.170 CHBJPT, 1000 Genomes Project). Li and colleagues found this SNP to be associated with esophageal squamous cell carcinoma [25], while in a recent report [40] the A allele of rs1063192 was found to be associated with a decreased risk of developing gestational diabetes mellitus (GDM) suggesting a direct genetic link between diabetes and pancreatic cancer though the genetic variants analyzed in both studies.
The strongest association we observed in the Japanese population was between the A allele of rs1063192 and a decreased PDAC risk. This finding gives rise to two considerations: the first is that even though the association does not reach the threshold of significance considering Bonferroni's correction, it is unlikely to be a false finding giving the fact that it was found in two different populations. The second is that since in the Caucasian and in the Japanese the leading SNP is not the same it is possible that the real causal variant is yet untyped and mildly in LD with rs1063192 and rs3217992.
We also observed an association close to statistical significance (taking into account multiple testing) between the missense SNP rs3731249 and PDAC susceptibility. This variation is predicted by PolyPhen to be possibly damaging with a score of 0.487 and has been widely investigated in childhood acute lymphocytic leukemia [23,24,41], once again highlighting the pleiotropic nature of this genomic region. Through dbGaP we could perform a GWAS look-up for the results of our candidate SNPs. We found results only for Caucasians (PanScan2, [6]) and only for three of the significant SNPs rs2811708, rs3217992 and rs1063192. All the SNPs showed ORs that were in the same direction with our results but none showed a statistically significant association, the best was p=0.08 for rs1063192. In 2 out of 3 populations (PANDoRA Japanese and PanScan) tested the best association was observed for rs1063192 and in the last population (PANDoRA Caucasians) for rs3217992. The two SNPs are in weak LD. This suggests the hypothesis that another variant (possibly a low frequency/rare one) may be associated with both, that was not typed in either population and that has an LD pattern with rs1063192 and rs3217992 that varies in the different populations. The difference in the results between PANDoRA and PanScan may be explained by the fact that in PANDoRA the strongest statistical association was observed for the rare homozygous carriers compared with the common allele carriers suggesting a recessive model of inheritance, while in the PanScan data only the allelic model is shown. Additionally, the results from PanScan arise partly from a prospective cohort while ours are from a case-control study. Although the biological explanation of this phenomenon is difficult to understand, discrepancies between the two study designs have been observed several times in different neoplastic diseases [4,21,42,43].
One of the major strengths of this study is its size, since with a total of 8,968 subjects this is one the largest genetic analysis of pancreatic cancer risk published todate. Additionally, our selection of SNPs provides an extensive coverage of genetic variability in the regions of interest. In addition we could analyze simultaneously two ethnic groups, which helps to generalize the findings. A possible limitation of the study is that patients and controls in PANDoRA were recruited in various centers across Europe and therefore there might be some populationbased differences. Moreover we did not include rare variants and therefore we cannot exclude to have missed associations due to alleles with a MAF<0.05.
In conclusion we observed several novel associations in a pleiotropic region that has been found to be of key relevance in the susceptibility to various types of cancer and diabetes, confirming a key role for CDKN2A/B in pancreatic cancer and suggesting a possible involvement of the common genetic variability at this locus in PDAC risk.

Study population
In this study 2,857 pancreatic ductal adenocarcinoma (PDAC) patients and 6,111 controls were collected in nine countries (Germany, Czech Republic, Greece, Italy, Lithuania, Poland, Netherlands, UK and Japan) belonging to the PANDoRA consortium that has been described in detail elsewhere [44]. Briefly, cases were retrospectively collected and are defined by a confirmed diagnosis of PDAC. Controls were recruited in the same hospitals, or at least geographical region from where the cases were recruited. British and Dutch controls (N=176 and 102, respectively) were selected from healthy volunteers recruited from the general population in the www.impactjournals.com/oncotarget European Prospective Investigation on Cancer (EPIC), an ongoing prospective cohort being carried out in ten European countries [45]. The German controls used were partly blood donors from the blood transfusion center in Mannheim and partly healthy volunteers selected among EPIC subjects collected in Heidelberg. Patients provided written informed consent and the study was approved by Ethical Review Board of the University of Heidelberg (Medizinische Facultaet Heidelberg) the study was performed in accordance with Declaration of Helsinki.

SNP selection
To survey the common genetic variability in the CDKN2A/B locus we selected tagging SNPs using the Tagger tool of the Haploview software using the Caucasian population in the HapMap web site (International HapMap Project, version 28; http://www.hapmap.org) as reference (http://www.broad.mit.edu/mpg/haploview/; http://www. broad.mit.edu/mpg/tagger/). We considered a region centered on the CDKN2A gene region and we added 5k bp at 5' and 5k at 3' of the gene resulting in a window of around 40k base pair. We used a pairwise tagging method with a minimum r 2 of 0.8 and a minor frequency allele of 0.05 to select tagging SNPs inside the region. The first tagging SNP at the 5' end, rs3731257, is situated at 21956221 (Hg18) while the last tagging SNP at the 3' end, rs3217986, is situated at 21995330 (Hg18). In addition we added two functional SNPs (rs1063192 and rs3217992) that are putative miR-SNPs, i.e. predicted to alter the binding of one or more microRNAs to their target. The final selection resulted in 13 SNPs in the 9p21.3 region. We checked if the tagging selection used was valid also for the Asian ethnicities and we obtained that the tagging set for Chinese and Japanese is a subset of the one used for Caucasians.

Sample preparation and genotyping
For each sample DNA was extracted from whole blood or from paraffin-embedded pancreatic tissues of patients and controls using the AllPrep Isolation Kit (Qiagen, Hilden, Germany) or the Qiagen-mini kit (Qiagen, Hilden, Germany), according to the manufacturer's protocol. Genotyping was performed using TaqMan (ABI, Applied Biosystems, Foster City, CA, USA) and KASPar (KBioscence, Hoddesdon, UK) technologies. The order of DNA samples was randomized on plates in order to ensure that similar numbers of cases and controls were analyzed in each batch. Detection was performed using an ABI PRISM 7900 HT or Viia7 sequence detection system with SDS 2.2 or Viia7 software (Applied Biosystems, Foster City, CA, USA). Genotyping for British and Dutch controls was performed in the context of a genome-wide association study using the Human 660W-Quad BeadChip array according to manufacturer's instructions (Illumina, San Diego, CA, USA). For quality control, duplicates of 10% of the samples were interspersed throughout the plates. In addition, we discarded all the samples that had a call rate < 75%.

Statistical analysis
The observed genotype frequencies of all SNPs in the control subjects were tested for deviation from Hardy-Weinberg equilibrium (HWE) using Pearson's chi-square test. The association between the genotypes of all polymorphisms and PDAC risk was estimated using an unconditional logistic regression computing odds ratios (OR), 95% confidence intervals (95% CIs) and p values. The more common allele among the controls was assigned as the reference category and the co-dominant model inheritance model was assessed. All analyses were adjusted for age, gender and geographic origin (among the Europeans countries). European and Japanese individuals were kept separate in the analyses. All analyses were adjusted for multiple testing using the Bonferroni correction whereby a significance threshold of 0.0038 (0.05/13) was set.

Bioinformatic analysis
To assess the possible functional relevance for the SNPs showing the most significant associations with risk of developing PDAC several bioinformatic tools were used. RegulomeDB (http://regulome.stanford. edu/) [46] and HaploReg v2B [47] were used to identify the regulatory potential of the region nearby each SNP. Genevar (http://www.sanger.ac.uk/resources/software/ genevar/) [48] and GTEx (http://www.gtexportal.org/ home/) [49] were used to identify potential associations between the SNP and expression levels of nearby genes (eQTL).