Three new pancreatic cancer susceptibility signals identified on chromosomes 1q32.1, 5p15.33 and 8q24.21

Genome-wide association studies (GWAS) have identified common pancreatic cancer susceptibility variants at 13 chromosomal loci in individuals of European descent. To identify new susceptibility variants, we performed imputation based on 1000 Genomes (1000G) Project data and association analysis using 5,107 case and 8,845 control subjects from 27 cohort and case-control studies that participated in the PanScan I-III GWAS. This analysis, in combination with a two-staged replication in an additional 6,076 case and 7,555 control subjects from the PANcreatic Disease ReseArch (PANDoRA) and Pancreatic Cancer Case-Control (PanC4) Consortia uncovered 3 new pancreatic cancer risk signals marked by single nucleotide polymorphisms (SNPs) rs2816938 at chromosome 1q32.1 (per allele odds ratio (OR) = 1.20, P = 4.88×10−15), rs10094872 at 8q24.21 (OR = 1.15, P = 3.22×10−9) and rs35226131 at 5p15.33 (OR = 0.71, P = 1.70×10−8). These SNPs represent independent risk variants at previously identified pancreatic cancer risk loci on chr1q32.1 (NR5A2), chr8q24.21 (MYC) and chr5p15.33 (CLPTM1L-TERT) as per analyses conditioned on previously reported susceptibility variants. We assessed expression of candidate genes at the three risk loci in histologically normal (n = 10) and tumor (n = 8) derived pancreatic tissue samples and observed a marked reduction of NR5A2 expression (chr1q32.1) in the tumors (fold change -7.6, P = 5.7×10−8). This finding was validated in a second set of paired (n = 20) histologically normal and tumor derived pancreatic tissue samples (average fold change for three NR5A2 isoforms -31.3 to -95.7, P = 7.5×10−4-2.0×10−3). Our study has identified new susceptibility variants independently conferring pancreatic cancer risk that merit functional follow-up to identify target genes and explain the underlying biology.


INTRODUCTION
Although relatively rare, pancreatic tumors are highly lethal. Over 80% of patients present with advanced disease at the time of diagnosis and the five year survival is only 7% [1,2]. This disease is currently the third leading cause of cancer deaths in the United States, sixth in Europe and seventh worldwide [3][4][5]. In contrast to most other cancers, mortality rates for pancreatic cancer are not improving [6,7]. In the U.S., it is predicted to become the second leading cause of cancer-related deaths by 2030 [7]. Pancreatic cancer risk has been associated with smoking, obesity, diabetes and pancreatitis [8]. A small fraction of the familial aggregation of pancreatic cancer can be accounted for by rare, moderately or highly penetrant mutations [9]. Furthermore, genome-wide association studies (GWAS) have identified common variants at 13 loci associated with risk of pancreatic cancer in European populations and at 5 loci in Asian populations (at the GWAS threshold of P < 5.0x10 -8 ), or a total of 18 loci [10][11][12][13][14][15].
Imputation has proven to be a powerful tool in genome-wide association studies (GWAS) by facilitating investigation of variants not directly assessed on genotyping arrays, the merging of GWAS datasets genotyped on different arrays, and fine-mapping of ABSTRACT Genome-wide association studies (GWAS) have identified common pancreatic cancer susceptibility variants at 13 chromosomal loci in individuals of European descent. To identify new susceptibility variants, we performed imputation based on 1000 Genomes (1000G) Project data and association analysis using 5,107 case and 8,845 control subjects from 27 cohort and case-control studies that participated in the PanScan I-III GWAS. This analysis, in combination with a two-staged replication in an additional 6,076 case and 7,555 control subjects from the PANcreatic Disease ReseArch (PANDoRA) and Pancreatic Cancer Case-Control (PanC4) Consortia uncovered 3 new pancreatic cancer risk signals marked by single nucleotide polymorphisms (SNPs) rs2816938 at chromosome 1q32.1 (per allele odds ratio (OR) = 1.20, P = 4.88x10 -15 ), rs10094872 at 8q24.21 (OR = 1.15, P = 3.22x10 -9 ) and rs35226131 at 5p15.33 (OR = 0.71, P = 1.70x10 -8 ). These SNPs represent independent risk variants at previously identified pancreatic cancer risk loci on chr1q32.1 (NR5A2), chr8q24.21 (MYC) and chr5p15.33 (CLPTM1L-TERT) as per analyses conditioned on previously reported susceptibility variants. We assessed expression of candidate genes at the three risk loci in histologically normal (n = 10) and tumor (n = 8) derived pancreatic tissue samples and observed a marked reduction of NR5A2 expression (chr1q32.1) in the tumors (fold change -7.6, P = 5.7x10 -8 ). This finding was validated in a second set of paired (n = 20) histologically normal and tumor derived pancreatic tissue samples (average fold change for three NR5A2 isoforms -31.3 to -95.7, P = 7.5x10 -4 -2.0x10 -3 ). Our study has identified new susceptibility variants independently conferring pancreatic cancer risk that merit functional follow-up to identify target genes and explain the underlying biology. www.impactjournals.com/oncotarget risk loci [16]. To discover additional pancreatic cancer susceptibility loci for individuals of European ancestry, we imputed three GWAS datasets including a total of 5,107 cases and 8,845 controls (PanScan I-III, Stage I) [12]. For replication of promising signals, we first genotyped an additional 1,912 cases and 3,763 controls (PANDoRA; Replication 1), and then further assessed promising signals in a second set of 4,164 cases and 3,792 controls (PanC4; Replication 2). We identified three new susceptibility signals that achieved genome-wide significance for pancreatic cancer risk.

RESULTS
We conducted imputation of three published pancreatic cancer GWAS datasets performed in individuals of European ancestry, PanScan I, II and III [10][11][12] using the 1000G (Phase 1, version 3) reference dataset [17]. We included 9,132,527 genotyped or imputed SNPs with an imputation information (INFO) score >0.5 and minor allele frequency (MAF) >0.01, and performed a fixed effects meta-analysis to combine association results for a total of 5,107 pancreatic cancer cases and 8,845 control subjects [10][11][12]. Little evidence of systematic inflation due to population stratification was observed (λ = 1.02 for PanScan I+II and λ = 1.07 for PanScan III). We attempted replication of promising findings in two stages. In the first replication stage, we genotyped 15 promising variants in 1,912 pancreatic cancer cases and 3,763 control subjects from the PANcreatic Disease ReseArch (PANDoRA) consortium, a case-control consortium including studies from eight European countries [18]. In the second replication stage, we assessed the three most significant variants based on the meta-nanalyses results for PanScan I+II, PanScan III and PANDoRA using 4,164 pancreatic cancer cases and 3,792 controls from the Pancreatic Cancer Case-Control Consortium (PanC4), including studies from the U.S., Canada, Europe and Australia [15]. In total, the discovery and replication stages included 11,183 cases and 16,400 controls (Supplementary Table  1).

Bioinformatic analysis of susceptibility alleles and differential expression analysis
In order to take the first steps towards understanding the functional ramifications at these three new susceptibility signals, we conducted in silico bioinformatic analyses using HaploReg and RegulomeDB [25,26]. Supporting evidence for putative regulatory function on gene expression was seen for the three loci, particularly for chr1q32.1 and 5p15. 33, with open chromatin, modified histones and transcription factor binding in multiple tissues, including those derived from the pancreas and other gastrointestinal tissues (Supplemental Table 3). At chr5p15.33, one of the four variants highly correlated with rs35226131 is a missense variant in the second exon of TERT (rs61748181: r 2 = 1, D' = 1 in 1000G EUR) whereby the minor allele, associated with reduced risk of pancreatic cancer, changes amino acid 279 from alanine to threonine (A279T).

Technical validation of imputed SNPs
To assess imputation quality, we performed TaqMan genotyping in 678 samples from PanScan I and III (see Materials and Methods). The correlation (r 2 ) between the imputed genotypes and those measured by TaqMan was 0.98 for rs2816938 (1q32.1), 0.90 for rs10094872 (8q24.21) and 0.37 for rs35226131 (5p15.33). Due to the lower correlation between imputed and directly assayed genotypes for rs35226131, we performed a second validation in an additional 875 samples, including both rs35226131 as well as the perfectly correlated coding SNP on 5p15.33 mentioned above, rs61748181. The imputed-genotyped r 2 for rs35226131 improved to 0.44 in the second validation set, and was 0.55 for rs61748181. Genotype concordance for the most likely imputed genotypes and directly assayed genotypes (see Materials and Methods) for rs35226131 improved from 86.4% in the first validation set to 94.4% in the second set, and was  (11,143/16,308). Text in bold indicates the combined meta-analysis results. NA: Note that the TaqMan assay for rs10094872 on chr8q24.21 failed manufacturing and was therefore not attempted in the PANDoRA samples.

DISCUSSION
In this study, we performed imputation across three pancreatic cancer GWAS datasets, namely PanScan I, II and III [10][11][12], using 1000G reference data. Through replication of promising variants in individuals from two independent pancreatic cancer case-control consortia, PANDoRA and PanC4, we identified three new GWAS significant risk signals for pancreatic cancer. They are independent signals in previously established pancreatic cancer risk loci on chromosomes 1q32.1, 5p15.33 and 8q24.21, as per conditional analysis, supporting their importance for pancreatic cancer risk.
The signal on 1q32.1 is located in NR5A2, a gene that encodes nuclear receptor subfamily 5 group A member 2 (NR5A2), a transcription factor important for pancreatic development and adult function in the pancreas, liver, intestine and ovary, where it regulates cholesterol synthesis, bile acid homeostasis and steroidogenesis [19,20]. NR5A2 is an important regulator of exocrine function in the adult pancreas where it maintains homeostasis and promotes regeneration of acinar cells after inflammation caused by chemically induced pancreatitis, and protects the pancreas from KRAS driven pre-neoplastic changes [29][30][31]. Other studies have indicated a growth inducing role for NR5A2 in pancreatic cancer [32,33]. Highly correlated variants (r2>0.7) span ~25 kb on chr1q32.1 from ~11 kb upstream of the TSS to within the second intron of the gene. We observed significantly lower mRNA expression of NR5A2 in the majority of pancreatic tumors and cell lines tested compared with histologically normal pancreatic tissue samples, indicating a possible role for reduced NR5A2 expression in pancreatic cancer. Although an expression QTL was not observed in GTEx data, the relationship between the two currently known pancreatic cancer risk loci on 1q32.1 and NR5A2 expression remains to be studied in greater detail.
The tag SNP on 8q24.21 is located ~28 kb upstream of MYC at an established bladder cancer risk locus [21][22][23] that is ~850 kb upstream of our previously reported pancreatic cancer susceptibility locus [12]. Multiple independent susceptibility loci on 8q24.21, distributed over a 2 Mb region, are known to influence risk of bladder, breast, prostate, colorectal, lung, ovarian, pancreatic, renal cancer, glioma and chronic lymphocytic leukemia (CLL) [34][35][36][37][38]. Deregulated expression of MYC, a transcription factor that regulates multiple aspects of cell growth and proliferation, occurs in a broad range of human tumors [39]. Although the proximity of rs10094872 to MYC indicates that it may be the most likely target gene, 8q24.21 is known for long range chromosomal interactions, and additional candidate genes, including PVT1 (183 kb), POU5F1B (290 kb), CCAT2 (305 kb) and MIR1205-MIR1208 (253-442 kb), could be involved [40][41][42][43][44]. Several of the 8q24.21 risk loci interact with the MYC and PVT1 promoters through long range chromosomal interaction, and allele-specific effects on gene expression have been reported for both genes [42,45]. An expression QTL for MYC has been described for the bladder cancer NR5A2 isoform 2 and C. NR5A2 isoform 3. Note that no expression was seen for isoform 1 in either the normal or tumor derived sample from Subject 3. Error bars represent standard deviation from four replicates. www.impactjournals.com/oncotarget risk locus in histologically normal bladder samples from Chinese subjects, albeit from a very small set [46], but not in adipose or blood tissue samples from European subjects [22]. We noted an eQTL for rs10094872 and PVT1 expression in pancreatic tissue samples in GTEx, indicating that PVT1 may be a target gene for this locus. Replication of these findings is required in independent sample sets. PVT1 encodes a long noncoding RNA that is often amplified and upregulated along with MYC across multiple cancers. Recently, it has been shown to increase MYC protein levels and potentiate its activity [47]. In pancreatic cancer, PVT1 expression is associated with gemcitabine sensitivity in human pancreatic cancer cells and may be associated with poor prognosis [47][48][49].
The signal on chr5p15.33 lies in another multicancer susceptibility region reported by GWAS for bladder cancer, breast cancer, chronic lymphocytic leukemia, glioma, lung cancer, melanoma, non-melanoma skin cancer, ovarian cancer, pancreatic cancer, prostate cancer and testicular germ cell cancer [11,12,23,37,[50][51][52][53][54][55][56][57][58][59][60]. For the 6 independent susceptibility loci that have been identified in the TERT-CLPTM1L gene region, the same alleles are associated with an increased risk for some cancers but decreased risk of others [24,60,61]. Two independent pancreatic cancer susceptibility loci have previously been identified on chr5p15.33 through GWAS [11,12,24]. The first one, described in PanScan II [11] was marked by an intronic SNP (rs401681) in CLPTM1L that has since been fine-mapped to rs451360 (and a set of highly correlated variants including rs36115365) [24]. A second independent signal on 5p15.33 was identified in PanScan III, tagged by a synonymous SNP (rs2736098) in the second exon of TERT [12]. Recently, a third risk locus, marked by rs2853677, was identified in this genomic region through a candidate gene analysis of the TERT and TERC genes [62]; however this variant did not attain GWAS significance in our study (PanScan I-III, P = 4.2x10 -4 ). The TERT gene encodes the catalytic subunit of telomerase, known for its critical role in maintaining telomere ends and the increased telomerase activity frequently seen in human cancers [63][64][65]. Telomereindependent functions for TERT include regulation of gene expression, cell survival, epithelial to mesenchymal transition (EMT) and mitochondrial function [66]. The neighboring gene encodes cleft lip and palate associated transmembrane 1 like (CLPTM1L) protein that promotes growth and survival in pancreatic and lung cancer, respectively, and is overexpressed in some cancers [67][68][69]. The SNP (rs35226131) that marks the new risk signal on 5p15.33 reported here, and highly correlated variants, are located in the TERT promoter (~200-500 bp upstream of the TSS) and could potentially influence its expression. Additionally, it is perfectly correlated with a nonsynonomous variant in TERT (rs61748181, A279T) that was recently reported as a novel lung adenocarcinoma risk locus by deep sequencing and direct genotyping of 5,164 cases and 5,716 controls of European ancestry [70]. The threonine substitution at this amino acid in TERT negatively influences telomere length and proliferation in esophageal cancer cell lines compared with alanine, and leads to reduced Wnt signaling through destabilization of complexes containing TERT, transcription activator BRG-1 and β-catenin [71]. As the TERT-279T variant is protective for pancreatic cancer in our study, and for lung cancer [70], the underlying mechanism at this locus may relate to increased TERT activity via canonical and/or noncanonical TERT pathways. This hypothesis needs to be formally investigated by future molecular studies.
In conclusion, through imputation of three existing GWAS datasets and replication in two independent casecontrol consortia, we identified three new susceptibility signals for pancreatic cancer in populations of European ancestry. They are located in genomic regions previously reported by GWAS of pancreatic cancer, further supporting their importance for pancreatic cancer risk. Further work is required to identify target genes and explain the underlying biological mechanisms.

Study participants
Participants were drawn from the Pancreatic Cancer Cohort Consortium and the Pancreatic Cancer Case-Control Consortium (PanC4) and include individuals from 17 cohort and 11 case-control studies genotyped in three previous GWAS phases, namely PanScan I, PanScan II and PanScan III [10][11][12]. Two replication cohorts were included, the PANDoRA consortium [18] (Replication I) and the Pancreatic Cancer Case-Control Consortium (PanC4) [15] (Replication 2). Cases were defined as individuals diagnosed with adenocarcinoma of the pancreas.
Each study obtained informed consent from study participants and approval from its Institutional Review Board (IRB) including IRB certification permitting data sharing in accordance with the NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS). The PanScan and PanC4 GWAS data are available through dbGAP (accession numbers phs000206.v5.p3 and phs000648. v1.p1, respectively).

Genotyping, imputation and association analysis
GWAS genotyping was performed at the Cancer Genomics Research Laboratory (CGR) of the National Cancer Institute (NCI) of the National Institutes of Health (NIH) using the Illumina HumanHap series arrays (Illumina HumanHap550 Infinium II, Human 610-Quad) www.impactjournals.com/oncotarget for PanScan I-II, and the Illumina Omni series arrays (OmniExpress, Omni1M, Omni2.5 and Omni5M) for PanScan III [10][11][12]. The 1000 Genomes (1000G) Phase 1, Release 3 [17] reference dataset was used to impute the PanScan I-III GWAS datasets using IMPUTE2 [72] as previously described [12,24]. Due to the large overlap of variants on genotyping arrays for PanScan I and II, these datasets were imputed and analyzed together. The PanScan III data was imputed and analyzed separately. For quality control, variants were excluded based on: 1) completion rate < 90%; 2) MAF < 0.01; 3) Hardy-Weinberg Proportion P value < 1x10 -6 ; 4) low quality imputation score (IMPUTE 2 INFO score < 0.5). After quality control, 9,132,527 SNPs in 5,107 pancreatic cancer cases and 8,845 controls of European ancestry were included in the analysis. The association analysis was performed using SNPTEST [73] based on probabilistic genotypes from IMPUTE2 [72] using the same adjustments for study, geographical region, age, sex and population substructure as were used in PanScan [10][11][12]. The score test of the log additive genetic effect was used. A meta-analysis of data from PanScan I & II with PanScan III was performed using the fixed-effects inverse-variance method based on β estimates and standard errors. Heterogeneity was not observed for the SNPs identified as GWAS significant or suggestive in the combined study (P heterogeneity ≥0. 30) The estimated inflation of the test statistic, λ, was 1.02 for PanScan I+II and 1.07 for PanScan III, respectively (using variants with MAF>0.01 and INFO>0.5) [74].

Replication
Fifteen variants giving promising signals (P < 5.0x10 -6 ) were selected for replication in the PANDoRA consortium (Replication 1) [18]. Genotyping was performed by custom TaqMan genotyping assays (Applied Biosystems) at the German Cancer Research Center (DKFZ) in Heidelberg, Germany in 3,343 pancreatic cancer cases and 4,998 controls, of which 2,820 cases and 3,909 controls had complete demographic and clinical data and did not overlap with other study samples. Duplicate quality control samples (n = 541 pairs) showed 99.67% genotype concordance. Samples on a few plates were not genotyped for all variants. Unfortunately these plates contained more cases than controls. We excluded 908 cases and 146 controls, either with low genotyping completion rate ( < 80%) or not genotyped, resulting in a total of 1,912 cases and 3,763 controls in the final analyses. The association analysis for PANDoRA was adjusted for age, gender and study in the same manner as previously described [12].
Three variants from the meta-analysis of PanScan and PANDoRA were then selected for a second replication in the Pancreatic Cancer Case-Control Consortium (PanC4) [15] (Replication 2). Genotyping for PanC4 had previously been performed at the Johns Hopkins Center for Inherited Disease Research (CIDR) using the IlluminaHumanOmniExpressExome-8v1 array followed by imputation using 100G Phase 3, version 1 [75] and IMPUTE2. Association analysis was performed in 4,164 pancreatic cancer cases and 3,792 control subjects of European ancestry as previously described [15]. Variants at 3 chromosomal locations were extracted from the results and meta-analyses conducted as described above. Heterogeneity between studies was assessed using the Cochran's Q-test. IMPUTE2 information scores were 0.78 (rs2816938), 0.96 (rs10094872) and 0.87 (rs35226131) for the three reported variants.

Validation of imputation accuracy
Imputation accuracy was assessed by direct TaqMan genotyping or Sanger sequencing. TaqMan genotyping assays (ABI, Foster City, CA) were optimized for three SNPs (rs2816938 on 1q32.1, rs35226131 on 5p15.33 and rs10094872 on 8q24.21) in the independent regions. In an analysis of 678 samples from PanScan I and III [10,12,77], the allelic r 2 measured between imputed and assayed genotypes [78] were 0.98, 0.37 and 0.90, respectively. A second validation in an additional 875 samples from PanScan I included two perfectly correlated SNPs on 5p15.33, rs35226131 and rs61748181; the allelic r 2 in this set was 0.44 and 0.55, respectively. We also assessed concordance between the most likely imputed genotypes and directly measured genotypes as follows: samples with imputed allelic dosage ranging from 0-0.5 were designated as being of the homozygous common genotype; samples ranging from 0.51-1.5 as being of the heterozygous genotype and samples ranging from 1.51-2.0 as being of the rare homozygous genotype.

Analysis of gene expression
Gene expression was assessed for five genes that are closest to the reported variants on chromosomes 1q32.1 (NR5A2), 5p15.33 (TERT and CLPTM1L), and 8q24.21 (MYC and PVT1). We first assessed differential expression of these genes in pancreatic tumor samples (PDAC, n = 8), histologically normal (non-malignant) pancreatic tissue samples (n = 10), and pancreatic cell lines (n = 9) by RNA-sequencing as described previously [28]. We compared gene expression in tumors and cell lines to histologically normal pancreatic tissue samples by EdgeR analysis [28]. P-values represent an exact test of the differential expression of each gene in histologically normal and tumor derived samples using normalized read counts in EdgeR.
We also assessed the expression of one of these genes (NR5A2) in a second independent tissue sample set that included 20 fresh frozen paired histologically normal pancreatic samples (adjacent to tumor) and pancreatic ductal adenocarcinoma (PDAC) tumor samples. RNA samples were isolated from fresh frozen tissues and reverse transcribed to cDNA as previously described [28]. Three NR5A2 isoforms (isoform 1: NM_205860, isoform 2: NM_003822, and isoform 3: uc009wzh.3) were tested using TaqMan gene expression assays (Thermo Fisher Scientific, isoform 1: Hs00894632_m1, isoform2: Hs00892375_m1 and a custom assay, for isoform 3 forward primer: 5'CTTTTCGCCGGAGTTGAAT3'; reverse primer: 5'GTCCGGAAGCCCAGCA3'; probe: 5' CTGTGCTGCCCGTGTCC3') and a 7900HT system (ABI). Each reaction was run in triplicate and analyzed according to the ΔΔCt method using B2M (Hs99999907_ m1), GAPDH (Hs99999905_m1) and PPIA (Hs99999904_ m1) as housekeeping genes. P-values represent twosided T-tests of the difference in expression between histologically normal and tumor derived samples.
All tissue samples were obtained from the Mayo Clinic in Rochester, MN. The project was approved by the Institutional Review Boards of the Mayo Clinic and the NIH. Nine pancreatic cancer cell lines (AsPC-1, BxPC-3, Hs766T, SU.86.86, SW1990, CFPAC-1, Capan-1, PANC-1, MIA PaCa-2) were purchased from ATCC and cultured as recommended (http://www.ATCC.com). The cell lines were tested for authentication with a panel of short tandem repeats (STR) using the Identifiler kit (Life Technologies) and compared with the ATCC and the DSMZ (German Collection of Microorganisms and Cell Cultures) STR Profile Databases. All cell lines matched the listed profiles.