Validation of loci at 2q14.2 and 15q21.3 as risk factors for testicular cancer

Testicular germ cell tumor (TGCT), the most common cancer in men aged 18 to 45 years, has a strong heritable basis. Genome-wide association studies (GWAS) have proposed single nucleotide polymorphisms (SNPs) at a number of loci influencing TGCT risk. To further evaluate the association of recently proposed risk SNPs with TGCT at 2q14.2, 3q26.2, 7q36.3, 10q26.13 and 15q21.3, we analyzed genotype data on 3,206 cases and 7,422 controls. Our analysis provides independent replication of the associations for risk SNPs at 2q14.2 (rs2713206 at P = 3.03 × 10−2; P-meta = 3.92 × 10–8; nearest gene, TFCP2L1) and rs12912292 at 15q21.3 (P = 7.96 × 10−11; P-meta = 1.55 × 10−19; nearest gene PRTG). Case-only analyses did not reveal specific associations with TGCT histology. TFCP2L1 joins the growing list of genes located within TGCT risk loci with biologically plausible roles in developmental transcriptional regulation, further highlighting the importance of this phenomenon in TGCT oncogenesis.


INTRODUCTION
Testicular germ cell tumor (TGCT) is the most common cancer in men aged between 18 and 45, with more than 52,000 men diagnosed annually worldwide [1].
Known risk factors include a family history of the disease, a previously diagnosed germ cell tumor, subfertility, undescended testis (UDT) [2] and testicular microlithiasis [3], characterized by intratesticular calcification. Histologically, TGCT can be divided into two main

Research Paper
Oncotarget 12631 www.impactjournals.com/oncotarget subtypes: seminomas, which resemble undifferentiated primordial germ cells and nonseminomas, which show varying degrees of differentiation toward embryonal and extraembryonal structures. Some tumors show features of both classes (mixed histology). Both subtypes are thought to arise from progenitor germ cells through a pre-invasive phase of intratubular germ cell neoplasia (ITGCN) [4].
The cancer has a strong heritable basis, which is reflected in an observed four to eight -fold familial relative risk [5][6][7][8] and, from heritability analyses utilizing familial data, it has been estimated that genetic factors contribute to nearly half of all disease risk [9]. Despite the sizable heritable component, high penetrance TGCT susceptibility variants accounting for a sizeable proportion of genetic susceptibility have not been identified. We recently described enrichment in familial TGCT using exome sequencing of rare disruptive mutations in genes relating to ciliary and microtubule functions; however, these variants account for only a minor fraction of disease heritability [10]. In contrast, interrogation via genomewide association studies (GWAS) for common variants of more modest effect size has proved vastly more fruitful, with a total of 50 independent risk loci proposed to date [11][12][13][14][15][16][17][18][19][20][21][22].
The two most recent, contemporaneously published TGCT GWAS reported a total of 26 novel TGCT susceptibility loci, more than doubling, from 24 to 49, the number of regions identified by preceding efforts [21,22]. In Litchfield et al., (2017) [21], we performed a new GWAS in UK TGCT cases using the OncoArray platform (UK OncoArray study, 3,206 cases, 7,422 controls). These data were combined in meta-analysis with two previously published GWAS datasets from the UK and Scandinavia (2,313 cases, 11,633 controls), followed by replication genotyping performed for the most strongly associated loci (1, In the contemporaneously published study reported by Wang et al., (2017) [22], under the auspices of the TEsticular CAncer Consortium (TECAC), meta-analysis of data from five TGCT GWAS datasets (totaling 3,558 TGCT cases and 13,970 controls and inclusive of the UK/Scandinavian datasets used in Litchfield et al. 2017) identified associations at eight loci (2q14.2, 3q26.2, 4q35.2, 7q36.3, 10q26.13, 15q21.3, 15q22.31 and Xq28), two of which were also identified in the UK OncoArray study (4q35.2 and 15q22.31). Additional associations were also reported in the TECAC meta-analysis at previously established TGCT loci at 9p24. 3 and 19p12. In the present study, we sought independent evidence of replication at the five novel autosomal loci unique to the TECAC meta-analysis (2q14.2, 3q26.2, 7q36.3, 10q26.13, 15q21.3) using the UK OncoArray GWAS data.

RESULTS
The UK OncoArray GWAS includes data from 10,628 UK individuals, comprising 3,206 TGCT cases and 7,422 controls. The final number of SNPs passing quality control filters was 371, 504, which were used to impute genotypes at over 10 million SNPs. We looked for evidence of association for the five index SNPs reported in the TECAC meta-analysis using a frequentist approach under an additive model. We also performed meta-analysis combining data from the UK OncoArray GWAS and the TECAC meta-analysis using a fixed-effects model.
The strongest evidence for an association amongst the five loci in the UK OncoArray GWAS dataset was at 15q21.3 (Table 1; Figure 1). The reported index SNP from the TECAC meta-analysis, rs12912292, showed a highly significant association in the UK OncoArray dataset (P = 7.96 × 10 −11 ), as did its most strongly linked directly genotyped tagging SNP (rs12899976, r 2 > 1.0, D′ > 1.0, P = 2.34 × 10 −11 ). Notably, SNPs in this region did show evidence of association in the meta-analysis undertaken in Litchfield et al., including rs12912292. However, due to poor phet and I 2 values associated with rs12912292, an alternative SNP (rs7175728) had been chosen for replication genotyping in 1,801 cases and 4027 controls, which failed to replicate (P = 0.97, OR = 0.9986). The reported index SNP at 2q14.2, rs2713206, was not significant in the UK OncoArray dataset after correcting for multiple testing (i.e. five tests), though it was significant at a nominal threshold (P = 3.03 × 10 −2 ; Table 1; Figure 1). Of note, a nearby directly genotyped tagging SNP in strong LD with rs2713206 (rs2713207; r 2 > 0.7, D′ > 0.9) showed a stronger level of association (P = 9.44 × 10 −3 ). For the reported index SNP at 2q14.2, the point estimate for the effect size was smaller in the UK OncoArray dataset than reported in the TECAC meta-analysis, likely a reflection of "winner's curse". For the reported index SNP at both loci, genome-wide significance (P < 5 × 10 −8 ) was achieved in meta-analysis of the UK OncoArray data with the constituent TECAC datasets ( Table 2).
Analysis of the UK OncoArray data did not find any evidence of association with TGCT risk for the loci at 3q26.2, 7q36.3 and 10q26.13 and did not achieve genomewide significance when combined with the TECAC data at meta-analysis (Table 1, Table 2). Directly genotyped tagging SNPs at these three regions did not show any evidence of association with TGCT risk.
Finally, we investigated whether the two SNPs showing evidence of association in the current study (rs12912292 and rs2713206) showed differences in risk allele frequency in phenotypically-defined subgroups of TGCT cases ( Table 3). Neither of the two SNPs showed www.impactjournals.com/oncotarget Oncotarget 12633 www.impactjournals.com/oncotarget a significant difference in frequency between cases with seminoma (n = 1,120) compared to nonseminoma/mixed histology (n = 643), cases with testicular maldescent (n = 308) compared to those with normal descent (n = 2,837), cases with a family history of TGCT (n = 53) compared to those without (n = 3,122) or cases with unilateral (n = 3,028) compared to bilateral (n = 78) disease.

DISCUSSION
In summary, we present independent evidence supporting associations between loci at 2q14.2 and 15q21.3 and susceptibility to TGCT. rs2713206 at 2q14.2 localizes to the intron of TFCP2L1 in an LD block of ~50 kb. TFCP2L1, a member of the CP2 family of transcription factors, is a component of a complex transcriptional network involved in the establishment and maintenance of pluripotency in embryonic stem cells. TFCP2L1 is highly expressed in primordial germ cells during embryogenesis [23] and is downregulated during transition of fetal gonocytes into spermatogonia [24]. TFCP2L1 is not expressed in normal adult testes, though it is in intratubular germ cell neoplasia unclassified (ITGCN, formerly known as carcinoma in situ, CIS) [24], a non-invasive precursor lesion from which TGCT is widely accepted to originate. As previously reported, there are eQTL variants in LD with the index SNP. The TGCT risk allele is associated with reduced TCFP2L1 expression, supporting transcriptional regulation of this gene as the functional mechanism through which the association may be mediated [22]. rs12912292 at 15q21.3 resides in a 130 kb region of LD that only contains PRTG (protogenin), which encodes an immunoglobulin superfamily transmembrane protein expressed in the developing nervous system [25]. rs12912292 displays strong eQTL effects for PRTG in muscle-skeletal (GTEx data, P = 1.9e-13) and thyroid (P = 5.1e-12) tissues; there is, however, no evidence for association of rs12912292 with expression of PRTG in either normal testes or TGCT [22].
Our data did not provide evidence supporting association with TGCT risk for SNPs at three of the loci analyzed (rs3755605 at 3q26.2, rs11769858 at 7q36.3 and rs61408740 at 10q26.13). The absence of demonstrable association for these loci in the UK OncoArray dataset could be due to power, sampling error and winner's curse. Variable LD between genotyped marker and causal variants and/or their frequency, hidden population substructures not accounted for by principal component analysis, differences in effect modifiers and technical artefacts induced by the use of different genotyping platforms or quality-control criteria may also be contributory. More detailed and/or larger studies are required to further explore the observed differences.
We investigated whether the loci at 2q14.2 and 15q21.3 are associated with different risks in subgroups of TGCT cases characterized by specific phenotypic characteristics. Neither locus showed a significant difference in effect for the subtypes analyzed. For bilaterality, maldescent and family history, there was very limited power to detect a difference because of the small numbers examined. However, analysis of seminoma compared to nonseminoma was better powered, with > 95% power to detect effect difference of ≥ 1.5 fold; absence of difference of effects is consistent with observations for SNPs identified in earlier GWAS [11,13] suggesting that, despite their distinct histological and biological features, these two subclasses of TGCT share a common biological pathway of oncogenesis.
TFCP2L1 joins the growing list of candidate genes within TGCT risk loci linked to developmental transcriptional regulation, a key disease mechanism   [21]. Further functional evaluation is required to explore the cellular mechanisms through which the associations are mediated. The set of 50 GWAS loci identified to date are more strongly predictive of disease than the SNP sets for cancer types such as breast, colorectal and prostate cancer despite much larger GWAS in these cancer types having identified much greater numbers of SNPs: those in the highest centile for risk estimated from the TGCT SNP set have a relative risk of > 14 compared to the population risk [21,26]. The continued success of GWAS in TGCT provides a strong rationale for continuing studies to identify additional risk loci via these methods.

GWAS
Genotyping was conducted using a custom Infinium OncoArray-500K BeadChip (OncoArray) from Illumina (Illumina, San Diego, CA, USA), comprising a 250K SNP genome-wide backbone and 250K SNP custom content selected across multiple consortia within COGS (Collaborative Oncological Gene-environment Study). OncoArray genotyping was conducted in accordance with the manufacturer's recommendations by the Edinburgh Clinical Research Facility, Wellcome Trust CRF, Western General Hospital, Edinburgh EH4 2XU.
OncoArray data was filtered as follows: we excluded individuals with low call rate (< 95%), with abnormal autosomal heterozygosity (> 3 SD above the mean) or with > 10% non-European ancestry (based on multi-dimensional scaling); we excluded SNPs with minor allele frequency < 1%, a call rate of < 95% in cases or