Pooled analysis of genome-wide association studies of cervical intraepithelial neoplasia 3 (CIN3) identifies a new susceptibility locus

Recent genome-wide association studies (GWASs) in subjects of European descent have identified associations between cervical cancer risk and three independent loci as well as multiple classical human leukocyte antigen (HLA) alleles at 6p21.3. To search for novel loci associated with development of cervical cancer, we performed a pooled analysis of data from two GWASs by imputing over 10 million genetic variants and 424 classical HLA alleles, for 1,553 intraepithelial neoplasia 3 (CIN3), 81 cervical cancer and 4,442 controls from the Swedish population. Notable findings were validated in an independent study of 961 patients (827 with CIN3 and 123 with cervical cancer) and 1,725 controls. Our data provided increased support for previously identified loci at 6p21.3 (rs9271898, P = 1.2 × 10−24; rs2516448, 1.1 × 10−15; and rs3130196, 2.3 × 10−9, respectively) and also confirmed associations with reported classical HLA alleles including HLA-B*07:02, -B*15:01, -DRB1*13:01, -DRB1*15:01, -DQA1*01:03, -DQB1*06:03 and -DQB1*06:02. In addition, we identified and subsequently replicated an independent signal at rs73730372 at 6p21.3 (odds ratio = 0.60, 95% confidence interval = 0.54–0.67, P = 3.0 × 10−19), which was found to be an expression quantitative trait locus (eQTL) of both HLA-DQA1 and HLA-DQB1. This is one of the strongest common genetic protective variants identified so far for CIN3. We also found HLA-C*07:02 to be associated with risk of CIN3. The present study provides new insights into pathogenesis of CIN3.

In addition, we identified and subsequently replicated an independent signal at rs73730372 at 6p21.3 (odds ratio = 0.60, 95% confidence interval = 0.54-0.67, P = 3.0 × 10 -19 ), which was found to be an expression quantitative trait locus (eQTL) of both HLA-DQA1 and HLA-DQB1. This is one of the strongest common genetic protective variants identified so far for CIN3. We also found HLA-C*07:02 to be associated with risk of CIN3. The present study provides new insights into pathogenesis of CIN3.

IntroductIon
Worldwide, cervical cancer is the fourth most common cancer in women [1]. Persistent infection with carcinogenic human papillomavirus (HPV) strains is the main cause of cervical cancer and its precursor lesions, cervical intraepithelial neoplasia (CIN) [2]. More advanced lesions, designated CIN3, are considered to be the same as carcinoma in situ (CIS) or Stage 0 cervical cancer. Although both population screening and a prophylactic vaccine are available, cervical cancer continues to be a major threat to female health globally.

Research Paper
Oncotarget 42217 www.impactjournals.com/oncotarget Therefore, further understanding of the contribution of genetic susceptibility to persistent HPV infection as well as tumorigenesis is important for preventing the disease.
The statistical power of individual GWAS has been limited by the modest effect size of genetic variants and the number of variants that could be studied. To identify additional cervical precancer susceptibility loci, we expanded the present GWAS by pooling its data with the data from another GWAS, together providing data on 1,553 cases with CIN3, 81 cases with cervical cancer and 4,442 controls, all from the Swedish population. To bring genotype data obtained from different single-nucleotide polymorphism (SNP) arrays into a common framework and provide information on ungenotyped genetic variants, we imputed >10 million genetic variants and 424 classical HLA alleles, using the 1000 Genome Project data and the Type 1 Diabetes Genetics Consortium (T1DGC) panel as reference. We then conducted a comprehensive examination of the association between CIN3 risk and common variants, rare variants as well as classical HLA alleles. Notable findings were validated in an independent study of 961 cases (827 with CIN3 and 123 with cervical cancer) and 1,725 controls.

Genome-wide association results
After stringent quality control (QC), data from 1,634 cases (1,553 CIN3 and 81 cervical cancer) and 4,442 controls were available for 5,471,179 SNPs with an overall call rate of 99.32%. The pooled analysis showed modest evidence of over-dispersion (λ = 1.08). Genetic variants mapping to the previously identified susceptiblity loci at 6p21.3 provided the best evidence for an association with cervical precancer (Supplementary Figure S1). A total of 1,576 SNPs showed evidence for an association with CIN3 risk at P = 5 × 10 -7 , 1,575 of which are located within the MHC region at 6p21.3 and one of which is located at 18q12.3. After excluding the genetic variants in the extended MHC region where extensive LD extends over long distances, λ is only 1.02 (Supplementary Figure S1). The strongest association was attained for rs9271898 (odds ratio [OR] = 0.64, 95% confidence interval [CI] = 0.59-0.70, P = 1.2 × 10 -24 for the minor allele A), which is located between HLA-DRB1 and HLA-DQA1 and is highly correlated with rs9272143 (D' = 0.99, r 2 = 0.98), the top hit previously reported (Table 1). A borderline genome-wide significant association was observed between CIN3 risk and a locus outside MHC region, that is rs73000875 at 18q12.3 (OR = 0.61, 95% CI = 0.50-0.74, P = 3.8 × 10 -7 for the minor allele G).
Concordance between the genotyped and imputed data was 0.96 for HLA-A, 0.97 for HLA-B, 0.92 for HLA-DRB1, 0.92 for HLA-DQB1 and 0.91 for HLA-DPB1, at two field level of HLA typing, respectively. As shown in Table 2 Table S1).

replication and combined results
A genotyping assay for replication of rs115625939 at 6p21.3 was not possible to design. Instead, a strong proxy of rs115625939, rs73730372 (D' = 1, r 2 = 1), was successfully genotyped in the replication study, including an independent set of 956 cases (827 with CIN3 and 123 with cervical cancer) and 1,715 control subjects. The association between rs73730372 and CIN3 risk was independently replicated in the replication study (OR = 0.64, 95% CI = 0.54-0.77, P = 9.8 × 10 -7 for the minor allele A) with no evidence of heterogeneity between two study phases (P-heterogeneity = 0.39). In contrast, no association was observed between rs73000875 at 18q12.3 (OR = 0.88, 95% CI = 0.69-1.11, P = 0.26 for the minor allele G) and risk of CIN3 in the replication series.
After pooling the discovery and replication data, the ORs (95% CI) were 0.62 (0.55-0.70) and 0.31(0.19-0.51) for heterozygous (genotype CT) and homozygous carriers (genotype TT) of rs73730372, compared to CC, respectively, yielding P = 3.0 × 10 -19 for trend in the combined analysis (Figure 3). No heterogeneity for the association of rs73730372 was noted by tumor grade (CIN3 vs cervical cancer). 0.69 CI, confidence interval; HLA, human leukocyte antigen; SNP, single-nucleotide polymorphism; OR, odds ratio; P, two-sided P value corresponding to the OR. a Derived from logistic regression in log-additive model with adjustment for study and one informative eigenvector generated by principal components analysis. b Derived from logistic regression in log-additive model conditioning on rs9271898, rs2516448, rs3130196 and rs73730372 with adjustment for study and one significant principal component generated by principal components analysis. www.impactjournals.com/oncotarget dIscussIon We performed a pooled analysis of data from two GWASs by imputing over 10 million genetic variants and 424 classic HLA alleles. The results provided increased support for multiple independent susceptibility loci and classic HLA alleles within the MHC region, previously identified to be associated with cervical cancer. We have also provided support for the existence of a novel, independent, disease locus, as indicated by the SNP rs73730372 at 6p21.3. These findings suggest that immunological factors play an important role in the pathogenesis of CIN3.The risk or protective alleles may affect the binding affinity to high-risk HPVs or expression level of HLA molecules, hence influence susceptibility to infection and/or persistence of high-risk HPVs.
The minor allele of rs73730372 was associated with a 40% reduced risk of CIN3 with the homozygous genotype conferring a 69% reduced risk ( Figure 2). This is one of the strongest common genetic protective variants identified CIN3 SNP rs73730372 correlates (r 2 ≥ 0.9) with 16 SNPs. It is located 11 kb upstream of HLA-DQA1 and 45 kb downstream of HLA-DQB1. rs73730372 has recently been identified as expression quantitative trait locus (eQTL) that influence the expression of both HLA-DQA1 and HLA-DQB1 in blood [4]. HLA-DQA1 and HLA-DQB1 belong to the HLA class II alpha and beta chain paralogs, respectively. Together, their gene products form a protein complex called an antigen-binding DQαβ heterodimer. This complex presents foreign peptides to the immune system to trigger the body's immune response, and therefore play a key role in the immune recognition process and subsequent clearance of virally-infected cells. Class II molecules are expressed in antigen presenting cells (APC: B Lymphocytes, dendritic cells, macrophages) and are known to be important in the regulation of the immune response to viral and other infections. Class II molecules present antigenic peptides to CD4 + T helper cells to initiate a cell-mediated immune response [5]. Our study points to the importance of expression level of HLA-DQA1 and HLA-DQB1 for the susceptibility to develop cervical carcinoma. Impaired class II gene expression has been reported in genital HPV infections and in lesions caused by HPV [6][7][8]. The increased incidence and progression Oncotarget 42221 www.impactjournals.com/oncotarget of HPV infections in immunosuppressed individuals illustrates the critical importance of the CD4 + T-cellregulated cell-mediated immune response in the resolution of HPV infection [5,9]. Failure to develop an effective cell-mediated immune response to clear infection results in a persistent infection and, in the case of the oncogenic HPVs, an increased probability of progression to CIN3 and cervical cancer [9]. The minor allele G also shows lower affinity of binding forkhead box L1 (Foxl1) and homeobox D10 (Hoxd10) [10] proteins. Foxl1 encodes a member of the forkhead/winged helix-box (FOX) family of transcription factors, which play a critical role in the regulation of multiple processes including metabolism, cell proliferation and gene expression during ontogenesis.
Hoxd10 is a member of the Abd-B homeobox family and encodes a protein with a homeobox DNA-binding domain. Recent studies suggest that HoxD10 functions as a candidate tumor suppressor [11,12].
A rs73730372, a proxy of rs115625939 (D'=1, r 2 = 1) and cervical cancer in the combined data was estimated by logistic regression analysis for the minor allele in log-additive model with adjustment for study after pooling individual level data from the discovery and replication phase. P-het , P for heterogeneity, was derived from Cochran's Q test. a Derived from logistic regression analysis for the minor allele in log-additive model with adjustment for study and the one significant principal component generated by principal components analysis. Table S6, unexpected duplicates (11 subjects) and firstdegree and second-degree relatives (9 subjects) were removed based on identity-by-state (IBS) estimates calculated in PLINK [22]. Utilizing a set of 15,872 SNPs genotyped in both GWASs and evenly distributed across the genome in low linkage disequilibrium (LD) (pair-wise r 2 < 0.02), a principal components analysis (PCA) using the EIGENSTRAT software [23] excluded 8 additional samples detected as outliers.

Genotyping and quality control
A genotyping assay for rs115625939 was not possible to design. To validate the findings from the whole-genome scan, SNP rs73730372 was genotyped for replication as a proxy for rs115625939 (D' = 1, r 2 = 1.00) with template-directed dye-terminator incorporation with fluorescence-polarization detection (FP-TDI) (Tecan, Männedorf, Switzerland). SNP rs73000875 was also genotyped using the same assay. Eight percent of the samples were selected for repeat genotyping as duplicates, yielding a reproducibility of 100%. Genotype success rate was greater than 98.7% and genotype distributions were consistent with that expected by HWE for each SNP.

statistical analysis
In the discovery phase, the potential for population stratification was investigated by PCA undertaken with the EIGENSTRAT package [23]. One significant PC derived from PCA based on the Tracy-Widom statistic (P < 0.05) was statistically significantly associated with case-control status (P < 0.05). Adjustment for population stratification was performed by including significant PC as covariate in the logistic regression. The association between each genetic variant and risk of cervical disease was estimated by the OR per minor allele and 95% CI using multivariate unconditional logistic regression, assuming a log-additive model with adjustment for study and the significant PC with PLINK. Conditioning analysis on significant HLA alleles/haplotypes was performed in PLINK using logistic regression in logadditive model conditioning on B*07:02-C*07:02, B*15:01, DRB1*13:01-DQA1*01:03-DQB1*06:03 and DRB1*15:01-DQB1*06:02 with adjustment for study and one significant principal component generated by principal components analysis. The conditioning HLA alleles/haplotypes were entered into the model simply as covariates.
In the replication study, the association between each genetic variant and risk of cervical disease was estimated by the OR per minor allele and 95% CI using unconditional logistic regression assuming a logadditive model. Individual level data from the discovery and replication phase were pooled, making for a total of 2,565 cervical cases and 6,079 controls. In the combined analysis, association between each genetic variant and risk of cervical disease was estimated by the OR per minor allele and 95% CI using unconditional logistic regression assuming a log-additive model with adjustment for study. We then conducted stratified analysis by study phase and tumor grade. Heterogeneity of ORs was assessed using the Cochran's Q test. Results that obtained a level of significance of a P < 5.0 × 10 -8 were considered statistically significant at the genomewide level. All replication and combined analyses were conducted using SAS 9.3 software. All statistical tests were two-sided.

Genotyping of classical HlA loci
The typing of HLA-A, -B, -DRB1, -DQB1 and -DPB1 in 254 cervical cancer patients included in the present study was previously performed [25] by PCR amplification of groups of alleles using biotinylated PCR primers, followed by hybridization to immobilized sequence-specific oligonucleotide probes in a lineararray format. Genotypes were determined using a computer algorithm on the basis of the pattern of sequence-specific oligonucleotide-probe hybridization.