Inherited variants affecting RNA editing may contribute to ovarian cancer susceptibility: results from a large-scale collaboration

RNA editing in mammals is a form of post-transcriptional modification in which adenosine is converted to inosine by the adenosine deaminases acting on RNA (ADAR) family of enzymes. Based on evidence of altered ADAR expression in epithelial ovarian cancers (EOC), we hypothesized that single nucleotide polymorphisms (SNPs) in ADAR genes modify EOC susceptibility, potentially by altering ovarian tissue gene expression. Using directly genotyped and imputed data from 10,891 invasive EOC cases and 21,693 controls, we evaluated the associations of 5,303 SNPs in ADAD1, ADAR, ADAR2, ADAR3, and SND1. Unconditional logistic regression was used to estimate odds ratios (OR) and 95% confidence intervals (CI), with adjustment for European ancestry. We conducted gene-level analyses using the Admixture Maximum Likelihood (AML) test and the Sequence-Kernel Association test for common and rare variants (SKAT-CR). Association analysis revealed top risk-associated SNP rs77027562 (OR (95% CI)= 1.39 (1.17-1.64), P=1.0×10−4) in ADAR3 and rs185455523 in SND1 (OR (95% CI)= 0.68 (0.56-0.83), P=2.0×10−4). When restricting to serous histology (n=6,500), the magnitude of association strengthened for rs185455523 (OR=0.60, P=1.0×10−4). Gene-level analyses revealed that variation in ADAR was associated (P<0.05) with EOC susceptibility, with PAML=0.022 and PSKAT-CR=0.020. Expression quantitative trait locus analysis in EOC tissue revealed significant associations (P<0.05) with ADAR expression for several SNPs in ADAR, including rs1127313 (G/A), a SNP in the 3′ untranslated region. In summary, germline variation involving RNA editing genes may influence EOC susceptibility, warranting further investigation of inherited and acquired alterations affecting RNA editing.


INTRODUCTION
Over the past decade it has been recognized that the complexity of higher organisms is related to the information stored in non-protein-coding regions of the genome. Such complexity may be attributed to a range of processing events and post-transcriptional modifications that affect the fate of RNA, including alternative splicing, 5' capping, 3' polyadenylation, and RNA editing [1][2][3]. The most common type of RNA editing in eukaryotes is site-selective hydrolytic deamination of adenosine into inosine (A-to-I) within double-stranded RNAs, and recent bioinformatic analyses and high-throughput sequencing efforts have revealed that A-to-I editing is widespread and alters non-coding and protein-coding sequences throughout the genome [4].
A-to-I editing is mediated by a family of adenosine deaminases acting on RNA (ADARs), and this process modulates expression of genes and biological pathways via several mechanisms [4]. Indeed, altered expression and/or activity of ADAR enzymes has been linked to a variety of conditions, including cardiovascular and neurological diseases and cancers [4]. Epithelial ovarian cancer (EOC) is the fifth leading cause of cancer death among women in the United States [5], and ADAR expression levels have been reported to be significantly higher in serum and peritoneal fluid from patients with EOCs compared with benign ovarian tumors [6,7], suggesting ADARs may be useful biomarkers for the diagnosis and management of EOC.
We hypothesized that germline single nucleotide polymorphisms (SNPs) involving ADAR-related/ RNA editing genes may contribute to EOC risk. The main purpose of this investigation was to determine whether SNPs in five ADAR genes (ADAD1, ADAR, ADAR2, ADAR3, and SND1) were associated with EOC susceptibility. We used data available from a large-scale genotyping collaboration involving 10,891 EOC cases and 21,693 controls from the international Ovarian Cancer Association Consortium (OCAC) [8]. We also sought to evaluate the overall contribution of each gene on EOC susceptibility and to determine whether candidate SNPs associated with altered expression of corresponding genes in EOC tumor tissue.

Study population
The study sample included 10,891 invasive EOC patients and 21,693 controls of European ancestry (Supplementary Table 1). Selected subject characteristics are shown in Table 1. The mean age at diagnosis for cases was 58.1 years, the mean age at interview for controls was 56.1 years. Cases were more likely than controls to be nulliparous and to have never used oral contraceptives. Most cases had serous histology (59.7%), distant stage (63.0%), and high-grade disease (58.9%).

Variant-level association analysis and overlap with regulatory domains
SNP-level association analysis revealed top-ranked SNPs (defined as the top 5% of SNPs having the most statistically significant P values) in ADAR, ADAR3, and SND1 in the all-histologies and serous-only analyses ( Figure 1A and 1B). Table 2 summarizes association results for the most statistically significant SNPs overall or by serous histology (P < 4.0x10 -3 ); associations were not significant after correction for multiple testing (FDR > 0.15). Most of the top-ranked variants were imputed, rare or low frequency (MAF < 0.05), and not part of a shared haplotype. rs77027562 (A>G; MAF = 0.009), the top risk-associated variant among all histologies (OR (95% CI) = 1.39 (1.17-1.64, P = 1.0 x10 -4 )), resides in an intron of ADAR3. ADAR3 SNP rs77027562 and its proxies (r 2 >0.80) reside in genomic regions that overlap with regulatory domains, particularly enhancers in blood and brain (Table 3). The next top-ranked variant, SND1 rs185455523 (T>A), was associated with a decreased EOC risk (OR (95% CI) = 0.68 (0.56-0.83), P = 1.5 x10 -4 ), but this SNP and its proxies do not appear to overlap with regulatory domains. When analysis was restricted to the 6,500 patients with invasive serous adenocarcinomas, the magnitude of association was slightly attenuated for ADAR3 rs77027562 (OR = 1.33, P = 6.1x10 -3 ) and slightly stronger for SND1 rs185455523 (OR = 0.60, P = 1x10 -4 ). Exploratory analysis for the less common histologic subtypes (endometrioid (n = 1,439), mucinous histology (n=6,500), the magnitude of association strengthened for rs185455523 (OR=0.60, P=1.0x10 -4 ). Gene-level analyses revealed that variation in ADAR was associated (P<0.05) with EOC susceptibility, with P AML =0.022 and P SKAT-CR =0.020. Expression quantitative trait locus analysis in EOC tissue revealed significant associations (P<0.05) with ADAR expression for several SNPs in ADAR, including rs1127313 (G/A), a SNP in the 3' untranslated region. In summary, germline variation involving RNA editing genes may influence EOC susceptibility, warranting further investigation of inherited and acquired alterations affecting RNA editing.

Gene-level analyses
Gene-level analyses based on AML and SKAT-CR revealed that variation in ADAR was nominally associated (P < 0.05) with susceptibility to all invasive EOC, with P = 0.02 using both methods (Table 4). Histology-specific analyses revealed that ADAR variation was associated with endometrioid EOC susceptibility (P SKAT-CR = 0.005/P AML = 0.008). When using a Bonferroni threshold of 0.0025, only ADAR3 variation was significantly associated with mucinous histology (P SKAT-CR = 0.0016/P AML = 0.031).
To examine associations between genotype and gene expression for the 5 candidate RNA editing genes, expression quantitative trait locus (eQTL) analysis was performed using matched genotype and tissue expression data from The Cancer Genome Atlas (TCGA) high-grade serous adenocarcinoma tumors (https://tcga-data.nci.nih. gov/tcga/). eQTL analysis revealed statistically significant associations (P < 0.05) with ADAR expression for several SNPs in ADAR, including rs1127313 (G/A), a SNP in the 3'UTR within a putative miRNA binding site that was associated with susceptibility in all histologies (OR = 1.05, P = 0.009). rs1127313 is also in high LD (r 2 = 0.86) with top ADAR risk SNP rs9426826 (see Table 2). ADAR tumor tissue expression was slightly higher among G allele carriers of rs1127313 compared to A allele carriers (P = 0.027; Figure 3). rs1127313 is also an eQTL for ADAR in whole blood (Supplementary Table 2), and lies in a genomic region with enhancer features and DNase I hypersensitivity site in several tissues, including ovary. Statistically-significant cis-eQTLs were not detected for SNPs in other candidate RNA editing genes.

DISCUSSION
An emerging body of data suggest that defects in RNA editing may contribute to a range of human diseases, including cancer [2][3][4][9][10][11]. The current largescale collaboration represents the first comprehensive association study of germline variants involving RNA editing genes and susceptibility to epithelial ovarian cancer. At the SNP-level, the strongest associations were observed for SNPs in RNA editing genes ADAR3 and SND1, but no associations reached genome-wide  levels of statistical significance. Gene-level analyses highlighted ADAR and ADAR3 as potential contributors to EOC susceptibility within the set of ADAR-related genes. Finally, positive eQTLs were also observed between ADAR genotype and ADAR expression in EOC tumor tissue. Focused evaluations of RNA editing SNP-disease associations are limited [12], especially with cancer as an outcome, so it is not possible to compare our SNP findings to those of other studies of cancer risk. We are, however, unaware of GWAS hits in or near these genes. Several recent studies [2,3] have evaluated the genomic landscape and clinical relevance of RNA editing in numerous human tissue types. These analyses used RNA-sequencing data from both tumor and normal samples profiled as part of TCGA Project. Striking differences in RNA-editing patterns were observed in tumors relative to matched normal tissues for 12 cancer types [2]. Further analyses revealed that altered RNA editing patterns in tumors correlated with ADAR expression, and that non-random, clinically-relevant RNA editing events (frequently located in noncoding RNAs, nonsynonymous sites, intronic regions, and non-Alu elements) correlated with tumor classification and patient survival and with increased cell survival and altered drug sensitivity [2,3]. Interestingly, gene amplification-associated overexpression of ADAR was recently shown to enhance lung tumorigenesis and contribute to poor outcomes by affecting downstream RNA editing patterns [10]. As mentioned previously, ADAR expression levels have been reported to be significantly higher in serum and/or peritoneal fluid from patients with EOCs compared with benign ovarian tumors [6,7]. Although high-grade serous EOCs from TCGA were not profiled as part of the aforementioned genomic investigations [18,19], Haploreg 4.1 effectively integrates GTEX eQTL results for normal ovary.
Taken together with several lines of investigation from ovarian [6,7] and other cancers [2,3,10] the current study suggests that ADARs (and ADAR in particular) may be useful biomarkers for the diagnosis and management of EOC. Thus, with replication, ADAR genotype status and/ or expression level may serve as a risk factor for EOC. Indeed, we find that our top risk SNP in ADAR, rs9426826, has several proxy variants (r 2 >0.8, Supplementary Table 2) that are strongly associated with expression of this gene in blood (rs1127313: 7.23x10 -14 ) and to a lesser extent, expression in high-grade serous EOC tumors (rs1127313: P = 0.027). Based on growing data which demonstrate  N = 10,891) or serous histology (N  = 6,500) versus controls (N = 21,693), sorted by gene and p- [13] and other therapeutic agents such as the IGFR-1R inhibitor BMS536924 and the MEK inhibitors CI1040 and trametinib [2], ADAR genotype and/or expression may help identify women whose tumors may respond to new combinations of therapies. Strengths of the current study include the large sample size that primarily enabled detection of small effects for common variants, the relatively homogeneous population of EOC cases, and the multi-tiered genomic evaluation. However, this study was underpowered to detect the rare variants that were identified and is burdened by the low imputation quality. Additionally, the study is limited in that eQTL analysis did not permit adjustment for somatic copy number changes and DNA methylation status, factors that can influence transcript abundance and confound associations between germline polymorphisms and gene expression [14][15][16]. Moreover, it is possible that the top-ranked SNPs could potentially affect genes other than the RNA editing genes that drive candidate  selection. Efforts to replicate these findings are needed; data will be available soon from a large, independent cohort of EOC cases genotyped by OCAC for this purpose (Amos et al, The OncoArray Consortium: a Network for Understanding the Genetic Architecture of Common Cancers (provisionally accepted, CEBP). Mechanistic studies to reveal how ADAR polymorphisms may affect oncogenic phenotypes will also be required, as will systematic investigations of the genomic landscape and clinical relevance of RNA editing in EOC using data from TCGA or other sources. In summary, this study provides data to support the hypothesis that germline polymorphisms in ADAR related genes may influence gene expression and susceptibility to EOC. Further investigations are needed to determine whether inherited and acquired alterations affecting RNA editing serve as biological mechanisms to promote the development of EOC.

Study population
A total of 41 studies (32 case-control and 9 caseonly) from OCAC contributed to this investigation (Supplementary Table 1). Briefly, cases were women diagnosed with histologically confirmed primary invasive EOC (95%), fallopian tube cancer (1%), or primary peritoneal cancer (4%). Controls were women without cancer and with at least one intact ovary on the reference date. Individual studies were grouped into 26 case-control strata. All studies provided data on disease status, age at diagnosis/interview, self-reported racial group, and histologic subtype.

Genotyping, quality control (QC), and imputation
Peripheral blood was the primary source of germline DNA and was collected in the course of clinical care or research at each of the participating sites. The candidate SNPs selected for the current investigation were genotyped using a custom Illumina Infinium iSelect Array as part of the international Collaborative Oncological Gene-environment Study (iCOGS), an effort to evaluate 211,155 genetic variants for association with cancer risk [17].
Briefly, OCAC genotyping was conducted at McGill University and Génome Québec Innovation Centre (Montréal, Canada) and Mayo Clinic Medical Genomics Facility. Each 96-well plate well contained 250ng genomic DNA (or 500 ng whole genome-amplified DNA). Raw intensity data files were sent to the COGS data coordination center at the University of Cambridge for genotype calling and QC using the GenCall algorithm. Sample and SNP quality control procedures have been described previously; in brief, samples were excluded with call rates < 95%, >1% discordance, < 80% European ancestry, or ambiguous gender, and SNPs were excluded with call rates < 95% or monomorphism [18,19].
To improve genomic coverage and power [14], we imputed genotypes based on data from the 1000 Genomes Project (1KGP); we used IMPUTE2 version 2 after prephasing with SHAPEIT [20]. All 14 populations in the 1KGP were used as the reference. Before imputation, we excluded poorly performing SNPs according to the genotyping success rates, deviation from Hardy-Weinberg equilibrium (HWE) (P < 1x10 -7 ), and replicate errors. To ensure the quality of the imputed genotypes, maximum likelihood genotype imputation was carried out and an estimate of the squared correlation between the imputed and true genotypes was calculated. Imputation quality is significantly decreased for low and rare frequency variants [21]. To be more inclusive of rare variants, we considered imputed SNPs with an r 2 > 0.40 as well-imputed [22] and included them in our analyses. The average imputation quality for included variants is detailed in Supplementary

Population stratification
HapMap DNA samples from European (CEU, n = 60), African (YRI, n = 53) and Asian (JPT+CHB, n = 88) populations were also genotyped as part of the same custom Illumina iSelect Array. The program LAMP [24] was used to estimate intercontinental ancestry based on the HapMap (release no. 23) genotype frequency data for these three populations. Eligible subjects with greater than 90 percent European ancestry were defined as European (n = 39,773). We then used a set of 37,000 unlinked autosomal markers to perform principal components analysis within each major population subgroup. To enable this analysis on very large sample sizes we used an in-house program written in C++ using the Intel MKL libraries for eigenvectors (available at http://ccge.medschl. cam.ac.uk/software/).

Statistical analysis
Descriptive statistics were calculated in terms of means and standard deviations for continuous variables and frequencies and percents for categorical variables. The primary association analysis focused on individuals of European ancestry. Unconditional logistic regression was used to estimate odds ratios (OR) and their 95% confidence intervals (CI) between genotype and case status under a log-additive genetic model, with adjustment for the first five principal components representing sub-European ancestry. Due to the heterogeneous nature of EOC, subgroup analyses were conducted to estimate genotype-specific odds ratios by histologic subtype: serous, endometrioid, mucinous, and clear cell carcinomas. False discovery rates (FDR) [25] were used to adjust for multiple comparisons, and FDR of 15% was used to declare significance. Two methods of gene-level evaluations were also conducted to combine association evidence from SNPs within each gene evaluated: the Admixture Maximum Likelihood (AML) Test [26] and the Sequence-Kernel Association test for the combined effect of common and rare variants (SKAT-CR) [27]. AML is an approach that simultaneously examines the global null hypothesis (of no SNP-outcome associations) and estimates the proportion of underlying false hypotheses. The AML uses univariate SNP-level results to calculate the AML Cochran-Armitage Trend test. Compared to other methods, AML has been shown to have similar or higher statistical power to detect associations except under the unlikely scenario that greater than 20% of all variants are associated with the outcome [26]. SKAT-CR evaluates the cumulative effect of rare and common variants, but does not consider low-frequency variants. These gene-level approaches were undertaken to complement SNP-level findings, and aimed to reduce the degrees of freedom, avoid model-fitting issues due to multicollinearity from LD, and to improve statistical power. The Bonferroni method was used to account for multiple comparisons.
Expression quantitative trait locus (eQTL) analysis was performed to examine for association between genotype (n = 5,303, imputed as above in n = 5 genes) and corresponding gene expression for the 5 candidate RNA editing genes. Matched genotype and gene expression profiling data were obtained for 402 high-grade serous EOC samples evaluated in the Cancer Genome Atlas (TCGA) Project using previously described methods [19]. Briefly, germline genotypes and matched tumor gene expression data were downloaded from the TCGA data portal. To conduct the eQTL analysis, we used germline genotypes of SNPs/proxies as independent variables and expression levels as traits. Expression levels between minor allele carriers versus non-carriers were compared using the Wilcoxon rank sum statistic. Haploreg v4.1 http://www.broadinstitute.org/mammals/haploreg/ www.impactjournals.com/oncotarget haploreg.php) [28] was used to evaluate the putative function of candidate SNPs.