Two-stage genome-wide association study identifies a novel susceptibility locus associated with melanoma

Genome-wide association studies have identified 21 susceptibility loci associated with melanoma. These loci implicate genes affecting pigmentation, nevus count, telomere maintenance, and DNA repair in melanoma risk. Here, we report the results of a two-stage genome-wide association study of melanoma. The stage 1 discovery phase consisted of 4,842 self-reported melanoma cases and 286,565 controls of European ancestry from the 23andMe research cohort and the stage 2 replication phase consisted of 1,804 melanoma cases and 1,026 controls from the University of Texas M.D. Anderson Cancer Center. We performed a combined meta-analysis totaling 6,628 melanoma cases and 287,591 controls. Our study replicates 20 of 21 previously known melanoma-loci and confirms the association of the telomerase reverse transcriptase, TERT, with melanoma susceptibility at genome-wide significance. In addition, we uncover a novel polymorphism, rs187843643 (OR = 1.96; 95% CI = [1.54, 2.48]; P = 3.53 × 10−8), associated with melanoma. The SNP rs187842643 lies within a noncoding RNA 177kb downstream of BASP1 (brain associated protein-1). We find that BASP1 expression is suppressed in melanoma as compared with benign nevi, providing additional evidence for a putative role in melanoma pathogenesis.

Crowd-sourced data has recently been utilized to identify susceptibility loci for a wide range of disease phenotypes [4]. Here, we utilize crowd-sourced data in a two-stage genome-wide association meta-analysis for melanoma, totaling 6,628 cases and 287,591 controls. In this GWAS, we replicate 20 of 21 previously identified melanoma-associated loci and discover one novel susceptibility locus at genome-wide significance.

RESULTS
Stage 1 consisted of 4,842 self-reported melanoma cases and 286,565 controls of European ancestry from the 23andMe, Inc. research cohort (Table 1). A validation of self-report of melanoma history using the same 23andMe survey questions with adjudicated medical records revealed a sensitivity of 100% and specificity of 98.8% (p < 0.0001; Fisher's exact test) (Supplementary Table 1). The most significant melanoma-associated SNP at each locus (P < 10 -5 ) was identified, resulting in nine index single nucleotide polymorphisms (SNPs) (Table 2, Figure 1). Stage 2 consisted of 1,804 melanoma cases and 1,026 non-hispanic controls from the MD Anderson Cancer Center. Four of the nine index SNPS were replicated in the stage 2 analysis (P < 0.05). Although some loci did not reach statistical significance in stage 2, their 95% confidence intervals (for odds ratios) overlapped with the corresponding stage 1 confidence intervals.

DISCUSSION
It is important to note that some loci associated with pigmentation phenotype may also contribute to melanoma risk independent of pigmentation. For example, in addition to affecting hair color, previous studies have demonstrated that primary human melanocytes with MC1R variants have impaired DNA-repair [9].
Our study identified one novel SNP not previously associated with melanoma, rs187843643. While rs187843643 did not reach statistical significance in stage 2, likely due to low allele frequency and limited number of stage 2 cases, its 95% confidence interval (for odds ratios) overlapped with corresponding stage 1 confidence intervals. Rs187843643, located at 5p15.1, lies 177 kb downstream of brain abundant membrane attached signal protein 1 (BASP1; P = 3.53 × 10 -08 ; OR = 1.96, logistic regression) and within a poorly characterized long noncoding RNA, RP11-321E2.4. BASP1 is a proteincoding gene, with several PEST motifs, which are associated with proteins with high turnover. The role of BASP1 protein in cancer has not been well established. One study demonstrated an association between increased BASP1 expression in stage III and stage IV melanoma tumor cells and improved melanoma survival [10]. Consistent with a protective role for BASP1 in melanoma, we found that BASP1 expression was suppressed in melanoma (N = 45) as compared with benign nevi (N = 18) by 0.26 fold (P = 0.007, moderated t-statistic) using publicly available expression data (GEO, GDS1375/ GSE3189) ( Figure 2) [11,12]. BASP1 expression has also been shown to be downregulated in hepatocellular carcinoma via epigenetic regulation [13]. This implicates a potential tumor-suppressive role for BASP1 in melanoma. Interestingly, this locus was not identified by the recent meta-analysis by Law et al., potentially due to variability in the imputation panels and QC filters [14].
Telomere homeostasis has been previously associated with melanoma risk. Multiple studies now support the association between longer telomere length and increased melanoma susceptibility, as well with increased nevus count [15,16]. In addition, telomererelated loci have been associated with risk of melanoma in GWAS including: ATM, TERT [17], and more recently, www.impactjournals.com/oncotarget OBFC1 [14]. TERT, the catalytic subunit of telomerase, plays a critical role in maintaining telomere length and has been shown to support cancer progression through both telomere-dependent and telomere-independent mechanisms [18]. Polymorphisms at the TERT locus have been associated with melanoma in multiple candidate studies and rare mutations in TERT have been identified in high-incident melanoma families [3]. Our GWAS found an association between the TERT marker rs139996880 (5p15.33) and increased melanoma risk at genome-wide Counts and percentages for cases and controls (n (%)) are listed above, stratified by stage of GWAS. We also report number and percentage of male subjects, subjects with age < 30 years, subjects with age 30-45 years, subjects with age 45-60 years, and subjects with age > 60 years. SNPs that met genome-wide significance (P < 5 × 10 -8 ) in the overall meta-analysis are listed. Additionally, we report genetic locus, nearest genes, major allele, minor allele, minor allele frequency (MAF) in stage 1 controls, average imputation r 2 (a measure of imputation quality) for stage 1, and odds ratio (OR) with P value for each stage, calculated with respect to the minor allele. Stage 1 included 4,842 melanoma cases and 286,565 controls from 23andMe. Stage 2 included 1,804 melanoma cases and 1,026 controls from the MD Anderson Cancer Center. The combined fixed-effect meta-analysis, totaled 6,628 melanoma cases and 287,591 controls. Statistics for effect heterogeneity (P het and I 2 ) are included in Supplementary Table 4. All subjects were from the US and of European ancestry. 1 MAF = minor allele frequency in stage 1 controls 2 Meta-analysis = Combined 23&Me + MD Anderson 3 CI = 95% confidence interval ** = Not previously associated with melanoma risk * = Imputation r 2 = 0.2968 in MD Anderson Dataset. significance (P = 7.16 × 10 -12 ; OR = 1.26) confirming the association of TERT with melanoma. Our findings further support the importance of telomere homeostasis in melanoma.
This two-stage GWAS validates the use of consumer self-report data as a platform for discovery of new cancer-related genes, provides confirmation of 20 out of 21 of the previously known melanoma-associated loci, and identifies one novel susceptibility locus (5p15.1; BASP1) which confers a 1.96-fold increase in risk of melanoma. Further exploration into the role of the BASP1 locus in melanoma pathogenesis is warranted.

Stage 1 study design and population
23andMe, Inc. (Mountain View, CA), a personal genetics company, provided free access to aggregated genetic and phenotypic information for stage 1 of the GWAS. 23andMe research participants provided informed consent, in accordance with 23andMe's human subjects protocol (reviewed and approved by Ethical and Independent Review Services, a AAHRPP-accredited IRB). 23andMe gathers genetic information for research by genotyping sample material provided by customers who have consented to research; phenotypic information is collected via online surveys taken by research participants. Inclusion and exclusion criteria are discussed below.

Stage 1 genome-wide association analysis
Association analysis for stage 1 was performed using logistic regression, assuming an additive model for allelic effects. The analysis was adjusted for age, sex, and population stratification (using the first five principal components), generating the following model: 1) Melanoma diagnosis ~ age + sex + pc.0 + pc.1 + pc.2 +pc.3 + pc.4 + genotype. Analyses were restricted to individuals with > 97% European ancestry from the local ancestry analysis to address outlier. Five principal components were extensively evaluated to verify robustness and its use in capturing ancestry structure within Europe. The association test p value was computed using a likelihood ratio test. Results for the X chromosome were computed similarly, with male genotypes coded as if they were homozygous diploid for the observed allele. Additional to principal component analysis, test statistics were adjusted for genomic control to correct for residual population stratification persisting after principal component analysis; the genomic control inflation factor was 1.016 (computed from the median p value for results that passed quality control). Regions of interest were defined by identifying SNPs with P < 10 −5 , then grouping these into intervals separated by gaps of at least 250 kb, and choosing the SNP with the smallest p value within each interval.

Sensitivity and specificity of stage 1 self-reported data
To assess the validity of self-reported phenotypic data in stage 1, 23andMe surveys (pertaining to skin cancer history and pigmentation) were randomly administered to 188 patients seen in Stanford outpatient clinics. The survey answers were then compared to medical records to assess for accuracy with respect to melanoma diagnosis to determine the sensitivity and specificity of the survey responses. P values were determined using a Fisher's exact test due to the presence of low frequency events. This sub-study was approved by the Stanford University Institutional Review Board with a waiver of documentation of informed consent.

Stage 2 study design and population
The study participants were from a hospital-based case-control study of melanoma, for which cases were recruited from among non-Hispanic white patients and controls at MD Anderson between March 1998 and August 2008. Samples and data were available from 931 melanoma patients and 1,026 cancer-free controls (friends of other patients reporting to clinics), which were frequency-matched on age and sex, had completed a comprehensive skin lifestyle questionnaire, and had passed quality control filters for genotyping. This questionnaire was administered by an interviewer to 70% of patients and controls and was self-administered for the remaining 30%. An additional case series comprising 873 individuals presenting for treatment for melanoma at MD Anderson was also included, bringing the total number of melanoma patients to 1,804. The study protocols were approved by the Institutional Review Board at MD Anderson and informed consent was obtained from all participants.

Stage 2 genome-wide association analysis
Association analysis with risk of melanoma of genotyped SNPs or most likely genotypes from the imputation study was performed using the PLINKlogistic and -covar options. A logistic regression model was built to measure the additive effect of each SNP on susceptibility to melanoma. A likelihood ratio test was performed under the null hypothesis of x2 distribution with one degree of freedom. The first two PCs were included to adjust for population structure [19].

Meta-analysis
For each SNP, associations in stage 1 and stage 2 were combined in an inverse-variance-weighted metaanalysis using the METAL software [20]. Imputation qualities across batches in 23andMe chips were tested for to pick up variants that have differences in behavior across arrays. Heterogeneity of per-SNP effect sizes in studies contributing to the stage 1, stage 2, and the overall meta-analysis was assessed. All R 2 and D' values between individual SNPs were calculated based on the 1000 Genomes Pilot 1 dataset, CEU Population (http:// www.broadinstitute.org/mpg/snap/ldsearchpw.php).

Proportion of familial relative risk
We have used the formula for calculating the proportion of FRR as outlined by the Cancer Oncological Gene-environment Study (COGS) [5]. The odds ratios derived from our meta-analysis of stage 1 and stage 2 are assumed to be relative risks. We estimated the proportion www.impactjournals.com/oncotarget of the familial relative risk (FRR) explained by each SNP (FRR snp ) as: 1) FRR snp = (pr 2 +q )/( pr+q) 2 Here, the risk allele and alternative allele frequencies are p and q, respectively, and r is the odds ratio for the risk allele. Allele frequencies are derived from the stage 1 population data. Assuming that the loci combine multiplicatively and are not in linkage disequilibrium, the combined effect of all loci is given by:

2)
Here, the product is across all loci. The proportion of the familial relative risk attributable to the SNPs, on a log scale, is then given by log(λ T )/log(λ P ), where λ P is the familial relative risk observed in epidemiological studies, assuming an λ P for melanoma of 2.19 [14].

Gene expression analysis
Processed gene expression data for melanoma and nevi (GSE3189) was obtained from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/). Forty-five melanoma samples and 18 controls (18 benign nevi) were included [12]. Each gene of interest was selected by its proximity to one of the novel risk alleles. For each dataset, Geo2R, which employs a linear-based model for microarray analysis, was utilized to compare gene expression between melanoma and normal skin controls [21]. Significant results were defined as instances of differential gene expression (in melanoma tissue relative to control) reaching P < 0.05 the dataset.