Systematic meta-analyses of gene-specific genetic association studies in prostate cancer

In the past twenty-five years, over 700 case-control association studies on the risk of prostate cancer have been published worldwide, but their results were largely inconsistent. To facilitate following and explaining these findings, we performed a systematic meta-analysis using allelic contrasts for gene-specific SNVs from at least three independent population-based case-control studies, which were published in the field of prostate cancer between August 1, 1990 and August 1, 2015. Across 66 meta-analyses, a total of 20 genetic variants involving 584,100 subjects in 19 different genes (KLK3, IGFBP3, ESR1, SOD2, CAT, CYP1B1, VDR, RFX6, HNF1B, SRD5A2, FGFR4, LEP, HOXB13, FAS, FOXP4, SLC22A3, LMTK2, EHBP1 and MSMB) exhibited significant association with prostate cancer. The average summary OR was 1.33 (ranging from: 1.016–3.788) for risk alleles and 0.838 (ranging from: 0.757–0.896) for protective alleles. Of these positive variants, FOXP4 rs1983891, LMTK2 rs6465657 and RFX6 rs339331 had not been previously meta-analyzed. Further analyses with sufficient power design and investigations of the potential biological roles of these genetic variants in prostate cancer should be conducted.


INTRODUCTION
Prostate cancer (PCa) is the second most common type of solid cancer and the sixth leading cause of cancer death in men worldwide [1]. In the United States in 2015, there were an estimated 220,800 new cases of PCa and 27,540 deaths from this disease (American Cancer Society: Cancer Facts and Figures 2015. Atlanta, GA: American Cancer Society, 2015). Prostate cancer has a clear familial clustering feature [2][3][4].The risk of developing PCa rises two to five fold with increasing numbers of affected relatives [5]. Although data from twin studies showed that the genetic contribution to PCa risk is approximately 40% [6,7], only a few mutations in genes such as BRCA1, BRCA2, and HOXB13 definitely account for a small proportion of cases of hereditary prostate cancer [8][9][10]. Based on a large number of family studies, linkage analyses have identified approximately twenty chromosomal regions harboring potential PCa susceptible loci. However, only a few overlap across studies suggests that PCa exhibits locus heterogeneity [11][12][13]. Populationbased case-control association studies using single nucleotide variants (SNVs) powerfully fine mapped susceptible variants for common diseases [14,15]. To fine map and identify risk loci of PCa in the population, over 700 candidate gene association studies and 19 GWAS have been published in the past 25 years (between 1990 and 2015). However, the statistical association results claimed by these studies are not consistent. Meta-analysis is a helpful procedure to increase statistical power and to estimate whether the genetic association of a disease is true or not through combining data from the initial reports and subsequent replication studies [16].
To better assess and explain the association of genetic variants with PCa, we have collected and catalogued all population-based case-control genetic association studies, which were published in the field of PCa between August 1, 1990 and August 1, 2015.

Meta-analyses
Among the 217 SNVs, 66 variants in 51 different genes with sufficient data were finally included for our meta-analysis (Supplementary Table S1 and Supplementary Figure S1). Combining studies of all ethnic populations, we calculated the summary OR and 95% c.i. values across the 66 included variants using an allelic contrast model. Of these 66 SNVs, 20 in 19 different genes (KLK3, IGFBP3, ESR1, SOD2, CAT, CYP1B1, VDR, RFX6, HNF1B, SRD5A2, FGFR4, LEP, HOXB13, FAS, FOXP4, SLC22A3, LMTK2, EHBP1 and MSMB) had significant summary ORs (referred to as "positive" association) (Tables 1 and 2; Supplementary  Figures S2 and S3). These results suggested an increased or a decreased risk for prostate cancer. The total of sample size involved in these positive meta-analyses was 584,100. The average sample size combining cases with controls across these 20 positive meta-analyses was 29,205 subjects (ranging from 2,581 to 84,391). On average, ~8 independent studies were included for each SNV (ranging from 3 to 18) (Tables  1 and 2). Fourteen of twenty positive SNVs (summary ORs >1, ranging from 1.039-3.788) increased the risk for prostate cancer by an average of 1.34-fold. Six SNVs (in VDR, FAS, KLK3, RFX6 and HNF1B) with an average protective summary OR of 0.838 (ranging from 0.757-0.896) decreased the risk for PCa by approximately 14%. Apart from 17 SNVs that have been previously meta-analyzed using published data (Tables 1 and 2), 3 additional positive associations were identified by this studies (including FOXP4 rs1983891, LMTK2 rs6465657 and RFX6 rs339331).
After the initial publications were removed, twelve (60%) of these twenty positive meta-analyses showed notable changes in statistical significance. The P values of eleven variants clearly decreased, and three positive variants (rs1800682 in FAS, rs9364554 in SLC22A3 and rs6465657 in LMTK2) became insignificant (P >0.05) (Tables 1 and 2), due to a substantial decrease in sample size (range: 34-56%). By contrast, one variant rs1544410 in VDR became more significant (P value changed from 0.011 to 0.007). Of twenty positive SNVs, eleven variants showed study homogeneity (P >0.1 in Q statistic) in the meta-analyses combining all ethnic groups, and their summary ORs and 95% c.i. values were calculated using the fixed-effect model (Table 1). Nine other positive associations showed large between-study heterogeneity (P< 0.1 in Q statistic), and the summary ORs of them were calculated using the random-effect model ( Table 2).
Forty-six SNVs in thirty-five different genes did not show significant summary ORs (referred to as negative SNVs) utilizing random-effect models, after all published population-based case-control studies were allelic-specifically meta-analyzed in all ethnic groups (Supplementary Table S2). The combined sample sizes of these negative meta-analyses ranged from 1,070-32,738 (mean of 9,096 individuals), and the average case-control studies were 15 (range from 3 to 33). Thirty-eight of these 46 variants were close to statistical significance. The average of OR was 1.05 (ranging from 0.921-1.249).
Eight of the twenty positive variants in the control subjects showed deviation from Hardy-Weinberg equilibrium (HWE). While four variants (rs9282858 in SRD5A2, rs1001179 in CAT, rs1056836 in CYP1B1 and rs1544410 in VDR) became insignificant after the exclusion of HWE-deviation studies (Tables 1 and 2). For positive variants that still showed between-study heterogeneity after excluding the HWE-deviation studies, we then stepwise removed outlier studies until homogeneity was obtained. Eight (~90%) of nine variants with heterogeneity were corrected (Heterogeneity P-value >0.1), except RFX6 rs339331. Although the overwhelming majority of significant results only showed slight changes after the process, one positive variant (rs9340799 in ESR1) lost significant effect size (Supplementary Table S4).
Apart from two positive variants (HOXB13 rs138213197 and LEP rs2167270) for which all published case-control studies were of Caucasian-ancestry, we stratified other eighteen positive variants into racial subgroups and re-meta-analyzed summary ORs and 95% c.i. values. Two variants (in EHBP1 and HNF1B) consistently showed significant association (P values: 0.036 to 0.000) with PCa across Asian-ancestry, Caucasianancestry and African-ancestry; Five variants (RFX6 rs339331, FOXP4 rs1983891, CYP1B1 rs1056836, MSMB rs10993994 and FGFR4 rs351855) showed positive results in both Asian-ancestry and Caucasian-ancestry; Six variants in KLK3, SRD5A2, SOD2, CAT, SLC22A3 and LMTK2 only maintained statistical significance in Caucasian-ancestry; Note that 'All ethnicities' represented that all studies across all ethnic groups were meta-analyzed; 'all excl. initial study' represented that the first publication reporting on statistic association for any variant was excluded; 'all excl. HWE study' represented that the studies deviating from HWE were excluded. a, the summary OR and 95% c.i. values. b, Q statistic across crude ORs was calculated for each independent study. c, P < 0.1 was usually considered as a significant evidence for between-study heterogeneity.
Two variants in the VDR gene showed positive results only in Asian ancestry; While the variant rs9340799 in ESR1 was significantly associated with PCa only in African-ancestry patients. Of note, no positive results were seen in IGFBP3 rs2854744 and in FAS rs1800682 in all ethnic subgroups ( Table 3, Figure 1 and Supplementary Figure S5). Fifteen of forty-six negative variants (calculation using random-effects model combining all subjects in all ethnic populations) became positive results (P value range: 0.04-0.000) in one or two ethnic groups, when re-analyzed them based on different ethnic ancestries. Of these fifteen positive variants, seven (~47%) were seen only in Asian-ancestry population. Except for one protective variant (MTHFR rs1801133) for prostate cancer (OR = 0.684, 95% c.i.=0.565-0.828), six of the seven variant (in MGMT, XRCC1, OGG1, ERCC2 and CASC8) showed risks for prostate cancer. The average risk OR was 1.417 (range: 1.148-1.911), and the mean sample size combining case-control individuals in all Asian-ancestry studies was 2091 (range: 407-3430). Six of the fifteen significant variants were observed in African and Caucasian ancestries, respectively (that is, RNASEL rs627928, CDH1 rs16260 and PTGS2 rs2745557 in African-ancestry; MPO rs2333227, MDM2 rs2279744 and THADA rs1465618 in Caucasian-ancestry). Among the fifteen variants, only one (rs10486567 in JAZF1) shared association with prostate cancer in 5,747 African (OR=0.855, 95% c.i.=0.787-0.929) and in 19,461   Figure S6). We estimated the potential publication bias of all twenty positive variants in combining all sample sizes across all ethnic groups using Egger's linear regression test. The results suggested that five (SOD2 rs4880, ESR1 rs9340799, VDR rs1544410, FOXP4 rs1983891 and EHBP1 rs721048) showed evidence of significant publication bias (P value from 0.048 to 0.01) (Supplementary Figure S7). The results produced by trim and fill algorithm showed that sixteen of the twenty positive variants had a adjusted effect size. The imputed missing studies ranged from 1 to 5 (Supplementary Figure S7).
The statistical power for detection of these twenty significant summary ORs (meta-analyzed in all ethnic groups) ranged from 0.28 to 1, and the average power was 0.77. The power of nine meta-analyses with OR<1.2 was below 80%. The genetic power of five variants (VDR rs731236 and KLK3 rs2735839 with OR <0.8; SRD5A2 rs9282858, HOXB13 rs138213197 and MSMB rs10993994 with OR > 1.2) approached 80% (Supplementary Table S5).
To estimate the sufficiency and stability of the positive meta-analyses, we performed one-studyremoved tests and cumulative meta-analyses (see methods). The results from the one-study-removed test revealed that most of positive meta-analyses (85%) had a slight change in the summary ORs and 95% c.i. values in one direction, but the summary ORs of three variants (LEP rs2167270, LMTK2 rs6465657 and FAS rs1800682) became negative when the studies with a large sample size were removed (Supplementary Figure S8). Cumulatively meta-analyzing all twenty positive variants, we found that all meta-analyses achieved consistent and significant effect sizes when an average of two-thirds of the independent studies were accumulated (mean of cumulative OR = 0.83 for protective variants, range:0.72-0.91; mean of cumulative OR=1.29 for risk variants, range:1.04-3.03) (Supplementary Figure S9 and Supplementary  Table S6). Four variants (ESR1 rs9340799, CYP1B1 rs1056836, SLC22A3 rs9364554 and LMTK2 rs6465657) reached significant cumulative ORs until their all samples were included.

DISCUSSION
As of 1 August, 2015, we collected and extracted data from 728 publications reporting on 217 genetic SNVs in 136 different genes. Based on our inclusion criteria,we systematically assessed the summary ORs of 66 SNVs across 51 different genes using meta-analysis. 35 meta-analyzed SNVs in 32 genes showed statistical significance. 20 of these positive results were derived from the meta-analyses combining all ethnic groups. The other 15 significant SNVs resulted from the analysis procedure in different ethnic ancestries. The average allelic risk showing statistical significance in both the initial meta-analyses combining all ethnic ancestries and in the analyses based on classified racial groups were indicated in dark blue text. The positive variants that only showed statistic significance in the analyses based on classified racial groups were indicated in red. summary OR was 1.338, and the average protective summary OR was 0.791. 70% (46/66) of SNVs in 371 independent studies involving 418,393 subjects showed no significant association with PCa when combining all samples across all ancestral groups. As we detected the sample size using the medium genotype relative risk of 1.16 that was observed in the present study, α defined as 0.05 and a disease prevalence of 0.1%, the sample size of case subjects ranged from ~3,500 to 14,500 with minor allelic frequencies between 0.1 and 0.5, when a statistical power approached 80% (estimated using genetic power calculator, see method). According to this estimation, 29 and 44 of 46 negative results could not meet these sample size requirements, respectively. Therefore, the small effect sizes across the current negative meta-analyses might account for the insufficient power.
Meta-analysis is a powerful way not only to measure heterogeneity of associations across independent studies, but also to quantify the extent of between-study heterogeneity [17]. In total, 43 (~65%) of 66 current meta-analyses showed evidence of significant heterogeneity (P<0.1 in Q statistic) across 560 published studies (range of heterogeneity P-value: 0.093-0.000). Of them, nine were positive associations and thirty-four were negative associations.
The initial positive study (the first association study) often exhibits an underlying statistical inflation [18]. In our current analyses, we observed the same phenomenon. After excluding the initial study from each of the twenty positive meta-analyses, we found that over half (11/20) of positive meta-analyses showed a reduction in statistical significance. The P value decreased, on average, by 9.6-fold (range: 1.429 to 44.214) (Tables 1  and 2, Supplementary Table S2). Four meta-analyses with weak significance became negative results, supported by the evidence from cumulative meta-analysis plots (Supplementary Figure S9). These results suggested that the actual genetic effect of these positive association might be overestimated by the "winner's curse" bias [19].
After removing 37 HWE-deviated studies (including 9 positive associations and 26 negative associations), the heterogeneity values of only nine negative analyses were corrected (heterogeneity P-value >0.1). We further excluded the outlier studies from nine positive metaanalyses with significant between-study heterogeneity until reaching homogeneity. Eight of nine could be reduced to display no between-study heterogeneity. We also stratified all studies into the same racial groups and re-performed the meta-analyses. Thirty-three of sixtysix meta-analyses showed positive associations in one or more racial groups. Fifteen negative SNVs identified in the initial meta-analyses became positive (Supplementary Table S3). Eight of forty-three meta-analyses with significant heterogeneity across all ethnic ancestries showed no between-study heterogeneity in ethnic subgroups (Table 3 and Supplementary Table S3). Notably, thirty-six of sixty-six meta-analyses existed significant between-study heterogeneity (heterogeneity P-value <0.1) within a single-ancestry population, suggesting that an evidence of bias for positive results might partly be attributed to undetected population stratification in Asian-, African-and Caucasian-ancestry.Finally, ten initially "negative variants" (in CASC8, CYP1A1, CYP3A4, GPX1, IL18, JAZF1, MDM2, MPO, THADA and TP53) under the random-effects model became significant when analyzed using the fixed-effects model (Supplementary Table S7). However, Cochran's Q test suggested that all ten variants have a large between-study heterogeneity (the average P value=0.021).
Publication bias tends to occur when small studies with insignificant results are not published. We observed the presence of significant publication bias in five of the current 20 positive meta-analyses using Egger's regression. The bias was further shown by the trim and fill procedure (an average of imputed missing studies was three) (Supplementary Figure S7). These bias evidence suggested that small studies led to larger effects than larger studies for these five positive variants (FOXP4 rs1983891, EHBP1 rs721048, SOD2 rs4880, ESR1 rs9340799 and VDR rs1544410). Although the trim and fill analyses also showed that thirteen positive variants need to impute average missing studies of 2, only small shift of adjusted ORs was observed (shift range: 0.1%-0.7%) (Supplementary Figure S7). Statistical methods used to test publication bias are generally underpowered [20,21], thus the conclusion of the genetic association of these variants with PCa should be treated cautiously.
Of the twenty positive associations identified here, three significant SNVs (two in FOXP4 and LMTK2 with risk effects and one in RFX6 with protective effects), to our current knowledge, were not reported in previous publications. The vast majority of differences between the results of previously published meta-analyses and the results reported here arose from the different inclusionand exclusion criteria. For example, five previous metaanalyses included family-based publications. In addition, one difference likely arose from miscalculation in one previous meta-analysis (Supplementary Table S8).
In this meta-analyses, five protein-coding variants (SRD5A2 p.Ala49Thr, FGFR4 p.Gly388Arg, HOXB13 p.Gly84Glu, CYP1B1 p.Leu432Val and SOD2 p.Val16Ala) were significantly associated with PCa risk, but the last two variants with small summary ORs (an average of 1.12) had significant between-study heterogeneity (heterogeneity P-value <0.1). The rs351855 C>T resides in exon 9 of the FGFR4 gene leads to a glycine to arginine substitution at codon 388 (p.Gly388Arg). The risk allele (T) of the variant was significantly associated with a poor cancer prognosis [22,23]. The rs138213197 C>T in HOXB13, yielding a rare G84E missense variant, significantly increases the risk of prostate cancer, and the biological function of this 84E remains unclear. www.impactjournals.com/oncotarget In summary, we have performed a comprehensive estimation of the genetic association between populationbased gene-specific SNVs and PCa using currently available data. Our meta-analyses yielded twenty significant associations, but the bulk of their genetic effects were small or modest. Thus, the interpretation of these positive results should be cautious. Further analyses with sufficient power and investigations of the potential biological roles of these genetic variants are needed.

Inclusion criteria
To be considered for inclusion in this meta-analysis, a study must be satisfy four criteria. (i) It must be an association study between a SNV and PCa. Note that in this study, as we exclusively focused on population-based case-control designs to investigate the genetic risk loci of disease, studies based on family designs or quantitative trait analyses were not included in any of the statistical analyses. Studies on microsatellite variants or on genetic markers with complex allelic architecture (for example, SNVs with more than two alleles for which it is difficult to obtain a consistent distribution of alleles and genotypes across different studies) were also not included. As SNVs occurring in known gene sequences can exert severe effects on biological functions, we thus exclusively metaanalyzed gene-specific SNVs here. (ii) The study must be a research published in a peer reviewed journal. (iii) Only studies published in English were included. Because non-English-publishing genetic association papers in PubMed and EMBASE represent 5.6% of all population-based case-control association studies of PCa, the exclusion of these papers is not expected to produce a drastic effect on any of the overall conclusions. (iv) The SNVs were reported on at least three independent case-control studies. For studies with underlying duplicate publications, we contacted the corresponding authors by e-mail to ask for clarification. Those publications that could not be clarified by authors after twice e-mails were considered as duplication publications. We only included the initial publication in our study.
We first performed a search for all publications included in the PubMed database using the keywords "prostate cancer AND (polymorphism OR association OR variation OR variant OR risk OR susceptible OR susceptibility OR sequencing OR case-control OR gene)" between August 1, 1990 and August 1, 2015. A total of 41,955 articles were obtained. The titles and/or abstracts of these papers were scanned for the inclusion criteria above. This step yielded 560 studies containing 66 SNVs in 51 different genes that were eligible for the inclusion criteria in this study. Next, to test the completeness of our search strategies, we searched the EMBASE and Cochrane database using 7 random chosen SNVs or 7 matched genes combined with the keyword "prostate cancer", respectively.

Data extraction
First, full-text versions of all papers used for metaanalyses were obtained. Next, the first author, year of publication, ethnic group, genotype or allele data and PubMed ID were extracted from each paper. In this study, ethnic groups encompassed four general populations (African, Asian, Caucasian and Mixed population). All 66 SNVs for subsequent meta-analyses were represented using dbSNP identifiers ("rs" numbers). For the papers in which genotype or allele distributions were not provided, we requested genotype information by directly e-mailing the first or corresponding authors. For SNVs for which we could not obtain the genotype data after two e-mail, genotype distributions were deduced from their allelic data. All information extracted from the publications was summarized in Supplementary Table S1.

Statistical analyses
SNVs with genotype and/or allele data in three or more independent studies were included for metaanalyses. We performed meta-analyses with i) all studies including general populations (all ethnic groups); ii) studies excluding the initial publication (because the initial publication often shows inflated evidence of association); iii) studies after the exclusion of samples deviating from HWE; iv) studies sorted by subgroups which have same ancestries. We calculated the crude ORs and 95% c.i. values using allele contrasts. Summary ORs and 95% c.i. values were calculated using the Mantel-Haenszel fixed-effects model [24] and the DerSimonian and Laird random-effects model [25]. Statistical heterogeneity across studies was assessed using Cochran's Q test, for which a P value< 0.1 showed the presence of significant heterogeneity. For meta-analyses that still showed evidence of between-study heterogeneity (Q statistic, P value< 0.1) after excluding deviations from HWE, we performed a heterogeneity correction by the stepwise removal of outlier studies until homogeneity was achieved. We performed one-study-removed analyses to show the effect of each study on the summary OR. In the one-studyremoved analysis, a single study was removed from the meta-analysis and the summary ORs and 95% c.i. values of the remaining studies were re-calculated. We estimated publication bias for all statistically significant SNVs. Egger's regression procedure was used to test the extent of publication bias, with a funnel plot to show bias. Duval and Tweedie's trim and fill procedure to impute a shift of ORs and to impute the number of missing studies when a apparent bias was to be removed [26]. We also estimated sufficiency and stability of the positive meta-analyses using meta-cumulative method [27]. In this approach,