The precision relationships between eight GWAS-identified genetic variants and breast cancer in a Chinese population

Some of the new breast cancer susceptibility loci discovered in recent Genome-wide association studies (GWASs) have not been confirmed in Chinese populations. To determine whether eight novel Single-Nucleotide Polymorphisms (SNPs) have associations with breast cancer risk in women from southeast China, we conducted a case-control study of 1,156 breast cancer patients and 1,256 healthy controls. We first validated that the SNPs rs12922061, rs2290203, and rs2981578 were associated with overall breast cancer risk in southeast Chinese women, with the per-allele OR of 1.209 (95%CI: 1.064-1.372), 1.176 (95%CI: 1.048-1.320), and 0.852 (95%CI: 0.759-0.956), respectively. Rs12922061 and rs2290203 even passed the threshold for Bonferroni correction (P value: 0.00625). In stratified analysis, we found another three SNPs were significantly associated within different subgroups. However, after Bonferroni correction (P value: 0.000446), there were no statistically significant was observed. In gene-environment interaction analysis, we observed gene-environment interactions played a potential role of in the risk of breast cancer. These findings provide new insight into the associations between the genetic susceptibility and fine classifications of breast cancer. Based on these results, we encourage further large series studies and functional research to confirm these finding.


INTRODUCTION
Breast cancer is the most common type of cancer that is diagnosed and the most common cause of cancer deaths among females worldwide [1]. In China, the incidence and mortality from breast cancer is rising rapidly [2]. To effectively reduce the incidence and mortality from breast cancer, its etiology must be determined. Genetic predisposition is an important factor associated with breast cancer risk. Linkage and family based studies have identified numerous predisposition factors for breast cancer, including BRCA1, BRCA2, TP53 and PTEN, which are known as high-penetrance breast cancer susceptibility genes [3][4][5][6]. However, only about 5% of the sporadic breast cancer risk and less than 25% of the familial risk can be explained by these high-penetrance susceptibility genes because of their low mutation rates [7]. Meanwhile, GWASs have discovered more than 90 independent low-penetrance susceptibility loci that are associated with breast cancer risk [8]. Different from high-penetrance susceptibility genes, these low-penetrance susceptibility loci account for a substantial portion of the sporadic breast cancer risk and approximately 16% of the familial risk [8]. Most of the susceptibility loci were discovered in women from European populations [9]. Due to the linkage disequilibrium (LD) in diverse populations, it still needs to be determined if these SNPs have strong statistical associations with the risk of breast cancer in other populations. In addition, unlike high-penetrance breast cancer susceptibility genes, each low-penetrance susceptibility loci has only a weak association with the risk of breast cancer, and a small effect for increasing breast cancer risk. However, an accumulative effect of multiple alleles may increase the potential of lowpenetrance susceptibility loci, contributing to the risk of breast cancer.
Breast cancer is a complex disease which results from both genetic factors and traditional risk factors. There are a number of traditional risk factors that have been reported to be associated with breast cancer, including age, age at menarche, reproductive and menstrual history, Body Mass Index (BMI), alcohol intake, smoking, physical activity, benign breast diseases, oral contraceptives, and hormone therapy [10][11][12][13]. Although many traditional risk factors have been incorporated into risk prediction models for breast cancer [14], little is known about how common susceptibility loci interact with traditional risk factors for breast cancer. The study conducted by the Breast Cancer Association Consortium (BCAC) in 2010 failed to validate the effects of common susceptibility loci on the associations of traditional risk factors with breast cancer [15]. Similarly, a comprehensive study performed by the Breast and Prostate Cancer Cohort Consortium was also unable to validate the effects [16]. However, a study by Nickels et al. including 24 studies from BCAC has provided strong evidence to confirm the important role of genetic-environment interactions in the risk of breast cancer [17]. Moreover, in recent years, exposure data have been incorporated into GWASs, and it is imperative to evaluate gene-environment interactions for breast cancer with the goal of better determining breast cancer susceptibility.
In this study, we conducted a case-control study of 1,156 breast cancer patients and 1,256 healthy controls from a southeast Chinese population to investigate the associations between eight novel GWAS-identified independent genetic susceptibility loci and the risk of breast cancer [18][19][20][21][22]. In addition, we performed stratified analysis, including the subgroups of the breast cancer subtypes, to gain more understanding of these variants in breast cancer etiology. Moreover, we evaluated the combined effects of SNPs. Furthermore, a geneenvironment interaction analysis was conducted to explore the role of genetic-environment interactions in the risk of breast cancer.

RESULTS
A total of 1,156 breast cancer patients and 1,126 health controls were selected for this study, and their characteristics are summarized in Table 1. The age of the breast cancer patients (46.7±10.4 years) was appropriately matched with the age of the controls (47.4±10.8 years). The breast cancer patients were more likely to have a low education lever, a lower mean BMI, fewer live births, a shorter period of breast feeding, an earlier age at menarche, a higher incidence of natural premenopausal status, a higher incidence of prior hormone replacement therapy, a higher incidence of previous benign breast disease diagnosis, and a greater frequency of breast cancer family history, compared with health controls (P≤0.05). There were no statistical differences in the other risk factors between patients and controls (P>0.05). With regard to the ER/PR status of the breast cancer patients, there were 778 (67.3%) ER positive cases and 709 (61.3%) PR positive cases included in this study. Table 2 shows the allele and genotype distribution of the eight SNPs in breast patients and controls and their association with overall breast cancer risk. Among the control group, the genotype for the eight SNPs are in a Hardy-Weinberg equilibrium (P>0.05). After adjusting for age, age at menarche, and family history of breast cancer, we found three of the eight SNPs are significantly associated with overall breast cancer risk. The SNP rs12922061 is most strongly associated with breast cancer risk. The per-allele OR is 1.209 [18,19]. We found that the minor and major alleles of our study and of previous reports are switched, indicating that the difference of the associations is due to different minor alleles among different ethnicities. Additionally, environmental risk factors and breast cancer subtypes must also be considered as possible causes of the difference. No significant association was observed for rs2296067, rs4951011, and rs9693444. However, the statistical power for the five negative loci is <70%. Therefore, some of the null findings may be false negatives.
The results of stratified analysis are displayed in Table 3 to Table 6. Regardless of BMI, age at menarche, and the length of the breast feeding period, rs12922061 can increase the risk of breast cancer, and a significant association is observed in women of older age, higher education, premenopausal status, more years of menstruation, younger age at first live birth, more live births, without family history of breast cancer, ER positive, HR positive, and Luminal or HER-2 overexpression type. Subsequently, heterogeneity analysis show that there is heterogeneity between the breast subtypes (P=0.031). However, for Luminal or HER-2 overexpression type cases, rs12922061 have a more significant effect (P=0.001 and P=0.011, respectively). Meanwhile, among women of a younger age, with lower BMI, lower education, a younger age at menarche, postmenopausal status, older age at menopause, more menstruation years, older age at first www.impactjournals.com/oncotarget live birth, longer period of breast feeding, ER positive, HR positive, and with Luminal or HER-2 overexpression type, the association for rs2290203 is significant, regardless of the family history of breast cancer. Moreover, regardless of the family history of breast cancer, rs2981578 shows a protective effect in women of a younger age, with lower BMI, higher education, younger age at menarche, premenopausal status, fewer menstruation years, younger age at first live birth, fewer live births, shorter period of breast feeding, ER positive, HR positive, and Luminal type or HER-2 overexpression type. After heterogeneity analysis, we observed that there are heterogeneities between the two subgroups in age and BMI (P=0.036 and P=0.032, respectively). Nevertheless, rs2981578 presents a more significant protective effect in the group with a younger age and the group with a lower BMI (P=0.002 and P=4.53×10 -4 , respectively). Although, the other loci are not associated with overall breast cancer risk, there still are significant associations between these loci and different subgroups. The SNP rs10474352 shows a significant protective effect in the women of a younger age at menarche, those with a longer period of breast feeding, those without a family history of breast cancer, who are ER negative, and HR negative. In addition, rs10816625 is significantly associated with the women of an older age, a higher BMI, lower education, younger age at first live birth and who are ER positive and HR positive. Moreover, rs2296067, rs9693444, and rs4951011 are associated with women of an older age at first live birth, shorter period of breast feeding, and Basal-like type, respectively. There are heterogeneities between the subgroup of age at first live birth for rs2296067 (P=0.047), between the subgroup of breast subtype for rs9693444 (P=0.042), and between the subgroup of breast subtype for rs4951011 (P=0.042). Only rs4951011 shows a more significant association with the Basal-like type of breast cancer (P=0.005). No heterogeneities were observed in the rest of the subgroups. After Bonferroni correction (P value: 0.05/8/14=0.000446), there were no statistically significant was observed in stratified analysis.
As shown in Table 7, we selected the two loci that are significantly associated with overall breast cancer risk to calculate their combined effects (rs12922061-T and rs2290203-G). With no risk allele as the reference, the individual carrying more risk alleles would have a higher OR; 1-2 alleles, OR=1.238, 95%CI: 0.965 to 1.588; 3-4 alleles, OR=1.716, 95%CI: 1.268 to 2.324. It indicated that the combined effect of susceptibility loci would amplify the effect of contributing to the risk of breast cancer (P trend =3.97×10 -4 ).
In the gene-environment interaction analysis, we found that there are interactions between rs2981578 and the age, BMI, and a family history of breast cancer for reducing the risk of breast cancer (P=0.046, P=0.035, and P=0.007, respectively). Additionally, rs10816625 has an interaction with the age at the first live birth for the risk of breast cancer (P=0.017). Moreover, there is a significant interaction between rs9693444 and the length of the breast feeding period for breast cancer risk (P=0.032) ( Table 3 and Table 5).

DISCUSSION
In the present study, we confirmed that three of the eight SNPs, rs12922061 on 16q12.2, rs2290203 on 15q26.1, and rs2981578 on 10q26.13, are significantly associated with overall breast cancer risk in southeast Chinese women. In addition, rs12922061 and rs2290203 even passed the threshold for Bonferroni correction.
The SNP rs12922061, located in the first intron of LOC643714, was identified as a susceptibility variant of breast cancer in a Japanese GWAS [18]. In the present study, we first validated that rs12922061 has a significant association with breast cancer risk in southeast Chinese women, with an allelic OR of 1.209, consistent with the initial GWAS. According to Entrez Nucleotide, it is predicted that the LOC643714 locus codes for a small mRNA, which could hypothetically be translated into a 55 amino acids protein. However, the specific function of LOC643714 is still uncertain. A high expression level of LOC643714 was found in ER positive tumors [23].
In the present study, we observed that rs12922061 has a significant association with the ER positive subgroup, which corresponded with the expression of LOC643714.
We hypothesize that the SNP rs12922061 may participate in the regulation of the expression of LOC643714 in ER positive tumors; further analysis is warranted. The SNP rs2290203 was discovered in an East Asian GWAS, and has been confirmed in European populations [19]. In this study we found that this locus is associated with breast cancer risk in woman from southeast Chinese, and the effect is similar to the initial GWAS (OR=1.176; 95%CI: 1.048 to 1.320). This locus lies in intron 14 of the protein regulator of the cytokinesis 1 (PRC1) gene, which encodes the PRC1 protein and is suspected of being strictly regulated in a cancer-specific manner. The PRC1 protein is a mitotic spindle midzone-associated protein and is a substrate of a cyclin-dependent kinase [24]. The PRC1 gene is down-regulated by p53, whereas in p53 defective cells, it is over-expressed [25]. Also, the expression level of the PRC1 gene is significantly higher in breast tumor tissue, compared with adjacent normal tissue [19]. A study indicated that the higher expression level of the PRC1 gene could be a predictor of the poor prognosis for breast cancer patients [26]. However, there is no association reported between rs2290203 and the expression of the PRC1 gene, but it does relate to the PCCD1 gene (5,712 bp upstream of rs2290203) [19,27]. We still don't know the functions of the PCCD1 gene.
The SNP rs2981578 is located in intron 2 of the fibroblast growth factor receptor 2 (FGFR2) gene. In 2007 this gene was reported to be associated with breast cancer in women of European descent [28,29]. Subsequently, rs2981578 was discovered to be associated with breast cancer in a fine-mapping study of African American populations [30]. This locus has also been identified in an African population [31]. Interestingly, in our study, we found that rs2981578 is associated with a decreased breast cancer risk, which is contrary to the previous reports (OR=0.852; 95%CI: 0.759 to 0.956). This may be due to the diverse genetic background among different ethnicities combined with environmental risk factors and breast cancer subtypes. Additionally, fibroblast growth factor receptor type 2, encoded by the FGFR2 gene, is a receptor tyrosine kinase, which is an essential part of the signaling pathway of the growth and differentiation for cells in breast tissue [32]. Meanwhile, rs2981578 is reported to cause differential expression of the common and minor haplotypes of the FGFR2 gene [33].
In postmenopausal women the endogenous estrogen is mainly provided by adipose tissue [34], and it is well demonstrated that estrogen has a significant linear correlation with breast cancer in these women [35]. It was also reported that some polymorphisms in FGFR2 were associated with breast cancer risk in postmenopausal women [36]. From our results, we believe that rs2981578 is correlated with reducing breast cancer risk in women of a younger age, lower BMI, younger age at menarche, premenopausal status, and fewer menstruation years; particularly with ER positive, HR positive, and Luminal type. The gene-environment interaction analysis also showed that rs2981578 is interacting with BMI. Considering the above results, we speculated that rs2981578 may play an important role in regulating pathways which are related to estrogen. Nevertheless, further functional research is still needed to confirm the relationship of the susceptibility locus and breast cancer.
For rs2296067, rs4951011, and rs9693444, no significant ORs were observed in the present study. However, there were still significant associations between these loci and different subgroups. Moreover, it has been previously reported that rs9693444 is associated with breast cancer risk in Chinese women (P=6.44×10 -4 ) [37]. We believe that the reason for failing to confirm their previously established role in breast cancer risk is that there are difference LD patterns in difference populations. In addition, environmental risk factors and breast cancer subtypes should also be taken into consideration.
Several limitations need to be taken into consideration in this study. Above all, the sample size in our study is still limiting and that will affect the  analyses between rs12922061, rs2290203, rs2981578, and rs10474352 and  sensitivity and accuracy of the results. Particularly in the stratified analysis, the incidences of some epidemiological characteristics were not numerous enough and cannot be used to efficiently analyze our data. In addition, as our study was a hospital based case-control study, there was a certain selection bias compared with the general population. The self-reported life-style factors of participation might also have a recall bias. Therefore, in the next few years, we will expand the sample size and perform a large series study to improve the sensitivity and accuracy of the study, aiming to reduce these biases, and better understand the relationship between breast cancer risk and these susceptibility loci.
In conclusion, our study is the first study to validate that rs12922061 on 16q12.2, rs2290203 on 15q26.1, and rs2981578 on 10q26.13 are associated with overall breast cancer risk in southeast Chinese women. The SNPs rs12922061 and rs2290203 even passed the threshold for Bonferroni correction. In addition, the three other SNPs (rs10474352 on 5q14.3, rs10816625 on 9q31.2, and rs4951011 on 1q32.1) are found to have a significant association within different subgroups. Moreover, geneenvironment interaction analysis revealed that there are interactions between rs2981578 and the age, BMI, and family history of breast cancer, between rs10816625 and the age at the first live birth, and between rs9693444 and the length of the breast feeding period. These findings may provide new insight into the association between genetic susceptibility and the fine classifications of breast cancer, which would guide clinical therapy in the future. Finally, it is certain that further large series studies and functional research are still warranted.

Study participants
All study participants were genetically unrelated Chinese females from Fujian province. There are 1,166 breast cancer patients and 1,258 healthy controls in this hospital-based case-control study. Patients were randomly enrolled from Fujian Medical University Union Hospital, Fujian, China, between January 2005 and December 2015, and each case was histopathologically confirmed by at least two oncologists. Estrogen receptor (ER) status, progesterone receptor (PR) status and human epidermal growth factor receptor 2 (HER-2) status of patients were evaluated by immunohistochemical analysis. It was considered to be a positive result when the percentage of stained cancer cell nucleus were ≥10%. The rest of the clinicopathological data was obtained from medical records. Healthy controls were selected from people who were undergoing routine health examinations in the same hospital during the corresponding period. Controls were age-matched (±5 years) healthy individuals without breast
Genomic DNA was extracted from leukocytes from EDTA anti-coagulated whole blood using the Whole-Blood DNA Extraction Kit (Bioteke, Beijing, China), according to the manufacturer's protocol. The concentration of the DNA samples was quantified with an Epoch Microplate Spectrophotometer (BioTek Instruments, Winooski, VT, USA), and quality of DNA samples was determined by agarose gel electrophoresis. Qualified DNA samples were genotyped by SNPscan, which is a high-throughput SNPs genotyping technology (Genesky Biotechnologies Inc., Shanghai, China). Finally, the raw date was analyzed by the GeneMapper 4.0 Software (Applied Biosystems, Foster City, CA). A five percent sample of both the cases and controls were randomly selected as blinded duplicates for quality assessment purposes and 100% agreement was obtained. Due to DNA quality or quantity, genotyping of ten cases (0.86%) and two controls (0.16%) failed. The call rate for per-SNP was 99.5%. After removing all data from these 12 participants, there were 1,156 cases and 1,256 controls in the final analyses.

Statistical analyses
Differences in demographic characteristics, risk factors and frequencies of alleles and genotypes between cases and controls were evaluated by t-test, for continuous variables, or χ 2 tests, for categorical variables. Genotype data of control samples were evaluated for consistency with the Hardy-Weinberg equilibrium (HWE) by a goodness-of-fit χ 2 test. The associations between SNPs and the risk of breast cancer were assessed by computing odds ratios (ORs) and 95% confidence interval (CIs) using conditional logistic regression models (co-dominant model and additive model) with adjustment for potential confounders such as age, age at menarche, and family history of breast cancer. The power of the study was carried out by using the Quanto, version 1.2.4, with the disease risk in the Chinese population was 268.6 per 100000. The date was then stratified into fourteen subgroups (Table 3 to Table 6). Subsequently, we used the χ 2 -based Q-test to estimate the heterogeneity of associations within subgroup. Moreover, we categorized all cases and controls into five groups based on the number of risk alleles they carried (from 0 to 4, with 0 risk alleles used as the reference), and assessed the cumulative effect of multiple genetic risk variants by calculating OR and 95%CI with adjustment for potential confounders as described above. Furthermore, the genetic-environment multiplicative interaction analysis was applied to explore the interactions between susceptibility loci and traditional risk factors, and it was performed by a multinomial logistic regression model. All of the statistical analyses were two-sided and a P value equal to or less than 0.05 was taken as the significance level. The Bonferroni correction was adopted to correct multiple comparisons. Analyses were carried out by using the Statistical Package for the Social Sciences (SPSS, version 18.0)