Genetic variant rs4072037 of MUC1 and gastric cancer risk in an Eastern Chinese population

Published data on the association between the MUC1 rs4072037A > G polymorphism and gastric cancer (GCa) risk were inconclusive. To derive a more precise estimation of the association, we conducted a large GCa study of 1,124 cases and 1,192 controls to confirm this association in an Eastern Chinese population. Our results showed that the G allele was strongly associated with a decreased GCa risk in the study population [GG vs. AA, odds ratio (OR) = 0.47, 95% confidence interval (CI) = 0.31–0.73; AG/GG vs. AA, OR = 0.82, 95% CI = 0.68–0.99; GG vs. AA/AG, OR = 0.48, 95% CI = 0.32–0.74]. These associations remained significant in subgroups of age, tumor site, drinking and smoking status. Moreover, this association was supported by an additional meta-analysis of published studies. In summary, these results suggest that the MUC1 rs4072037G allele may be a low-penetrating protection factor for GCa risk in Chinese populations.


INTRODUCTION
Gastric cancer (GCa) is one of the most common cancers and one of the leading causes of cancer-related deaths in the world. There were 951,600 new GCa cases and 723,100 deaths in 2012, accounting for 8% of the cancer cases and 10% of cancer deaths, respectively [1]. Therefore, GCa is a major public health problem whose mechanism of carcinogenesis is still not fully understood. It is well-known that environmental factors and low-penetrance susceptibility genes may be important in the etiology of GCa. For example, a higher rate of Helicobacter pylori (HP) infection (70-90%) in developing countries than in developed countries (25-50%) may have increased GCa risk in developing countries [2,3]. However, not all HP carriers will develop GCa, suggesting that other factors are also important in the etiology, such as tobacco smoking, alcohol use and dietary habits [4]. In addition, genetic factors for the GCa risk are important as well, because the success in identifying at-risk populations by associations between genetic variants and GCa risk is encouraging [5][6][7][8]; however, it is necessary to confirm those genetic factors that have been reported to be important in the etiology of GCa.
The Mucin 1 (MUC1) gene is a member of the mucin family encoding membrane-bound glycoproteins. The mucin 1 protein protects gastric epithelial cells from a variety of external insults that potentially cause inflammation, leading to carcinogenesis. Although MUC1 has 712 SNPs as reported to the dbSNP database (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?sh owRare=on&chooseRs=all&go=Go&locusId=4582), only 11 SNPs ( Figure 1A.) have actually been confirmed in the HapMap database, of which only rs4072037 has a MAF > 0.05, representing a block of 4 SNPs ( Figure 1B). The rs4072037 A > G polymorphism is located in the 5′ untranslated region (UTR) of the second exon of MUC1 at chromosome 1q22, alters transcriptional regulation, and determines splice variants in MUC1 [9]. Several studies reported an association between the MUC1 rs4072037 A > G polymorphism and GCa risk [10][11][12][13][14][15][16], but the results were inconclusive, especially different by ethnic group and primary tumor site. To further confirm this reported association, we conducted a replication study in an Eastern Chinese population with a relatively larger sample with subgroup analysis. Furthermore, a meta-analysis was also performed to further validate the association.

RESULTS
The characteristics of participants included in this hospital-based case-control study were described elsewhere [17], but one sample in cases and four samples in controls failed to be genotyped in the current study. Thus, the final analysis included 1,124 GCa cases and 1,192 cancerfree controls. Participants were well matched by age and sex with more smokers and drinkers in the controls, but these variables were further adjusted in the following multivariate analysis. Table 1 lists allele frequencies of the rs4072037 A > G SNP in cases and controls and the estimated association between this SNP and GCa risk. Overall, the G allele was associated with a decreased GC risk in the study population [GG vs. AA, OR = 0.47, 95% CI = 0.31-0.73; AG/GG vs. AA, OR = 0.82, 95% CI = 0.68-0.99; GG vs. AA/AG, OR = 0.48, 95% CI = 0.32-0.74]. In the stratified analysis (Table 2), these associations remained significant in subgroups of age, tumor site, drinking and smoking status.
Then, we performed a mini meta-analysis, including the present study, of eight studies with 7312 cases and 6112 controls [10][11][12][13][14][15][16]. The pooled data indicated that the G allele was strongly associated with a decreased GCa risk (Table 3:  Figure 3) without significant publication bias. However, significant heterogeneities across studies were present in these genetic models. Thus, we performed a sensitivity analysis to assess the effects of each study on the pooled results. The pooled ORs were not affected by omitting each of studies at a time (data not shown), which suggests that the overall results are robust.

DISCUSSION
In addition to environmental and lifestyle factors for GCa risk, genetic factors are also important in identifying at-risk populations for primary prevention of GCa. The results presented here consistently showed that the G allele of the MUC1 rs4072037 A > G SNP was associated with a decreased risk of GCa. This SNP located in the 5′ UTR of the second exon is predicted to have an effect on the splicing of the primary transcripts, which in turn determines the type of variants. Studies suggest that the G allele results in the expression of variant 2, while the A allele results in the expression of variant 3 [9,24]. The structural difference between these two variants leads to insertion/deletion of nine amino acids encoded by the second exon, which are involved in the N-terminal signal peptide. This differential signal peptide may lead to a different function of the encoded variant protein. Also, the A allele reduces the transcriptional activity, which may result in decreased MUC1 expression [9]. Furthermore,   MUC1 can block the adhesion of HP blood group antigenbinding adhesion and sialic acid-binding adhesion to the gastric mucosa, which in turn limits the HP colonization [25,26], and MUC1 acts as a barrier against exogenous insults in normal epithelial cells [25]. Therefore, low expression of MUC1 may cause a reduction in its barrier function in the stomach and subsequently increases GCa susceptibility. Such a hypothesis needs to be tested in additional mechanistic studies.
There are some limitations in the present study. First, although age, sex, smoking and drinking status, and tumor site were taken into consideration for subgroup analysis, other important risk factors, such as diet and HP infection, were missing in the study, which might also contribute to the etiology of GCa. Second, new classification of GCa tumor types, which was not available for the patients diagnosed years ago, is also important and may have a different genetic basis in the etiology. Third, the sample size of the cases in subgroups was largely reduced in the stratification analysis, which may lead to limited statistical power in subsequent analysis.
In conclusion, the present study confirmed that the G allele of the MUC1 rs4072037 SNP was a lowpenetrating protection factor for GCa risk. However, future studies should incorporate diet, HP infection status and Lauren classification to better understand the associations between the MUC1 rs4072037 SNP and GCa susceptibility.

Study subjects
This study included patients who were recruited from our ongoing molecular epidemiology study of GCa, and the cases and controls were described previously [17][18][19]. Briefly, 1,125 unrelated ethnic Han Chinese patients with newly diagnosed and histopathologically confirmed primary gastric cardia adenocarcinoma and nongastric cardia adenocarcinoma (NGCA) were recruited from Fudan University Shanghai Cancer Center (FUSCC) in Eastern China between January 2009 and March 2011. Patients other than histopathologically confirmed primary GCa were excluded. In addition, 1,196 age and sex-matched cancer-free ethnic Han Chinese controls were recruited from the Taizhou Longitudinal (TZL) study conducted at the same time period in Eastern China as described previously [20]. Blood samples from both GCa patients and cancer-free controls were provided by the tissue bank of FUSCC and the TZL study, respectively. All participants had signed a written informed consent for donating their biological samples to the tissue bank for scientific research. Demographic data and environmental exposure history of each participants were collected. The overall response rate was approximately 91% for cases and 90% for controls. This research protocol was approved by the FUSCC institutional review board.

SNP genotyping
According to a relevant protocol, we extracted genomic DNA from peripheral blood. The rs4072037 SNP was genotyped by the TaqMan assay with ABI7900HT real-time PCR system (Applied Biosystems) as reported previously [17]. Participants' status was unrevealed in the genotyping process. As recommend by the company, four negative controls (without DNA templates) and two duplicated samples were included in each 384-plate for the quality control. The assays were repeated for 5% of the samples, and the results were 100% concordant.

Statistical methods
The χ 2 test was used to assess differences in the distributions of demographic characteristics between cases and controls. The association between SNP and GCa risk was assessed by odds ratio (OR) and 95% confidence intervals (CIs) in heterozygous (AG vs AA), homozygous (GG vs AA), dominant (AG + GG vs AA), recessive (GG vs AG + AA), and additive (G vs A) models, respectively. OR values were calculated by both univariate and multivariate logistic regression models. Moreover, logistic regression tests for each genetic model were adjusted for age, sex, drinking and smoking status. Furthermore, the association between the MUC1 rs4072037 SNP and GCa risk was also stratified by age, sex, smoking or drinking status, and primary tumor site. All statistical process above was achieved by SAS software (version 9.1; SAS Institute, Cary, NC) To validate the results of the present study, we performed a mini meta-analysis with studies searched from Medline, PubMed and Embase. After using the search terms and inclusion and exclusion criteria as described in previous studies [21,22], all primary reports were carefully reviewed, and the relevant references in these papers were also searched and reviewed by two independent authors. Then, data were retrieved from the reported studies and pooled crude ORs for heterozygous, homozygous, dominant, and recessive models were calculated. Heterogeneity between studies was estimated by Chi-square-based Q test. Pooled ORs were calculated by a fixed-or random-effects model, depending on the heterogeneity between searched studies [23]. To validate the stability of the pooled results and to identify the sources of heterogeneity, the leave-one-out sensitive analysis was performed. Publication bias was shown by the funnel plot, in which the asymmetry was estimated by the Egger's liner regression test, where the statistically significant publication bias was tested, when P < 0.05 determined by the t test as suggested by Egger. All statistical process was achieved by STATA version 10.0 (Stata Corporation, College Station, TX).