The rs1550117 A>G variant in DNMT3A gene promoter significantly increases non-small cell lung cancer susceptibility in a Han Chinese population

In this study, we conducted a case-control study to explore the association between rs1550117 A>G variant of DNMT3A gene promoter and non-small cell lung cancer (NSCLC) susceptibility in a Han Chinese population. The genotyping of rs1550117 A>G variant was performed by polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) and confirmed by sequencing. Allele G of rs1550117 was associated with an increased risk of NSCLC. Moreover, individuals carrying the GG genotypes had a higher risk to develop NSCLC than the AA and GA genotype carriers. Further stratified analysis showed that rs1550117 A>G was significantly related to age (> 60 years), male, smoking and drinking. In vivo detection of DNMT3A mRNA levels in NSCLC tissues and in vitro luciferase assays consistently showed that the allele G significantly decreased DNMT3A transcription. Additional functional analysis revealed that the increased binding affinity of transcription repressor SP1, which was associated with allele G of rs1550117, led to the significant decreased expression of DNMT3A. Collectively, our results propose a suppression role of DNMT3A in NSCLC development and emphasize the dual roles of DNMT3A in tumorigenesis.


INTRODUCTION
Lung cancer has been the leading cancer diagnosed and cause of cancer death for many years in China, especially in Hubei province with a high incidence and mortality rate [1,2]. Non-small cell lung cancer (NSCLC) is the most common type of lung cancer, accounting for 80%-90% [3]. Nowadays, surgical resection has been an efficient curative treatment for lung cancer patients in earlier stage [2]. Unfortunately, most lung patients are diagnosed in stage IIIB/IV, during which period the tumor are unresectable anymore. In view of this, the discovery of novelty strategy for risk prediction and early diagnosis of lung cancer is urgently needed.
DNA methylation is one of the best elucidated epigenetic modifications and has an important role in cancer development [4]. Of note, aberrant DNA methylation patterns have been found in most human cancers, including NSCLC [5]. As a de novo DNA methyltransferase, DNMT3A contributes to the establishment of genomic DNA methylation patterns [6], indicating that abnormal DNMT3A expression may be responsible for the aberrant DNA methylation in carcinogenesis. On the other side, certain genetic variants in the 5′-and 3′-UTR (untranslated region) of genes were recently proved to influence promoter activity (gene expression) and messenger RNA (mRNA) conformation (stability) [7]. Therefore, identification of functional

Research Paper
Oncotarget 23471 www.impactjournals.com/oncotarget variants in DNMT3A gene and analysis of their effects may lead to a better understanding of their impact on DNMT3A gene expression and individual susceptibility to cancer.
Recently, a number of studies have investigated the association between DNMT3A variants and cancer risk [8][9][10][11][12][13][14][15], and proposed a putative functional variant (rs1550117) in the 448bp upstream of the transcription start site of DNMT3A gene promoter [10]. However, the results from previous studies remain conflicting rather than conclusive [16,17]. This discrepancy may be largely attributed to the insufficient sample sizes and different ethnic populations. Moreover, to the best of our knowledge, the association of DNMT3A rs1550117 with NSCLC susceptibility was still not elucidated. To address these issues, a case-control study was conducted to estimate the association between DNMT3A rs1550117 A>G variant and NSCLC risk in Hubei Han Chinese population with larger sample size.

Characteristics of study subjects
The distributions of age, gender, smoking status and alcohol status did not differ significantly between NSCLC patients and normal controls, suggesting that matching based on these four variables was adequate (Table 1). Moreover, the NSCLC patients and normal controls had a similar distribution of mean age: 60.1 years (range: 23~81 years) and 58.6 years (range: 27~85 years), respectively.

The DNMT3A rs1550117 A>G variant significantly increases the risk of NSCLC
The genotype frequencies of rs1550117 A>G variant were in agreement with Hardy-Weinberg equilibrium (HWE) in normal controls (p = 0.537), suggesting the enrolled control subjects were representative. In Table 2, it was presented that the genotype distributions of rs1550117 were significantly different between the NSCLC patients and normal controls (p = 0.001). Moreover, the G allele frequency was significantly higher among NSCLC patients than normal controls (p = 0.001, OR = 1.36, 95%CI = 1.18-1.71), indicating allele G was associated with an increased risk of NSCLC. Similarly, we also found a significant association between GG genotype of rs1550117 A>G variant and increased risk of NSCLC in three genetic models: GG vs. GA (p = 0.010, OR = 1.33, 95%CI = 1.06-1.71), GG vs. AA (p = 0.032, OR = 1.95, 95%CI = 1.03-3.60) and GG vs. GA+AA (p = 0.002, OR = 1.39, 95%CI = 1.15-1.80). These results indicated that the DNMT3A 5′-regulatory variant rs1550117 A>G significantly increases the risk of NSCLC. In addition, there were no significant different frequencies of DNMT3A rs1550117 in NSCLC patients at age range ≤ 60 years vs. > 60 years (p genotype = 0.768, p allele = 0.603), male vs. female (p genotype = 0.656, p allele = 0.607), smoking vs. non-smoking (p genotype = 0.347, p allele = 0.224), and drinking vs. non-drinking (p genotype = 0.482, p allele = 0.811) ( Table 3).
The susceptibility to NSCLC with rs1550117 A>G variant is strongly related with age (> 60 years), male, smoking and drinking Age, sex, smoking and drinking have been regarded as important factors in lung carcinogenesis [1,2]. In this study, a stratified analysis of rs1550117 according to age, sex, smoking status and alcohol status was performed. The results showed that these factors would affect the association between rs1550117 A>G variant and NSCLC risk. Specifically, the individuals carrying G allele and GG genotype exhibited significantly increased NSCLC risk compared with individuals carrying A allele and GA/AA genotypes in > 60 years, male, smoking and drinking subgroups, while not in ≤ 60 years, female, non-smoking and non-drinking subgroups (Table 4). Moreover, all genotype frequencies were in agreement with the HWE among normal controls in each subgroup (p > 0.05). These results suggested that the rs1550117 A>G variant confers an increased risk to NSCLC, particularly in over than 60 years old males who smoke and drink.

The rs1550117 A>G variant decreases DNMT3A transcriptional activity in NSCLC
Although previous study demonstrated that the rs1550117 A>G variant affected the transcriptional activity of DNMT3A promoter in Chinese hamster ovary cells [10], it was wondered that whether this underlying mechanism was also applicable in NSCLC. We performed dual luciferase assays to find that the plasmid containing the G allele showed a significantly lower luciferase activity than the A allele with a 48% decrease in A549 cells, a 45% decrease in PC14 cells and a 50% decrease in Hek293 cells ( Figure 1A). It was recently identified a novel short isoform-DNMT3A2 protein (about 82 kDa). Transcription of this isoform is initiated from a different promoter in the sixth intron of the DNMT3A gene, which encodes the full length isoform-DNMT3A protein (about 120 kDa). Therefore, the rs1550117 A>G variant in DNMT3A gene promoter may be specifically responsible for the expression of DNMT3A but not DNMT3A2. In this study, the DNMT3A mRNA levels in 56 NSCLC tissue samples with different rs1550117 A>G genotypes were also tested, and DNMT3A was significantly upregulated in GA samples (1.6-fold) and AA samples (3.1-fold) than in GG samples ( Figure 1B). These results consistently suggested that the rs1550117 A>G variant decreases DNMT3A transcriptional activity in NSCLC.

The rs1550117 A>G variant increases the transcription repressor SP1 binding affinity
Alibaba2 software (http://gene-regulation.com/pub/ programs/alibaba2/index.html?) was used to predicted that the rs1550117 A>G variant creates the transcription factor (TF) binding sites for SP1 and GR ( Figure 2A). However, the chromatin immunoprecipitation (ChIP) sequencing results in the ChIPBase v2.0 database (http:// rna.sysu.edu.cn/chipbase/) and previous investigation results collectively suggested that SP1 but not GR could bind to the DNMT3A promoter region [18,19]. In this study, through ChIP assays, it was demonstrated that the DNMT3A promoter fragment with −448 site was occupied by SP1 ( Figure 2B). Moreover, the surface plasma resonance (SPR) analysis revealed that, compared with the A allele oligonucleotide probe, the G allele oligonucleotide probe had higher binding affinity to Hek293 nuclear proteins or purified recombinant SP1 protein ( Figure 2C). The co-transfection experiment showed that the ectopic SP1 expression generally decreased the luciferase activities of the plasmids containing DNMT3A rs1550117 A allele or G allele, and the rs1550117 variant amplified the promoter function disparity ( Figure 2D). Taken together, SP1 acts as a transcription repressor of DNMT3A gene, and the rs1550117 A>G increases the binding affinity of SP1 to the DNMT3A promoter, which finally contributes to the decreased expression of DNMT3A.

DISCUSSION
DNMT3A was previously suggested to promote tumorigenesis [20]. However, the underlying molecular mechanism remains elusive. One possibility is that overexpressed DNMT3A may lead to the silencing of certain tumor suppressor genes (TSGs) in tumorigenesis. Indeed, it was showed that knockdown of DNMT3A would upregulate the expression of some immune response genes in melanoma [21]. Similarly, depletion of DNMT3A restored the expression of various TSGs (including PTEN) that participate   1 Numbers in parentheses, percentage. 2 The p value was calculated using two-sided χ 2 test. 3 Adjusted for age, gender smoking status and alcohol status.
On the other side, numerous genetic association studies had investigated the association between DNMT3A rs1550117 A>G variant and cancer risk in gastric cancer, urothelial cancer, esophageal cancer, ovarian cancer, breast cancer, colorectal cancer and hepatocellular cancer, but the results were inconsistent [8][9][10][11][12][13][14][15]. In view of this, Zhang et al. conducted a comprehensive meta-analysis and found that the rs1550117 A allele significantly increased the cancer risk [16]. The interesting question is why the rs1550117 variant could affect individual susceptibility to cancer. In this study, it was firstly revealed that SP1 acts as a transcription repressor of DNMT3A gene, and SP1 possesses a higher binding affinity to rs1550117 G allele than to A allele. Therefore, compared with the G allele, the rs1550117 A allele would decrease SP1 binding affinity, and then leads to the increased expression of DNMT3A, which finally increases cancer susceptibility.
Although these evidences clearly assigned an oncogenic role to DNMT3A in tumorigenesis, the recently discovered DNMT3A mutations in acute myeloid leukemia (AML) suggested that its role in cancer is more complex than previously thought [23][24][25][26][27]. Further functional analysis illustrated that the DNMT3A mutations would affect global DNA methylation pattern and expression of HOXA genes [28], which act an important role in AML tumorigenesis [29]. In addition, two latest publications showed that Dnmt3a deletion in a mouse lung cancer model promoted tumor progression but not initiation [30,31], and Dnmt3a maintained the methylation-dependent repression (MDR) of specific oncogenes which were involved in key steps of lung tumor progression, including angiogenesis, cell adhesion and movement [30]. These results suggested that Dnmt3a may be a tumor suppressor gene and a critical    Oncotarget 23475 www.impactjournals.com/oncotarget determinant of lung tumorigenesis [30]. However, it leaves a question that whether DNMT3A acts the same in human lung cancer as in mouse lung cancer model.
In this study, the putative functional rs1550117 A>G variant in DNMT3A gene promoter was selected to evaluate the association between DNMT3A gene and NSCLC susceptibility in a Han Chinese population. Interestingly, we found that individuals carrying rs1550117 GG genotype exhibited significantly increased NSCLC susceptibility compared with individuals carrying GA and AA genotypes, suggesting that G allele was a harmful effect potentially exhibited by rs1550117 variant. Meanwhile, the GG genotype was associated with lower DNMT3A mRNA expression in NSCLC tissues than GA and AA genotypes. Therefore, DNMT3A may participate in the suppression of NSCLC in human, which was consistent with the finding from previous study in mouse model with lung cancer.
The present findings firstly demonstrated that DNMT3A rs1550117 A>G variant is associated with a significantly increased risk of NSCLC in a Han Chinese population. Meanwhile, it was also firstly revealed that the DNMT3A rs1550117 A>G variant decreases DNMT3A expression by increased the binding affinity of transcriptional suppressor SP1. Moreover, the stratified analysis implied that over than 60 years old males who smoke and drink are more susceptible to NSCLC with rs1550117 A>G. Our study proposes an unintended role of DNMT3A in suppression of NSCLC development and further emphasizes the dual roles of DNMT3A in tumorigenesis. The results reported here may initiate a novel strategy for the prediction and prevention of NSCLC. However, further confirmatory studies should be undertaken in other ethnic populations because the present observations involved only Chinese Han population.

MATERIALS AND METHODS Subjects
A total of 998 normal controls and 600 patients with histologically confirmed NSCLC were recruited in the current study. All subjects were Han Chinese living in Hubei province. Nowadays, more and more Chinese are inclined to have a physical examination every year. The normal controls were selected from cancer-free individuals who visited Wuhan Xinzhou District People's Hospital for annual physical examinations or who volunteered to participate in the epidemiology survey during the same period. It was required that the normal controls passed all annual physical examinations in the latest three years. The patients were confirmed histopathologically and volunteers recruited from the same hospital. This study was approved by the Ethical Committees of Wuhan Xinzhou District People's Hospital and Wuhan University of Technology, and written informed consent for the genetics analysis was obtained from all subjects or their guardians DNMT3A rs1550117 A>G variant genotyping Samples were collected into blood vacuum tubes containing ethylenediaminetetra-acetic acid (EDTA) and stored at 4°C. Genomic DNA was extracted within 1 week of sample collection by proteinase K digestion as previously described [32]. The transition of A>G of DNMT3A rs1550117 variant creates a TspRI restriction site, PCR-RFLP was used to detect this A-G transition in the promoter of DNMT3A at -448 A>G (GenBank accession No.NT_022184.14:g.4381840). The PCR reaction was performed in a total of 15μl containing 50ng genomic DNA, 1.5 μl 10× Taq Buffer (Mg 2+ Plus), 0.2 μl 10 mM dNTP, 1 μl 1 mM Primer (forward: 5′-ACACACCGC CCTCACCCCTT-3′; reverse: 5′-TCCAGCAATCCC TGCCCACA-3′), and 0.5U Taq polymerase (Takara Biotechnology Co. Ltd, Dalian, China). PCR cycle conditions consisted of an initial melting step of 94°C for 5 min, followed by 36 cycles of 94°C for 30s, 63°C for 30 s, 72°C for 30 s and a final extension step of 72°C for 10 min. The 358bp fragment was then digested with TspRI (Takara Biotechnology Co. Ltd, Dalian, China) overnight at 37°C, the digested products were separated on a 2.0% agarose gel and the RFLP bands visualized under ultraviolet light with Gel-Red staining. The wildtype G allele consists of a TspRI restriction site that results in three bands (155 bp, 121 bp and 82 bp), while the A allele produces two bands (276 bp and 82 bp). For quality control, genotyping analysis was performed blind, with respect to case/control status, and repeated twice for all subjects. The results of genotyping were 100% concordant. In order to confirm the genotyping results, 20% randomly selected PCR-amplified DNA samples were examined by DNA sequencing, and the results were also 100% concordant.

Plasmid constructs, host cell culture and dual luciferase assays
To construct the DNMT3A reporter plasmid, we amplified a 588bp DNMT3A promoter fragment from -684bp to -97bp by PCR from genomic DNA, which contains the A or G allele of rs1550117 A>G variant. It was notable that the amplified fragment contains the putative promoter sequence of DNMT3A gene (-312bp ~ -262bp: TCAGCACTTCAGCTATA TCACAGTGCCCTGAGCTCCCTGACTGGCACAGG), which was analyzed with BDGP online software (http://www.fruitfly.org/seq_tools/promoter.html).
The PCR products were then subcloned into the NheI and HindIII restriction sites of the pGL3-Basic vector (Promega, Madison, WI, USA). We verified all of the recombinant clones by DNA sequencing. The primers utilized were: 5′-CTAGCTAGCTCAGCACTGGGGCTG-3′ (forward) and 5′-CCCAAGCTTCTGTGACGCTAAAA-3′ (reverse). 2 NSCLC cells (A549 and PC14) and the human embryonic www.impactjournals.com/oncotarget kidney 293 (Hek293) cells (1 × 10 5 ) were seeded in 24-well culture plates. After 24 h of culture, the host cells were cotransfected with the pGL3-Basic (blank control), pGL3-A allele or pGL3-G allele plasmids and the pRL-TK plasmid as a normalization control, half of the cells were additionally co-transfected with the pcDNA3.1-SP1 expression plasmid or equivalent amounts of pcDNA3.1-basic vector using Lipofectamine 2000 (Invitrogen, Carlsbad, CA, USA), according to the manufacturer's instructions. After an additional 24 h of culture, the transfected cells were assayed for luciferase activity using the Dual-Luciferase Reporter Assay System (Promega, Madison, WI, USA). Three independent transfection experiments were performed, and each luciferase assay was carried out in triplicate.

Quantitative real-time RT-PCR
56 NSCLC tissue samples were obtained from NSCLC patients who had undergone surgical resection at the Wuhan Xinzhou District People's Hospital (Wuhan, Hubei Province, China). Total RNA was extracted from the human NSCLC samples preserved in RNAlater (Qiagen, Valencia, CA, USA) and converted to cDNA using random hexamers, oligo (dT) primers and Moloney murine leukemia virus reverse transcriptase (Takara Biotechnology Co. Ltd, Dalian, China). The DNMT3A mRNA levels were measured by quantitative real-time RT-PCR using the Applied Biosystems 7900HT Fast Real-Time PCR System (Applied Biosystems, Foster City, CA, USA), and GAPDH was used as an internal reference gene. Each reaction was performed in triplicate. The primers used for DNMT3A amplification were 5′-ACCCAGCGCAGAAGCAG-3′ (forward) and 5′-A TAGATCCCGGTGTTGAGCC-3′ (reverse), the primers for GAPDH were 5′-TGCACCACCAACTGCTTAGC-3′ (forward) and 5′-GGCATGGACTGTGGTCATGAG-3′ (reverse). Relative quantification of DNMT3A mRNA was calculated by using the 2-ΔΔCT method, and each assay was done in triplicate.

SPR analysis
The SPR analysis was carried out using the ProteOn XPR36 Protein Interaction Array System (Bio-Rad, Hercules, CA, USA). Biotinylated duplex oligonucleotide probes representing the rs1550117 A or G alleles (sequences were rs1550117 [A] Forward: 5′-CAGCCACTCACTATGTGCTCATCTC-3′, [A] Reverse: 5′-GAGATGAGCACATAGTGAGTGGCTG-3′; rs1550117 [G] Forward: 5′-CAGCCACTCACTGTGTGCTCATCTC -3′, [G] Reverse: 5′-GAGATGAGCACACAGTGAGTGG CTG-3′) were immobilized on the streptavidin-modified surfaces of the different channels from DNA solutions at a fixed concentration (400 nM) to ensure identical surface density. Nuclear extracts from Hek293 cells or purified SP1 recombinant protein were diluted in PBST (10 mM Na-phosphate, 150 mM NaCl and 0.005% Tween 20, pH 7.4) to different concentrations and then pre-incubated with non-specific DNA for 15 min before passing across the DNA immobilized surface. The results presented in the sensorgram were converted by BIA evaluation software. Each experiment was repeated three times.

ChIP assays
The ChIP assays were performed using the EZ ChIP Kit (Upstate Lake Placid, NY, USA). First, Hek293 cells and two NSCLC tissue samples were crosslinked by 1% formaldehyde for 10 min. DNA was then sonicated into fragments with a mean length of 200 to 1000 bp. The sheared chromatin was immunoprecipitated by incubation with antibodies against SP1 or non-specific rabbit IgG (Santa Cruz Biotechnology, Santa Cruz, CA) overnight at 4°C. The DNA fragments were identified using PCR, and the primers utilized were: 5′-CACCGCCCTCACCCCATCA-3′ (forward) and 5′-TGCCCAGCCGCAAGTCCTA-3′ (reverse).

Statistical analysis
The χ 2 test was used to compare the difference in age, gender, smoking status and alcohol status between NSCLC patients and normal controls. Genotypic frequency of rs1550117 A>G variant was tested for departure from Hardy-Weinberg equilibrium (HWE) using the χ 2 test. To evaluate the association between rs1550117 A>G variant and NSCLC risk, ORs and 95% confidence intervals (CIs) were calculated by unconditional logistic regression analysis with adjustments for age, sex smoking status and alcohol status. Other differences were evaluated using the Student's t-test. Data were expressed as means and standard deviations (SD) from at least three independent experiments. All statistical tests were two-tailed with P < 0.05 set as the significance level and were performed using SPSS 15.0 software (SPSS, Chicago, IL, USA).