The breast cancer susceptibility-related polymorphisms at the TOX3/LOC643714 locus associated with lung cancer risk in a Han Chinese population

It has been well established that besides environmental factors, genetic factors are also associated with lung cancer risk. However, to date, the prior identified genetic variants and loci only explain a small fraction of the familial risk of lung cancer. Hence it is vital to investigate the remaining missing heritability to understand the development and process of lung cancer. In the study, to test our hypothesis that the previously identified breast cancer risk-associated genetic polymorphisms at the TOX3/LOC643714 locus might contribute to lung cancer risk, 16 SNPs at the TOX3/LOC643714 locus were evaluated in a Han Chinese population based on a case-control study. Pearson's chi-square test or Fisher's exact test revealed that rs9933638, rs12443621, and rs3104746 were significantly associated with lung cancer risk (P < 0.001, P < 0.001, and P = 0.005, respectively). Logistic regression analyses displayed that lung cancer risk of individuals with rs9933638(GG+GA) were 1.89 times higher than that of rs9933638AA carriers (OR = 1.893, 95% CI = 1.308-2.741, P = 0.001). Similar findings were manifested for rs12443621 (OR = 1.824, 95% CI = 1.272-2.616, P = 0.001, rs12443621(GG+GA) carriers vs. rs12443621AA carriers) and rs3104746 (OR = 1.665, 95% CI = 1.243-2.230, P = 0.001, rs3104746TT carriers vs. rs3104746(TA+AA) carriers). The study discovered for the first time that three SNPs (rs9933638, rs12443621, and rs3104746) at the TOX3/LOC643714 locus contributed to lung cancer risk, providing new evidences that lung cancer and breast cancer are linked at the molecular and genetic level to a certain extent.


INTRODUCTION
Lung cancer is a major public health concern worldwide, causing as many deaths as next four most deadly cancers combined (breast, prostate, colon, and pancreas). Non-small cell lung cancer (NSCLC) is the commonest lung cancer histology, comprised primarily of adenocarcinoma and squamous cell carcinoma [1]. In China, lung cancer has replaced liver cancer to become the leading cause of cancer-related deaths and accounted for 29% of all male cancer deaths and 23% of all female cancer deaths, totally approximately 220,5200 deaths, in 2014 (World Health Statistics, WHO, 2014). It has been established that multiple environment (mainly cigarette smoking and asbestos) and genetic factors are involved and gene-environment interactions exist in the development and process of lung cancer [2][3][4].
In the past few years, several genetic variants and loci have been identified to be the genetic risk factors of lung cancer [5][6][7][8][9]. However, to date, these known common

Research Paper
loci only explain a small fraction of the familial risk of lung cancer. Hence it is vital to investigate the remaining missing heritability to understand the development and process of lung cancer. The TOX3/LOC643714 locus on chromosome 16q12.1 was one of the first breast cancer regions identified through genome-wide association study (GWAS) in populations of European and East Asian origin [10]. The chromosomal region spanning the 5' end of TOX3, the intergenic region between TOX3 and LOC643714, and the entire coding part of LOC643714 are located in a 133 kb linkage disequilibrium (LD) block [11]. LOC643714 is an uncharacterized gene of unknown function [http://www.ncbi.nlm.nih.gov/ gene/643714]. Identified first in a screen for transcripts containing trinucleotide repeat expansions, TOX3 gene, also termed trinucleotide repeat containing 9 (TNRC9), belongs to the high-mobility group (HMG) family of nonhistone chromatin proteins, indicating its potential role as a transcript factor [12] and involvement in bending and unwinding of DNA and alteration of chromatin structure [13]. TOX3 is largely expressed within the brain in the frontal and occipital lobe, the central nervous system (CNS), and the ileum in normal human tissues. Through interaction with the cAMP-response-element-binding protein (CREB), TOX3 regulates Ca 2+ -dependent neuronal transcription [14]. The overexpression of TOX3 induces transcription involving isolated estrogen responsive elements and estrogen-responsive promoters, and protects neuronal cells from cell death caused by endoplasmic reticulum stress or BCL2-Associated X Protein (BAX) overexpression through the induction of anti-apoptotic transcripts and repression of pro-apoptotic transcripts [15].
Though the known etiology and carcinogenesis of lung cancer are different from that of breast cancer, the patients of the two diseases could be treated by some common chemotherapeutic agents such as taxanes (paclitaxel and taxotere), vincristine (Navelbine, NVB), and platinum-containing anticancer drugs (cisplatin and carboplatin). Most importantly and interestingly, a recent meta-analysis based on four lung cancer GWAS in populations of European ancestry, the MD Anderson Cancer Center (MDACC) GWAS, the Institute of Cancer Research (ICR) GWAS, the National Cancer Institute (NCI) GWAS, and the International Agency for Research on Cancer (IARC) GWAS, identified a rare variant of BRCA2 gene, a well-known risk factor for breast, ovarian, and aggressive prostate cancers, to an increased risk of squamous cell lung cancer among cigarette smokers, suggesting that lung cancer and breast cancer are linked at a molecular and genetic level to a certain extent [44].
Due to the findings of a molecular and genetic linkage between lung cancer and breast cancer and the potential involvement of TOX3 in bending and unwinding of DNA and alteration of chromatin structure, we deduced that the previously identified breast cancer susceptibilityassociated variants and loci at the TOX3/LOC643714 locus may contribute to lung cancer risk. To test the hypothesis, 16 SNPs at the TOX3/LOC643714 locus were selected and genotyped in a Han Chinese population from Southwestern China based on a case-control study. The genotyping data demonstrated that three SNPs (rs9933638, rs12443621, and rs3104746) at the TOX3/LOC643714 locus were associated with elevated risk of lung cancer and might be potentially biologically relevant to lung carcinogenesis.

Subject characteristics
Totally, 352 unrelated patients and 407 unrelated controls were recruited from Southwestern China for the case-control study. No female cigarette smokers were gathered. The general descriptive characteristics of the study population were given in Table 1. The median number of pack-years of combined cases and controls was utilized as the cut-point to stratify the cigarette smoking subjects. As shown in Table 1, there were no significant www.impactjournals.com/oncotarget difference in gender and age between the controls and cases. As expected, cases smoked more cigarettes (P < 0.001). The distribution of tumour types among the patients was as follows: adenocarcinoma, 42.05%; squamous cell carcinoma, 28.13%; other non-small cell carcinoma, 16.19%; and small cell carcinoma, 13.64%.

Association of the alleles of the 16 SNPs with lung cancer risk
The basic information regarding the 16 SNPs at the TOX3/LOC643714 locus was demonstrated in Supplementary Table S3. The 16 SNPs were genotyped in all of the lung cancer patients and healthy controls and qualified according to Hardy-Weinberg equilibrium (HWE) in the study population (Supplementary Table  S3). As shown in Table 2, Pearson's chi-square test or Fisher's exact test demonstrated that of the 16 SNPs, three SNPs (rs9933638, rs12443621, and rs3104746) were found to be significantly associated with lung cancer risk (P < 0.001, P < 0.001, and P = 0.005, respectively) and rs3095661 displayed a marginally significance (P = 0.041) ( Table 2). Among the four SNPs, rs9933638, representing a haplotype block covering 12 SNPs including rs12443621, is located at intron 2 of TOX3. Rs3095661 and rs3104746 are located at intron 4 of TOX3 and intron 2 of LOC643714, respectively (Supplementary Table S3).
The stratification by gender demonstrated that rs9933638, rs12443621, and rs3104746 were significantly associated with lung cancer risk of both male (P < 0.001, P = 0.003, and P = 0.038, respectively) and female subjects (P = 0.007, P = 0.005, and P = 0.049, respectively). Additionally, the analysis of the 16 SNPs among patients with adenocarcinoma and squamous cell carcinoma, the two most common types of NSCLC, respectively, revealed that rs9933638, rs12443621, and rs3104746 were  significantly associated with risk of both adenocarcinoma (P < 0.001, P < 0.001, and P = 0.047, respectively) and squamous cell carcinoma (P = 0.006, P = 0.020, and P = 0.008, respectively), while rs3095661 was found to be only related with risk of adenocarcinoma (P = 0.009) ( Table 2).
Association of the genotypes of the four SNPs (rs9933638, rs12443621, rs3104746, and rs3095661) with lung cancer risk As shown in Table 3, consistent with the association between the alleles of the three SNPs (rs9933638, rs12443621, and rs3104746) and lung cancer risk, there was a significantly different distribution of the genotypes of the three SNPs between lung cancer cases and controls (P < 0.001, P < 0.001, and P = 0.003, respectively). The multivariate logistic regression analyses with adjustment of age, gender, and smoking revealed that individuals with rs9933638GG had an elevated risk of lung cancer compared with rs9933638GA and rs9933638AA carriers (codominant model, OR = 2.571, 95%CI = 1.710-3.867, P < 0.001, and OR = 1.509, 95%CI = 1.022-2.229, P = 0.038, respectively). The dichotomic analysis further demonstrated that individuals with rs9933638GG showed an increased risk of lung cancer compared with rs9933638(GA+AA) carriers (dominant model, OR = 1.877, 95%CI = 1.423-2.476, P < 0.001). Additionally, individuals with rs9933638(GG+GA) also displayed an increased risk of lung cancer compared with rs9933638AA carriers (recessive model, OR = 1.893, 95%CI = 1.308-2.741, P = 0.001), suggesting that individuals with the allele G of rs9933638 were susceptible to lung cancer in a dose-dependent manner. Similar findings were discovered for rs12443621. For rs3104746, the dichotomic analysis demonstrated that compared with individuals with rs3104746(TA+AA), rs3104746TT carriers had an increased risk of lung cancer (dominant model, OR = 1.665, 95%CI = 1.243-2.230, P = 0.001), suggesting that individuals with rs3104746TT were susceptible to lung cancer compared with rs3104746(TA+AA) carriers. Moreover, though a marginally significant difference of rs3095661 alleles was found between controls and cases (P = 0.041, Table 2), no significant difference of rs3095661 genotypes was found in the four models between controls and cases (P = 0.362, P = 0.066, P = 0.999, P = 0.197, respectively, Table 3). Notably, the detection of rs3095661CC carriers only in lung cancer patients (6/352, 1.70%) but not in controls strongly suggested that rs3095661CC might be risk factor of lung cancer.
Distribution of rs9933638, rs12443621, and rs3104746 among lung cancer patients stratified by cigarette smoking Because rs9933638, rs12443621, and rs3104746 were found to be associated with lung cancer risk, the   (rs9933638, rs12443621, rs3104746, and rs3095661)

DISCUSSION
Lung cancer is one of the major causes of cancerrelated death worldwide. Besides environmental factors, inherited genetic variants or polymorphisms are also involved in lung cancer risk. In the case-control study, the genotyping of the 16 SNPs at the TOX3/LOC643714 locus in a Han Chinese population revealed that rs9933638/ rs12443621 and rs3104746 might contribute to risk of lung cancer. To our best knowledge, the study discovered for the first time that the previously identified breast cancer susceptibility-associated SNPs at the TOX3/LOC643714 locus were risk factors of lung cancer.
The TOX3/LOC643714 locus was one of the first breast cancer regions identified through GWAS in populations of European and East Asian origin [10]. In the recent years, quite a few of SNPs at the TOX3/LOC643714 locus including rs3803662, rs3104746, rs8051542, rs4784227, rs12443621, rs3112612, rs3112562, rs3104793, rs8046994, rs3104788, and rs3104767, were demonstrated to be independently associated with elevated risk of breast cancer and the other human diseases . Of the SNPs mentioned above, only rs12443621 and rs3104746 were found to be significantly associated with increased risk of lung cancer in the study. Initially, Pearson's chi-square test or Fisher's exact test revealed that rs9933638 were strongly associated with lung cancer risk (Table 2), confirmed by logistic regression analyses in which lung cancer risk of individuals with rs9933638GG was shown to be 2.57 and 1.51 times higher than that of individuals with rs9933638GA and rs9933638AA, respectively. In addition, lung cancer risk of rs9933638(GG+GA) carriers was demonstrated to be 1.89 times higher than that of rs9933638AA carriers ( Table 3). The findings suggested that individuals with the allele G of rs9933638 were susceptible to lung cancer in a dose-dependent manner. To validate the findings, rs12443621, the previously identified breast cancer risk factor of Chinese population [25,28,37] and covered by the haplotype block represented by rs9933638, was genotyped in the study. The genotyping data showed that the allele G of rs12443621 was also a risk factor for lung cancer (Table 2), verified by the stratification analysis which revealed that subjects with rs12443621GG and rs12443621GA had an increased risk of lung cancer compared with individuals with rs12443621AA ( Table 3). The validation results strongly confirmed that rs9933638, representing a haplotype block covering rs12443621, was associated with risk of lung cancer. Most importantly, rs12443621, the previously determined breast cancer risk factor of Chinese population [25,28,37], was discovered to be a lung cancer risk factor of Chinese population in the study provided new evidences that lung cancer and breast cancer are linked at a molecular and genetic level at least in part in Chinese population, which may help to explore the novel carcinogenesis mechanisms.
Rs3803662 was reported to be associated with breast cancer risk of both male and female subjects [13,[16][17][18][19][20]. Consistent with rs3803662, the stratification by gender revealed that all of the three SNPs (rs9933638, rs12443621, and rs3104746) were all associated with increased lung cancer risk of both male and female individuals. Furthermore, the three SNPs were all indicated to be related with elevated lung cancer risk of patients with adenocarcinoma and squamous cell carcinoma. The findings suggested that the association of the three SNPs with lung cancer risk was gender-and histology-independent. Moreover, the analysis of the three In summary, the present study discovered for the first time that rs9933638/rs12443621 and rs3104746, the previously identified breast cancer susceptibility-related SNPs at the TOX3/LOC643714 locus, contributed to the individual's risk to lung cancer in the Southwestern Han Chinese population. The findings provided additional evidences that lung cancer and breast cancer are correlated at a molecular and genetic level at least in part in Chinese population.

Study population
Patients (n = 352) with primary lung cancer diagnosed from September 2007 to December 2008 were recruited from the Institute of Human Respiratory Disease of Xinqiao Hospital, the Third Military Medical University. All patients were newly diagnosed, histologically confirmed and previously untreated. 407 age-and sex frequency-matched healthy control samples were collected from individuals at the Centre of Physical Examination of Xinqiao Hospital between November 2007 and December 2008. The exclusion criterion for the control group was any history of cancer. All of the subjects were unrelated at least within three generations. After explaining the purpose and procedures of the study, all participants signed a written informed consent form, completed a detailed questionnaire regarding their smoking habits, and donated 5 ml peripheral blood. Blood samples were drawn into Na-EDTA tubes from all subjects and stored at -70°C for genomic DNA extraction. The study was approved by the Ethical Committee of Xinqiao Hospital, the Third Military Medical University.

Selection of SNPs
Totally, 16 SNPs at the TOX3/LOC643714 locus were selected in the study. Of the 16 SNPs, six SNPs (rs8051542, rs12443621, rs3803662, rs3104746, rs3112562, and rs4784227) were selected based on the published references in which these SNPs were suggested to be susceptible to breast cancer or the other human diseases and the other 10 SNPs were selected from the genetic variation data for TOX3 gene obtained from the HapMap project for 45 healthy Chinese Han Beijing (CHB) adults (www.hapmap. org). Haplotype blocks, representing regions inherited without substantial recombination in the ancestors of the current population, were constructed throughout the entire TOX3 gene using Haploview (version 4.0, Broad Institute of MIT and Harvard, Cambridge, MA) [45]. The history of recombination between a pair of SNPs can be estimated with the use of the normalized measure of allelic association D' (value of D prime between the two loci) [46,47]. The criterion for the selected SNPs to construct a haplotype block is that all SNPs in one region must be in strong LD with D' > 0.98 for the upper 95% confidence bound and > 0.7 for the lower bound. A maximally informative htSNP was then selected from each block using the software Tagger program (http://www.broad.mit.edu/mpg/haploview). This algorithm selects a subset of variants that capture all known common genetic variations in the TOX3 gene based on a LD threshold of r 2 ≥ 0.8. The inverse of r 2 represents the ratio of sample size needed to detect an indirect association with an un-analyzed SNP to direct association at the same power.

Data analyses
Cigarette smoking was stratified by the median number of pack-years of combined cases and controls (1 pack-year = 20 cigarettes per day for 1 year). Cases and controls were compared by Student's t-test for continuous variables and Pearson's chi-square test or Fisher's exact test for categorical variables. The Hardy-Weinberg equilibrium of each SNP was tested by SNPStats (http://bioinfo.iconcologia.net/snpstats/start.htm). Each component of the model was: codominant model (major allele homozygotes vs. heterozygotes vs. minor allele homozygotes), dominant model (major allele homozygotes vs. heterozygotes + minor allele homozygotes), recessive model (major allele homozygotes + heterozygotes vs. minor allele homozygotes), and overdominant model (major allele homozygotes + minor allele homozygotes vs. heterozygotes). To assess the independent effect of each SNP, the multivariate logistic regression analyses with adjustments for possible confounding factors (age, gender, and smoking habits) were performed to estimate the association between the SNPs and cancer risk as well as the possible gene-environment interactions. All associations were presented as odds ratios (ORs) with the corresponding 95% confidence intervals (95%CI). All statistical analyses were performed using the Statistical Package for Social Science 15 for Windows (SPSS Inc, Chicago, IL, USA). In the statistical analysis, all statistical tests were two-sided and P < 0.05 was considered significant.