PADI4 rs2240337 G>A polymorphism is associated with susceptibility of esophageal squamous cell carcinoma in a Chinese population

Background Esophageal cancer (EC) remains one of the major causes of cancer incidence and mortality worldwide. Genetic factors, such as single nucleotide polymorphisms (SNPs), may contribute to the carcinogenesis of EC. Methods We conducted a hospital based case-control study to evaluate the genetic susceptibility of SNPs on the development of EC. A total of 629 esophageal squamous cell carcinoma (ESCC) cases and 686 controls were enrolled for this study. Seven PADI4 SNPs were determined by ligation detection reaction method. Results Our findings suggested that the PADI4 rs2240337 GA/AA variants were significantly associated with decreased risk of ESCC. Haplotype PADI4 Ars2477137Crs1886302Grs11203366Grs16825533Grs2240337Ars1635564Ars1635562 and Crs2477137Trs1886302Grs11203366Ars1635564Grs2240337Crs1635564Trs1635562 polymorphism was correlated with decreased susceptibility to ESCC, while Crs2477137Trs1886302Ars11203366Ars1635564Grs2240337Ars1635564Ars1635562 was correlated with increased susceptibility of ESCC. Stratification analyses demonstrated that smoking significantly increased ESCC risk in PADI4 rs11203366 AG/AA, rs1886302 CC/CT, rs1635562 AT, rs1635564 CA and rs2477137 AC genotype. Alcohol drinking increased ESCC risk in PADI4 rs11203366 AG, rs1635562 AT, rs1635564 CA, rs2477137 AC, rs1886302 CT genotype. In younger cohort (<63 years), rs11203366 AA genotype was associated with increased risk of ESCC. PADI4 rs1886302 CC variant was associated with ESCC susceptibility in female cohort. Conclusions Our study suggested that PADI4 rs2240337 G>A polymorphism may be correlated with individual susceptibility to ESCC. PADI4 rs11203366, rs1886302, rs1635562, rs1635564 and rs2477137 polymorphisms were implicated with altered susceptibility of ESCC based on sex, age, smoking status and alcohol consumption. However, larger studies among different ethnic populations and further experiments using genetically mutated cells or animals are warranted to verify our conclusion.


INTRODUCTION
Esophageal cancer is one of the most common cancers worldwide, and carries a high mortality after diagnosis following the onset of symptoms [1]. Cancer of the esophagus occurs in two major histological forms, esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC). ESCC dominates in most parts of the world, especially in high-risk areas such as China, where it accounts for about 90% of the total esophageal cancer cases [2,3]. Smoking and alcohol consumption are related with more than 90% of ESCC patients in the western countries [4,5], but the role of smoking and alcohol consumption is less important in China. The risk factors for ESCC in China include poor nutrition, lack of fruit and vegetables, drinking hot beverages and opium [3,6].
The peptidylarginine deiminase IV (PADI4 or PAD4) converts arginine residues at histone tails to citrulline [7]. PADI4 has been demonstrated to co-localize with cytokeratin, an intermediate filament protein that plays a role during cell differentiation and apoptosis [8][9][10]. In cancer, high PADI4 expression has been connected to tumor growth [11], as PADI4 was overexpressed in numerous malignant cancers, but not in healthy tissues [8]. Recent study using immunohistochemistry further verified a significant PADI4 expression in various malignancies, comprising esophageal squamous cancer cells [12]. Consistently, PADI4 level in the blood increased dramatically in the patients with various malignant tumors, but considerably declined after tumor excision surgery [12]. Notably, PADI4 can disrupt the apoptotic process via the citrullination of histone H3 in the promoter of p53target genes [13]. Therefore, we postulated that PADI4 might play an important role in the carcinogenesis of the esophageal cancer.
Single nucleotide polymorphisms (SNPs) account for more than 90% genetic variations. Despite the evidence described above indicated a correlation between PADI4 and ESCC, few molecular epidemiological studies have explored the relationship between PADI4 SNPs and susceptibility of ESCC with inconsistent results [13]. In a small cohort of esophageal cancer patients (including ESCC and EAC), PADI4 rs10437048 and rs41265997 were found significantly associated with the risk of esophageal cancer [13]. To specifically examine the potential associations between genetic variants in PADI4 and ESCC risk, we studied the correlation with the tagging SNP strategy in a larger cohort of 629 subjects of ESCC and 686 controls.

Characteristics of the study population
Characteristics of cases and controls included in the study are summarized in Table 1. The cases and controls appeared to be adequately matched on age and sex as suggested by the χ 2 test. As shown in Table 1, significant difference was detected on smoking status (p<0.001) between the cases and the controls, and drinking rate (p<0.001) was higher in ESCC patients than in control subjects.

Associations between PADI4 tagging polymorphisms and risk of ESCC
The seven tagging SNPs were selected on the basis of their pairwise linkage disequilibrium (LD) with the r 2 threshold of 0.8 and minor allele frequency (MAF) ≥0.05 to capture all the common SNPs. Among eligible SNPs, linkage disequilibrium analysis was performed in the Chinese Han population (https://www.ncbi.nlm.nih.gov/ variation/tools/1000genomes/), and the SNP loci with moderate correlation were chosen for further analyses. The LD structure across the PADI4 genomic region was presented, and three blocks were defined ( Figure  1). Next, we applied the "block-based" method, which exploits the principle of linkage disequilibrium observed within haplotype blocks, to search for tag SNPs. Several algorithms have been devised to partition chromosomal regions into haplotype blocks that are based on haplotype diversity, LD, four-gamete test and information complexity. We then used online database to predict the function of SNPs (http://www.regulomedb.org/) and selected seven tag SNPs for analysis (See Figure 1).
As shown in Table 2, the genotyping successful rates were ranging from 95.13% to 98.47%. In the control subjects, the genotype frequencies for these seven polymorphisms reached Hardy-Weinberg equilibrium (pvalue for HWE, all p>0.05). The minor allele frequency (MAF) in our controls was comparable with the Chinese cohort in database for all seven SNPs loci.
The genotype distributions of PADI4 SNPs in the cases and the controls are shown in Table 3. When the PADI4 rs2240337 G>A SNP GG homozygote genotype (AA) was used as the reference group, both the GA heterozygote genotype (AB) and the AA mutated homozygote genotype (BB) were associated with a significantly decreased risk of ESCC (AB vs. AA: adjusted OR = 0.52, 95% CI = 0.39-0.71, p<0.0001; BB vs. AA: adjusted OR = 0.30, 95% CI = 0.13-0.68, p = 0.004). Logistic regression analyses www.impactjournals.com/oncotarget

Associations between PADI4 rs2240337 polymorphism and pathologic character of ESCC
Furthermore, we analyzed the correlation between PADI4 rs2240337 G>A SNP and the clinic pathologic state. However, PADI4 rs2240337 G>A SNP did not correlate with clinical tumor stage (p = 0.215) or grade (p = 0.497) ( Table 4).

Stratification analyses of seven polymorphisms and risk of ESCC
To further evaluate the effects of these seven SNPs on the risk of ESCC according to different age, gender, smoking and alcohol drinking status, stratification analyses were performed as shown in Table 5-11. We showed that smoking significantly increased ESCC risk in PADI4 rs11203366 AG/AA, rs1886302 CC/CT, rs1635562 AT, rs1635564 CA, rs2240337 AG and rs2477137 AC genotype. Alcohol drinking increased ESCC risk in PADI4 rs11203366 AG, rs1635562 AT, rs1635564 CA, rs2477137 AC, rs1886302 CT genotype. In younger cohort (<63 years), PADI4 rs16825533 AG genotype was associated with decreased risk of ESCC, while rs11203366 AA genotype was associated with increased risk of ESCC. In the non-drinking cohort, PADI4 rs11203366 AA variant was associated with increased risk of ESCC. PADI4 rs1886302 CC variant was associated with ESCC susceptibility in     In the non-alcohol drinking cohort, PADI4 rs11203366 AA (p=0.020) variant was associated with increased risk of ESCC. In the dominant (p=0.023) model, PADI4 rs11203366 A>G was associated with increased risk of ESCC. In the PADI4 rs11203366 AG subgroup, alcohol drinking significantly increased the risk of ESCC (p h =0.013). female cohort. In the non-alcohol drinking cohort, PADI4 rs1886302 CC and CT variants were associated with decreased risk of ESCC. In rs1635562 TT subgroup, elder people (≥63 years) were more susceptible to ESCC.

Linkage disequilibrium analyses and association test
Linkage disequilibrium analyses in both controls and cases were conducted as shown in Table 12-13, there were correlations between these seven loci. Association test was performed using Haploview software (v 4.2), there were associations between these seven loci ( Figure 2).

Haplotype analyses of PADI4 polymorphisms and susceptibility to ESCC
As shown in Table 14, haplotype analyses showed that PADI4 C rs2477137 T rs1886302 A rs11203366 A rs1635564 G rs2240337 C rs1635564 A rs1635562 was the most common haplotype in both groups (24.5% in controls, 25.5% in cases). The haplotype PADI4 A rs2477137 C rs1886302 G rs11203366 G rs16825533 G rs2 240337 A rs1635564 A rs1635562 frequency and PADI4 C rs2477137 T rs18 86302 G rs11203366 A rs1635564 G rs2240337 C rs1635564 T rs1635562 frequency were significantly lower in ESCC cases as compared with controls (0.019 vs. 0.036, p=0.007; 0.019 vs. 0.031, p=0.038, respectively), suggesting that both PADI4 A rs247

Power calculation
The power calculation was performed by "Power and Sample Size Calculation" Software (http://biostat. mc.vanderbilt.edu/wiki/Main/PowerSampleSize). Based on the assumption that the type I error probability for a two sided test (α) equals 0.05, the probability of exposure in controls p0 is 0.0698 in rs2240337 in the Chinese Han population according to the NCBI project. In the current study, using ligation detection reaction method, the successful rates of genotyping all exceeded 95%. There were 1,200 alleles successfully genotyped. The ratio of control/case (m) equals 1.085, and the correlation coefficient for exposure between matched case and controls (f) is 2.058 in rs2240337. The power value is 1.000.

DISCUSSION
In this hospital-based case-control epidemiological study in a Chinese population, we investigated whether tagging SNPs in PADI4 were associated with risk of developing ESCC. We found that the PADI4 rs2240337 Recently, PADI4 has emerged as a novel transcriptional corepressor [14][15][16]. This enzyme catalyzes the posttranslational modification of arginine residues (to form citrulline) in histones H2A, H3, and H4 at the estrogen-regulated pS2 promoter [15][16][17] and at the apoptosis-related gene promoters p21 and OKL38 [14,18], thereby repressing gene transcription. Additionally, the histone deaminating activity of PADI4 has been shown to downregulate the expression of numerous p53-dependent genes, including p21, PUMA, and GADD45 [14,18]. PADI4 is overexpressed in numerous malignant cancers (e.g., breast, metastatic carcinomas, colon, bladder, lung, ovarian, and many others). In parallel, under normal circumstances, PADI4 exists as an intracellular protein, but in patients with malignant tumors, PADI4 can be detected in the plasma [16]. The PADI4 in blood increased in the presence of tumor and decreased after the tumor excision [12]. These studies bolstered the pathogenic role of PADI4 during carcinogenesis. Furthermore, expression of PADI4 was detected in esophageal cancer, but not in normal tissues. Significantly, PADI4 levels were positively correlated with the pathological classification of esophageal cancer [13].
In the present study, seven PADI4 gene variations in Chinese population were tested and associations between these variations and outcomes in ESCC were explored. Of the seven SNPs, rs2240337 G>A was validated as an ESCC susceptibility locus, showing highly significant evidence both in heterozygote group (p<0.0001) and homozygote group (p<0.004). A previous study in a small cohort of patients with EC (83 cases and 67 controls, including ESCC and EAC) has reported that the PADI4   while the association between this SNP and rheumatoid arthritis severity has also been reported [20]. As the sample size was limited in our study, the correlation between rs2240337 and the pathologic character of ESCC was not evident, further investigation is desirable to demonstrate the functional relevance of rs2240337 polymorphism in ESCC.
Smoking and alcohol drinking have emerged as widely acknowledged risk factors of ESCC. This notion was in line with our finding, although PADI4 rs11203366, rs1886302, rs1635562, rs1635564, rs16825533 and rs2477137 were not associated with the susceptibility to ESCC, smoking significantly increased ESCC risk in PADI4 rs11203366 AG/AA, rs1886302 CC/ CT, rs1635562 AT, rs1635564 CA and rs2477137 AC genotype, while alcohol drinking increased ESCC risk in PADI4 rs11203366 AG, rs1635562 AT, rs1635564 CA,   disequilibrium analyses of PADI4 rs11203366, rs1886302, rs1635562, rs1635564, rs16825533  A rs1635564 A rs1635562 and PADI4 C rs2477137 T rs1886302 G rs11203366 A r s1635564 G rs2240337 C rs1635564 T rs1635562 genetic polymorphism may be correlated with decreased susceptibility to ESCC, while haplotype PADI4 C rs2477137 T rs1886302 A rs11203366 A rs1635564 G rs2240337 A rs1635564 A rs1635562 genetic polymorphism may be correlated with increased susceptibility of ESCC, which indicated that single locus polymorphism might not significantly modify the susceptibility to cancer, the chain effect lying in different loci leads to a more profound impact on the risk of cancer.
Our study provides the evidence that polymorphism of PIDA4 rs2240337 G>A is associated with the altered susceptibility of ESCC. We acknowledge there are several limitations in this study. First of all, the study subjects were all recruited from several local medical centers within same area, which might not completely represent the general Chinese population, especially when diverse regional environmental factors existed. Secondly, the detailed information regarding cancer metastasis and survival were not provided as the follow-up study is still ongoing, which hindered analyses of the impact of these SNP polymorphisms on ESCC progression and prognosis. Further studies with more loci and large sample size are warranted to elucidate the effect of PADI4 SNPs on ESCC risk. Last but not least, refrained by the limited technical support, we have not evaluated the biological function of the SNP polymorphism in the carcinogenesis of ESCC in the current study. As rs2240337 is located in the intron region of PADI4 gene, therefore overexpression of wild type and mutant type PADI4 coding sequence does not work. We speculate that rs2240337 may cause an alternative RNA splicing on PADI4 mRNA, thereby regulating the PADI4 protein function. Further studies using an rs2240337 G>A mutation cell or mouse model are needed to clarify the mutant PADI4 function.

Ethical approval of the study protocol
We have complied with the World Medical Association Declaration of Helsinki regarding ethical conduct of research involving human subjects and/ or animals. The Review Board of Jiangsu University (Zhenjiang, China) approved this hospital-based casecontrol study. To be included in the study, all subjects provided written informed consent.

Patients and controls
Between October 2008 and June 2013, 629 subjects with ESCC were consecutively recruited from the Affiliated People's Hospital of Jiangsu University and Affiliated Hospital of Jiangsu University (Zhenjiang, China). All cases of ESCC were diagnosed pathologically. The exclusion criteria were patients who previously had: cancer; any metastasized cancer; radiotherapy or chemotherapy. The 686 controls were patients without cancer and were matched to the cases with regard to age (±5 years) and sex. Most of the controls were admitted Haplotype PADI4 C rs2477137 T rs1886302 G rs11203366 A rs1635564 G rs2240337 C rs1635564 T rs1635562 frequency was significantly lower in ESCC cases as compared with controls (0.019 vs. 0.031, p=0.038), suggesting that haplotype PADI4 C rs2477137 T rs1886302 G rs11203366 A rs16 35564 G rs2240337 C rs1635564 T rs1635562 genetic polymorphism may be correlated with a decreased susceptibility of ESCC (OR=0.568, 95%CI: 0.330-0.975).
to the hospitals for the treatment of trauma. They were recruited from the two hospitals mentioned above during the same time period.
Trained interviewers, using a pre-tested questionnaire, questioned each subject personally to obtain information on demographic data (e.g., age, sex) and related risk factors (including tobacco smoking and alcohol consumption). After the interview, 2mL of venous blood was collected from each subject. Individuals who smoked one cigarette per day for >1 year were defined as "smokers". Subjects who consumed more than three alcoholic drinks a week for >6 months were considered to be "alcohol drinkers".

Isolation of DNA, SNPs selection and genotyping by ligation detection reaction
Blood samples were collected from patients using vacutainers and transferred to tubes lined with ethylenediamine tetra-acetic acid (EDTA). Genomic DNA was isolated from whole blood with the QIAamp DNA Blood Mini Kit (Qiagen, Berlin, Germany) as described [21].
To find tagging SNPs, we used a block-based tagging strategy using Haploview 4.2 software, according to the HapMap database (http://www.hapmap.org/, phase II Nov08, on NCBI B36 assembly, dbSNP b126; population: Chinese Han population). Seven PADI4 tagging SNPs were selected on the basis of Hardy-Weinberg equilibrium (HWE) p ≥ 0.05, call rate ≥ 95% and minor allele frequency ≥ 0.05. The samples were genotyped using the ligation detection reaction (LDR) method, with technical support from the Shanghai Biowing Applied Biotechnology Company [22]. For quality control, repeated analyses were done for 110 (11.73%) randomly selected samples with high DNA quality.

Statistical analyses
Differences in the distributions of demographic characteristics, selected variables, genotypes of the PADI4 variants, and the correlation between genotyping and pathologic state were evaluated using the χ 2 test. The associations between the seven SNPs and risk of ESCC were estimated by computing the odds ratios (ORs) and their 95% confidence intervals (CIs) using logistic regression analyses for crude ORs and adjusted ORs when adjusting for age, sex, smoking and drinking status. The HWE was tested by a goodness-of-fit χ 2 test to compare the observed genotype frequencies to the expected frequencies among the control subjects. The Bonferroni correction procedure was applied because of the number of comparisons. As multiple hypotheses are tested, the chance of a rare event increases, and the likelihood of incorrectly rejecting a null hypothesis (type I error) increases, the Bonferroni correction was therefore performed. All statistical analyses were performed with SPSS 23.0 Statistical Package (SPSS Inc., Chicago, IL).

Author contributions
LW, HG and TL carried out the molecular genetic studies, selected the tagged SNPs, performed the statistical analysis and drafted the manuscript. HP, LL, YS, JZ, YS, WT, GD, SC, YF, HD, QW, JY recruited the patients and collected the samples. SC, JY and YF participated in the design and coordination of the study. LT and JY conceived of the study, and participated in its design and coordination. All authors read and approved the final manuscript.