Genetic variations in chromodomain helicase DNA-binding protein 5, gene-environment interactions and risk of sporadic Alzheimer’s disease in Chinese population

CHD5 is an essential factor for neuronal differentiation and neurodegenerative diseases. Here, the targeted next generation sequencing and TaqMan genotyping technologies were carried out for CHD5 gene in a two-staged case-control study in Chinese population. The genetic statistics and gene-environment interactions were analyzed to find certain risk factors of Alzheimer's disease. We found intronic rs11121295 was associated with the risk of Alzheimer's disease at both stages including combined cohorts. This risk effect presented consistently significant associations with the alcoholic subgroups at both all stages in the stratified analysis. The gene-environment interactions further supported the above findings. Our study highlighted the potential role of CHD5 variants in conferring susceptibility to sporadic Alzheimer's disease, especially modified its risk by alcoholic intake.


INTRODUCTION
Alzheimer's disease, a progressive neurodegenerative disorder, is one of the major dementia in elderly people that usually starts slowly and worsens over time [1]. It causes a tremendous societal challenge because the popularity of Alzheimer's disease continues to rise. It affects about 6% of people over 65 years of age, and there are approximately 30 million people worldwide with Alzheimer's disease [2].
The cause of Alzheimer's disease is poorly understood. About 70% of the risk is believed to be genetic with many genes usually involved [3]. Recent studies have proved that a number of polymorphisms impact on the progression of patients with Alzheimer's disease [4,5]. However, more than 90% of cases are scattered, as a result, only a few percentage points have clear genetic causes. One of the most frequent genetic modifications depicted is the deletion of the short arm of chromosome 1 found in around 35% of neuroblastoma [6]. Other risk factors include a history of head injuries, depression and anxiety, or hypertension.

Research Paper: Gerotarget (Focus on Aging)
Oncotarget 24873 www.oncotarget.com CHD5 belongs to the ATP-dependent chromatin remodeling protein snf2 DNA methylase/helicase (SNF2) superfamily which is one of the nine members of the chromodomain helicase DNA-binding (CHD) family of enzymes [7,8]. According to the latest report, the expression of CHD5 was in several brain regions and neurons [9,10]. And then CHD5 directly regulated the targets including genes which are important for aging, Alzheimer's disease, and neuronal development. Also, intellectual impairment has been coupled with the deletion of a region of chromosome 1 near CHD5 [11]. However, it remains to be confirmed that the specific role of CHD5 in brain development and function. Based on all of these findings, we set out to investigate whether CHD5 singlenucleotide polymorphisms (SNPs) were associated with risk of Alzheimer's disease in a two-stage case-control study from China.

RESULTS
In the discovery study, the genotype distributions of 164 candidate variants in the case and control groups are shown in Supplementary Table 1. We found three variants perhaps were associated with Alzheimer's disease risk Oncotarget 24874 www.oncotarget.com (rs11121295, P for alleles = 6.00×10 -4 ; rs10864393, P for alleles = 0.0049; and rs9434741, P for alleles = 0.0036; Figure 1A). Even after 10 5 permutation tests, there was still remain significance for rs11121295 (P = 0.0387; Figure 1B). The observed X 2 values with the distribution of null hypothesis were deviated from the expectations at higher value of approximately 4.3 ( Figure 1C). But it lost its significance after removing rs11121295 ( Figure 1D).
We identified three blocks with high linkage disequilibrium ( Figure 1E). Block 1 includes SNP9~SNP10 (rs745759 and rs3888071). Block 2 includes SNP35~SNP36 (rs2273041 and rs2273040). Block 3 includes SNP115 and SNP116 (rs12564469 and rs9434711). The results of the haplotype-based case-control study between the Alzheimer's disease and control groups are shown in Supplementary Table 2. We found that a haplotype AG in block 3 showed a significant association with Alzheimer's disease risk (P = 1.620×10 -5 ). Nevertheless, it is actually difficult to determine Alzheimer's disease risk, because the proportions of the frequencies of haplotype AG were all too small (0.055 in cases and 0.005 in controls).
For Alzheimer's disease risk in older group, the false-positive report probability (FPRP) values of Table 1: Association results of the selected three SNPs from next generation sequencing in discovery, replication and combined studies D,derived alleles; A, ancestral alleles; Het, heterozygous variants; Hom, homozygous variants. P trend were calculated in logistical regression models with adjustment for age, gender, smoking and drinking status. www.oncotarget.com rs11121295 GG were below 0.20 for the assigned prior probability (0.008 for the prior probability of 0.1 in the discovery study; 0.005, 0.010 for the prior probabilities of 0.1, 0.01, respectively in the replication study; and 0.002, 0.009, 0.081 for the prior probabilities of 0.1, 0.01, 0.001 respectively in the combined study). For Alzheimer's disease risk in drinking group, when the assumption of prior probability was 0.1, significant findings were noteworthy in the replication study but in the combined study (prior probability 0.011 and 0.004, respectively). Moreover, when the assumption of prior probability was 0.01, this prominent association was just found in the combined study (prior probability 0.010; Table 3).

DISCUSSION
The genome-wide association studies have found some chromosome regions including rare variants that appear to affect Alzheimer's disease risk [3,12,13]. The potential genes including CD2AP, SlC24A4, HLA-DRB5, etc. For example, variants in the TREM2 gene might be associated with a three to five times higher risk [14]. In our study, we reported the positive association between the rs11121295 homozygous variant and Alzheimer's disease not only in the discovery cohort but in the replication and combined studies with targeted next generation sequencing and TaqMan genotyping technologies, which has not been reported before.
The quantile-quantile (Q-Q) plots are plots of two quantiles against each other, which could examine if two data sets have roots in the same distribution. If the two sets of data have a common distribution, the points will fall on that reference line of forty-five degree angle. In Table 2: Stratification analysis for associations between rs11121295 and AD risk in the discovery, replication and combined studies P, P value for haplotype model, which obtained in logistic regression with adjustment for age, sex, smoking status and drinking status. P i , means P 2 /P 1 or P 1 /P 2 .

Table 4: MDR analysis for the prediction of AD risk with and without rs11121295
Labels: 1, age; 2, rs11121295; 3, drinking status; 4, education; 5, gender; 6, smoking status. a, P value for 1000-fold permutation test. b, the best model with maximum cross-validation consistency and minimum prediction error rate. www.oncotarget.com our study, a significant change appeared after taking away rs11121295, further indicating that it is the logical risk locus.
The 1p36 is frequently deleted in neural crest tumors including neuroblastomas, raising that this region may contribute to the neurologic and developmental issues. CHD5 gene contains 42 exons and spans more than 78 kb, it acts on acid anhydrides in phosphorus-containing anhydrides annotated by gene ontology. A downstream gene CDKN2A could regulate the P53 pathway in particular, which in turn, impedes cell proliferation [9]. These suggested CHD5 plays an important role in the neurogenesis and development by activating expression of specific genes promoting neuron terminal differentiation.
Moreover, this study first explored the potential gene-environment interactions by stratification (addressed ORs) adding high-order interactions assessed by FPRP and multiple dimension reduction (MDR) analyses with five known risk factors (age, gender, education, smoking and drinking status), which suggested that older age ( > 60 years), drinking, as well as rs11121295, contributed to an increased Alzheimer's disease risk at different levels. These experiment-wise results further revealed that potential gene-environment interactions seem to predispose to Alzheimer's disease. Although the FPRP can yield serious inferential errors, the FPRP was still proposed as a Bayesian prophylactic against false reports of significant associations [15] .
In developed countries, dementia is one of the most financially costly diseases. It led to about 1.9 million deaths every year. Of them, the cause of 60% to 70% of cases is Alzheimer's disease. Its most common symptoms include language barrier, orientation disorder, mood swings, loss of self-control and behavioural issues. Alzheimer's disease most often begins in people over 60 years and older, although about 4% of cases are early-onset which begin before this. Growing evidence from previous epidemiological studies and meta-analysis has firmly established the important role of older age as a causal factor for Alzheimer's disease [16][17][18]. A report from the World Health Organization deemed that prevalence rates in developing regions like China are lower than that in developed regions [19]. Chan et al. reported Alzheimer prevalence was estimated to be 2.6% in the 65-69 age group, and 60.5% in the age 95-99 years in China in 2010 [20]. This suggested the incidence of Alzheimer's disease increases with age. Though unmatched ages with cases were used in this case-control study would lead to incorrect risk assessment, a proper statistical analysis can well control for differences between the groups. Estimates of effects can be statistically adjusted for covariates that may be different between cases and controls. Logistic regression is mostly used, the weighted one being known as the one that controls for confounders for example. This may still be good enough for the purpose, therefore, the results are still quite valuable.
The genetic heritability of Alzheimer's disease ranges from 49% to 79% based on family studies, and about 0.1 percent of the cases are autosomal dominant inheritance, which have an onset before age 65 [21][22][23]. But most cases do not exhibit autosomal-dominant inheritance and are termed sporadic Alzheimer's disease, in which genetic and environmental differences may act as risk factors. In our study, drinking status with CHD5 rs11121295 variant presented a promising association with Alzheimer's disease risk compared with non-drinking carrying that genotype, which indicated that this variant might act in response to drinking, and true associations might be detected by alcoholic stimulation. It has been shown that alcohol could modulate the effect of various cytokines, receptors or neuroimmune signaling in brain, such as midkine (MDK) [24], PPARgamma receptor [25], and HMGB1, miRNA and TLR receptors [26]. Therefore, we speculated, alcohol intake might trigger proinflammatory events through their induction of oxidative stress and extensive inflammation.
The process of Alzheimer's disease is related with tangles and plaques in the brain [2,27]. The standard diagnosis is in view of the medical history, cognitive examinations, imaging check and blood testing to preclude other possible causes. However, evidence to support the proposals is usually not sufficient. Therefore, clinical genetic examinations may play an important role in the future.
In summary, we identified rs11121295 variant in CHD5 gene that was highly associated with risk of developing Alzheimer's disease from Chinese descent. CHD5 is a nuclear protein which forms a nucleosome remodeling and deacetylation (NuRD) complex, and it may affect the brain neurons in the level of chromatin remodeling and gene transcription [28]. Nevertheless, we do not yet understand the mechanisms of how it is impacting on the expression of CHD5 protein in the brain. Hence, it will be much more interesting as further investigations of this gene are implemented, not only in Alzheimer's disease patients but also in healthy commons.

Subjects
In the first step, 205 unrelated Alzheimer's disease patients and 255 healthy control people who had no history of Alzheimer's disease and other conspicuous diseases from Zibo Center Hospital in North China were included as the discovery study. Then, a replicative study including 397 Alzheimer's disease cases and 510 controls (from Guangdong Provincial People's Hospital, Guangzhou Brain Hospital and Peking University Shenzhen Hospital in South China) was carried out. Finally, the above both were included as the combined study. At recruitment, each study participant (or his/her relative) was interviewed via a structured questionnaire, to obtain information on demographic characteristics, habits of cigarette smoking and alcohol drinking, as well as personal and family history of major chronic illnesses. A pack of cigarettes was defined as 20 cigarettes in China. "Ever or current smoking" were defined by valuing subjects who had smoked more than 5 packs in their whole life before the date of diagnosis for cases, or before the date of the interview for controls [29]. "Ever or current drinking" as having consumed alcoholic beverages ≥1 time/week for ≥ 6 months previously; otherwise, they were defined as nondrinkers [30]. One drink was regarded as 30 g of spirits (12.9 g of ethanol), 103 g of wine (12.3 g of ethanol), or 360 g of beer (12.6 g of ethanol) [30]. Less education implied that he only accept the primary school education or less [31]. The main features of the subjects included are summarized in Table 5. The Ethics committee of Guangdong Medical University authorized the protocol of this study. The study also adhered to tenets in the declaration of Helsinki.

Targeted sequencing, variants selection and genotyping
Genomic DNA was extracted from the whole blood leukocytes. We sequenced whole CHD5 gene with next generation sequencing technology (Illumina Genome Analyzer) in 255 controls and 205 Alzheimer's disease samples. A targeted resequencing study was performed on the Illumina platform with pair-end 90 bp reads. Following the manufac turer's instructions, shotgun libraries were built from 5 microgram of genomic DNA, and genomic DNA diluted in Tris-EDTA buffer was sheared into about 500-bp fragments. The DNA fragments were subsequently tailed with A. Then the Illumina sequencing adaptors were ligated to the samples. Finally, the adaptor-linked fragments were enriched via PCRs. The prepared library was subsequently hybridized to capture probes. The captured fragments were then amplified with the following protocol: incubation at 95 °C for 5 min followed by 25 cycles of 95 °C for 15 s, 56 °C for 30 s and 72 °C for 60 s and a final extension at 72 °C for 8 min. PCR products were purified and finally sequenced with standard 2 × 90bp paired-end reads on the Illumina HiSeq 2000 sequencer.
The reads were aligned to the reference genome hg19 (NCBI build 37.1) (NCBI build 37.1) [32]. Singlenucleotide variants that met any of the following criteria were then filtered: P for Hardy-Weinberg equilibrium < 10 −4 [33], duplicated pair-end reads, overall depth ≤ 8×, copy number variant ≥ 2, or SNP within 10 bp of a gap. In this evaluation, we only considered the qualified SNPs, www.oncotarget.com thus yielding a 164-SNPs set, which will be used as the primary case-controls study.
We analyzed the associations of variants (allele frequencies > 1%) and Alzheimer's disease risk. Only three variants (rs11121295, rs10864393 and rs9434741) entered the next step study for their lower P values. Then, genomic DNAs from all the other subjects (510 controls and 397 cases) were genotyped by TaqMan probes in Applied Biosystems ABI 7500 Fast System (Forster City, CA) for the above selected three variants. The PCR of samples heated to 95 o C for 10 min followed by 40 cycles of 92 o C for 15s and 60 o C for 1 min.

Statistical analysis
Chi-square test and Mann-Whitney U-test were used to assess the difference of demographics between cases and controls. The genotype distributions in controls were analyzed with Hardy-Weinberg equilibrium (P HWE > 0.01).
Haplotype estimation and permuting association analysis were executed with Haploview program [34] for 10 5 permutations (the 'Single Markers Only' option was used) , in which the subjects' phenotypes were randomly realigned. By convention if P < 0.05, the difference was considered statistically significant. The Q-Q plot was then performed to check the distributions of P value. We used homozygote (DD vs. AA) and heterozygote (DA vs. AA), as the models (D-derived alleles and A-ancestal alleles). Logistic regression was used to test the association, which adjusted for gender, age, education, smoking and drinking. A dose-dependent effect was assessed by the trend test of odds ratios (ORs). The variant(s) (which P for genotypes < 0.0003 (0.05/164) ) will be entering the next replication study. The FPRP and MDR program [35] was used to evaluate the possible high-order gene-environment interaction. The minimum average prediction error and the maximum CVC were required for the best candidate interaction model. SPSS 22.0 for windows (SPSS, Chicago, IL) and R scripts (3.0.2 Suite) were performed in the statistical analyses.