Accumulated promoter methylation as a potential biomarker for esophageal cancer

We performed a two-stage molecular epidemiological study to explore DNA methylation profiles for potential biomarkers of esophageal squamous cell carcinoma (ESCC) in a Chinese population. Infinium Methylation 450K BeadChip was used to identify genes with differentially methylated CpG sites. Sixteen candidate genes were validated by sequencing 1160 CpG sites in their promoter regions using the Illumina MiSeq platform. When excluding sites with negative changes, 10 genes (BNIP3, BRCA1, CCND1, CDKN2A, HTATIP2, ITGAV, NFKB1, PIK3R1, PRDM16 and PTX3) showed significantly different methylation levels among cancer lesions, remote normal-appearing tissues, and healthy controls. PRDM16 had the highest diagnostic value with the AUC (95% CI) of 0.988 (0.965–1.000), followed by PIK3R1, with the AUC (95% CI) of 0.969 (0.928–1.000). In addition, the methylation status was higher in patients with advanced cancer stages. These results indicate that aberrant DNA methylation may be a potential biomarker for the diagnosis of ESCC.


INTRODUCTION
Esophageal cancer is one of the most common cancers worldwide, with approximately 456,000 new cases and 400,000 deaths in 2012 [1,2]. Esophageal squamous cell carcinoma (ESCC) is the most prevalent esophageal cancer in the world, especially in Asian countries [3,4]. ESCC is highly invasive and rapidly metastatic, often resulting in a poor postoperative quality of life [5,6]. In spite of clinical advances in the field of oncology, the overall long-term survival rates of ESCC remain dismal [7]. If patients were diagnosed and treated at an early stage, the five-year survival rate after endoscopic mucosectomy could reach 100% [8]. Therefore, there is an urgent need to identify sensitive and specific biomarkers for the early diagnosis of ESCC.
One of the early events that occur during carcinogenesis are the epigenetic changes [9,10]. Epigenetic modifications cause heritable changes to cells without changes to DNA sequence. Epigenetic modifications, such as methylation, histone modifications, DNA replication timing, nucleosome positioning, or heterochromatization, result in selective gene expression or repression [9,11]. DNA methylation is one of the most extensively characterized epigenetic modifications [12,13]. Aberrant DNA methylation has been associated with various human diseases, including cancer [14], autoimmune diseases [15], mental illness [16], and cardiovascular diseases [17]. Large-scale methylation analysis of human genomic DNA may provide a better understanding of the molecular mechanisms involved in the esophageal carcinogenesis [18].

Research Paper
Oncotarget 680 www.impactjournals.com/oncotarget In this epidemiological study, we analyzed the impact of aberrant DNA methylation levels on the clinical and pathological features of ESCC in a Chinese population, and we investigated the methylation profile as a potential biomarker for the diagnosis of esophageal cancer.

Identification of candidate genes
The heat map of hierarchical clustering of methylation according to the data from the Infinium Methylation 450K array is shown in Supplementary  Figure 1. Based on diffScore, delta β and gene function,

Validation of methylated CpG sites
We collected 43 cancer lesion samples, 43 remote normal-appearing esophageal tissues, and 10 healthy control tissues. The patients included 28 males and 15 females, with the age ranging from 46 to 81 years (Table 1). We also recruited 10 healthy controls, including 7 males and 3 females, with the age ranging from 42 to 74 years (mean ± standard deviation: 58.8 ± 9.2 years). We sequenced 1160 CpG sites in the promoter region of 16 candidate genes. After excluding loci with low calling rate, 961 CpG sites in 15 genes met the requirements for further analysis (Table 2). There were 33.82% (325/961) CpG sites showing significant differences in the distribution of methylation between ESCC and normal esophageal tissues (P < 0.05). The proportion of differentially methylated sites in each gene is shown in Figure 1. There were 195 sites having 2 to 10 fold changes and 58 sites having more than 10 fold changes between ESCC and normal esophageal tissues. 299 out of differentially methylated 325 CpG sites (92 %) had higher methylation level in ESCC samples compared with healthy controls. In addition, 254 CpG sites had significantly different methylation between remote normal-appearing tissues and health controls, and 221 CpG sites had significantly different methylation status between ESCC and remote normal-appearing tissues. The above results are summarized in a Venn diagram in Figure 2. There were 64 CpG sites differentially methylated between these three groups (cancer lesions, remote normal-appearing samples, and health controls). Among them, 54 CpG sites were located in the gene of PRDM16.

Diagnostic value analysis
We further analyzed the cumulative methylation levels by considering multiple CpG sites in each gene. The diagnostic values of selected CpG sites and genes were estimated based on three different models.

Model 3
We further excluded CpG sites with negative correlations and kept 299 sites for analysis. Ten genes (BNIP3, BRCA1, CCND1, CDKN2A, HTATIP2, ITGAV, NFKB1, PIK3R1, PRDM16 and PTX3) had significantly different methylation status among the three groups. The methylation levels of BNIP3, CCND1, CDKN2A, HTATIP2, ITGAV, NFKB1, PIK3R1, PRDM16 and PTX3 were significantly different between esophageal cancer and healthy control tissues ( Table 4). The AUC (95% CI) of each gene in the diagnosis of ESCC is listed in Table 4. The methylation of PRDM16 gene showed the highest diagnostic value, with the AUC (95% CI) of 0.988 (0.965-1.000), followed by PIK3R1, with the AUC (95% CI) of 0.969 (0.928-1.000). Compared with findings using model 1 and model 2, the AUC of each gene in model 3 has greatly increased. Especially for BRCA1 and CDKN2A, the AUC increased from less than 0.5 to 0.712 and 0.912, respectively. Based on the model 3, the cumulative methylation level of most genes increased with the histologic changes from normal to normalappearing tissues and cancer lesions ( Figure 3). To avoid false positives caused by multiple comparisons between groups, we used the Bonferroni correction method. Using Bonferroni correction, 49 CpG sites in 4 genes were significant, including 1 site in BNIP3, 1 site in PIK3R1,

Methylation status and clinical characteristics
The methylation frequency was higher in patients at advanced cancer stages. For example, samples from patients with N1-3 stage had an average cumulative methylation value of 9.56 in RASSF1 gene, which was significantly higher than that in patients at N0 stage (cumulative methylation value: 3.54). For HTATIP2 gene, samples from patients at G1-2 stages also had a significantly higher cumulative methylation level compared with patients at G3 stage (P < 0.05, Figure 4). The cumulative methylation levels of these genes did not correlate with patient's gender (male and female) and age (< 60 and >= 60 years).

DISCUSSION
When cancer occurs, a massive global hypomethylation is frequently observed, while certain genes can be hypermethylated at the CpG islands [19]. Previous studies have indicated aberrant DNA methylation in esophageal cancer; however, those studies have focused on limited CpG sites [20][21][22]. In this study, we used a two-stage study design, sequenced 1160 CpG sites in the promoter region of 16 candidate genes, and demonstrated that aberrant DNA methylation can be a potential biomarker for esophageal cancer.
Compared with other methods, such as MSP, Q-PCR, MethyLight or bisulfite pyrosequencing, NGS used in this study can capture full sample diversity with small amounts of DNA. In addition, NGS can enhance epigenetic analyses with high coverage density and flexibility, which help advance our understanding of epigenetics at the genomic level [23]. A fluorescently labeled reversible terminator is utilized in this system, allowing for the accrual of qualitative and quantitative information of nucleic acid at an incredible throughput while incurring relatively limited costs [24].
One of the most robust epigenetic marks found in this study was PRDM16 gene. PRDM16 is located near the 1p36.3 breakpoint, encoding a zinc finger transcription factor and contains an N-terminal PR domain. It is known to be a fusion partner of RPN1, RUNX1 and other genes in hematopoietic malignancies [25]. The malfunction of PRDM16 is related to a poor prognosis of cancer patients [26]. For example, PRDM16 is often methylated in lung cancer cells, with downregulated protein expression [27]. The demethylation drug 5-aza-2ʹ-dC upregulates PRDM16 expression and suppresses growth of lung cancer cells [27]. Other genes with a higher AUC (over 0.9) for distinguishing ESCC were PIK3R1 and CDKN2A. PIK3R1 encodes a p85 regulatory subunit alpha and appears to play a tumor suppressor role because PI3K subunit p85α (p85α) regulates and stabilizes p110α [28]. A previous study has reported that the expression of PRK3R1 negatively correlates with hypermethylation of CpG sites in PIK3R1 [29]. Our results also showed similar negative correlations, although they were not statistically significant; this may be due to the limited sample size. CDKN2A blocks phosphorylation of the Rb protein and inhibits cell cycle progression. CDKN2A is aberrantly methylated in esophageal cancer [30], and is associated with metastatic and invasive phenotypes [31]. Similar CDKN2A methylation patterns have been observed in gastric and nasopharyngeal carcinoma [32,33]. As the regional lymph node metastasis is associated with the patient's prognosis, the methylation status of these genes might be used to assess the possibility of recurrence and metastasis of ESCC patients and also help to implement proper medications. Moreover, our study shows that the methylation levels of selected genes, such as RASSF1 and HTATIP2, change with the cancer stages, indicating their potential values in the prognosis of ESCC.
There are several limitations in this study. First, the bisulfite conversion efficiency is critical for the accuracy and the reliability of the results. The incomplete conversion of unmethylated cytosine to uracil or inappropriate conversion of methylcytosine to thymine can cause over-or underestimation of the methylation level. It is also noteworthy that the bisulfite conversion technique cannot be used to discriminate the methylated cytosine from 5-hydroxymethylcytosine [34]. Second, the false positives may be caused by multiple comparison when we compared various CpG sites between groups. We used the Bonferroni correction method to adjust for the test level; however, this is an overcorrection when the tests are correlated [35]. Third, aberrant DNA methylation usually occurs somatically in cancer cells and can also be detected in peripheral blood samples [36]. To evaluate the clinical use of aberrant DNA methylation, a blood-based assay is preferable, since it uses a far less invasive procedure.
In conclusion, aberrant DNA methylation is a promising biomarker that has a good predictive value for identifying esophageal cancer in a molecular diagnostic laboratory. The hypermethylation status of PRDM16, PIK3R1, and CDKN2A genes might be used as a potential biomarker for the diagnosis of ESCC.

Study design
First, we used the Illumina Infinium 450K Methylation Beadchip to construct a genome-wide DNA methylation profile. Then, candidate genes were selected for the validation using the Next-Generation Sequencing (NGS) platform (Illumina MiSeq platform).

Study subjects
This study was approved by the Ethics Committee of Nanjing Medical University. Written informed consent was obtained from all participants. The methods were carried out in accordance with the approved guidelines. Esophageal cancer patients were recruited in the Yangzhong People's Hospital from 2012 to 2016. Yangzhong is an area with high morbidity and mortality rates of the upper digestive tract cancers [37]. The inclusive criteria were: (1) Patients were diagnosed as ESCC with histopathological evidence; (2) All patients were of Chinese Han origin living in Yangzhong longer than five years; (3) Patients underwent esophagectomy and the lesions were eligible for sampling; (4) None of the patients had received preoperative radiotherapy or chemotherapy. Tissues in the center of the cancer lesion and remote normal-appearing esophagus were excised and immediately stored in -80 o C freezer. Healthy control esophageal tissues were collected from individuals who had no cancer history and participated in a screening program for upper digestive tract cancers.

DNA extraction
Genomic DNA was extracted from tissues using the QIAmp DNA Mini Kit (Qiagen, Hilden, Germany). The quality and concentration were evaluated with Thermo NanoDrop 2000-1 spectrophotometer (NanoDrop Technologies, Montchanin, DE, USA).

Infinium methylation 450K array
We used the Infinium 450K Methylation Beadchip (Illumina, San Diego, CA, USA) to evaluate the methylation status of five paired tumor samples and corresponding remote normal-appearing esophagus tissues, along with two normal controls from the healthy population.

Next-generation sequencing (NGS) Primer design and optimization
Genomic regions were analyzed and transformed to bisulfite-converted sequences by gene CpG software. The primers were designed by the Gensky Bio-Tech Co., Ltd.
(Shanghai) to amplify regions of interest from the bisulfite converted DNA. Different sets of primers were compared using 1 ng bisulfite modified positive and negative control DNA samples. The final optimized primers are listed in Table 5.

Bisulfite conversion and multiplex amplification
Genomic DNA (about 400 ng) was subjected to sodium bisulfite modification using EZ DNA Methylation™-GOLD Kit (Zymo Research, Orange, CA, USA) according to the manufacturer's protocols. An unmethylated cytosine was converted to uracil when treated with bisulfite, whereas a methylated cytosine remained as cytosine [38]. A multiplex PCR was performed using the optimized primer sets. A 20 µl PCR reaction mixture was prepared for each reaction, including 1x buffer (TaKaRa, Tokyo, Japan), 3 mM Mg 2+ , 0.2 mM dNTP, 0.1 µM of each primer, 1U HotStarTaq polymerase (TaKaRa, Tokyo, Japan) and 2 µl of template DNA. The cycling program was 95ºC for 2 min; 11 cycles of 94ºC for 20 sec, 63ºC for 40 sec with a decreasing temperature step of 0.5ºC per cycle, 72ºC for 1 min; then followed by 24 cycles of 94ºC for 20 sec, 65ºC for 30 sec, 72ºC for 1 min; 72ºC for 2 min.

Index PCR and sequencing
PCR amplicons were then diluted and amplified using the indexed primers. Specifically, a 20 µl mixture was prepared for each reaction, including 1x buffer (NEB, MA, USA), 0.3 mM dNTP, 0.3 µM forward primer, 0.3 µM index primer, 1 U Q5 TM DNA polymerase (NEB, MA, USA) and 1 µL of diluted template (PCR amplicons from the previous step). The cycling program was 98ºC for 30 sec; 11 cycles of 98ºC for 10 sec, 65ºC for 30 sec, 72ºC for 30 sec; 72ºC for 5 min. The PCR products (170 bp -270 bp) were separated by agarose electrophoresis and purified using the QIAquick Gel Extraction kit (Qiagen, Hilden, Germany). Libraries from different samples were quantified and pooled together, followed by sequencing on the Illumina MiSeq platform according to the manufacturer's protocols. Sequencing was performed with a 2 × 300 bp paired-end mode. Quality control of sequencing-reads was performed by FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). Filtered-reads were aligned back to the reference genome using the Bismark software (http://www.bioinformatics. babraham.ac.uk/projects/bismark/). After reads recalibration with USEARCH [39], the methylation and haplotype were analyzed using the Perl script.

Statistical analysis
We used the IBM SPSS Statistics 19.0 (IBM Corp., NY, USA) and the R program (https://www.r-project.org/) to analyze the data. Individual and cumulative methylation statuses of candidate genes were analyzed. We used the t-test, ANOVA or nonparametric test to compare the differences of methylation between groups. Considering the false positive caused by multiple comparisons, the Bonferroni correction was applied. The receiver operative characteristics (ROC) curve was drafted to reflect the diagnostic value of biomarkers. The area under the curve (AUC) together with 95% confidence interval (CI) were calculated.