Comprehensive analysis identifying aberrant DNA methylation in rectal mucosa from ulcerative colitis patients with neoplasia

Background There are no biomarkers to facilitate the identification of patients with ulcerative colitis (UC) who are at high risk for developing colorectal cancer (CRC). In our current study, we used rectal tissues from UC patients to identify aberrant DNA methylations and evaluated whether they could be used to identify UC patients with coexisting colorectal neoplasia. Results Using a training set, we identified 484 differentially methylated regions (DMRs) with absolute delta beta-values > 0.1 in rectal mucosa by using the ChAMP algorithm. Next, pathway enrichment analysis was performed using 484 DMRs to select coordinately methylated DMRs, resulting in the selection of 187 aberrant DMRs in rectal tissues from UC-CRC. Then, the Elastic Net classification algorithm was performed to narrow down optimal aberrant DMRs, and we finally selected 11 DMRs as biomarkers for identification of UC-CRC patients. The 11 chosen DMRs could discriminate UC patients with or without CRC in a training set (area under the curve, 0.96) and the validation set (area under the curve, 0.81). Conclusions In conclusion, we identified 11 DMRs that could identify UC patients with CRC complications. Prospective studies should further confirm the validity of these biomarkers. Methods We performed genome-wide DNA methylation profiles in rectal mucosal tissues (n = 48) from 24 UC-CRC and 24 UC patients in a training set. Next, we performed comprehensive DNA methylation analysis using rectal mucosal tissues (n = 16) from 8 UC-CRC and 8 UC patients for validation.


INTRODUCTION
Patients with long-standing ulcerative colitis (UC) are at a higher risk than the general population for developing colorectal cancer (CRC). The prevalence of colitis-associated cancer (CAC) in patients with UC is 8% 20 years after the initial UC diagnosis and increases to 18% at 30 years [1]. CAC is a major cause of mortality in patients with UC [2,3], so that diagnosis at an early or precancerous stage is crucial.

Research Paper www.oncotarget.com
Surveillance colonoscopy with multiple random biopsies has been widely recommended for patients with long-standing and extensive UC [4]. However, the low yield and lack of clinical consequences from random biopsies in this high-risk population raise questions about the necessity and cost-effectiveness of such UC surveillance [5]. A recent randomized controlled trial to compare rates of neoplasia detected by targeted vs random biopsies in patients with UC from Japan revealed that these methods detected similar proportions of neoplasias. However, a targeted biopsy approach appears to be a more cost-effective method [6].
More accurate diagnostic modalities, such as chromoendoscopy and magnifying endoscopy, to identify potential sites of neoplasia in a non-neoplastic inflamed epithelium, together with analysis of p53 alterations, to distinguish neoplastic lesions from regenerative epithelium, have been evaluated [7,8]. However, the labor-intensive nature and expense of these adjunctive modalities preclude their use in the surveillance of all UC patients with longstanding and extensive colitis. We suggest that within this subgroup of patients, the ability to distinguish those who are at low vs. high risk of colorectal neoplasia would allow physicians to identify those patients most likely to benefit from these more extensive screening methods.
Recently, we reported that methylation of specific miRNAs (MIR1, MIR9, MIR124, MIR137 and MIR34B/C) occurred in an age-and cancer-dependent manner in UC patients, and that methylation of these 5 miRNAs in nonneoplastic rectal mucosa successfully discriminated patients with UC-CRC from those without in 2 independent patient cohorts [9]. However, the study had limitations because it focused on methylation of aging-and cancer-associated miRNAs. Therefore, further studies including a broader, unbiased genome-wide analysis may potentially identify additional methylation loci to assess for the risk for UC-CRC.
In the current study, we examined the possibility of using genome-wide DNA methylation array analysis of rectal tissues of UC patients to identify aberrant DNA methylation and thereby identify UC patients who had coexisting colorectal neoplasia.

Pathway enrichment analysis
The 484 DMRs were subjected to a gene set enrichment analysis in order to select coordinately methylated DMRs. We assumed that coordinately methylated genes would have biologically significant roles and provide potentially robust diagnostic power. We searched for the overrepresented biological pathways associated with the differentially methylated genes using the Enrichr analysis tool [11]. The 464 genes associated with 484 DMRs were used as the input, 268 being hypermethylated and 196 being hypomethylated. Results for the enrichment analysis are shown in Tables 1 and 2. Enrichr's combined score, a combination of the P value and z-score, was used to prioritize enriched pathways.
The hypermethylated genes were associated with 19 enrichment terms, including cancer cell lines of lung, ganglia and central nervous system and skin (Table 1). The hypomethylated genes were associated with 24 enrichment terms, including cancer cell lines as well as terms in the Disease_Permutaions_from_GEO_up library (Table 2). Colon cancer and adenocarcinomas are included in the library and all of the genes in these terms are categorized as upregulated. We extracted genes that appeared at least once in the enrichment terms. Of these, 90 genes were hypermethylated and 90 were hypomethylated. DMRs (187) were associated with these genes and used as input for the next step.

DISCUSSION
It has been proposed that in UC patients, chronic inflammation increases epithelial cell turnover in the colonic mucosa, resulting in the acquisition of genetic and epigenetic alternations in non-neoplastic mucosa during cancer development [13][14][15][16]. In fact, previous studies have demonstrated that patients with UC-CRC have widespread genetic alterations in non-neoplastic colonic mucosa [17][18][19][20]. These results suggested that the detection of molecular changes in non-neoplastic mucosa could provide biomarkers for predicting the risk associated with CRC in UC patients. However, they have not been confirmed by subsequent studies and are routinely used in the clinical setting.  On the other hand, inflammation in UC patients characteristically begins in the rectal mucosa and spreads progressively and contiguously to the proximal colon [21]. We suggest that non-neoplastic rectal mucosa might be the optimal site to investigate aberrant molecular changes during carcinogenesis. Few studies of rectal mucosa have been conducted to search for predictive markers capable of identifying high risk UC patients with developing CRC. Watanabe et al. used microarray analysis of rectal mucosa from UC patients to identify a gene expression signature that was predictive of the development of UC-associated neoplasm [22]. We found 5 miRNAs to be hypermethylated in rectal mucosa from UC patients with dysplasia or CRC compared with patients without neoplasms, and they might be used to identify patients with UC at greatest risk for developing UC-CRC [9].
However, comprehensive high-throughput analysis of methylation status in rectal mucosa has not been reported.
In the current study, we undertook the first comparison of differential DNA methylation profiles in rectal mucosa between UC patients with CRC and those without using the Illumina HumanMethylation450 BeadChip. Then, we narrowed down specific DMRs by statistical analyses using several algorithms. Finally, we searched for a set of DMRs that were significantly different in rectal mucosa from UC-CRC in cohort 1. The 11 DMRs that we identified in rectal mucosa were robust in discriminating UC patients with CRC from those without, with AUC values of 0.96 (95% CI: 0.90, 1.00). Thereafter, 11 DMRs were successfully validated in an independent set of rectal samples from UC patients. In addition, the sensitivity and specificity were high in both the training set and validation cohort, as expected. Thus, our results suggest that analysis of the status of 11 DMRs in a single rectal biopsy could help identify UC patients that are at greatest risk of developing neoplasia, which would be a substantially more practical strategy in contrast to current surveillance protocols.
Pathway analysis identified DMR-associated enrichment terms. The hypomethylated genes were associated with 24 enrichment terms, including cancer cell lines as well as terms in the Disease_Permutaions_from_ GEO_up library. Colon cancer [23] and adenocarcinomas [24,25] are included in the library and all of the genes in these terms are categorized as upregulated. Hypomethylation of these DMRs would be associated with upregulation of corresponding genes. However, we did not perform gene expression analysis due to the limitation of FFPE-derived RNA samples. Therefore, further study should be conducted to investigate the association between DNA methylation status on selected DMRs and expression of their genes.
Our Elastic Net classification algorithm selected 11 DMRs. The aberrant methylation of 11 DMRs was more remarkable in neoplastic tissues than in nonneoplastic tissues, which suggest that these genes are associated with carcinogenesis in colonic mucosa. Of these, heart and neural crest derivatives expressed 2 genes (HAND2 and SALL1) reported to be hypermethylated and associated with tumorigenesis in tumor tissues. HAND2 is a basic helix-loop-helix transcription factor that plays a very important role in the development and differentiation of the heart and nervous system [26]. Recent studies revealed that HAND2 was significantly hypermethylated and downregulated in colon and rectal cancer [27,28]. In addition, continuous proliferation of the endometrium was observed in mice with knockdown of HAND2 [29]. Here, our bioinformatic results showed that HAND2's methylation status of rectal mucosa was dysregulated in UC patients with CRC, which was first reported. Collectively, it is suggested that HAND2 has characteristics of tumor suppressor genes in several types of tumors.
SALL1 is a multi-zinc finger transcription factor that regulates organogenesis and stem cell development [30]. In breast cancer, SALL1 acts as a tumor suppressor that recruits the NuRD complex and thereby induces cell senescence [30]. In addition, inhibition of SALL1 correlates with reduced levels of CDH1, an important contributor to the epithelial-to-mesenchymal transition [31].
GDNF was also selected as one of the 11DMRs. The GDNF family of ligands and their receptors activate the Ret signaling pathway and regulate cell survival and proliferation [32]. In addition, Ret expression compromises neuronal cell survival in the colon [33]. Therefore, it has been postulated that GDNF is a novel member in the set of protective mucosal factors [34]. In this context, dysregulation of GDNF could lead to down-regulation of Ret expression and may finally result in failure of colonic mucosal protection. Saito et al. [35] demonstrated that increased methylation of CDH1 and GDNF is correlated with severe inflammation in the colonic mucosa of UC, which indicates a potential epigenetic mechanism underlying mucosal inflammation and occurrence of dysplasia/cancer with chronic inflammation in UC patients.
In conclusion, in screening a patient cohort, we successfully selected 11 DMRs that could identify UC patients that would progress to developing CRC. This was achieved by whole-genome methylation analysis followed by Pathway enrichment analysis and Elastic Net regularized regression modeling. Furthermore, whole-genome methylation data from a validation cohort confirmed the 11 DMRs to be promising biomarkers in UC with CRC. Although our findings were successfully validated with an external independent cohort, the number of patients with UC was still limited, and longitudinally collected rectal specimens from UC patients were not available in this study. Therefore, larger prospective studies will be needed to confirm the validity of these predictors. However, we believe the analysis of these 11DMRs from a single rectal biopsy specimen might have robust predictive potential in permitting the identification of UC patients that are at high risk for neoplasia elsewhere in the colorectum.

Patients and samples
This study analyzed a total of 64 non-neoplastic rectal epithelial specimens that were obtained from 64 patients diagnosed with UC from 2 different patient cohorts enrolled at Hyogo College of Medicine and Mie University in Japan. In the training set, 48 non-neoplastic rectal epithelial specimens were collected from 48 UC patients with (n = 24) or without dysplasia or cancer (n = 24). All formalin-fixed paraffin-embedded (FFPE) samples were retrieved from colectomy specimens that were collected at the Hyogo College of Medicine between 2005 and 2011 (Supplementary Table 2). There was no significant difference between the UC patients with cancer and without in training cohort. In the validation cohort, 16 non-neoplastic rectal epithelial specimens were collected from 16 UC patients with (n = 8) or without dysplasia or cancer (n = 8) (Supplementary Table 3). All tissues were also FFPE and were retrieved from colectomy specimens resected at the Mie University Hospital between 2005 and 2015. Although significant difference was not recognized in validation cohort, it had a trend to increase disease duration, and to decreases disease severity in UC patients with cancer. Specimen collection and studies were approved by the Institutional Review Board of all participating institutions. All participants provided written informed consent and willingness to donate their tissue samples for research. www.oncotarget.com The diagnosis of UC was based on medical history, endoscopic findings, histologic examination, laboratory tests, and clinical disease presentation. Patients who presented with their first attacks, or infectious colitis caused by C. difficile or cytomegalovirus were excluded from this study.

DNA extraction from FFPE samples
FFPE tissue blocks were serially sectioned at a thickness of 10 μm. Based on histologic findings, mucosal tissues from each region were micro-dissected and genomic DNA was extracted using the QIAamp DNA FFPE tissue kit (Qiagen) according to the manufacturer's instructions.

Methylation analysis
Whole-genome DNA methylation profiles were quantified using the Infinium HumanMethylation450 BeadChip Array (Illumina), which measures 485,577 CpG sites at Riken Genesis Co., Ltd., Japan. Prior to the BeadChip Array analysis, quality control of FFPE DNA was performed using the Illumina FFPE QC Kit and Fast SYBR Green Master Mix. Amplified fluorescence was measured using a Step One Plus Real-Time PCR System. The Ct value of each sample was determined and the differences between sample and positive control (delta Ct) were measured. Samples with delta Ct below 5 were passed and bisulfite treated using the EZ DNA Methylation Kit (D5004; Zymo Research, Inc., Irvine, CA, USA). To repair damaged DNA, the Infinium HD FFPE restore Kit was used. The repaired DNA was isothermally amplified overnight at 37° C, followed by an enzymatic fragmentation step. The fragmented DNA was precipitated, resuspended and loaded on the 12-sample BeadChip that was then incubated overnight at 48° C, allowing the fragmented DNA to hybridize to locusspecific 50-mers. Non-specifically hybridized DNA was washed away, followed by a single-base extension reaction using DNP-and biotin-labeled ddNTPs (with the use of a Tecan EVO robot). Subsequently, hybridized DNA was removed from the labeled oligonucleotides and chips were dried under vacuum and imaged using an Illumina iScan device. Data were extracted using GenomeStudio (Illumina, Methylation Module v1.9), which was also used to subtract the background and to normalize staining intensities using internal controls present on the chip. A beta-value was calculated to estimate the methylation level of each CpG locus using the ratio of intensities between methylated and unmethylated alleles (0 = unmethylated, 1 = fully methylated).

Identification of differentially methylated regions
Differentially methylated regions (DMRs) were identified using the ChAMP methylation analysis package in R. Briefly, intensity data from IDAT files were loaded and normalized using default settings (i.e., beta-mixture quantile normalization; BMIQ), after which methylation variable positions (MVPs) were identified using R package limma to compare 2 groups. DMRs were identified using an algorithm "probe lasso" implemented in the ChAMP package. DMRs were defined as regions containing 3 or more adjacent probes within a region showing unidirectional changes in methylation that attained nominal significance (unadjusted p < 0.05) in the MVP analysis. The lasso region was set to 2 kb and was scaled according to the local genomic/epigenomic landscape in order to account for uneven probe spacing across the genome.

Pathway enrichment analysis
Pathway analysis of differentially methylated genes was performed using enrichR, which provides an R interface to all Enrichr databases [11], a web-based tool for analyzing gene sets and returns any enrichment of common annotated biological functions. Enrichr currently contains annotated gene sets from 128 gene set libraries organized in eight categories. We used Cancer_Cell_Line_ Encyclopedia, Disease_Perturbations_from_GEO_up, Disease_Perturbations_from_GEO_down, Disease_ Signatures_from_GEO_up_2014, Disease_Signatures_ from_GEO_down_2014, Jensen_DISEASES, MSigDB_ Oncogenic_Signatures and NCI-60_Cancer_Cell_Lones gene set libraries to identify coordinately methylated genes. We considered "terms" in these libraries as enriched if their adjusted p value was lower than 0.05 and select genes included in the extracted terms.

Logistic regression analysis
Average beta-values of CpG sites in each DMR were calculated and used to build an Elastic Net regularized regression model using the glmnet package in R. Elastic Net is a generalized linear model that operates as a mix of ridge regression and LASSO, which was specifically designed to overcome issues of large variable numbers and small sample size [36]. To account for the randomness of the procedure, we performed it 100 times [37]. After running the 100 iterations, we selected the subset of DMRs that appeared in all 100 to choose a robust subset of DMRs that might be more applicable to other studies. Receiver operating characteristic (ROC) analysis was performed and we calculated the area under the curve (AUC) using pROC package in R.