Methylation of MGMT and ADAMTS14 in normal colon mucosa: biomarkers of a field defect for cancerization preferentially targeting elder African-Americans.

Somatic hypermethylation of the O6-methylguanine-DNA methyltransferase gene (MGMT) was previously associated with G > A transition mutations in KRAS and TP53 in colorectal cancer (CRC). We tested the association of MGMT methylation with G > A mutations in KRAS and TP53 in 261 CRCs. Sixteen cases, with and without MGMT hypermethylation, were further analyzed by exome sequencing. No significant association of MGMT methylation with G > A mutations in KRAS, TP53 or in the whole exome was found (p > 0.5 in all comparisons). The result was validated by in silico comparison with 302 CRCs from The Cancer Genome Atlas (TCGA) consortium dataset. Transcriptional silencing associated with hypermethylation and stratified into monoallelic and biallelic. We also found a significant clustering (p = 0.001) of aberrant hypermethylation of MGMT and the matrix metalloproteinase gene ADAMTS14 in normal colonic mucosa of CRC patients. This suggested the existence of an epigenetic field defect for cancerization disrupting the methylation patterns of several loci, including MGMT or ADAMTS14, that may lead to predictive biomarkers for CRC. Methylation of these loci in normal mucosa was more frequent in elder (p = 0.001) patients, and particularly in African Americans (p = 1 × 10-5), thus providing a possible mechanistic link between somatic epigenetic alterations and CRC racial disparities in North America.


SM1. detection of KRAS and TP53 mutations
For KRAS analysis, 20 ng of genomic DNA were amplified by PCR using primers 5′-TAAGGCCTGCTGAAAATGA-3′ (forward) and 5′ GTCCTGCACCAGTAATATGC-3′ (reversal). PCR cycling was 94ºC for five minutes, followed by 30 cycles at 94ºC for 30 seconds, 55ºC for 30 seconds, and 72ºC for 90 seconds. A final extension step at 72ºC for seven minutes was performed. For TP53 analysis, PCR conditions were identical than those for KRAS amplification, except for the T m that was optimized for every PCR. Primers flanking the exons of interest are listed in supporting table S1.
Single stranded conformation polymorphism (SSCP) analysis was used to screen for mutations within the codon 12/13 region of the KRAS gene and TP53 gene [1]. Five μl of PCR product was denatured for 5 minutes at 96ºC with 50 μl of formamide denaturing dye mixture (94% formamide, 10 mM EDTA, 0.3% xylene cyanol, and 0.3% bromophenol blue) and quickly cooled on ice. Amplified products were resolved by electrophoresis in 8% polyacrilamide gels, then stained with ethidium bromide and visualized on a UV transilluminator.

Illumina human methylation arrays 450K
One μg of DNA was treated with . Befor treatment, DNA concentration, integrity and purity were determined by Qubit DNA High Sensitivity fluorimetry, agarose gel electrophoresis and nanodrop, respectively. The efficiency of the bisulfite treatment was tested by PCR amplification of ADAMTS14 promoter region (see SM3). Transformed DNA was subjected to hybridization on Illumina Human Methylation Arrays 450K following manufacturer's protocol. Results were analyzed using RnBeads software [6].

SM3. methylation analysis of ADAMTS14
One μg of genomic DNA was treated with EZ DNA Methylation™ Kit (Zymo Research, CA). After treatment, DNA was subjected to PCR amplification with primers 5′-GTTTTTAGTTTGGGATTTGG-3′ and 5′-AACAACCTTAAACCACCCTAAC-3′ that amplify a 214 bp sequence containing 24 CpG sites at the 5′ end of ADAMTS14. PCR reactions were performed with Qiagen HotStart Kit, in 25 μl of 1X Buffer, 0.5X Q-Solution, 0.125 mM dNTPs, 0.4 μM of each primer, 100 ng of bisulfitetreated genomic DNA and 1 unit of polymerase. Cycling conditions were as follows: 15 min at 95ºC, followed by 35 cycles of 30 sec at 95ºC, 30 sec at 55ºC and 30 sec at 72ºC, and a final extension of 10 min at 72ºC. For bisulfite sequencing, PCR amplification products were cloned into pCDNA3.2 TOPO-TA (Qiagen) and transformed into E. coli. Plasmids from individual clones were isolated and sequenced with primer T7. For COBRA, PCR products were digested with BstUI for 2 h at 60ºC, resolved by electrophoresis in 12% acrylamide gels and visualized on a UV transilluminator after staining with ethidium bromide.

SM4. exome DNA library preparation, sequencing and analysis
DNA concentration, integrity and purity were determined by Qubit DNA High Sensitivity fluorimetry, agarose gel electrophoresis and nanodrop, respectively. 2.6 μg of high-quality genomic DNA in 130 μl of low TE (10 mM Tris pH 7.5, 0.1 mM EDTA pH 8) were fragmented with the Covaris S2 system in 8 stages of 30 seconds. Settings were: duty cycle = 10%, intensity = 5, bursts per second = 200, duration = 240 seconds, mode = frequency sweeping, power = 23 W, temperature 5.5-6.2ºC. Fragmentation size was assessed on 1 μl of 1:10 dilution in water on a High Sensitivity DNA chip assay on the Bioanalyzer 2100 (Agilent) for an average median target size of 200-300 bp and well within the 100-900 bp range recommended by Illumina. 1 μg of fragmented DNA was used as input for the TruSeq DNA library protocol (Illumina) and subject to end repair (30 min at 30ºC), A-tailing (30 min at 37ºC) and indexed adapter ligation (10 min at 30ºC), with intermediate AMPure XP magnetic bead cleanup steps, followed by PCR amplification on a BioRad Dyad thermocycler (denaturation: 30 sec 90ºC; cycling 10 times: 10 sec 90ºC, 30 sec 60ºC, 30 sec 72ºC; extension: 5 min 72ºC). 500 ng of each of 6 libraries with different indices were pooled and subjected to two rounds of targeted hybridization overnight at 58ºC, with subsequent washing and elution of the captured sequences, followed by a final PCR library amplification step of 10 additional cycles, as above (TruSeq Enrichment, Illumina). Pooled exome libraries were quantified using the KAPA SYBR Fast qPCR kit for Illumina and their size profile was assessed on a High Sensitivity DNA chip on the Bioanalyzer 2100. Exonic sequence foldenrichment efficiency was verified by qPCR using an exon specific (HTT_F 5′ CCTCCCACATGTCATCAGC 3′ and HTT_R 5′ GCAACCACCTCAAGCACAG 3′) and an intergenic specific primers set (BETA-ACTIN-LEFT 5′ AGTGTGGTCCTGCGACTTCTAAG3′; BETA -ACTIN-RIGHT5′ CCTGGGCTTGAGAGGTAGAGTGT 3′), with 400-1000 fold enrichment in all cases. Each exome library pool was clustered on three lanes of a paired end flow cell in a cBot (Illumina) at 10-12 pM and sequenced using SBS v3 reagents (2 × 101 bp) on a HiScan-SQ system (Illumina), to achieve an average of 2 exomes per lane. Raw sequence data was monitored using the Real Time Analysis software from Illumina to assess cluster density and base quality during the run. It was subsequently exported offline to a computing cluster and storage facility. Basecalling was performed with CASAVA 1.8.2 from Illumina. Data was processed using an in house pipeline that involves the following steps: first, reads are trimmed using trimmomatic [7] and aligned using BWA [8], followed by exome capture on target rate and uniformity assessment using the TEQC Bioconductor package [9]. After alignment, duplicates were marked using Picard (http:// picard.sourceforge.net) and alignments around indel regions were refined by local realignment followed by Base Quality Recalibration algorithm with GATK [10]. Variant calling was performed using the UnifiedGenotyper from GATK followed by the application of the Variant Quality Score Recalibration (VQSR) algorithm by GATK [11] in independent steps for SNP or indel calling. Finally, SNPs and indel variants were merged and filtered using GATK and annotated using the SNPeff tool [12]. We performed an additional quality control step by determining the concordance of exonic SNP between the sequencing and Illumina Exon v 1.0 or v 1.1 bead arrays finding it higher than 95% for heterozygous SNPs and between 97 and 98% for homozygous SNPs in all cases. Germline variants and somatic mutations were identified using VarScan [13] with default parameters. Only somatic mutations with a frequency over 25% in the tumor sample, less than 5% in the normal samples and identified by at least 5 independent reads, were considered for the subsequent analyses.

SM6. analysis of the TCGA COAD and READ datasets
We downloaded and combined the data from the COAD (colon adenocarcinoma) and READ (rectum adenocarcinoma) databases from the TCGA (https://tcgadata.nci.nih.gov/). We employed Level 3 methylation data of 398 Human Methylation 450K (HM450) and 278 Human Methylation 27K (HM27) arrays, corresponding to 420 colon primary adenocarcinomas, 163 rectal primary adenocarcinomas, one rectal recurrent adenocarcinoma, 75 normal colonic mucosa and 12 rectal mucosa samples. For samples analyzed in duplicate (TCGA-A6-2672-01, TCGA-A6-5661-01, TCGA-A6-5665-01, TCGA-AG-3728-01 and TCGA-AG-A026-01), ß-values of the probes of interest were combined by averaging across replicates. MGMT methylation status was determined by probes cg12434587, cg12981137 and cg02941816, common to both HM450 and HM27 platforms. These probes interrogate three CpG sites located -239 bp, 128 bp and 248 bp from the MGMT transcriptional start site, respectively, within regions previously shown to associate with transcription of MGMT [15]. ß-values of these probes are below 0.2 in all colorectal normal samples analyzed, but exhibit a bimodal distribution in tumors ( Figure S6). Tumors with an average ß-value in these three probes greater than 0.2 were considered as hypermethylated. Level 2 mutational analysis performed by exome sequencing was available for 386 CRC patients. Clinical information was available for all patients except for patient TCGA-F4-6857, whose exome sequencing results were however included in the COAD somatic mutations database. A summary table of the data employed in this work is available as supporting table S1. Tumors with more than 10 mutations/Mb, also referred to as hypermutated tumors [16], were considered to have a mutator phenotype due to defects in the MMR system. in grey, and non-MSI tumors in white. MSI frequency was similar between Caucasians and African-Americans, and higher in females and older patients ( Figure S1 top). There were more MSI tumors located proximal to the splenic flexure, at Dukes' stages A or B, and with a degree of poor differentiation (Figure 1s middle), and tumors with MSI had fewer mutated KRAS and TP53 ( Figure S1, bottom). These results are in agreement with our initial and subsequent studies on MSI in CRC (1,9), p-values were obtained by Fisher's exact test. In bold type, p-values below 0.05.