Detection of prognostic methylation markers by methylC-capture sequencing in acute myeloid leukemia

Clinical and genetic features incompletely predict outcome in acute myeloid leukemia (AML). The value of clinical methylation assays for prognostic markers has not been extensively explored. We assess the prognostic implications of methylC-capture sequencing (MCC-Seq) in patients with de novo AML by integrating DNA methylation and genetic risk stratification. MCC-Seq assessed DNA methylation level in 44 samples. The differentially methylated regions associated with prognostic genetic information were identified. The selected prognostic DNA methylation markers were independently validated in two sets. MCC-Seq exhibited good performance in AML patients. A panel of 12 differentially methylated genes was identified with promoter hyper-differentially methylated regions associated with the outcome. Compared with a low M-value, a high M-value was associated with failure to achieve complete remission (p = 0.024), increased hazard for disease-free survival in the study set (p = 0.039) and poor overall survival in The Cancer Genome Atlas set (p = 0.038). Hematopoietic stem cell transplantation and survival outcomes were not adversely affected by a high M-value (p = 0.271). Our study establishes that MCC-Seq is a stable, reproducible, and cost-effective methylation assay in AML. A 12-gene M-value encompassing epigenetic and genetic prognostic information represented a valid prognostic marker for patients with AML.

The concentration gradients samples were collected from a randomly selected de novo patients, numbered as C21 in this study. Ten-milliliter (2 tubes) ethylene diamine tetraacetic acid (EDTA)-anti coagulated fresh bone marrow samples were collected at diagnosis. Of which 2ml was used for identifying immunophenotype by flow cytometry (FCM) and the remaining was for extracting mononuclear cells. The antibody of CD117 was finally determined for labelling leukemia cells (70%) based on immunophenotype from FCM. Meanwhile, the mononuclear cells were equally divided into two samples: Sample 1 and 2. The Sample 2 was labelled with the CD117 antibody (Anti-Human CD117 (c-Kit) PE-Cyanine5), with flow cell sorting conducted to obtain 100% purify of leukemia cells by BD FACSAria (BD Biosciences, USA). Finally, after calculating and gradient mixture between Sample 1 and absolutely purified leukemia cells, we got Concentration gradients samples with different purity of cells: 70%, 80%, 90% and 100%, numbered as S21-Sor1, S21-Sor2, S21-Sor3 and S21-Sor4, which was then used for MCC-Seq to assess associations between DNA methylation level (DMI) and percentage blast of specimens.

Replicate samples
To assess the effects of technical variability of MCC-Seq in bone marrow DNA samples, another two single patients with AML in relapse (C22 and C23) were randomly selected. Five-milliliter ethylene diamine tetraacetic acid (EDTA)-anti coagulated fresh bone marrow samples were collected and mononuclear cells were extracted, respectively. Then the mononuclear cells were equally divided into two samples of each patient, therefore four replicate samples were obtained, numbered as S22-Rep1 and S22-Rep2, S22-Rep1 and S22-Rep2. Subsequently, DNA was extracted for MCC-Seq.

MCC-Seq protocol
MCC-Seq has been confirmed to be a comparable accuracy to alternative approaches but enables more efficient cataloguing of functional and disease-relevant epigenetic and genetic variants for large-scale EWAS by Allum F et al. [4]. The MCC-Seq performed on our platform has been showed a good performance (Supplementary Table 1 and Supplementary Figure 1).
More details in this study:

MCC-Seq library construction
The concentration and integrity of DNA were detected by electrophoresis to confirm the quality. Genomic DNA (500ng-1μg) was fragment into 200 bp peak sizes using the Covaris focused-ultrasonicator E210. Fragment size was controlled on a Bioanalyzer DNA 1000 Chip (Agilent) and the KAPA HTP Library Preparation Kit (KAPA Biosystems) was applied. End repair of the dsDNA which produces blunt-ended'5-phosphorylated fragments; then A-taling, during which dAMP is added to the 3-ends of blunt-ended dsDNA library fragments; Methylatedadapter ligation and clean-up steps were carried out as per KAPA Biosystems' recommendations. The cleanedup ligation product was then analysed on a Bioanalyzer High Sensitivity DNA Chip (Agilent) and quantified by PicoGreen (Life Technologies).

Bisulfite conversion
The cleaned-up ligation products were bisulfite converted using the EZ DNA Methylation-Gold™ Kit (Zymo Research, Irvine, CA, USA) as described by the manufacturer. Bisulfite converted DNA was quantified using Qubit 2.0 (Life Technologies) and, based on quantity, amplified by 9-12 cycles of PCR using the Kapa Hifi Uracil + DNA polymerase (KAPA Biosystems), according to the manufacturer's protocol. The amplified libraries were purified using Ampure Beads and validated on Bioanalyzer High Sensitivity DNA Chips, and quantified by PicoGreen. All samples passed these QC tests and subsequently were entered into the MCC-Seq DNA methylation data production pipeline.

MCC-Seq DNA methylation profiling
The hybridization capture procedure of the amplified bisulfite converted library was performed with SeqCap Epi Enrichment System protocol (Roche NimbleGen) as described by the manufacturer. Using 1 μg of total input of library to hybridize with probes incubated at 47°C for 72 h. Washing and recovering of the captured library, as well as ligation-mediated PCR amplification and final purification, were carried out as recommended by the manufacturer. Quality, concentration and size distribution of the captured library was determined by Bioanalyzer DNA 1000 Chip. Each capture was sequenced on the Illumina HiSeq2500 system using 125 bp paired-end sequencing.

MCC-Seq methylation analysis
Clean sequences were first mapped to the human genome (build GRCh37) using Bismark (v0.10.1; parameters: -pe, -bowtie2, -directional, -unmapped) [5]. The following were removed: (i) adapter polluted reads: reads with > 5 bp adapter polluted base; reads at both ends if one end of the adapter is polluted for PE. (ii) low quality reads: reads with quality score (< 20) bases accounting for more than 15%; reads at both ends if one end with low quality score for PE. (iii) reads with N ratio > 5%; reads at both ends if one end with N ratio > 5% for PE (Supplementary Table 1). To avoid potential biases in downstream analyses, CpGs were further filtered as follows: CpGs not covered by at least five reads, CpGs not covered by at least two reads per strand.
We further selected CpGs sites that exhibited ≤ 20% methylation difference between strands. The minimum read coverage to call a methylation status for a base was set to 5. All off-target reads were removed. Methylation level at each site was determined by dividing the number of reads supporting methylation for that site by the total number of reads covering that site. CpGs were included in subsequent analysis if the number of sequence reads was 5 or greater. In some analyses, we also excluded sites at which the average sequence depth over all study individuals was below 5× coverage per site in the complete data set. CpGs were counted once per location combining both strands together. Additionally, for downstream analysis, repeat regions, sex, chromosomes and Mitochondrial chromosomes were excluded, and annotated using University of California Santa Cruz HG19. Differential Methylation regions (DMRs) were analyzed in R 3.1.0 using the methylKit package with data ≥ 10× [6]. Data visualization and analysis were performed using integrative genomics viewer (IGV), custom R and perl scripts [6][7][8].
To assess genome-wide DNA methylation patterns between different clinical subgroups, we calculated a DNA methylation indicator (DMI) for each sample (Supplementary Table 2). Differentially methylated genes for detected groups are determined by comparing the DNA-methylation profiling data of each sample of patients within the group vs. patients outside the group using the Student's t test. Genes are considered differentially methylated when DNA methylation levels differ with P ≤ 0.001, after correcting for multiple testing using the Benjamini-Hochberg method (denoted as the false discovery rate, FDR) [9]. Pathway analysis is performed by using Molecular Signature Database, version 3.0 to detect enriched BioCarta pathways and, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Pathways and/or gene sets are considered statistically significant when the P value derived from the hypergeometric test is less than or equal to.05 after correcting for multiple testing using the FDR.