Integrated epigenetic and genetic analysis identifies markers of prognostic significance in pediatric acute myeloid leukemia

Acute myeloid leukemia (AML) may be an epigenetically-driven malignancy because it harbors fewer genomic mutations than other cancers. In recent studies of AML in adults, DNA methylation patterns associate with clinical risk groups and prognosis. However, thorough evaluations of methylation in pediatric AML have not been done. Therefore, we performed an integrated analysis (IA) of the methylome and transcriptome with clinical outcome in 151 pediatric patients from the multi-center AML02 clinical trial discovery cohort. Intriguingly, reduced methylation and increased expression of DNMT3B was associated with worse clinical outcomes (IA p ≤ 10−5; q ≤ 0.002). In particular, greater DNMT3B expression associated with worse minimal residual disease (MRD; p < 10−5; q = 0.01), a greater rate of relapse or resistant disease (RR) (p = 0.00006; q = 0.06), and event-free survival (EFS; p = 0.00003; q = 0.04). Also, greater DNMT3B expression associated with greater genome-wide methylation burden (GWMB; R = 0.39; p = 10−6) and greater GWMB associated with worse clinical outcomes (IA p < 10−5). In an independent validation cohort of 132 similarly treated AAML0531 clinical trial patients, greater DNMT3B expression associated with greater GWMB, worse MRD, worse RR, and worse EFS (all p < 0.03); also, greater GWMB associated with worse MRD (p = 0.004) and EFS (p = 0.037). These results indicate that DNMT3B and GWMB may have a central role in the development and prognosis of pediatric AML.


CC-promiSe analysis
The CC-PROMISE method was used to identify candidate genes based on the correlation of methylation and expression with each other and their pattern of association with minimal residual disease and risk of relapse. The Affymetrix annotation was used to annotate each expression probe-set to a gene. For each gene, canonical correlation was used to evaluate the association of expression with methylation, noting that expression and methylation may each be measured by multiple probe-sets in each gene. Classically, the sign of the canonical correlation statistic is arbitrary. To enhance biological interpretation, in this study we assigned the sign of the canonical correlation statistic to match that of the univariate correlation of the average expression with the average methylation. Also, canonical correlation empirically defines an expression score and a methylation score that were further evaluated for association with minimal residual disease and risk of relapse or resistant disease as described below.
We used Spearman's correlation to associate the canonical correlation methylation score and the canonical correlation expression score with minimal residual disease ordinally categorized as undetected (or negative), between 0.1-1%, and >1%. We used the rank-based statistic for censored time-to-event variables of Jung, Owzar, and George (Jung et al, 2005 [3]) to evaluate the association of the methylation score and the expression score with relapse. In the JOG statistic, we defined the time to relapse as the time from study enrollment to disease resistance or relapse with subjects not having these events censored at date of last follow-up or death in remission. The PROMISE statistic was defined as a linear combination of the association of the CC expression score with MRD, association of CC expression score with risk of relapse, association of the CC methylation score with MRD, and the association of CC methylation score with risk of relapse using the Spearman and JOG statistics. The absolute value of all coefficients in this linear combination was one. The sign of the coefficients were defined as described below.
The PROMISE statistic was defined so that a positive sign indicated that greater expression was associated with better clinical outcomes (i.e. reduced risk of relapse and reduced levels of MRD). The signs of the coefficients for the association of the methylation score were defined to match those of the canonical correlation statistics as described above. In short, the PROMISE statistic was defined to identify a concordant pattern of associations among expression, methylation, MRD and risk of relapse as shown below: PR (M,X,MRD,RR) = -(Sp (X,MRD) + JOG (X,RR) + sign (Spearman (M*,X*)) (Sp(M,MRD) + JOG(M,RR))) where M is the CC methylation score, X is the CC expression score, MRD is the ordinal MRD (undetected, 0.1-1%, >1%), RR is rate of relapse or resistant disease, M* is the average M-value across methylation probes annotated to the gene, and X* is the average expression level across expression probe-sets annotated to the gene. All association statistics were computed on a correlation scale ranging from -1 to +1.

Validation of associations in the aaml0531 cohort
The AML TARGET project (https://ocg.cancer. gov/programs/target) has made clinical, methylation, and expression data publicly available for a subcohort of patients treated on the AAML0531 and AAML03P1 clinical trials performed by the Children's Oncology Group. Gemtuzumab ozogamicin (GO) was included in two courses of chemotherapy for all patients on the AAML03P1 trial. Patients on the AAML0531 trial were randomized to receive GO during two courses of chemotherapy on the experimental arm or no GO on the control therapy arm. Also, the AAML0531 trial found that GO significantly improved EFS. In the AML02 trial, the use of GO was very limited; a few patients with poor response to induction chemotherapy received one dose of GO. All trials were similar with respect to administration of other drugs. Thus, we chose to use the control therapy arm of the AAML0531 trial as a validation cohort for our study.
The TARGET project has made clinical outcome and microarray expression available for 69 subjects on the control arm of AAML0531. We used this cohort of patients to test the association of DNMT3B expression with clinical outcomes: MRD, RR, EFS. We performed one-sided tests to improve power of this limited cohort to confirm the associations observed in AML02. The results of these analyses are shown in Figure 2 of the primary manuscript.
For these 69 subjects, 68 had 27 K methylation array data and 3 had 450 K methylation array data. Due to the limited number of subjects with 450 K array data (same platform used in the AML02 study), we used the cohort of 68 patients with both microarray expression data and 27 K methylation array data to evaluate the association of genome-wide methylation burden with DNMT3B expression. The result of this analysis is shown in Supplementary Figure 7A.
A cohort of 53 AAML0531 control arm subjects had 450 K methylation array data available. The controltherapy arm subjects with data available were not representative of the association of risk group with clinical outcome. The publicly reported outcomes for control-arm subjects on the trial as a whole are a three year EFS of 64.0% for low risk disease, 45.8% for intermediate risk disease, and 27.2% for high-risk disease. However, for the 53 subjects with publicly available 450 K methylation array data, 7 of 8 (87%) low-risk patients, 35 of 42 (83%) intermediate risk, and 8 of 10 (80%) high risk patients experienced an EFS event. Due to the reverse ordering of risk groups by outcome among this cohort of patients (low risk was worse than intermediate risk which was worse than high risk) with 450 K methylation array data, we chose to use the cohort of 42 intermediate risk patients to validate the association of GWMB with clinical outcomes that was observed in AML02. The results of that analysis are shown in Supplementary Figures S7b, S7c, and S7d.
The tests are one-sided to improve power of this limited cohort to confirm the associations observed in AML02.
We also examined the association of 450 K array GWMB with clinical outcome within the low-risk and high-risk subjects (Supplementary Figure 8). In each case, we observed the trend of greater GWMB associating with worse clinical outcomes, but the associations were not statistically significant with these very limited sample sizes. The log-rank test was used to evaluate associations of specimen availability with event-free survival and overall survival. The rank-sum test was used to evaluate the association of WBC as a quantitative continuous variable with specimen availability. Fisher's exact test was used to evaluate all other associations. Characteristics of patients that were part of the parent clinical trial but were not included in the present analysis is also shown.