Classification of pediatric acute myeloid leukemia based on miRNA expression profiles

Pediatric acute myeloid leukemia (AML) is a heterogeneous disease with respect to biology as well as outcome. In this study, we investigated whether known biological subgroups of pediatric AML are reflected by a common microRNA (miRNA) expression pattern. We assayed 665 miRNAs on 165 pediatric AML samples. First, unsupervised clustering was performed to identify patient clusters with common miRNA expression profiles. Our analysis unraveled 14 clusters, seven of which had a known (cyto-)genetic denominator. Finally, a robust classifier was constructed to discriminate six molecular aberration groups: 11q23-rearrangements, t(8;21)(q22;q22), inv(16)(p13q22), t(15;17) (q21;q22), NPM1 and CEBPA mutations. The classifier achieved accuracies of 89%, 95%, 95%, 98%, 91% and 96%, respectively. Although lower sensitivities were obtained for the NPM1 and CEBPA (32% and 66%), relatively high sensitivities (84%−94%) were attained for the rest. Specificity was high in all groups (87%−100%). Due to a robust double-loop cross validation procedure employed, the classifier only employed 47 miRNAs to achieve the aforementioned accuracies. To validate the 47 miRNA signatures, we applied them to a publicly available adult AML dataset. Albeit partial overlap of the array platforms and molecular differences between pediatric and adult AML, the signatures performed reasonably well. This corroborates our claim that the identified miRNA signatures are not dominated by sample size bias in the pediatric AML dataset. In conclusion, cytogenetic subtypes of pediatric AML have distinct miRNA expression patterns. Reproducibility of the miRNA signatures in adult dataset suggests that the respective aberrations have a similar biology both in pediatric and adult AML.


INTRODUCTION
Currently, pediatric AML patients are stratified into risk categories according to response to induction therapy and genetic abnormalities, as defined in the WHO 2008 classification [1]. Although patient outcome has improved over the past decades, overall survival rates are 60-70% and the relapse rate is still high [2][3][4]. To further improve patient outcome, biological studies that aim to identify leukemogenic drivers and/or signaling pathways that can be directly targeted are needed. This is necessary as further intensification of chemotherapy may cause higher frequency of early and late side effects, including therapyrelated mortality [5]. Alternatively, patient outcome may be improved by refining the risk-group classification and identify uniform subgroups utilizing (epi-) genetic and molecular aberrations, which may also contribute to the design of targeted therapy [6]. Identification of hallmark aberrations and their accompanying molecular targets for the 15-20% of unclassified pediatric AML patients is the subject of many ongoing biological studies [7].
Mutations in AML can be categorized into Type I and Type II classes [42]. Type I aberrations occur in genes involved in cell proliferation and survival (e.g., FLT3, Research Paper www.impactjournals.com/oncotarget KIT, NRAS and and TP53) while Type II aberrations lead to impaired differentiation. Examples of these are PML-RARα, MLL-rearrangements and CBFB-MYH11.
MiRNAs -circa 22 nucleotide long non-coding RNAs -influence gene expression by suppressing the translation of genes that have sequences complementary to the miRNA in their 3′UTR. Since a miRNA can target multiple genes, it therefore can influence many physiological processes like apoptosis, proliferation, differentiation, and ageing as well as hematopoietic differentiation [8][9][10][11][12][13]. The epigenetic effect of miRNA on gene expression contributes to leukemogenesis through interaction with tumor suppressor genes that are involved in cell proliferation and differentiation [10,[14][15][16][17]. Although cancer-promoting miRNAs have been described in adult AML, there are unmet demands for studies on role of miRNA in pediatric AML [18][19][20][21]. Zhang et al. described miR-expression differences between FAB-M1, FAB-M2 and FAB-M3 groups in pediatric AML [21]. In our previous work, we demonstrated the non-random distribution of miR-29a, miR-155, and miR-196a/b expressions between clinically relevant genetic entities of pediatric AML [18]. Daschkey et al. performed clustering on pediatric samples with t(8;21), t(15;17) and MLLrearrangements using their miR-expression profiles [19]. We also showed that inv (16) and other genetic aberrations in pediatric AML have specific miR-expression profiles. Furthermore, low expression of the miR-9 was identified to act as a tumor-suppressor in cooperation with let-7 family members in a stringent cell-context dependent manner in pediatric AML samples with t(8;21) [20]. In another it has been identified that high expression of the miR-99a, miR-125b and let-7c in pediatric AML with FAB M7 phenotype promoted leukemogenesis by switching the balance between TGFß and Wnt signaling [22]. However, so far, no study has been conducted to identify novel prognostic subgroups of pediatric AML based on the miRNA profiles. To address this, in the work we investigated, in a large cohort, whether genetic and molecular subtypes of childhood AML can be classified using their miRNA expression profiles.

Unsupervised clustering
We began by performing hierarchical clustering of patients using a subset of miRNAs. The miRNAs were selected if they were 6-fold higher or lower expressed than that of the geometric mean of the cohort in at least one patient sample. This resulted in selection of 563 out of 664 arrayed miRNAs. The clustering was performed using Pearson correlation as distance metric and Ward's minimum variance as linkage. Clustering results are shown in Figure 1. The clusters subsequently were examined for the enrichment of cytogenetic and molecular aberrations, FAB-classification, and mean age and white blood cell count (Supplementary Table 1), separately. We observed that miRNAs expressions correlated with cytogenetic and molecular aberrations groups, especially with MLLrearrangements, NPM1 mutations, inv(16)(p13q22), t(8;21)(q22;q22), and t(15;17)(q21;q22). However, the clusters were not enriched for type I aberrations (FLT3-ITD, FLT3-TKD, and mutations in WT1, NRAS, KRAS, PTPN11, and C-KIT).
The samples with MLL-rearrangements appeared to be separated into two sub-clusters: cluster 1 and cluster 9. But, this separation was not related to the MLLtranslocation partner. Majority of samples with NPM1mutations were located adjacent to the samples with MLLrearrangements in the cluster 2. Four samples with NPM1 mutations were scattered in the cluster 1 (n = 1), cluster 3 (n = 2) and cluster 8 (n = 1).
Fifty-three percent of the samples (n = 17) with inv(16)(p13q22) were found in the cluster 4 and the rest were scattered over 6 clusters. The samples with t(8;21)(q22;q22) grouped closely together in the cluster 6 and the cluster 10, two cases were found in the heterogeneous cluster 5 that encompasses samples with various aberrations. We presumed that the distribution of samples with t(8;21)(q22;q22) in two sub-clusters might be due to differences in type I mutations or morphological features. However, no significant differences between these clusters were observed based on the afore-mentioned characteristics (Supplementary Table 2). All samples with CEBPA double mutations were clustered together with the t(8;21)(q22;q22) samples.
Eleven out of the twelve samples with t(15;17) (q21;q22) were present in the cluster 11. We observed no enriched biological characteristic, e.g. type I aberration, of the outlying sample.
The cluster 7 and 13 were consisted of > 65% of samples (n = 4 and n = 6) with "other" cytogenetics but no known common cytogenetic or molecular denominator was found.

Classification of genetic and molecular subtypes
To investigate the potential of miRNAs in predicting known Type II aberration subtypes, we performed classification using samples with MLL-rearrangements, t(8;21)(q22;q22), inv(16)(p13q22), t(15;17)(q21;q22), CEBPA double mutations and NPM1 mutations. Subtypes selection was purely due to control sample sizes. We generated miRNA signatures specific to each genetic subtype using Support Vector Machine (SVM). As previously described, a double-loop-cross-validation (DLCV) strategy avoids over-fitting and leads to stable signatures with highest prediction accuracy [24]. Details www.impactjournals.com/oncotarget of about classification model construction steps are given in the Supplementary information.

Validation of the 47 miRNA signatures
To examine the validity of the 47 miRNA signatures generated on the pediatric AML dataset, we test their prediction power using an independent dataset. Due to lack of a sufficiently large pediatric AML dataset, we used an adult AML dataset from Jongen-Lavrencic et al [42] instead. The t(15;17)(q21;q22) group was excluded from the analysis due to insufficient samples. Classification results are given in Table 2. Relatively high classification sensitivities were obtained in the t(8;21)(q22;q22), inv(16) (p13q22), CEBPA double-mutations and NPM1 mutations groups (83%-100%), whereas, the MLL-rearrangements displayed a lower sensitivity (50%). Similar to the pediatric dataset, high specificities were obtained in all groups (84%-100%).
Our results are in line with previous study that reported strong miRNA signatures for pediatric AML carrying inv(16)(p13q22) and t(8;21)(q22;q22) [19]. Multiple studies on adult AML also reported predictive miRNA markers for t(15;17)(q21;q22) and the CBFleukemias [33][34][35]. One of the clusters showed in our study enriched for sample with NPM1 mutations (64%), indicating there is a NPM1 mutation specific miRNA expression profile, which concordant with studies on adult AML [36]. Interestingly, although NPM1 cases cluster together in unsupervised analysis, classification of these samples only resulted in sensitivity of 32%. This might be due to (a) the number of included samples; (b) cooperating genomic events that influence the miRNA profiles of the samples [43]. The MLL-rearranged specific miRNA  Note: PPV-positive predictive value, NPV-negative predictive value.
signatures, however, were not previously reported either in pediatric or in adult AML.
To quantify the predictive power of miRNA expression in discriminating the cytogenetic subtypes in a supervised way, we performed classification using multiple well-known classification algorithms, which are given in the Supplementary Information. We observed that the classification accuracies are relatively uniform across algorithms. This corroborates our findings, and our original decision to use SVM as a classifier is free from selection bias.
Overall, classification accuracies using samples' miRNA expression profiles were not exceeded that of the classification using gene expression profiles previously reported [24]. Superior performances of the mRNA-based classifier over the miRNA-based one have also been reported in two studies on adult AML [35,37].
The power of our classifier might be further validated using an independent pediatric AML cohort. Unfortunately, currently datasets that can be used for validation are scarce. In this study, we were only able to validate out classifier with an adult AML dataset, which we think is the closest replication cohort [35][36][37].
MiR-485-5p was the only miRNA used to classify both adult and pediatric AML samples with t(15;17) [35,37]. The MiR-485-5p is encoded on chromosome 14q32.31, and overexpression of several miRNAs located adjacent to each other on this location has been reported in adult. However, these miRNAs were not used for classifying the adult AML samples with t (15;17). Interestingly, the miRNA signatures specific to the t(15;17)(q21;q22) group are encoded on chromosome 14q32.31 [34]. The mechanism behind their overexpression and possible function in AML has not been elucidated yet. We believe findings might provide clues regarding the distinct biology of t(15;17)(q21;q22) positive AML.
Down regulation of miR-126, miR-196b, and miR-9 was found in both pediatric and adult t(8;21)(q22;q22) AML. In our recent we showed that down regulated miR-9 acts as a tumor suppressor in pediatric t(8;21)(q22;q22) AML and induced differentiation through targets HMGA2 and LIN28B in cooperation with the let-7 family. It might be worthwhile to explore the therapeutic potential by ectopic expression of miR-9 in t(8;21)(q22;q22) positive AML [20].
While the miR-10a and miR-10b were used to classify adult AML saples with NPM1 mutations, neither of them was selected by our classifier when discriminating the samples with NPM1 mutations in our cohort [36,39]. This might reflect the differences in leukemogenic pathways between children and adults or may be merely due to the method of miRNA-selection. In addition, the miRNA profiles of NPM1-mutated samples resemble the profiles of MLL-rearranged samples, which explain why these samples ended up in the same cluster.
Expression of five miRNAs, miR-149, miR-181a, miR-181c, miR-196b, and miR-9, in pediatric AML samples with CEBPA mutations were also reported in adult AML case with CEBPA mutations [33,35,37,40]. Over expressions of tumor suppressive miR-181 family were found in CEBPA-mutated cases and has been showed to correlate with treatment response and better clinical outcome in AML patients. Treatment of AML-blasts carrying CEBPA-mutations with lenalidomide sensitized AML cells to chemotherapy and increased CEBPA-p30 protein levels and miR-181a expression [41].
To examine if the miRNA signatures given in Supplimentary Table 4 are characterized by distinct target gene patterns, thus reflecting disease biology, we performed the following analyses: we downloaded 289 miRNAs from five miRNA-mRNA target prediction databases: microcosm, mirecords, mirtarbase and pita targetscan. Among the 47 unique miRNAs given in Table 3, only 25 were found in the miRNA-mRNA target prediction databases we downloaded (see Supplementary  Table 3). To reliably predict miRNA target genes, we call a gene as the target of the miRNA under investigation if the prediction is reported at least in three databases. It is known that a miRNA expression is inversely correlated with the expression of its targeted genes. To investigate if this prior knowledge holds in our dataset, we obtained mRNA expression dataset that were measured on the same samples24. Visualizations of the inverse correlation between miRNA and targeted genes were displayed in Supplementary Figure 1-58. While some genes behaved consistent with the prior assumption, i.e. inverse correlation, we also observed genes that violated the assumption. We believe this worth further investigation.
To sum up, (cyto)genetic aberrations groups have specific miRNA expression profiles both in pediatric and adult. In unsupervised, some patients with unknown underlying common genetic or molecular denominator were clustered together. Further investigations focusing on the common factor in these clusters might reveal new subgroups of pediatric AML. Although relatively high classification results were obtained from the six aberration groups investigated in this study, they did not exceed the results from currently used methods, which limits its clinical applicability. We believe that the miRNAs signatures reported here spark future line of research as they may provide insights into important biological pathways involved in pediatric AML.
MicroRNA expression profiling was performed by Taqman ® Array MicroRNA Cards v2.0 (Applied Biosystems, Foster City, CA, USA). Raw Ct-values were analyzed, summarized and exported using SDS 2.3 (Applied Biosystems). All further biostatistical analyses including quality control, aggregation of data, data normalization, and filtering were performed using R 2.11.1 [31].
Unsupervised hierarchical clustering of samples was performed using 563 miRNAs. These miRNAs were 6-fold higher or lower expressed compared to the geometric mean of the cohort in at least one patient sample (Pearson correlation and Ward distance) within the R 3.0.3 statistical environment.
A classifier was constructed for the groups identified by unsupervised clustering with at least ten samples. We used the classification strategy described in Balgobind et al. [24]. Normalized adult AML data [42] was obtained from the last author. Due to difference in data generating platforms, only part of the 47 miRNA signatures were found in the adult AML dataset (See Table 3). During the validation, a classifier was trained using the pediatric AML data with only those overlapped miRNA signatures. Then, the trained classifier was applied to the adult AML dataset to quantify the prediction accuracy.