High expression of miR-125b-2 and SNORD116 noncoding RNA clusters characterize ERG-related B cell precursor acute lymphoblastic leukemia

ERG-related leukemia is a B cell precursor acute lymphoblastic leukemia (BCP ALL) subtype characterized by aberrant expression of DUX4 and ERG transcription factors, and highly recurrent ERG intragenic deletions. ERG-related patients have remarkably favorable outcome despite a high incidence of inauspicious IKZF1 aberrations. We describe clinical and genomic features of the ERG-related cases in an unselected cohort of B-other BCP ALL pediatric patients enrolled in the AIEOP ALL 2000 therapeutic protocol. We report a small noncoding RNA signature specific of ERG-related group, with up-regulation of miR-125b-2 cluster on chromosome 21 and several snoRNAs in the Prader-Willi locus at 15q11.2, including the orphan SNORD116 cluster.


INTRODUCTION
Childhood B cell precursor acute lymphoblastic leukemia (BCP ALL) is an heterogeneous disease characterized by recurrent genetic aberrations and frequent mutations of genes involved in hematopoietic development, patients that lack common aberrations are defined "B-others" [1].
ERG-related leukemia is a BCP ALL subtype firstly identified by Yeoh and colleagues in the beginning of the gene expression era as a group of B-others characterized by a unique gene expression profile [2]. Recent studies uncovered IGH-DUX4 rearrangements, and rare ERG-DUX4 rearrangements, as universal feature and founder event in the ERG-related leukemia subtype [3][4][5][6]. In these DUX4 rearranged leukemias, the double-homeobox transcription factor DUX4 is in most cases placed under the control of the immunoglobulin heavy chain enhancer, is expressed at high level and activates transcription of many genes including a novel ERG (v-ets erythroblastosis virus E26 oncogene homolog) isoform called "ERGalt". ERGalt protein was proven to inhibit wild-type ERG transcriptional activity and to transform hematopoietic precursors [6].
High frequency of focal mono-allelic ERG intragenic deletions were exclusively identified in ERG-related patients [7], the deletion was shown to have subclonal nature and to be caused by aberrant RAG (recombination activating gene) mediated recombination [8,9]. ERGrelated patients were associated to a favorable outcome [10] also despite a marked incidence of IKZF1 aberrations, a known unfavorable prognostic marker in BCP ALL [8,9].

Research Paper
So far, very little is known about noncoding RNAs (ncRNAs) expression in this leukemia subtype. Expression of a long noncoding RNA proximal to the first exon of ERG, termed "Antisense long noncoding RNA associated with ERG (or ALE)", was described in the majority of ERG-related cases with IGH-DUX4 rearrangements and confined to this leukemia subtype. ALE transcripts were shown to be retained at the ERG locus in the nucleus and their function has still to be uncovered [6].
Here, we investigated the expression of small noncoding RNAs in ERG-related patients identified among a cohort of B-other BCP ALL enrolled in the AIEOP ALL 2000 therapeutic protocol. We focused on microRNA (miRNAs), protein post-transcriptional regulators, and small nucleolar RNAs (snoRNAs), conserved nuclear RNAs that guide post-transcriptional modification of ribosomal RNAs, and small nuclear RNAs. For the first time we report a specific noncoding RNA signature of ERG-related patients, characterized by high expression of miRNAs in the miR-125b-2 cluster and a subgroup of snoRNAs mapping in the Prader-Willi Syndrome locus.

Study cohort
We studied 143 "B-others" specimens at diagnosis of BCP ALL. Patients were enrolled in the AIEOP-BFM ALL 2000 study and lacked genomic aberrations (t(9;22), t(12;21), t(1;19), t(4;11)) or a hyperdiploid karyotype (DNA index >1. 16). Patients were mostly younger than ten years (76.9%), had non-high risk MRD levels at day 78 (72,8%) and were assigned to the non-high risk protocol strata (76.9%). High WBC at diagnosis and dismal early treatment response were over-represented in the study group (Supplementary Table  1). Nevertheless, event-free survival (EFS) and cumulative relapse incidence (CRI) analysis revealed no significant differences between B-others included and not included in the study (Supplementary Figure 1). Overall, patients in the study cohort can be considered representative for B-others enrolled in the AIEOP ALL 2000 protocol.

Highly specific and sensitive classification of patients with an ERG-related signature among B-other BCP ALL
Gene expression profile (GEP) analysis of an initial set of B-others (101 patients) identified a subgroup of patients that clustered separately from the rest of the cohort by unsupervised hierarchical cluster analysis. The same result was obtained in a second independent set of patients (42 patients) analyzed with the purpose to enlarge the study cohort. Twenty-four patients from the first cohort (23.8%) and 11 patients from the second cohort (26.2%) belonged to this tightly clustered subgroup, for a total of 35 out of 143 patients in the merged cohort ( Figure 1A). GEP data of the merged cohort were explored to uncover the features of the tightly clustered patients and class comparison analysis generated a gene lists of 1323 probe sets differently expressed compared to the remaining B-other patients (Supplementary Table 2).
Compared to previously reported BCP ALL subgroups our signature shared 86/100 probe sets with patients described by Harvey et al. [10] as "Cluster 6" in a cohort of high risk ALL (St. Jude cohort, Children's Oncology Group) and representing the ERG-related group in that cohort (Supplementary Figure 2). ERG-related BCP ALL, classified by GEP, have a prevalence of 24.5% (35 out 143) among the AIEOP B-others cohort with a higher frequency of female patients (p-value 0.02) and lower WBC (p-value 0.004). Other clinical parameters were not significantly different in ERG-related vs. non-ERG-related patients (Table 1).
ERG-related group showed a remarkably favorable response to therapy, indeed the cumulative relapse incidence of ERG-related patients was significantly lower compared to the rest of the cohort (5- and Supplementary Table 3).
Considering the remarkably favorable outcome of the ERG-related patients in our and previous studies a classifier was constructed applying LASSO [11] on all probe sets to improve identification of ERGrelated patients. A 91 probe sets classifier was identified (Supplementary Table 4) and was further assessed for discriminative ability using the misclassification rate as performance measure. Considering the outer loop cross validation, a misclassification rate of 0.0062 (sensitivity 0.98, and specificity 1) was obtained. Applying this classifier to the study cohort, all patients were correctly identified as either ERG-related or non-ERG-related.

Incidence of ERG aberrations in ERG-related BCP ALL
ERG intragenic deletions were early on described in ERG-related leukemia and mostly involve conserved breakpoints between exon2 and exon10 of ERG isoform 1 (NM_182918) [8,9].
In our study long-template PCR established to detect deletions between exon2 and exon10 of ERG identified intragenic deletions in 11 out of 27 ERGrelated samples and in none of non-ERG-related patients, genomic breakpoints were characterized in 8/11 samples. As reported previously [8,9] breakpoints downstream exon2 were very consistent among samples whereas the breakpoints upstream exon10 were located within a 2.5kb genomic region. In 4 samples co-occurrence of multiple breakpoints was identified (Supplementary Figure 3 and  Supplementary Table 5). Most recent data uncovered less common deletion patterns that involve the first exon or the whole gene ERG [6], and might have been missed in our analysis.
To further assess ERG aberrations in our cohort, the presence of ERG deleted transcripts was investigated on cDNA of 31 ERG-related and 84 non-ERG-related patients. Expression of ERG deleted transcripts revealing the presence of the genomic deletions was found in 14 (including the 11 samples identified by gPCR) out of 31 ERG-related patients and in none of the non-ERG-related samples. In all deleted samples wild type ERG mRNA was detected as well, in line with previous observations that BCP ALL ERG intragenic deletions are monoallelic.  To the left, features of 143 patients in the B-other cohort according to the distinction in unsupervised gene expression analyses: ERG-related, non-ERG-related. To the right, features of 34 out of 35 ERG-related patients according to the status of the ERG intragenic deletion (one ERG-related patient not analyzed for the ERG intragenic deletion was excluded). Clinical parameters were not evaluated by statistical tests because of low patients numerosity. WBC, white blood cells; yrs, years; MRD, minimal residual diseases.

Similarities and differences within ERG-related specimens: ERG deleted and ERG non-deleted leukemias
The presence of ERG intragenic deletions in about half of ERG-related patients raises the question if the latter can be considered a unique homogeneous group. ERG-wt gene expression level was analyzed by qRT-PCR (probe ERG-wt specific) in 31 ERG-related and 71 non-ERG-related B-others samples. Surprisingly, no difference in ERG-wt expression was found among ERG-related samples with and without ERG intragenic deletions. ERG-wt expression was highly heterogeneous in non-ERG-related samples and modestly higher when compared to ERG-related specimens, with a less than two-fold difference between the means (0.9916 ± 0.1176 relative expression in ERG-related vs. 1.789 ± 0.2513 in non-ERG-related; t-test, p-value=0.005; Supplementary Figure 4A-4B) in keeping with previous observations [12]. Western blotting analysis in a representative subgroup of patients (13 ERG-related and 4 non-ERG-related) revealed similar levels of ERG protein expression in all samples (data not shown). An additional short ERG isoform, supposedly the ERGalt protein described in most of ERG-related patients and shown to inhibit wild type ERG transcriptional activity [6], was expressed in 6 out of 13 ERG-related patients (46,1%; 2/5 with ERG deletion and 4/8 without ERG deletion) and in none of non-ERGrelated cases (Supplementary Figure 4C).
Wondering if inhibition of ERG transcriptional program was a general event in ERG-related patients, we applied GSEA analysis to our gene expression data. Consistently, a signature of ERG-activated genes from literature [13] was significantly enriched in non-ERGrelated patients when compared to ERG-related group (Supplementary Figure 5) suggesting an impaired ERG signaling in the ERG-related leukemia subtype.
We further investigated if global gene expression would distinguish ERG-deleted patients within the ERGrelated group. Unsupervised hierarchical cluster analysis of ERG-related expression profiles revealed a different gene expression signature among the samples with and without a deletion, indeed the two groups clustered separately (Supplementary Figure 6A). T-test analysis revealed a 232 probe sets signature (t-test, p<0.05, Supplementary Table 6).
Interestingly, among the differentially regulated genes we found a significant higher expression of RAG-1 and RAG-2 in ERG deleted compared to ERG not deleted samples (Supplementary Figure 6B-6C). This supports the hypothesis raised in previous studies [8,9] that ERG intragenic deletion takes place by RAG-mediated rearrangement.
We conclude that while ERG-related leukemias share most clinical features and can be considered a unique group, some biological differences between ERG deleted and ERG not deleted leukemias are noteworthy.

ERG-related status influences the incidence of genomic aberrations and outcome
Samples expressing the ERG-related signature lacked major chromosomal aberrations frequently found in BCP ALL. We questioned if the presence of the ERG-related mRNA signature was associated with the incidence of genomic micro-deletions of B cell differentiation pathway genes [14]. Eight recurrent aberration sites (IKZF1, P2RY8-CRLF2, CDKN2A/B, . When compared to the non-ERGrelated samples, ERG-related samples have a statistically significant lower incidence of aberrations involving the CDKN2A/B and PAX5 (49,2% vs. 14.7% and 47.4% vs. 11.8% respectively, p<0.001) while we found only a trend of lower frequency of P2RY8-CRLF2, ETV6, BTG1, RB1 and EBF1 aberrations (Table 2). Conversely, a slightly higher frequency of IKFZ1 aberrations was found in the ERG-related patients (35.3% of ERG-related vs. 26,2% of non-ERG-related, p-value=0.35). The difference was more evident considering ERG deleted patients compared to the rest of the cohort (50% vs. 26,25%, p-value=0.07), in line with previously reported data [8,9]. The presence of IKZF1 aberrations did not affect the highly favorable prognosis of ERG-related patients, both ERG deleted and not deleted, while it was associated to a worst EFS at 5 years from diagnosis in the non-ERG-related patients   (Figure 2A). Beside the overall higher incidence of IKZF1 aberrations, we noticed in ERG-related patients, both ERG deleted and not deleted, that IKZF1 aberrations occur mainly as single aberrations (IKZF1 only) while among the non-ERG-related patients IKZF1 aberrations co-occurred with ≥1 aberrations analyzed by MLPA (IKZF1 plus) (3 patients IKZF1 plus and 9 IKZF1 only in ERG-related vs. 15 IKZF1 plus and 1 IKZF1 only in non-ERG-related, p-value=0.0002; Figure 2B and Supplementary Table 7).

ERG related BCP ALL patients share a microRNA signature with overexpression of the miR-125b-2 cluster
Considering the highly distinct gene expression program that distinguishes ERG-related from non-ERGrelated patients, further analysis including other RNA species was pursued.
Noncoding RNAs expression profile analysis was performed on 24 BCP ALL patients of the study cohort, 8 ERG-related and 16 non-ERG-related B-others patients. Unsupervised cluster analysis clustered the ERG-related patients separately from the non-ERGrelated patients ( Figure 3A). Class comparison analysis (Wilcoxon test) resulted in a signature of 18 differentially regulated miRNAs (Table 3 and Figure 3B). Top-ranked in the signature were all miRNAs belonging to miR-125b-2 cluster (hsa-miR-125b, -125b-2*, -99a, let-7c). Overexpression in ERG-related patients was validated by qRT-PCR in a second group of 40 patients, 20 ERGrelated and 20 non-ERG-related, belonging to the initial study cohort (Supplementary Figure 7). Furthermore, hsa-miR-100 belonging to the miR-125b-1 cluster (chr11, MIR100HG host gene) and hsa-miR-125a-3p belonging to miR-125a cluster (chr19) were among the top ranked differentially expressed miRNAs, highlighting the involvement of the miR-125 family in the ERG-related leukemia phenotype.

Higher expression of snoRNAs in Prader-Willi Syndrome locus in ERG-related BCP ALL
Unsupervised cluster analysis on small nucleolar RNAs (snoRNAs) probe sets again showed tight clustering of ERG-related patients ( Figure 4A). Class comparison analysis detected a list of differentially expressed probe sets (Table 4) most of which mapping in the 15q11-q13 chromosomal region, a locus governed by genomic imprinting and involved in the neurodevelopmental congenital disease Prader-Willi Syndrome (PWS) [15,16] (Figure 4B).
The analysis revealed a specific up-regulation of SNORD109A, SNORD64, SNORD107 and 12 snoRNAs in the SNORD116 cluster (SNORD116-11, -14 The SNORD116 cluster is organized in a large tandemly repeated cluster containing 29 gene copies of snoRNAs with highly similar sequences. Based on sequence similarity, members of the SNORD116 cluster were proposed to be divided into three snoRNA groups (SNOG). SNOG1 (SNORD116-1 to -9) was reported to have significantly higher expression than SNOG2 (SNORD116-10 to -24) or SNOG3 (SNORD116-25 to-29) in several tissues [18]. Deregulated probe sets with fold change >1.5 or fold change <1/1.5 (adjusted p-value <0.05) are listed (probe sets identified by comparison analysis but with mean intensity <1 in both groups were excluded from the list). Interestingly, most of SNORD116 snoRNAs upregulated in ERG-related samples belong to SNOG2 and share high similar anti-sense elements (ASEs), the sequences up-stream to D and D' box that guide the snoRNA to the target by complementarity (Supplementary Table 8).

DISCUSSION
This study shows that a specific ERG-related GEP signature identifies 25% of patients among BCP ALL B-other patients lacking t(4;11), t(12;21), t(9;22), t(1;19) and with 1.16>DNA index <1.60. The study cohort of 143 B-other patients reported here includes all risk groups and is representative of B-others in the AIEOP ALL 2000 protocol study offering generality of our findings.
In our study 40% of patients with an ERG-related signature carry ERG intragenic deletions involving central exons, rare deletions involving first or last exons recently described [6] might have been missed in our analysis. Deletions were in all cases monoallelic and in many cases showing evidence of subclonal nature. We observed no age related enrichment neither for patients with an ERG-related signatures nor for ERG deleted patients. We did observe a sex related increase of ERG-related as well as ERG deleted cases being female patients overrepresented in both groups, which is remarkable considering that female patients are underrepresented among B-other patients.
Overall, patients with a common ERG-related signature comprising ERG deleted cases are commonly found among patients lacking classical genetic lesions and importantly, identifies patients showing a good response to assigned therapy.
In addition to the lack of major chromosomal alterations, ERG-related patients had a lower incidence of micro-aberrations of genes related to B cell differentiation known to be frequently deleted or amplified in BCP ALL [14]. Aberrations of IKFZ1 represent a relevant exception to this trend with a very high incidence (50%) in ERGrelated patients carrying intragenic ERG deletion. In ERGrelated patients IKZF1 aberrations are mostly found as isolated aberrations not associated with poor prognosis. This observation tempted to speculate that co-occurrence of additional aberrations supports the unfavorable prognosis of IKZF1 aberrations in non-ERG-related BCP ALL patients, and to reconsider the role of IKZF1 aberrations alone as poor prognostic marker. Though, another speculation is that ERG aberrations are dominant in phenotypic appearance over IKZF1 deletions. Importantly, IKZF1 was recently identified among ERG transcriptional regulators [19], suggesting that IKZF1 and ERG aberrations tightly interrelate in affecting patient's outcome.
We did find down-regulation of a set of ERG downstream target genes in ERG-related patients, suggesting impaired ERG signaling in these samples. ERG-wt expression was not altered in relation to ERG status but 46% of ERG-related patients expressed a short ERG isoform, supposedly the ERGalt protein recently described in another ERG-related cohort [6] and shown to inhibit wild type ERG transcriptional activity.
Strikingly, we show that among ERG-related patients the expression of a set of genes distinguishes patients with and without ERG intragenic deletions.
Patients with ERG intragenic deletions showed higher expression of recombination activating genes RAG1 and RAG2. The latter offers rational for the enrichment of ERG and IKZF1 intragenic deletions caused by illegitimate RAG-mediated recombination [20,21]. However, the higher RAG1/RAG2 expression cannot be considered the only cause of specific ERG and IKZF1 deletion accumulation, indeed other microdeletions mediated by illegitimate RAG-meditated recombination like BTG1, ETV6, CDKN2A/B [22] and the P2RY8/CRLF2 deletion [23] had a very low incidence in ERG-related patients.
Presumably additional factors guide the propensity of specific aberrations to occur e.g. active chromatin H3K4me3 was shown to tether the RAG enzyme complex to DNA and to increase RAG enzymatic activity [24].
Besides a unique GEP we show for the first time that ERG-related leukemia share a highly specific noncoding RNA signature. The most deregulated microRNAs belong to the miR-125 family and to the genomic clusters in which they are embedded. Indeed, the clusters miR-125b-2 and miR-125b-1 were shown to be transcribed as a single polycistronic RNA from their host genes LINC00478 and MIR100HG respectively in normal and leukemic hematopoietic cells [25]. All three guide-strands of the miR-125 family share the same seed region and were shown to have the same biological phenotype when overexpressed in murine cells [26]. The miR-125 family considered as 'oncomir' [27] has been reported to play a fundamental role in hematopoiesis, is highly expressed in several leukemia subtypes [28] and involved in rare chromosomal translocations in ALL [29] with transformation of hematopoietic progenitors and repression of pro-apoptotic genes.
In addition to a miRNA signature, ERG-related patients cluster apart from other BCP ALL patients with a specific snoRNA expression profile. Strikingly, the most differentially regulated snoRNAs map to the 15q11.2 chromosomal region governed by genomic imprinting and are involved in the Prader-Willi Syndrome (PWS). A subgroup of snoRNAs in the PWS locus, including the SNORD116 cluster, was specifically up-regulated in ERG-related patients whereas others were heterogeneously expressed in the cohort. With the exceptions of SNORD115, which has been shown to regulate the alternative splicing of the 5HT-2C serotonin receptor pre-mRNA [30], all other snoRNAs in the PWS locus are "orphan snoRNAs" that lack any complementarity with known ribosomal RNA or small nuclear RNA and their non-canonical function is mostly unknown [31].
Up-regulation of snoRNAs in the SNORD116 cluster were previously described in a subgroup of multiple myeloma patients where expression levels were associated to genome-wide copy number alterations in the genomic locus [32]. Specific patterns of snoRNA expression were described in cytogenetic subgroups of chronic lymphocytic leukemia, and in particular high SNORD116-18 expression was associated to shorter progression free survival [33]. Here for the first time we describe the up-regulation of snoRNAs in PWS locus associated to a favorable outcome in childhood BCP ALL. Interestingly, SNORD116 snoRNAs that are upregulated in ERG-related patients share high similar antisense elements (ASEs) suggesting a common biological function.
All together our data contribute to the knowledge on the recently defined ERG-related leukemia subtype, that despite the presence of IKZF1 deletions are marked by an excellent prognosis. A specific small noncoding RNA signature shared by ERG-related patients is presented and offers new clues to dissect their specific biological program.

RNA and DNA preparation and quantitative analysis
DNA and RNA were isolated from bone marrow or peripheral blood mononuclear cells separated by Ficoll-Hypaque (Pharmacia, Uppsala, Sweden). TaqMan gene expression assay (Applied Biosystems, Foster City, CA, USA) was used to assess ERG gene expression (Hs 01554629-m1). MiRNAs expression data were validated by TaqMan ® MicroRNA assays (Applied Biosystems). SnoRNAs expression were validated by miScrpt PCR System (Qiagen, Hilden, Germany).

Genes and ncRNAs expression profiling and data analysis
Gene expression profiles were obtained with HG-U133 Plus 2.0 GeneChip ® (Affymetrix, Santa Clara, CA, USA) arrays. A first set of patients (101) was processed as part of the MILE study as previously described [35]. A second set of patients (42) was processed starting from 100ng of total RNA using the GeneChip ® 3'IVT express kit and protocol (Affymetrix).
The data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus [36] and are accessible through GEO Series accession number GSE79547 (http://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc=GSE79547).
R/Bioconductor packages and Partek software (Partek ® Genomics Suite ® software, version 6.6 Copyright ©; 2014 Partek Inc., St. Louis, MO, USA) were used for microarrays data analysis. Gene expression profiles of all 143 samples in the study cohort were used to build a classifier and LASSO [11] was used as prediction method. Enrichment of relevant signatures previously published was analyzed using Gene Set Enrichment Analysis (GSEA) software [37,38].

Characterization of ERG intragenic deletions
Breakpoints on genomic DNA were investigated by long-range PCR using PCR Extender System (5 Prime, Hilden, Germany). A second PCR was run to better characterize the breakpoints in patients with deletions. Expression of deleted ERG transcripts were investigated by PCR on cDNA. PCR products were analyzed by Sanger sequencing.

Statistical analysis
Event Free Survival (EFS) and overall survival were estimated according to Kaplan-Meier, with Greenwood standard error and with the log-rank test for comparison; Cumulative Relapse Incidence (CRI) was estimated adjusting for competing risks of other events and compared with the Grey test. To assess associations between patients' features, the Chi-Square test was applied. GraphPad Prism software and SAS 9.2 were used for analyses (GraphPad Software, La Jolla, CA, USA).