Characterization and diagnostic application of genomic NPM-ALK fusion sequences in anaplastic large-cell lymphoma

Nucleophosmin-anaplastic lymphoma kinase (NPM-ALK) fusion genes resulting from the translocation t(2;5)(p23;q35) are present in almost 90% of childhood ALK-positive anaplastic large-cell lymphomas (ALCL). Detection and quantification of minimal disseminated disease (MDD) by measuring NPM-ALK fusion transcript levels in the blood provide independent prognostic parameters. Characterization of the genomic breakpoints provides insights into the pathogenesis of the translocation and allows for DNA-based minimal disease monitoring. We designed a nested multiplex PCR assay for identification and characterization of genomic NPM-ALK fusion sequences in 45 pediatric ALCL-patients, and used the sequences for quantitative MDD monitoring. Breakpoint analysis indicates the involvement of inaccurate non-homologous end joining repair mechanisms in the formation of NPM-ALK fusions. Parallel quantification of RNA and DNA levels in the cellular fraction of 45 blood samples from eight patients with NPM-ALK-positive ALCL correlated, as did cell-free circulating NPM-ALK DNA copies in the plasma fraction of 37 blood samples. With genomic NPM-ALK fusion sequence quantification, plasma samples of ALCL patients become an additional source for MRD-assessment. Parallel quantification of NPM-ALK transcripts and fusion genes in ALCL cell lines treated with the ALK kinase inhibitor crizotinib illustrates the potential value of supplementary DNA-based quantification in particular clinical settings.

The pathogenic mechanisms involved in the generation of the NPM-ALK fusion gene have not been analyzed in patients thus far. A systematic analysis of genomic fusion sequences from ALCL patients could provide insights into the pathogenesis of the translocation. The genomic fusion sites consistently fall within specific breakpoint cluster regions that comprise a 1 kb region around intron 4 within the NPM gene and a 2.2 kb region between exon 19 and exon 20 within the ALK gene [6,7].
Standard multi-agent chemotherapy reaches eventfree survival rates of 70% at five years [8][9][10][11]. New therapeutic options are available to be tested for patients with a high relapse risk in addition to chemotherapy (e.g., ALK-kinase inhibitors or Brentuximab Vedotin) or for those with a low relapse risk as a lower toxicity backbone (Vinblastine monotherapy) [12][13][14]. Therefore, reliable prognostic factors are necessary. The tumorspecific NPM-ALK fusion transcript has been established as a minimal disease marker in both bone marrow and blood mononuclear cells. Several groups have established detection protocols for minimal disseminated disease (MDD) by qualitative PCR for NPM-ALK mRNA as an independent and potent prognostic parameter under BFM pulse-type chemotherapy [15][16][17][18][19]. Fifty-five to sixty percent of patients are MDD-positive, and their risk of relapse is about 50% compared to 15% for MDD-negative patients [15][16][17][18]. Quantification of MDD has been shown by one group to detect patients with very high risk of relapse of 70% [17]. Detection of minimal residual disease (MRD) before the second course of chemotherapy allowed for definition of very high-risk patients with a relapse risk of almost 80%, as well [18]. However, despite the proven reliability of the MDD-marker at the RNA level, the use of RNA has some intrinsic disadvantages such as possible degradation by RNases during transport of blood samples to central laboratories. In addition, supplementary quantification of DNA fusion sequences would allow for calculation of absolute tumor cell numbers independent of gene expression, and detection of quiescent tumor cells. The fact that the breakpoint cluster regions in the NPM and ALK genes in ALCL are relatively small facilitates the design of fusion gene detection assays.
In the present study, we developed a nested multiplex PCR assay for identification of genomic NPM-ALK fusion sequences and performed a detailed characterization of the genomic breakpoints in pediatric ALCL. We evaluated the genomic fusion sequence as a supplementary tool for minimal disease assessment in both the cellular and plasma fractions of blood in children and adolescents with ALK-positive ALCL.

Characterization of genomic NPM and ALK breakpoints in ALCL patients
The nested multiplex PCR assay enabled identification of the genomic ALK fusion sequences in all four tested ALK+ cell lines (Karpas 299, SR-786, L-82, and SuDHL-1) and in all 45 ALCL patients (Table 1).
In 43 patients, NPM was the fusion partner of ALK. ATIC and TPM3 were the fusion partners in the two remaining patients, respectively ( Figure 1A). In one patient (UPN31), the NPM-ALK fusion gene was not detectable, but the reciprocal ALK-NPM fusion gene could be sequenced. In 30 ALCL patients and one cell line (SuDHL-1), we were able to detect both derivative fusion sites (NPM-ALK and ALK-NPM). There were no cases with a perfectly balanced translocation: nearly all patients had deletions at the fusion region with a median deletion size of 55 base pairs (bp) in NPM and 49 bp in ALK ( Table 1).
The alignment of the genomic breakpoints to the breakpoint cluster region (bcr) of NPM showed a random distribution with no sub-clusters ( Figure 1A-1B). All NPM breakpoints identified were located in intron 4 and were randomly distributed therein. Genomic breakpoints within the ALK bcr were mostly located in intron 19 (93%), with 3 breakpoints in exon 19 (7%). Although genomic ALK breakpoints appeared to be enriched in the first half of intron 19, kernel density analysis did not identify any significant clustering ( Figure 1B).
Detailed characterization of the NPM-ALK fusion sites showed small microhomologies (1 to 6 bp) in 38% of patients and small fillers (1 to 8 bp) in 22% of patients ( Figure 1C). These findings indicate that the formation of NPM-ALK translocations in ALCL involves inaccurate non-homologous end joining (NHEJ) repair mechanisms [20].
We further analyzed the genomic breakpoints for co-localization with repeat regions or with other DNA sequence motifs that could support the initiation of ALCL chromosomal translocation as described for other lymphoma subtypes [21,22]. No significant correlation could be observed with repeat elements or other sequence motifs at the exact breakpoint position or expanded breakpoint regions (plus 50 bp upstream and downstream) that suggest an inaccurate DNA strand repair at the fusion site ( Figure 2).

Comparative quantification of NPM-ALK fusion transcripts and NPM-ALK fusion gene sequences in blood and plasma samples
To evaluate the potential application of DNAbased minimal disease monitoring for ALCL patients, we compared the standard RNA-based technique with DNAbased quantification using patients' individual fusion sequences from both the cellular and plasma fractions in 51 blood samples. The 51 samples were collected from eight high-risk patients identified as MDD-positive by the standard method during the course of their treatment. Seven of the patients relapsed. Consequently, more than half the samples showed quantifiable copy numbers using www.oncotarget.com RNA-based MDD/MRD measurement. This provided sufficient samples with measurable copy numbers for a quantitative comparison of the two methods. NPM-ALK transcripts in the mononuclear cell fraction were quantifiable in 48 of the 51 samples (when applying the quality criteria of 2000 copies ABL). Of those 48 samples, 23 were negative and 25 positive. Three of the 48 samples quantifiable by RT-qPCR were not quantifiable by the DNA-based assay. Conversely, the DNA-based assay was able to quantify eight samples that could not be evaluated at the RNA-level (Supplementary Table 1).
The DNA breakpoint method and the standard RNA method were well correlated for NPM-ALK quantification, with a correlation coefficient of 0.77 (p < 0.0001) ( Figure  3A). Eight samples identified as negative by the RNAbased method were identified as positive at the DNA level, usually with very low copy numbers (0.08, 0.2, 0.7, 0.8, 1, 2.1, 4.7, 24 NPM-ALK/10 4 ALB copies). Three samples identified as negative at the DNA-level had measurable RNA-copies (1.1, 1.1, and 1.6 NPM-ALK/10 4 ABL copies).
In 37 available concordant plasma and cell samples, we were able to perform NPM-ALK quantification with RNA in the cellular fraction and with cell-free circulating tumor DNA (ctDNA) in the plasma fraction. Spearman´s correlation revealed a correlation of ctDNA quantification data with NPM-ALK RNA levels (r = 0.77, p < 0.0001) as well as with cellular NPM-ALK DNA levels (r = 0.81, p < 0.0001) ( Figure 3).
Examples of the course of MRD quantification of initially MDD-positive patients for three patients are shown in Figure 4. Patient UPN45 was MDD-positive using RNA and DNA based quantification. The patient's MRD then became negative in cells (RNA and DNA level) and plasma (ctDNA) before the second course of chemotherapy, and stayed MRD negative in all available following timepoints according to all three methods. Patient UPN35, however, never became MRD negative according to any of the methods and suffered two relapses. These two patients showed complete concordance between all three methods.
Despite the overall correlation of the MRD results obtained by both the RNA and DNA methods, the MRD course of patient UPN43 contained two timepoints at which NPM-ALK was detectable by the DNA-based method at low copy numbers while fusion gene transcripts were not. The prognostic important MRD timepoint before the second course of chemotherapy [18], however, showed concordance in all three patients.

In vitro evaluation of RNA-and DNA-based therapy monitoring during treatment with ALK kinase inhibitors
To evaluate whether DNA-based minimal disease measurement might provide additional information beyond the standard RNA-based method, we compared RNA-and DNA-based quantification of NPM-ALK fusion sequences in two ALK+ ALCL cell lines (Karpas 299 and SR-786) incubated with different concentrations of the ALK kinase inhibitor crizotinib for 72 hours ( Figure 5). To measure changes in NPM-ALK fusion genes and fusion gene transcripts under minimal disease conditions, ALK- negative DG-75 cells were added to the ALK+ ALCL cells for a dilution of 1:100. We also determined the amount of living and dead cells following the 72-hour exposure ( Figure 5).
In the SR-786/DG75 cell suspension mix, we observed a concentration-dependent reduction of the NPM-ALK fusion transcripts after 72 hours of incubation with crizotinib ( Figure 5A). At the DNA level, the reduction was far less pronounced ( Figure 5B). The ratio of NPM-ALK RNA to DNA revealed reduced NPM-ALK expression per NPM-ALK-containing cell ( Figure 5C). In concordance with the results of our cell death measurements, NPM-ALK quantification was not significant reduced at either the RNA level or the DNA level after 72 h of incubation with crizotinib. In contrast to SR-786 cells, the less sensitive cell line Karpas 299 ( Figure 5D) shows a lower reduction of NPM-ALK RNA transcripts even after treatment with the highest concentration (1000 nM crizotinib) ( Figure  5A) that results in nearly constant RNA/DNA ratios ( Figure 5C).

DISCUSSION
In the present study, we established a multiplex PCR assay for the detection of genomic NPM-ALK fusion sequences in children with ALK-positive ALCL in order to investigate the pattern of fusion sites in ALCL and to assess genomic breakpoints as biomarkers for minimal disease quantification. Our multiplex PCR assay identified the genomic fusion gene sequences in all 45 pediatric ALCL patients investigated, permitting further characterization of the breakpoint features and breakpoint distribution in a large cohort of ALK-positive pediatric ALCL patients.
Of the few genomic NPM-ALK fusion sequences that have been previously published, all were identified with long-range or nested long-range PCRs [6,7,[23][24][25]. These methods require high-molecular-weight DNA that often cannot be retrieved from the formalin-fixed, paraffinembedded tumor tissue available from routine diagnostics. Our multiplex PCR, however, is less dependent on highquality DNA because it generates smaller amplification products. The NPM-ALK breakpoint distribution allows complete coverage of the breakpoint cluster regions with multiplex PCR compatible primer numbers. In principle, genomic fusion sequences can also be identified with library enrichment strategies and next generation sequencing techniques that allow parallel sequencing of several patients in an automated pipeline. Given the small breakpoint cluster regions in the NPM and ALK gene, easily accessible with one multiplex PCR assay, and the rare occurrence of this disease, costly enrichment assays in   preparation for next generation sequencing may not offer substantial benefit for diagnostic laboratories.
The detailed sequence analysis of fusion site sequences revealed an NHEJ repair pattern similar to other chromosomal translocations in leukemia and sarcoma [26][27][28][29][30][31]. Features of VDJ recombination or AID signatures identified in B-cell lymphomas, Burkitt lymphoma, mantle cell lymphoma, or myeloma are not predominant in NPM-ALK-positive ALCL [21].
Detection and quantification of NPM-ALK fusion transcripts (MDD) as well as early MRD measurement have been established as independent prognostic markers in children and adolescents with ALK-positive ALCL [15][16][17][18]. We optimized the quantification of NPM-ALK fusion genes by choosing patient-specific primer and probe sets from the individual fusion sequences. Our results applying both RNA-and DNA-based minimal disease assessments on patient samples show that the patient-specific NPM-ALK DNA breakpoints can be used to design primers allowing for minimal disease assessment with at least the same sensitivity as the standard RNA-based method. Quantified copy numbers of the NPM-ALK fusion gene and NPM-ALK fusion transcript correlate well. The results obtained by the RNA-based method have proven prognostic value for patients with ALCL. In addition, the method does not require identification of the DNA- breakpoint and development of a patient-specific assay. Therefore the assessment of MDD and MRD on the RNAlevel is the standard method during current chemotherapy. In addition, several examples show that MRD-analysis for NPM-ALK transcripts is helpful for guiding treatment decisions in patients with very high risk or relapsed ALCL [32,33]. New treatment options against ALCL like ALKkinase inhibitors are arising. These agents induce cell cycle arrest in in vitro cell line experiments with variable effects on cell death [34,35]. Clinically, rapid development of relapses with MRD-reappearance has been observed after discontinuation of crizotinib in patients treated with the drug for ALCL-relapse, suggesting that quiescent tumor cells that are detectable by DNA based methods, but are underestimated by RNA-based methods might exist [33]. We therefore suggest studying possible clinical implications of DNA-based MRD-screening in addition to the standard method in clinical trials with ALK-inhibitors. The high stability of DNA furthermore enables the quantification of cell-free tumor DNA (ctDNA) that is released into the plasma from primary tumor and metastasis [36][37][38]. For many solid cancers, e.g., colorectal, lung, and breast cancer, or Ewing sarcoma, quantification of ctDNA has been established as a valuable tool for non-invasive therapy monitoring and even risk stratification [38][39][40][41].
Mussolin et al. investigated the presence of total cell-free DNA and NPM-ALK fusion sequences in initial plasma samples of 43 NPM-ALK-positive ALCL patients [42]. They used a SYBR green-based real-time PCR assay and the same primer pair for NPM-ALK quantification for all patients. They observed no correlation of the total cell-free DNA with the presence or absence of MDD as determined by qualitative PCR for NPM-ALK transcripts in mononuclear cells. In addition, their method did not analyze whether there was any association between the amount of cell-free NPM-ALK DNA and the presence or absence of MDD. In our patient-specific assays, genomic NPM-ALK copies in the patient´s plasma correlated with both the amount of NPM-ALK DNA and RNA fusion sequences in the cellular blood fraction, suggesting the release of tumor DNA from circulating cells or a sign of total tumor burden. The correlation of ctDNA with classical MDD/MRD quantification as well as the concordance of all results in samples obtained during therapy suggests a possible prognostic role for quantification ctDNA in patients with ALK-positive ALCL. However, that possibility will need to be further analyzed in a larger, unselected patient cohort.
In summary, we established a multiplex PCR assay for reliable identification of ALCL patients' individual genomic NPM-ALK fusion sequences that can be easily adopted for routine diagnostics and enables a DNAbased minimal disease monitoring for ALK-positive ALCL patients. We propose that supplementary MRD assessment including RNA and DNA quantification may allow for better understanding the mode of action of new targeted therapies and may contribute to improved therapy assessment and risk stratification by detecting quiescent tumor cells.

Patients and material
Cryopreserved tumor material and EDTA-blood or -bone marrow from NPM-ALK-positive ALCL patients included in the Berlin-Frankfurt-Muenster group study NHL-BFM95 or the NHL-BFM Registry 2012, or German patients enrolled in the European intergroup trial ALCL99 was included in the analysis after informed consent of the patients or their legal guardians. Both the studies and the registry were approved by the institutional ethics committee of the primary investigator of the NHL-BFM study group (A.R., W.W.). Tumor samples were available from 45 patients. The tumor material was from the initial biopsy in 25 patients and from a relapse biopsy in 15 patients, and the bone marrow or peripheral blood from five patients with high amount of circulating tumor cells measured by NPM-ALK-specific quantitative real-time PCR. Patient characteristics are shown in Table 1. The patient cohort was not representative with overrepresentation of relapse patients since tumor cells for sequencing was from relapse in several patients and infiltrated blood/bone marrow was used as well.

Nested multiplex PCR assay for identification of genomic NPM-ALK fusion sites in ALCL patients
The genomic NPM-ALK fusion sequence was analyzed in four ALCL cell lines (Karpas 299, SR-786, L-82, and SuDHL-1) and 45 ALCL patients. Genomic DNA was isolated from tumor samples, bone marrow, or blood samples by Trizol reagent (Thermo Fisher Scientific).
To amplify genomic NPM-ALK fusion sequences, we developed a two-round multiplex PCR assay. For the first round, 100 ng of DNA was combined with one forward primer located at the 5´ end of the NPM breakpoint cluster region (~1kb; exons 4-5 [Chr5:170,818,710-170,819,820]) and five reverse primers covering the ALK breakpoint cluster region (~2.2kb; exons [19][20]448,446,208]) to enable targeted amplification of PCR products with a maximal length of several hundred base pairs. Primer sequences are shown in Supplementary Table 2. Next, the amplified DNA was used in second-round single PCRs with corresponding nested primers. In five separate PCR reactions, the internal NPM forward primer was combined with one of the five internal ALK reverse primers to identify the ALK primer located closest to the fusion site. Systematic optimization of multiplex PCR parameters was carried out with DNA from NPM-ALK-positive cell lines L-82 and SR-786. To calculate the sensitivity of the multiplex PCR assay, we quantified a dilution series of NPM-ALK-positive cells with NPM-ALK negative HL60 cells were quantified. A minimum of one tumor cell in thousand wild-type cells can be detected by the multiplex PCR assay.
The amplification product was sequenced after purification with the QIAquick PCR Purification Kit (Qiagen). Patient-specific breakpoints were confirmed in an independent PCR using primer sets next to the patient's fusion site and 50 ng original tumor DNA. All PCR reactions were performed with the LongAmp ® Taq DNA Polymerase System (NEB) according to the manufacturer's instructions.
Components of the free software environment R (http://www.r-project.org) were used for kernel density analysis as described previously [30].
Quantification of tumor specific RNA, genomic DNA, and cell-free circulating DNA using the individual NPM-ALK fusion sequence As a proof of principle, eight high-risk patients with detectable NPM-ALK fusion transcripts were monitored during treatment course by parallel quantification of the NPM-ALK fusion transcript in blood or bone marrow cells, the NPM-ALK fusion gene in blood or bone marrow cells and the NPM-ALK fusion gene in cell-free plasma samples. In total, 50 RNA samples, 48 DNA samples and 42 plasma samples were analyzed. Genomic DNA and RNA were isolated from bone marrow or peripheral blood samples using Trizol reagent (Thermo Fisher Scientific). cDNA synthesis was performed using 1 μg total RNA, random hexamers, and superscript II reverse transcriptase (Invitrogen). Cell-free circulating DNA was isolated from frozen plasma samples with the QIAamp Circulating Nucleic Acid Kit (Qiagen).
For MRD monitoring, the NPM-ALK fusion transcripts (RNA) were quantified using real-time quantitative PCR as previously described [17]. Genomic NPM-ALK fusion sequences (DNA) were quantified with digital droplet PCR QX200 Reader (BioRad) using patient individual breakpoint spanning primers and probe sets (Supplementary Table 3). To calculate the absolute number of NPM-ALK copies, the fusion-specific probe signal was normalized to a signal of the single-copy human albumin gene.

Comparative analysis of NPM-ALK fusion transcript and fusion gene levels after treatment of ALK+ ALCL cell lines with ALK kinase inhibitor
NPM-ALK-positive (ALK+) cell lines Karpas 299 (a cell line with lower sensitivity to the ALK kinase inhibitor crizotinib) and SR-786 (a cell line with higher sensitivity to the ALK kinase inhibitor crizotinib) and NPM-ALK negative (ALK-) cell line DG-75 were obtained from the German Resource Centre for Biologic Material (DSMZ) and were cultured in RPMI medium supplemented with 10% fetal bovine serum, L-glutamine, and antibiotics at 37°C in 5% CO 2 . The ALK kinase inhibitor crizotinib was obtained from Cell Signaling Technology. Stock solutions (10 mM) were prepared with DMSO and stored at -80°C; working solutions were prepared with DMSO immediately before use. 50,000 ALK+ cells were incubated with increasing concentrations (3 nM, 30 nM, 300 nM and 1000 nM, respectively) of crizotinib for 72 h. To measure changes of NPM-ALK fusion genes and fusion gene transcripts under MRD conditions, 4,950,000 ALK-DG-75 cells were added for a dilution of 1:100.
For quantification of NPM-ALK fusion transcripts and fusion genes, RNA and DNA were isolated in parallel using the AllPrep DNA/RNA Mini Kit (Qiagen). cDNA synthesis was performed with 1 μg of RNA, random hexamer primers, and Superscript II reverse transcriptase (Invitrogen). NPM-ALK transcripts and fusion genes were quantified using fusion-sequence-spanning primers and probes (Supplementary Table 3). The NPM-ALK fusion transcript was normalized to the housekeeping gene ABL1 to exclude experimental variation during the cDNA synthesis process. To calculate the absolute number of ALK+ cells, the NPM-ALK fusion gene signal was normalized to a signal of the single copy gene albumin, which is equally detectable in ALK+ and ALK-cells.
In addition, 100,000 ALK+ (Karpas 299 and SR-786) and ALK-cells (DG-75) were analyzed in two parallel experiments to assess the number of living and dead cells after 72 h of crizotinib treatment. To detect viable cells, the cell lines were incubated with 5 ng/ ml fluorescein diacetate (FDA) (Sigma) for 20 minutes at 37°C. Cells were collected by centrifugation and re-dissolved in 200 μl PBS. To detect dead cells, 10 μl 7-AAD solution (BD-biosciences) was added and the cells were stained on ice for 20 minutes [43]. The number of viable and dead cells was measured on a FACS Calibur flow cytometer with Cell Quest Pro software (BD biosciences). Analysis was performed with FlowJo 10 software (Miltenyi Biotech).

Statistical analysis
Co-localization of genomic breakpoints to repeat regions and DNA sequence motifs was statistically analyzed using the Fisher´s exact test. Differences between mean values of the in vitro measurements were assessed with a one-way ANOVA test. MDD data from the quantification of the NPM-ALK fusion at the RNA, DNA, and ctDNA levels were compared using Spearman correlation statistics.