Relapsed diffuse large B-cell lymphoma present different genomic profiles between early and late relapses

Despite major advances in first-line treatment, a significant proportion of patients with diffuse large B-cell lymphoma (DLBCL) will experience treatment failure. Prognosis is particularly poor for relapses occurring less than one year after the end of first-line treatment (early relapses/ER) compared to those occurring more than one year after (late relapses/LR). To better understand genomic alterations underlying the delay of relapse, we identified copy number variations (CNVs) on 39 tumor samples from a homogeneous series of patients included in the Collaborative Trial in Relapsed Aggressive Lymphoma (CORAL) prospective study. To identify CNVs associated with ER or LR, we devised an original method based on Significance Analysis of Microarrays, a permutation-based method which allows control of false positives due to multiple testing. Deletions of CDKN2A/B (28%) and IBTK (23%) were frequent events in relapsed DLBCLs. We identified 56 protein-coding genes and 25 long non-coding RNAs with significantly differential CNVs distribution between ER and LR DLBCLs, with a false discovery rate < 0.05. In ER DLBCLs, CNVs were related to transcription regulation, cell cycle and apoptosis, with duplications of histone H1T (31%), deletions of DIABLO (26%), PTMS (21%) and CK2B (15%). In LR DLBCLs, CNVs were related to immune response, with deletions of B2M (20%) and CD58 (10%), cell proliferation regulation, with duplications of HES1 (25%) and DVL3 (20%), and transcription regulation, with MTERF4 deletions (20%). This study provides new insights into the genetic aberrations in relapsed DLBCLs and suggest pathway-targeted therapies in ER and LR DLBCLs.


INTRODUCTION
Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma in adults [1]. The addition of the anti-CD20 monoclonal antibody rituximab to the chemotherapy CHOP (cyclophosphamide, adriamycin, vincristine and prednisone) has dramatically improved the outcome of patients with DLBCL [2,3]. Despite this major advance a significant number of patients will experience treatment failure. At such a time, treatment is a real challenge and options are to be discussed regarding the eligibility or not for transplant and prognostic factors at relapse [4,5]. One of the most important prognostic parameters in this context is the time of relapse [6,7]. Two groups of patients are identified. The first group of early relapse (ER) is defined as relapse occurring within 1 year after the initial treatment. The second group of late relapse (LR) is defined as relapse occurring more than 1 year after the initial treatment. In the ER group, the overall survival (OS) is estimated at less than 17% at 3 years, while in the LR group, the OS is estimated at about 50% at 5 years [4].
The heterogeneous prognosis of patients with DLBCL reflects a deep biological diversity. DLBCL is composed of at least 2 molecular subtypes that differ in gene expression profile (GEP): germinal center B-cell like (GCB) and activated B-cell like (ABC), each with different prognosis [8]. The prognostic impact of this molecular classification has been reported in several independent series and different types of treatment at diagnosis as well as at relapse [9][10][11].
Recent advances in genomic technology have provided further understanding in the biology of DLBCL. Constitutive activation of the NF-κB pathway, owing to oncogenic mutations activating BCR (B-Cell Receptor) and toll-like receptor signaling, leads to cell proliferation and resistance to apoptosis. These constitutive activations, characterizing ABC DLBCLs, are related to somatic gain of function mutations and/or duplications affecting positive regulators of this pathway such as CD79a and CD79b [12], MYD88 [13], CARD11 [14], TRAF2, TRAF5, MAP3K7/TAK1 and TNFRSF11A/RANK [15] or somatic inactivating mutations and/or deletions of negative regulators such as TNFAIP3/A20 [15,16]. The GCB subtype is associated with abnormalities of epigenetic modifiers, with frequent bi-allelic point mutations leading to the loss of function of the histone lysine methyl-transferase MLL2 [17,18], high frequency of gain-of-function mutations in the histone lysine methyl-transferase EZH2 [19], inactivating mutations and deletions of the histone lysine acetyltransferases CREBBP and EP300 [18], and mutations of their cofactor, ME2FB [17]. Constitutive activation of the PI3K/AKT/mTOR signaling pathway through PTEN deletions, MiR17-92 locus amplifications and mutations of PIK3CD, PIK3R1 and mTOR is another oncogenic mechanism commonly observed in GCB DLBCLs [20,21]. Dysregulation of cell cycle is a common event in all DLBCLs, potentially owing to an increased expression of the anti-apoptotic factor BCL2 [22][23][24], mono and bi-allelic deletions affecting the CDKN2A/B locus [22,25], point mutations and allele deletions leading to the loss of function of TP53 [26], or dysfunction of the transcription factor MYC, which has been described in 5 to 11% of de novo DLBCLs [27] and 17% of relapsed DLBCLs [28]. Constitutive activation of the NOTCH pathway through NOTCH1 and NOTCH2 somatic gain-of-function mutations or gain of copy numbers [29][30][31] cause an increased cell proliferation in B-cell lymphoma. BCL6 expression beyond the germinal center reaction leads to increased tolerance to DNA breaks and blockade of plasmablastic differentiation [32,33]. Importantly, escape to immune response through biallelic inactivations of B2M leading to defects of surface HLA class I molecules assembly [34], downregulation of surface HLA class II molecules expression due to breakpoints and mutations of CIITA [35] or deletions and truncating mutations of the CD58 locus [34] represent another major player in lymphomagenesis in DLBCLs.
All these genomic studies have been conducted on de novo DLBCLs at diagnosis. Cellular functions and signaling pathway disruptions associated with relapsed DLBCLs remain largely unknown. Here we describe the number and the type of genetic alterations present in relapsed DLBCLs and make a comparison of Copy Number Variations (CNVs) between ER and LR DLBCLs to identify genetic alterations that are associated with the delay of relapse. To achieve this outcome, we designed an original method allowing the statistical assessment with the high level of confidence required when dealing with a very large set of candidates. Finally we evaluated the impact of relevant CNVs on gene expression.

Landscape of large copy number variations in relapsed DLBCL
CNVs ≥ 2 megabases (Mb) were counted to evaluate the complexity of structural aberrations of relapsed DLBCL ( Figure 1). For the whole group, average CNVs per sample was 15.50 (ranging from 0 to 67). We found an equivalent number of duplications and deletions with an average 8.30 duplications per sample (0-43) and 7.20 deletions per sample (0-34; p = 0.60, Student's t-test). Chromosomes 1, 2, 3, 6, 12 and 18 were the most frequently altered, with 1.38 CNV per chromosome per sample on average while other chromosomes presented with an average of 0.45 CNV per chromosome per sample (p = 6x10 -4 ).
In the ER group, the average CNVs per sample was

Identification of genes differentially altered among the ER and the LR subgroups
After smoothing all the copy number probes using the sliding window technique, we observed that 0.38% of the values were below 1 and 1.53% above 3. This means that almost all losses and gains of copy numbers corresponded to heterozygous deletions and duplications, respectively.
Among the subset of 36 genes selected for their implication in lymphomagenesis, the 10 most frequently When we compared CNVs distribution among ER and LR samples, we did not find any statistically significant difference. We also did not observe any correlation between CNVs distribution and the GC/ABC subtype (data not shown).

Evidence on recurrent events in relapsed DLBCLs
We performed a hierarchical clustering on 56 protein-coding genes and 25 LncRNA CNVs to identify events recurrently associated with ER and LR DLBCLs ( Figure 4). The resulting heatmap shows recurrent CNVs www.impactjournals.com/oncotarget    Remaining ER and LR samples constituted a third group without specific CNV profiles and no recurrent events.

Impact of CNVs on gene expression
We examined the impact of CNVs on average gene expression level (AGEL) ( Figure 5).
In ER DLBCLs, deletions of DIABLO were related to a decrease in its AGEL, with a fold change of 0.58 between deleted and normal samples (p = 5x10 -4 , Student's t-Test). We observed the same correlation between deletions of the genes and decrease in their expression level for PTMS, HNRPU, and CSNK2B. Deletions of these three genes resulted in a decrease in their respective AGELs, with a fold change of 0.47, 0.71, and 0.53, respectively (p = 0.01, 0.02 and 0.03).
In LR DLBCLs, deletions of MTERF4 were related to a decrease in its AGEL, with a fold change of 0.51 between deleted and normal samples (p = 8x10 -7 ). Accordingly, deletions of LGALS9C led to a decrease in AGEL compared to normal samples (fold change = 0.54; p = 0.05).
In the set of 36 genes, deletions of CDKN2A/B were related to a decrease in AGEL (fold change = 0.31; p = 0.07). Deletions of CD58 were also related to a decrease in AGEL (fold change=0.65; p = 0.08).

DISCUSSION
Understanding the relationship between tumor biology and outcome is important for identifying molecular targets and may yield more effective therapies for DLBCLs. Almost all genomic studies reported to date were performed on de novo DLBCLs at diagnosis and little is known about structural aberrations of refractory/ relapsed DLBCLs. Here we analyzed one of the few series of relapsed DLBCLs published to date [37,38], composed of 39 samples from the Collaborative Trial in Relapsed Aggressive Lymphoma (CORAL), a unique and well-documented prospective study on relapsed DLBCLs [7]. Studying this thirty-nine-patient cohort allowed us to make strong statements since we devised and used for the first time a robust, comprehensive, unsupervised analysis method to analyze CNVs. This method makes use of SAM, a statistical technique originally designed for gene expression data analysis [36]. This permutationbased method allowed the discovery of a list a gene containing less than 5% of false positives despite multiple testing, independent from data distribution. Applied to oligonucleotide microarrays covering the entire genome, the method allowed us to delineate statistically validated CNVs with great precision and locate structural aberrations linked with ER and LR DLBCLs. Other published algorithms such as GISTIC [39], GEDI [40], or ComFocal [25] are designed to identify recurrent regions of deletions or duplications, which is not the purpose of this work. Here we aimed at identifying CNVs strongly associated with ER or LR, which is the main reason motivating the development of this original method.
ER and LR DLBCLs display a comparable CNV landscape, with a high degree of heterogeneity within each group. In relapsed DLBCLs, the most frequently altered genes were related to cell cycle regulation and apoptosis, regulation of transcription, immune response and NF-κB pathway ( Figure 6). Among genes associated with lymphomagenesis, CNVs affecting CDKN2A/B locus (42% of samples) were the most frequent, with a majority of deletions (28% of samples). CDKN2A is a cyclinedependent kinase restraining cellular proliferation. These results are consistent with previous reports describing 30-35% of CDKN2A deletions in primary DLBCLs, predominantly in ABC subtype, with subsequent increase in cell proliferation and poorer outcome [22,41,42]. We also observed a high rate of IBTK aberrations (33% of samples), mainly deletions (23% of samples). IBTK (Inhibitor of Burton Tyrosine Kinase) is a BTK binding protein which inhibits BTK's kinase activity and disrupts BTK-mediated calcium mobilization leading to a decrease in BTK-induced activation of the anti-apoptotic NF-κB pathway [43]. A recent study reported a high rate of complete or partial responses and long-lasting remissions with ibrutinib treatment in relapsed/refractory DLBCLs, essentially in ABC subtype [44]. In this work, we did not observe any correlation between CNVs and the GC/ ABC subtype. These results are in keeping with a recently published study showing (i) an absence of correlation between known driver mutations in lymphomagenesis and response to therapy, and (ii) an equivalent distribution of genomic events predicting survival across GC and ABC subtypes [37].
When we compared ER and LR DLBCLs at the gene scale, aberrant genomic profiles were distinct on 56 protein-coding genes, with a subsequent effect on gene expression levels. In some cases, we were unable to demonstrate a statistically significant effect on gene expression. This can be explained by intronic localization of CNVs.
First, structural aberrations of cell cycle and apoptosis regulators were rather associated with ER DLBCLs, with deletions of DIABLO/Smac, a promoter of apoptosis which allows activation of caspases by binding to inhibitor of apoptosis proteins [45], and deletions of LGLAS9C have subsequent consequences on the expression levels of those genes. *: p < 5 x10 -2 ; **: p < 10 -2 ; ***: p < 10 -3 . www.impactjournals.com/oncotarget CSNK2B/CK2B, the regulatory subunit of protein kinase CK2, involved in a wide number of cellular processes, such as regulation of programmed cell death, cell division and proliferation, DNA damage repair, gene transcription and protein translation. CK2 increased protein expression and enzymatic activity has been described to positively regulate the PI3K pathway and STAT3 and NF-κB signaling in hematological malignancies [46]. We hypothesize that deletions of the CSNK2B/CK2B regulatory subunit with subsequent decrease in gene expression level result in increased enzymatic activity. LR DLBCLs were associated with duplications of DVL3, a member of the disheveled protein family which has been described to promote cell growth in ALK-positive anaplastic large cell lymphoma [47].
Second, CNVs encompassing genes related to immune response included duplications of LGALS9C identified in 47% of ER DLBCLs. LR DLBCLs were associated with partial deletions and duplications of TRAJ and TRAV, both implicated in TCR assembly. This suggests infiltration of the tumor by T-cell lymphocytes as part of the immune response. Deletions of B2M (20%) and CD58 (10%) were restricted to LR DLBCLs. Deletions and duplications of HLA related genes, were observed frequently in our cohort and are most likely related to HLA system polymorphism. Losses in IgHV and IgKV clusters are presumably a sign of BCR rearrangement and cannot be considered as recurrent lesion in tumor samples. We made the choice to consider all CNVs, including those that are described in databases of genomic variants. Limiting our analysis to acquired lesions may avoid constitutional abnormalities that are prone to foster lymphoma development.
Third, dysregulation of transcription through altering events in genes encoding regulators of chromatin compaction and DNA replication were associated with ER DLBCLs. Duplications of the linker histone H1 gene and/or deletions of its regulatory partner PTMS [48] were found in 21 to 31% of cases. This observation is consistent with a previous report, where mutations of the linker histone H1 gene have been described as a key evolutionary process governing transformation of follicular lymphoma into DLBCL [49]. LR DLBCLs were linked with deletions of MTERF4, a mitochondrial transcription regulator [50], in 20% of cases. LncRNA are nucleic acids often longer than 2 kb involved in the regulation of transcription through epigenetic modifications by recruiting chromatin remodeling complexes to various genomic loci [51]. AIRN (Air noncoding RNA) was of particular interest as it was deleted in 20% of ER DLBCLs. This LncRNA plays a role in epigenetic regulation of transcription by cooperating with the histone methyl transferase G9a [52]. Duplications of HES1, a downstream effector of the NOTCH pathway associated with Fanconi anemia [53] and blast crisis  transformation in chronic myelogenous leukemia [54] were restricted to LR samples (25%). Even though DLBCL is a heterogeneous disease, our method allowed the identification of structural aberrations that are statistically associated with ER and LR. Moreover, hierarchical clustering of CNVs identified three groups of patients. Two groups (the first one composed of ER samples and the second one of LR samples) were well separated in homogeneous clusters with common deletions and duplications. The remaining samples formed a mixed group of ER and LR samples with no recurrent CNV. This result gives evidence of a continuum between ER and LR DLBCLs. Identifying two groups of samples with correlated CNVs profiles demonstrates the statistical significance of our results, since these correlated groups could not be identified by chance.
Our original method unraveled genomic aberrations typical of the "LR signature", which includes genomic aberrations encompassing genes related to immune response, cell proliferation and regulation of transcription. The "ER signature" includes events leading to disruption of cell cycle, apoptosis and gene transcription and provides a structural basis for selective growth advantage and chemotherapy resistance leading to a worse prognosis. This is of great importance since it may be a starting point for further studies and supports the rationale for targeted therapy. Even if this study needs to be confirmed on a larger cohort of patients, it provides new insights into the genetic complexity and dysregulated pathways in relapsed DLBCLs, suggesting the possibility of individualized targeted therapy for those patients with poor prognosis.

Patient cohort
The phase III CORAL prospective study included 396 patients presenting with relapsed/refractory DLBCL after first-line treatment with R-CHOP (rituximab, cyclophosphamide, adriamycin, vincristine and prednisone). These patients were randomly assigned to receive a salvage chemotherapy regimen, either rituximab, ifosfamide, carboplatin and etoposide (R-ICE) or rituximab, dexamethasone, cytarabine, and cisplatin (R-DHAP). Patients who responded to the chemotherapy were submitted to high dose therapy and autologous stem cell transplant (ASCT). After ASCT, the trial compared rituximab treatment every 2 months for 1 year with observation alone. The initial results revealed no significant difference in outcome. Several factors did affect survival, including ER, the international prognostic index at relapse, and prior exposure to rituximab [7].
From these 396 patients with relapsed/refractory DLBCL, frozen lymph node biopsies were available from a homogeneous group of 39 patients. A part of tumors were sampled at diagnosis, before first-line treatment, and the others were sampled at relapse, before salvage chemotherapy. Thus, among the 19 tumor samples composing the ER group, 11 were sampled at diagnosis, and 8 at relapse. Among the 20 tumor samples from the LR group, 9 were sampled at diagnosis and 11 at relapse. Molecular characterization regarding GCB DLBCLs and ABC DLBCLs could be determined with GEP in 30 cases. Characteristics of the patients are listed in Table  2. Procedures used were in accordance with the Helsinki Declaration of 1975 (as revised in 1983).

Copy number analysis
CNVs of the 39 samples were determined using the Affymetrix SNP 6.0 platform (Affymetrix, Santa Clara, CA, USA), which interrogates 1 800 000 copy number probes distributed evenly along the human genome (mean interval: 1.8 kilobases) [55]. The 1 800 000 probes were annotated with home-made tools querying the Ensembl 77 th version with Human Genome HG37 MYSQL database available at ftp.ensembl.org/pub/release-77/mysql/. Total copy numbers were computed using the Bioconductor crlmm package [56]. A sliding window (length = 11 probes) representing the signal median value was used to smooth artifactual data.
This strategy allowed CNV discovery at high resolution and identification of genes frequently altered in relapsed DLBCLs. Duplications were defined by a copy number ≥2.5, deletions were defined by a copy number ≤1.5 and copy numbers between 1.5 and 2.5 were considered normal, according to an already published method [22]. We used Java Treeview 3.0 [57] to visualize the results at a genomic scale.

Identification of differential genomic alterations between ER and LR
We aimed at identifying genes affected by differentially distributed CNVs between ER and LR DLBCL patients, ie cases where deletions and duplications are associated preferentially with ER or LR. Given the large number of statistical tests, we used a permutation-based analysis to reduce falsepositive results due to multiple testing. Our method was based on the widely used technique called Significance Analysis of Microarrays (SAM) [36]. SAM is a nonparametric test that avoids unjustified assumptions over gene distribution and gene independence. SAM was originally designed to determine whether changes in gene expression are experimentally significant. As claimed by the authors, SAM can be adapted for other experimental types of data. For our purpose, one needs to define a measure of the strength of the relationship between the CNV distribution (loss of copy number, normal copy number and gain of copy number) and ER or LR. Many statistics such as Pearson's chi-squared test, and other measures such as the Kullback-Leibler divergence, are only defined in case of non-null frequencies. Therefore we preferred to select the Hellinger distance H(P,Q) which quantifies the difference between probability distributions P = (p1,…, pk) and Q = (q1,…, qk). The Hellinger distance is defined as: The lower bound of H(P,Q) = 0 corresponds to P = Q and its upper bound H(P,Q) = 1 implies that piqi = 0 " i Î [1,k]. In our method, P and Q represent the probability distributions over the ER and LR DLBCL copy numbers, expressed in three discrete states (deletion, normal, duplication).
Genes were considered positive when the FDR was inferior to 5%. For a FDR of 5%, our algorithm can be stated as: (i) compute the observed Hellinger distances for all the probes mapping to gene gi. The distance associated to gi is defined as the mean of the probe distances. Then, the aim of this method is not to provide an individual p-value for each gene of the genome but to identify of a list of genes which CNVs distribution is differential between ER and LR, with systematic correction of multiple testing and less than 5% of false positives.
SNP microarray data acquired on Affymetrix SNP 6.0 platform can be accessed online on Gene Expression Omnibus (GSE73791).

Selected set of genes implicated in lymphomagenesis
We also explored a subset of 36 genes selected from literature and already known to play a role in lymphomagenesis through the dysregulation of key cellular pathways [58]. CDKN2A, CDKN2B, PRDM1, TP53, BCL2, XPO1, MYC, GNA13, MFHAS1 and KLF4 were chosen for their implication in cell cycle and apoptosis; MLL2, CREBBP, EZH2 and EP300 for their role in epigenetic regulation; B2M, CD58, TNFSF14 and CIITA for their implication in dysregulation of immunity and escape to immune response; BCL6 for its role in germinal center reaction; IBTK, FOXO1, IRF4, ITPKB, CD79b, TCF3, ID3 and SYK for their implication in the BCR pathway; TNFAIP3, MYD88, CARD11 and PIM1, for their implication in constitutive activation of NF-κB pathway; NOTCH1 and NOTCH2 for their association with NOTCH pathway disruption; STAT6 and SOCS1 implicated in JAK-STAT pathway; BRAF for its role in MAP kinase pathway.

CNV impact on gene expression through integrative genomics
Gene expression profiles were available for 30 out of the 39 patients. These data were acquired on an Agilent Whole Human Genome microarray 4x44K platform (Agilent technologies, Santa Clara, CA, USA). Gene expression levels for each sample were integrated with CNVs. The impact of deletions and duplications of the selected genes on their own expression level was evaluated by comparing the average gene expression levels of deleted, duplicated and normal samples using Student's t-test.