Genetic rearrangements result in altered gene expression and novel fusion transcripts in Sézary syndrome

Sézary syndrome (SS) is an aggressive, leukemic cutaneous T-cell lymphoma variant. Molecular pathogenesis of SS is still unclear despite many studies on genetic alterations, gene expression and epigenetic regulations. Through whole genome and transcriptome next generation sequencing nine Sézary syndrome patients were analyzed in terms of copy number variations and rearrangements affecting gene expression. Recurrent copy number variations were detected within 8q (MYC, TOX), 17p (TP53, NCOR1), 10q (PTEN, FAS), 2p (DNMT3A), 11q (USP28), 9p (CAAP1), but no recurrent rearrangements were identified. However, expression of five genes involved in rearrangements (TMEM244, EHD1, MTMR2, RNF123 and TOX) was altered in all patients. Fifteen rearrangements detected in Sézary syndrome patients and SeAx resulted in an expression of new fusion transcripts, nine of them were in frame (EHD1-CAPN12, TMEM66-BAIAP2, MBD4-PTPRC, PTPRC-CPN2, MYB-MBNL1, TFG-GPR128, MAP4K3-FIGLA, DCP1A-CCL27, MBNL1-KIAA2018) and five resulted in ectopic expression of fragments of genes not expressed in normal T-cells (BAIAP2, CPN2, GPR128, CAPN12, FIGLA). Our results not only underscored the genomic complexity of the Sézary cancer cell genome but also showed an unpreceded large variety of novel gene rearrangements resulting in fusions transcripts and ectopically expressed genes.


INTRODUCTION
Sézary syndrome (SS) is an aggressive, leukemic cutaneous T-cell lymphoma variant [1]. It is characterized by the presence of atypical, malignant Sézary cells (CD4+CD45RO+) in blood, lymph nodes and skin, with phenotype of central memory T-cells (T CM ) [2], severe erythroderma, pruritus and lymphadenopathy. SS is a very rare disease (incidence rate 0.1/100 000) and consists of 3% of cutaneous T cell lymphoma (CTCL) [1,3], but with an increasing incidence rate [4]. Median age at presentation is 60 years and despite treatment the prognosis for patients is bad [5]. Data from the largest cohort of advanced Mycosis fungoides (MF) and SS patients (1,275) revealed that the medium overall survival (OS) is 63 months, and, depending on the four risk factors (stage IV, age > 60 years, large-cell transformation, and increased lactate dehydrogenase), the 5-year survival differs markedly from 68% for low risk group, through 44% for intermediate risk to 28% for high risk group [6].
The purpose of this study was to combine the whole genome and transcriptome NGS technology, in order to look globally at genetic alterations and their effect on the expression of the affected genes in SS patients. Our results not only underscore the genomic complexity of the Sézary cancer cell genome and make evident that key signaling pathways in SS are abrogated but also show an unpreceded large variety of novel gene rearrangements resulting in fusions transcripts and ectopically expressed genes.

Copy number variations (CNVs) within regions of oncogenes and suppressor genes and new "candidate" genes
Whole genomes of tumor cells of nine Sézary syndrome patients were examined by NGS in combination with inspection of these data using Integrative Genomics Viewer (IGV, Broad Institute). This analysis revealed many large deletions and amplifications (Supplementary Table 1). Recurrent CNVs occurred in regions of already known and described oncogenes, like MYC (8q amplifications in 4/9 patients) and suppressor genes like TP53 (17p deletions in 6/9 patients) ( Figure  1) and PTEN (10q deletions in 3 patients). MYC amplifications resulted in altered gene expression, especially in patient 5 with two extra copies of this gene (Reads per Kilobase per Million mapped reads -RPKM: C n=10 =59.54; P5=224.05). MYC amplifications are common in SS [10,11], but also in many other cancer types. Another gene, TOX, in the 8q region was shown to be highly expressed in SS and involved in abnormal cell proliferation [14]. Furthermore, in contrary to MYC, TOX was stated as a marker in differential diagnosis between SS and erythrodermic dermatitis [22]. Due to this reports, TOX expression was checked in our nine SS patients (RPKM: C n=10 =5.1; SS n=9 =63.0) and its overexpression was confirmed (Table 1). For three patients: P5 (RPKM=102), P8 (RPKM=155) and P9 (RPKM=88) this overexpression was the most significant, probably due to the presence of amplification in the 8q region. Strong upregulation of TOX in SS was confirmed by RQ-PCR on an additional cohort of SS patients. TOX showed the highest expression among the genes analyzed in SS, and was 21.88 times upregulated (p<0.0001) as compared to CD4+ controls ( Figure 2).
TP53 deletions and mutations were described to be frequent in SS, and as a result this gene was suggested to be a cancer driver gene in Sezary syndrome [7,[18][19][20][21]. In recent papers alterations in TP53 were detected in 24-92.5% of patients [18][19][20]. In this study monoallelic  Mutations that may influence the function of TP53 were identified in P1 (TGT→TTT; C238F) and P5 (splicing site at the 5' end of intron 7; GT→TT).
The 10q region is recurrently deleted in SS and the status of two genes in this region: PTEN, FAS have already been investigated. PTEN was described to be downregulated in SS, mostly by deletions [8]. In this study in 3 patients heterozygous PTEN deletions were found, with no significant influence on its expression (RPKM: C n=10 =20.37; P1=14.6; P5=17.43; P7=29.33). The second gene, FAS, was deleted in 5 SS patients, but its expression was decreased in all of them (RPKM: C n=10 =49.5; SS n=9 =11.5). This is in agreement with previous studies showing that in majority of SS patients FAS expression was lost [9], not only because of recurrent deletions in this region, but also due to hypermethylation of CpG islands in the FAS promoter. Furthermore, inactivating FAS mutations are frequent in SS (10%-42.5%) [18,19], confirming the putative role of different FAS alterations in malignant transformation and its possible role as a cancer driver. The 10p region is also frequently deleted in SS. Recent whole genome studies showed not only deletions, but also loss of function mutations (45.2%-65%) in Zinc Finger E-Box Binding Homeobox 1 (ZEB1) encoding an important T cell transcription factor [18][19][20]. In our study we identified deletions of ZEB1 in 5 patients, including one biallelic in P7, all of them having an impact on ZEB1 expression (RPKM: C n=10 =21.5; P1=11.7, P5=13.8; P6=14.2; P7=2.5; P9=7.7).
Interestingly, deletions in the 6q region, that were described to be quite common in SS (50% of patients) [13], were detected only in P1 in this study, and it didn't include the tumor necrosis factor, alpha-induced protein 3 (TNFAIP3) gene. However, on the contrary to this previous study including pretreated patients, only tumor cells from newly diagnosed patients were studied in this one, suggesting TNFAIP3 deletions to be late events in the SS progression. Late occurrence of TNFAIP3 deletions is further supported by their absence in newly diagnosed Mycosis fungoides patients (unpublished data).
Deletions within 2p region resulted in a loss of DNA (cytosine-5-)-methyltransferase 3 alpha gene (DNMT3A) in four patients, biallelic in patient P8 and monoallelic in patients P1, P5 and P9, with a significant influence of its expression (RPKM: C n=10 =15.32; P1=7.9; P5=6.41; P8=0.1; P9=7.13). In other SS patients the expression of DNMT3A, for other reasons, was also lower compared to controls (RPKM: SS n=9 =8.33). According to previous studies DNMT3A and other DNMT genes were usually overexpressed in cancer [23], leading to hypermethylation, gene silencing and possible malignant transformation. However, in aging cells and age-related diseases global hypomethylation of the genome is detected [24], suggesting the possible decreased activity of DNA methyltransferases. Alterations in genes like DNMT3A, DNMT3B, TET1, TET2 have recently been described to be very frequent in SS [18][19][20][21] and our results support a potential role of epigenetic modifiers in SS pathogenesis.
Two patients (P1, P5) had caspase activity and apoptosis inhibitor 1 gene (CAAP1) (9p) deleted, the second patient on both alleles (RPKM: C n=10 =12.92; P1=6.36; P5=0.93). Knockdown of CAAP1 was proven to induce apoptosis in a caspase-10 manner [27], which is not observed in SS samples with decreased (P1) or lost (P5) CAAP1 expression. One should keep in mind, that most likely p53 is non-functional in those two samples (due to LOH, see above), and since caspase-10 is activated in a p53-induced manner [27], this pathway could be blocked.

Genes involved in rearrangements have altered expression in SS patients
A variety of rearrangements (306; Supplementary Table 2) were detected in SS patients. Most of them were unique for each patient. We decided to focus on those changes that affect the same genes and their expression in more than one patient. NCOR1 (nuclear receptor corepressor 1) is situated in the 17p region, which is one of the most commonly deleted regions in SS [20]. NCOR1 deletions were detected previously in SS patients (38/80) [20], in some cases they were associated with mutations leading to biallelic loss of function. In this study NCOR1 was fully deleted in four patients (P1, P3, P6, P9) ( Figure 1) and in one patient (P5) most of the gene was deleted, except for exon 1, which was translocated to chromosome 15 and fused head-to-head with SV2B gene [chr15:91,813k/chr17:16,099 k]. CTBP1 (C-terminal binding protein 1) (4p) was fully deleted in P4, while in P7 it was involved in a complex rearrangement: translocation t(2;4), [chr4: 1,241k (CTBP1i1)/chr2: 9,994k (TAF1Bi4); chr2: 231,508k/chr4: 1,240k (CTBP1i1)], and t(4;11) [chr4: 1,240k/chr11: 76,788k (CAPN5i1)]. Both NCOR1 and CTBP1 have similar function in cells, they encode proteins involved in histone deacetylases (HDAC) complexes [28], which regulate HDACs activity. Despite the genomic alterations they seem to have higher mRNA expression in SS, yet only overexpression of CTBP1 was confirmed by PCR (Table 1; Figure 2). Destabilization of HDAC activity can lead to imbalance in gene expression, which is observed in human diseases [29], including SS [30]. Since HDAC inhibitors revolutionized the treatment of CTCL patients [31], the status of HDACs in the pathogenesis of SS should be studied in details, also in terms of HDACs' interactions with proteins, regulating their activity.
CD96 and MBNL1 (muscleblind-like splicing regulator 1), both located in the amplified region in patient P4 (3q), were overexpressed in this sample (  Figure  3). This is yet another example of MYB involvement in rearrangements and deletions in SS, noticed by our team [32]. In SeAx MBNL1 was broken within intron 1, 5' end was transpositioned in a close proximity to mediates an important role in pre-mRNA alternative splicing regulation and depletion of Mbnl protein was proven to misregulate this process [33]. Because both fusions with MBNL1 gene downregulated its expression (Table 1), we decided to look up this gene in other SS patients. Although, RNAseq analysis clearly showed reduced expression in most SS patients compared to controls, it was not confirmed by RQ-PCR (Table 1; Figure 2).
Several genes (TMEM244, RNF123, PARVB, EHD1, MTMR2, GSPT1, TAB2), involved in a rearrangement in only one patient, were selected based on their expression www.impactjournals.com/oncotarget pattern in all SS patients as determined by RNAseq ( Table  1). To confirm the pattern, expression of those genes was studied using TaqMan probes on a different cohort of SS patients and controls (Table 1; Figure 2). Four of those genes: TMEM244, EHD1, MTMR2, RNF123 had statistically significant difference in expression level compared to controls. TMEM244 (transmembrane protein 244) was only expressed in SS, no expression was detected in healthy T-cells. Even in P1, where one copy of the TMEM244 genewas involved in a complex rearrangement, it still had higher expression compared to controls ( Table 2). The function of TMEM244 is unknown, however expression of other genes from the same family (TMEM176A, TMEM276B, TMEM16A) have been already linked to cancer [34,35].

Novel fusion transcripts created as a result of rearrangements
Fifteen rearrangements detected in SS patients and SeAx resulted in an expression of new fusion transcripts ( Table 2). Those transcripts, except for one (TFG-GPR128) [36], have never been detected before, either in SS, or any other malignancy. TFG-GPR128 was detected in patients with atypical myeloproliferative neoplasms, but also in a small percent of healthy individuals [36]. This is in agreement with other studies showing that chimeric RNA can be expressed in healthy individuals [37], especially when two genes involved reside closer on the genome. Moreover, even oncogenic fusion transcripts, like BCR-ABL, can be detected in normal cells [38,39]. Knowing that, 30 healthy donors were checked for all fusion-driving rearrangements detected in SS patients in this study, but no fusion transcripts were detected.
Nine of identified fusions (Table 2) were in frame, with no premature stop codons. Theoretically, they could create fully length new proteins. However, for unknown reasons, fusion partners were not always fully expressed, like in EHD1-CAPN12, where exon 9 was the last to be expressed in a fusion transcript. Interestingly, sometimes the breakpoint was not within the partner gene, yet the fusion was still created. In DCP1A-CCL27, the breakpoint on DNA was within intron 3 of DCP1A, followed by a part of SERF2 gene (2, 767 bp), translocated approximately 1.6 kb upstream from CCL27 gene. Similarly, in MBNL1-KIAA2018, the first exon of MBNL1 gene was transpositioned 16, 500 kb upstream from KIAA2018, just to create the transcript with the 2nd exon. In this case the fusion occurred between parts of 5'UTR, so the partner gene KIAA2018 was intact, but the regulatory elements of MBNL1 led to its overexpression compared to other samples (RPKM: C n=10 =12.1; SS n=9 =13.7; SeAx=53.1). In TMEM66-BAIAP2, both genes on DNA where separated by a 6 kb insertion, including the small part of NPLOC4 gene. The expression of BAIAP2 gene ended probably with exon 12, as there was another rearrangement starting within intron 12: t(17;21) [79, 082k;44, 412kPKNOX1i1].
PTPRC was involved in two fusions (Figure 4). The breakpoint was within intron 2, first two exons were translocated to chr 3 and forced the expression of the fragment of the second exon of CPN2 gene with 3'UTR. The second part of PTPRC gene was also translocated to chr 3, but under the influence of MBD4 gene. PTPRC encodes the Protein Tyrosine Phosphatase, Receptor Type C, which is highly expressed in lymphocytes where it is also known as the CD45 antigen [40,41]. By interaction with the LCK proto-oncogene, belonging to the Src family tyrosine kinases, PTPRC/CD45 is involved in TCR signaling pathways regulating T-cell growth, differentiation and function, and contributes to the malignant transformation [42,43]. There are many variants of this gene expressed in T-cells, each with unique roles for signal transduction and apoptosis [44][45][46], therefore destabilization of this molecule could lead to lymphomagenesis. Not in frame fusions ( Table 2) had premature stop codons usually in the first exon after the breakpoint, probably leading to Nonsense-Mediated Decay (NMD) [47], which is a well-known mechanism preventing cells from accumulation of non-functional RNA and producing abnormal peptides.
Expression of chimeric RNA, with intronic elements have already been reported [37]. This whole transcriptome analysis revealed fusions with large intronic fragments: CEP57-NAT10-ABAT and ARMC6-DHX57-CNRIP. First exon of CEP57 gene was fused in frame with exon 22 of NAT10 gene (Table 2; Figure  5). Exon 23 was included in a fusion, but due to another breakpoint within the following intron, the expression was continued within intron 14 of ABAT gene (chr16). Twelve triplets encoding amino acids were present before the stop codon appeared, however, at this point it could not be determined whether it would be translated to a protein. In the second fusion, the transcription started with ARMC6 gene, but interestingly it stopped in the middle of the 3'UTR, just to go on within genes DHX57 and CNRIP. The implications of such expression remain unclear. FIGLA is expressed in ovaries, as a central regulator of oocyte-specific genes that play roles in folliculogenesis, fertilization and early development [48]. In SS P8 FIGLA was ectopically expressed as a result of a fusion with MAP4K3, that have already been described as a fusion partner and a tumor suppressor in cancer [49,50], and, in general, as an important player in a cell signal transduction.

Conclusion
In conclusion, with this NGS analysis we showed that the variety of genetic events in Sézary cells is enormous, though common denominators could not be identified. It appears that malignant transformation is a result of alterations on many levels: genomic (mutations, CNVs, rearrangements), transcriptomic (elevated expression or downregulation, fusion transcripts, ectopic expression) and epigenomic (methylation, HDACs activity). Although, it is still unclear, which changes are the drivers that caused the disease, and which are the passengers that occurred during the disease progression, this study strongly confirms the role of known tumor suppressors (TP53, FAS), epigenetic modifiers (NCOR1, DNMT3A) and dysregulated signaling pathways (PTPRC in TCR signaling) in SS pathogenesis. Due to the huge variety of alterations we hypothesize that in case of SS different pathways lead to the same clinical presentation. SS is a disease of elderly, so different molecular changes could accumulate during lifetime in skin homing T CM , that persist long term in normal skin following resolution of an immune response [2,51]. In the future, it would be worth to analyze SS patients individually, to classify them, based on their molecular profile, to choose the best treatment options.

Clinical samples
Nine SS blood samples were included in the study: P1 (23) Table 3. Blood samples were collected once the diagnosis of SS was confirmed. The use of human material was approved by the Local Ethics Committees of both universities, and performed in accordance with the Declaration of Helsinki. Samples were used to extract DNA and RNA, three of them were Ficoll purified mononuclear cells (PBMCs) (P1-P3) while others were sorted CD4 + T-cells (P4-P9). PBMCs were enriched for CD4+ T cells by depletion of non-CD4+ T cells using the CD4+ T Cell Isolation Kit I (Miltenyi Biotec, Bergisch Gladbach, Germany), resulting in a cell purity >95% as claimed by the manufacturer [52]. The purity of the samples was confirmed by quantitative NGS analysis of the T cell receptor alpha/delta locus (Supplementary Figure  1). In samples P4-P9 no germline TCRAD sequence was visible, indicating almost 100% purity of the samples, while in samples P1-P3 an admixture of nonmalignant cells, without TCRAD rearrangements, was present.

Paired-end next-generation sequencing
Paired-end 15x coverage whole genome NGS was performed by BGI-Hong Kong on the HiSeq2000 Illumina platform. At least two non-identical inserts with discordant 5' and 3' reads localized in different regions of the genome were considered to carry a rearrangement.

NGS data analysis
Sequence analysis was based on GRCh37/hg19. Sequences, as bam files and bam.tdf files, were analyzed using Integrative Genomics Viewer 2.3 (IGV 2.3, Broad Institute). RPKM values were used to measure mRNA abundance. Lifescope software was used to find fusion transcripts: Paired-end reads that mapped to different genes were considered as evidence of fusion transcripts.

PCR, RT-PCR
Selected rearrangements and fusion transcripts were confirmed on DNA and cDNA using standard PCR and RT-PCR. Products were Sanger sequenced (Institute of Biochemistry and Biophysics PAS, Warsaw, Poland) to confirm the breakpoint. 30 DNA samples from healthy donors were used as controls.