Molecular alterations and tumor suppressive function of the DUSP22 (Dual Specificity Phosphatase 22) gene in peripheral T-cell lymphoma subtypes

Monoallelic 6p25.3 rearrangements associated with DUSP22 (Dual Specificity Phosphatase 22) gene silencing have been reported in CD30+ peripheral T-cell lymphomas (PTCL), mostly with anaplastic morphology and of cutaneous origin. However, the mechanism of second allele silencing and the putative tumor suppressor function of DUSP22 have not been investigated so far. Here, we show that the presence, in most individuals, of an inactive paralog hampers genetic and epigenetic evaluation of the DUSP22 gene. Identification of DUSP22-specific single-nucleotide polymorphisms haplotypes and fluorescence in situ hybridization and epigenetic characterization of the paralog status led us to develop a comprehensive strategy enabling reliable identification of DUSP22 alterations. We showed that one cutaneous anaplastic large T-cell lymphomas (cALCL) case with monoallelic 6p25.3 rearrangement and DUSP22 silencing harbored exon 1 somatic mutations associated with second allele inactivation. Another cALCL case carried an intron 1 somatic splice site mutation with predicted deleterious exon skipping effect. Other tested PTCL cases with 6p25.3 rearrangement exhibited neither mutation nor deletion nor methylation accounting for silencing of the non-rearranged DUSP22 allele, thus inactivated by a so far unknown mechanism. We also characterized the expression status of four DUSP22 splice variants and found that they are all silenced in cALCL cases with 6p25.3 breakpoints. We finally showed that restoring expression of the physiologically predominant isoform in DUSP22-deficient malignant T cells inhibits cellular expansion by stimulating apoptosis and impairs soft agar clonogenicity and tumorigenicity. This study therefore shows that DUSP22 behaves as a tumor suppressor gene in PTCL.


Molecular characterization of the paralog of the DUSP22 gene
The occurrence of hybridization signals on both 6p25.3 and 16p11.2 loci with probes encompassing DUSP22 (Supplementary Figure S4, Figure 2A) was consistent with bioinformatics analyses predicting the existence of a closely related paralog of DUSP22 on 16p11.2 [31,32].
The high degree of sequence similarity predicted between DUSP22 and its paralog would inevitably complicate the interpretation of cytogenetic and molecular analyses of the DUSP22 gene. It was thus essential to determine the molecular features of the paralog in order to develop a comprehensive strategy to analyze unambiguously the status of DUSP22.
First, we developed a 3 color FISH approach, with a probe encompassing DUSP22 (thus hybridizing also on the paralog when present) combined with 6p25.3 and 16p11.2 loci specific probes (Figure 2A, Supplementary Figure S5A). In agreement with bioinformatics [31,32], this analysis revealed that the paralog was subjected to copy number variations (CNV), being absent in 10% of normal PBL and detected on one or both chromosomes 16 in respectively 30% and 60% of the cases ( Table S6). Predictions of Genovese et al. about this paralog were based on heterozygosity frequency of single-nucleotide polymorphisms (SNPs) and haplotype segregation [31,32].
We thus genotyped 3 of the 6 SNPs analyzed in this previous study [31] (Supplementary Figure S2) on PBL from unrelated healthy donors. We observed excess of heterozygosity (90% for rs11242812, 65%, for rs1129085 and 90% for rs1046656), reminiscent of paralog sequences [31], and alleles co-segregation identified 3 haplotypes: G-G-C, G-A-C and A-G-T ( Figure 2B, Table 1, Supplementary Figure S5B and Supplementary Table S6). In 10% of individuals, normal PBL showed only 6p25.3 DUSP22-specific FISH signals and carried exclusively the G-G-C and/or G-A-C haplotypes. Conversely, in 90% of individuals, the A-G-T haplotype was present and always associated with G-G-C and/or G-A-C haplotypes and with additional FISH signals on one or both 16p11.2 paralog alleles (Figure 2A, 2B, Table 1, Supplementary  Figures S5A, S5B and Supplementary Table S6). We then analyzed the SNPs allelic expression status in normal PBL and various normal tissues. We found that the G-G-C and G-A-C alleles, which appear specific of DUSP22, were always transcribed in normal cells, while the A-G-T allele, which correlates with the presence of the 16p11.2 paralog, was silent or barely expressed in most carriers ( Figure  2B, Table 1 Table S6). Genovese et al. commented that the paralog was expressed similar to the reference DUSP22 gene [31]. Actually, re-analysis of their RNA-sequencing reads data for 4 of the investigated SNPs, including rs11242812 and rs1046656 also evaluated herein, clearly showed that the paralog was mostly silent (Figure 5 in [31]). Only RNA-sequencing data of SNPs rs3778605 and rs1129085 could suggest expression of the paralog [31]. Our study of rs1129085 suggested that transcripts from the G allele, shared between DUSP22 and its paralog, could possibly have been misinterpreted as expression of the paralog in the report of Genovese et al.
[31], as 78% of individuals carry at least one expressed G allele of rs1129085 on DUSP22 (Supplementary Table  S6). Or this is due to misattribution of the A allele of rs1129085 (which is specific of DUSP22 and thus always expressed when present) to the paralog, as it can be understood from Supplementary Table S9 in [31].
Our data indicating that the paralog was transcriptionally inactive or hypomorphic prompted us to examine the epigenetic status of DUSP22 and its paralog. We found that partial methylation of a 5'-CpG island was frequent in normal cells, including PBL ( Figure 2C, 2D,Supplementary Figures S5D and S6B,S6D). Matching with FISH and SNP data revealed that cases without 16p11.2 paralog sequence showed no CpG island methylation, while methylation was detected in most carriers of the paralog and proportional to its copy numbers (Figure 2, Supplementary  Figures S5C, S5D and S6B, S6D and Supplementary  Table S6). In normal cells, methylation was thus found on paralog but not DUSP22 sequences (Table 1). Nonetheless, rare cases with 16p11.2 paralog sequence detected by FISH and silent A-G-T haplotype lacked CpG island methylation (e.g. PBL13, Figure 2B, 2D and Supplementary Table S6). Such cases likely harbor the paralog allele with a predicted 5' deletion (Supplementary Figure 9 in [31]), as indicated by quantitative PCR (Supplementary Figures S5C and S5D). As this deletion encompasses promoter and exon 1 sequences, this would account for lack of paralog 5' methylation.
Similar methylation and allelic expression patterns were observed in all other normal tissues tested (Supplementary Table S6 and Supplementary Figures S6C  and S6D). The paralog is thus either methylated or deleted in its 5' region and transcriptionally inactive in most instances ( Figure 2B, 2D, Table 1, Supplementary Figures  S5, S6 and Supplementary Table S6). However, in rare samples (4/47, 10.6%), transcripts from the A-G-T allele were fully expressed as well (e.g. PBL12 in Figure 2B, Supplementary Table S6). Our hypothesis is that in such individuals an active A-G-T haplotype may be present on the 6p25.3 locus and correspond to a rare ancestral allele of DUSP22 from which the paralog was duplicated (Supplementary Figure S6E).
In addition to expression data, FISH, methylation and SNP analyses showed that IRF4 and EXOC2 have no paralog and do not exhibit significant alteration in the tested cutaneous T-cell lymphomas (Supplementary Figure S3). Figure S1: Description of DUSP22 isoforms. A. Structure of the human DUSP22 gene and alternative transcripts.

Supplementary
Numbered boxes indicate the exons, with 5' and 3' untranslated (UTR) regions in grey and coding region in black. Nt: nucleotides. The black arrow indicates the position of the transcription initiation site. Black and red dotted lines indicate the alternative splicings, concerning exon 4 and intron 7. Four alternatively spliced transcripts have been identified: 1) transcripts containing 8 exons with all introns spliced; 2) transcripts with the same 8 exons but retaining intron 7 (exon 8 then being in 3' UTR); 3 and 4) the above-mentioned transcripts but with exon 4 spliced out (Δ exon 4 transcripts). The red box highlights the presence of a premature STOP codon which is in frame in Δ exon 4 transcripts. B. RT-PCR analyses of normal lymphocytes (PBL) and different lymphoid (1301, FE-PD) cell lines illustrating the existence of DUSP22 alternative transcripts. Left panel: RT-PCR was performed with a forward primer within exon 1 and a reverse primer at the junction between exons 6 and 7, revealing the presence of the expected fragment as well as a 50 base pairs shorter fragment which, by Sanger sequencing, was shown to exhibit exon 4 splicing (Δ exon 4 transcripts). Right panel: RT-PCR was performed with a forward primer within exon 6 and two reverse primers: one within the exon 8 coding sequence and one within the part of intron 7 which is present only in transcripts which were shown, by Sanger sequencing, to retain intron 7 (Sequence framed in red in Supplementary Figure S2). This analysis revealed that transcripts with and without intron 7 splicing are present in all tested samples, in variable proportions depending on the cells. ; 2) transcripts containing all 8 exons and retaining intron 7 (exon 8 then being in 3' UTR) are predicted to encode a 205 amino-acids protein (right panel); the two previous types of transcripts with exon 4 spliced out (Δ exon 4 transcripts) are predicted to encode a 54 amino-acids protein (Bottom left panel). Nucleotidic sequences of exons are indicated alternatively in black and blue and underlined, in order to highlight the limits of each exon. Amino-acids corresponding to each codon are indicated bellow the cDNA sequences. Amino-acid residues encoded by codons overlapping two exons are marked in grey. The sequence of the alternatively spliced exon 4 and corresponding amino-acids are indicated in red. The part of exon 7 which is present and coding only in transcripts retaining intron 7, as well as the sequence of exon 8 which is coding only in transcripts with intron 7 splicing are indicated in italic. The ATG initiating codon is framed in red and the STOP codon for each isoform is framed in blue. Isoform-specific amino-acids are in italic and red. These 3 protein isoforms differ in their C-terminal domains. and cutaneous T-cell lymphomas (CTCL) tumors and cell lines with and without 6p25.3 rearrangements. IRF4 and EXOC2 transcript levels were normalized for EEF1A1 gene expression. CTCL exhibit occasional overexpression of IRF4 or EXOC2 without correlation with 6p25.3 genomic status, indicating that these genes are not the targets of 6p25.3 rearrangements in ALCL. B. Schematic representation of the 5' region of the IRF4 and EXOC2 genes. Exons are indicated by black boxes, the transcription start sites by a curved arrow and the primers used for methylation specific PCR (MSP) analyses by blue arrowheads. Both IRF4 and EXOC2 genes harbor large CpG islands (vertical bars indicate CpG sites), spanning from the promoter through the beginning of intron 2 for IRF4 (Left panel) and encompassing the promoter, exon 1, and beginning of intron 1 for EXOC2 (Right panel). C. Quantitative methylation specific PCR (qMSP) analysis of IRF4 (Top panel) and EXOC2 (Bottom panel) in normal lymphocytes (PBL) and cutaneous T cell lymphoma (CTCL) samples with and without 6p25.3 rearrangements. No DNA (-) and in vitro methylated DNA (+) were used as negative and positive controls. Relative proportions (%) of unmethylated and methylated alleles (Mean ± SEM from independent measurements) are shown. Excepted the My-La and FE-PD cell lines, both showing strong IRF4 5' CpG island methylation, none of the tested normal or tumor samples exhibited any significant 5' CpG island methylation of either IRF4 or EXOC2 genes. D. Representative examples of genotype and allelic expression status analysis of two SNPs (rs2316515 -A or G-and rs11242914 -G or A-), respectively located within exon 11 (3' UTR) of IRF4 and exon 28 (3' UTR) of EXOC2. Fragments encompassing these SNPs were amplified by PCR from genomic DNA or complementary DNA (cDNA) from normal PBL. PCR products encompassing SNPs rs2316515 and rs11242914 were respectively digested by HpyCH4V and HaeIII restriction endonucleases and analyzed by 4% low melting agarose electrophoresis. Arrows indicate each SNP allele (digested or undigested). Electrophoresis profiles show that normal lymphocytes heterozygous for the SNPs, based on restriction polymorphism analysis of genomic DNA, exhibit bi-allelic expression of IRF4 and EXOC2 genes (detection of transcripts from both alleles upon cDNA analysis). These data demonstrate the absence of allelic inactivation of these genes in normal lymphocytes, in agreement with the absence of CpG island methylation. Concerning the My-La and FE-PD cell lines, both showing strong IRF4 5' CpG island methylation, only FE-PD exhibited absence of expression of one allele (allele A) of IRF4, although present at the DNA level. The fact that both IRF4 alleles were expressed in My-La cells and that one allele was expressed in FE-PD cells, despite complete IRF4 5' CpG island methylation, suggests that there might be alternative IRF4 transcription start sites insensitive to hypermethylation of this region. All tested EXOC2 SNPs were not informative in both cell lines (homozygous at the DNA level), preventing allelic expression status analysis of this gene in these cells. Green labeled), a probe encompassing the DUSP22 gene (CTD-3079O17, Spectrum Red labeled), and a 16p11.2-specific probe (RP11-488I20, Spectrum Gold labeled, here displayed in pink (pseudocolor) as colocalization of signals from the former probes may appear in yellow). DAPI (4′,6-diamidino-2-phenylindole dihydrochloride), here visualized in grey (pseudocolor), was used to stain nuclear DNA. On panels (A) and (B), cases exhibiting, in addition to 6p25.3, one or two additional CTD-3079O17 probe signals, illustrate the predicted existence of a DUSP22-paralog subjected to copy number variations at the 16p11.2 locus [31,32] (See also Figure 2A). This paralog was absent from PBL2 lymphocytes and present in 1 copy in PBL25 and 2 copies in PBL13 and PBL15. C. Top panel: Quantitative real-time PCR analysis of rs1046656 SNP allele-specific copy numbers in normal PBL. Data were normalized using KLK3 copy numbers. Mean ± SEM from independent measurements are shown. Bottom panel: Capillary electrophoresis-based analysis of SNP rs1046656 alleles copy numbers in normal PBL. PCR was performed to fall within the linear range of the amplification, with fluorescently labeled primers. PCR products were digested by the BssSI restriction endonuclease and analyzed by capillary electrophoresis. Area under curve was quantified each SNP allele's peak and normalized using KLK3. Allelic ratios are indicated bellow each electrophoregram. These allele specific analyses (confirmed in larger series of normal samples, see Supplementary Table S6) indicated that detection of the T allele of SNP rs1046656 correlated with the presence and copy numbers of the 16p11.2 paralog (absent in PBL2, 1 copy in PBL25 and 2 copies in PBL13 and PBL15 lymphocytes). D. Quantitative real-time PCR analysis of DUSP22 exons 1 and 6 in normal PBL. Data were normalized using KLK3 copy numbers. Mean ± SEM from independent measurements are shown. The expression status of DUSP22 (allele C of SNP rs1046656) and its paralog (allele T of SNP rs1046656) (see text and Supplementary Table S6) is indicated bellow the graph. Enhanced copy numbers (from 2 to 3 or 4 copies) correlated with the presence of 1 copy (in PBL25) or 2 copies (PBL13 and PBL15) of the 16p11.2 paralog. The difference between exon 6 and exon 1 copy numbers (PBL13) is consistent with the existence of paralog alleles deleted in their 5' region, including exon 1 and the 5' CpG island, as predicted by analysis of next generation sequencing reads numbers ( Supplementary Figure 9 in Ref. [31]). E. Quantitative methylation specific PCR (qMSP) analysis of DUSP22/16p11.2 paralog 5' CpG island in normal PBL. No DNA (-) and in vitro methylated DNA (+) were used as negative and positive controls. Relative proportions (%) of unmethylated and methylated alleles (Mean ± SEM from independent measurements) are shown. Detection of methylation with the presence and copy numbers of the 16p11.2 paralog (absent in PBL2, and representing ≈33% in PBL25 -1 paralog copy-and ≈50% in PBL15 -2 paralog copies-), except in PBL13 for which FISH and molecular analysis (see Supplementary Figures S5A-S5D) was consistent with the presence of 2 paralog alleles deleted in the 5' region, such deletion encompassing the region methylated in other paralog alleles.  Figures 1B, 1C). Nonetheless, the poor sensitivity and abundance of a higher molecular weight unspecific band hampers the use of this antibody for endogenous detection in most cell lines and also for in situ protein detection on cells slides and tissue sections. The other commercially available antibodies tested did not give better results (not shown). B. FE-PD cells were transduced with lentiviral vectors (Control -encoding the ZsGreen reporter alone-and bicistronic vectors encoding DUSP22 isoforms together with ZsGreen). Living (DAPI-) and transduced (ZsGreen+) cells were sorted by flow cytometry using ARIA II cell sorter. Dot plots show cell fraction purity after sorting. C. DUSP22 isoforms transcript levels were analyzed by quantitative real-time reverse transcription-PCR (qRT-PCR) on RNA isolated from the same cells transduced with either control or DUSP22 isoforms expression vectors and sorted. DUSP22 isoform-specific primers were used ( Figure 1C and Supplementary  Table S2) and transcript levels were normalized for EEF1A1 gene expression and plotted at the y axis. Mean ± SEM from independent measurements are shown. D. Western blot analysis of DUSP22 expression in FE-PD cells infected with the control and DUSP22 isoforms vectors and sorted. The anti-DUSP22 antibodies used were from SIGMA® (# HPA031394, as in (A)) and Santa Cruz® (# sc-47935), the latter being directed against an N-terminal peptide common to the 3 isoforms. Alpha-tubulin (α-Tubulin) was used as a control for protein loading. EEF1A1, GAPDH, TBP and 18S rRNA were used as controls for normalization in quantitative reverse transcription (cDNA)-PCR analyses. KLK3 was used as a control for normalization in quantitative genomic DNA (gDNA)-PCR analyses.

Quantitative SNP alleles analysis
Real-time PCR was performed with a common forward primer and SNP allele-specific reverse primers (the 3' end nucleotide in bold is allele-specific, the underlined nucleotide being a deliberate mismatch introduced to enhance specificity). In parallel, PCRs were performed from the same DNA samples with KLK3 primers (located on chromosome band 19q13.41 and used as a control gene).
To validate this real time PCR assay, PCR were performed with primers encompassing the SNP of interest, one of them being labeled at the 5' position by the 6-FAM fluorescent dye. PCR products were then digested with the BssSI restriction enzyme. All PCRs were performed with a number of cycles falling within the linear range of the reactions, as ascertained by real-time qPCR assays. Digested PCR products and KLK3 control gene products were then pooled, denatured and analyzed by capillary electrophoresis on a Applied automated sequencer. Quantification was performed by measuring the area under curve and calculating ratios between peaks corresponding to SNP alleles and the control gene.