Immunoglobulin gene rearrangements in Chinese and Italian patients with chronic lymphocytic leukemia

Chronic lymphocytic leukemia (CLL) is the most common type of leukemia in the Western world, whereas in Asia the incidence is about 10 times lower. The basis for this ethnic and geographic variation is currently unknown. The aim of this study was to characterize IGHVDJ rearrangements and stereotype of the HCDR3 region in a series of 623 Chinese CLL, in order to identify possible differences in immunoglobulin gene usage and their potential pathogenetic implications. Chinese CLL were compared to 789 Italian CLL. Chinese patients showed a higher proportion of mutated IGHV and a more frequent usage of IGHV3-7, IGHV3-74, IGHV4-39 and IGHV4-59 genes. A significantly lower usage of IGHV1-69 and IGHV1-2 was documented, with comparable IGHV3-21 frequency (3% Chinese vs 3.8% Italian CLL). The proportion of known stereotyped receptors was significantly lower in Chinese (19.7%) than in Italian CLL (25.8%), despite a significantly higher frequency of subset #8 (p= 0.0001). Moreover, new paired clusters were identified among Chinese cases. Overall, these data support a potential different antigenic exposure between Eastern and Western CLL.

The heterogeneity of CLL is also reflected by the different incidence in populations with diverse ethnic and geographic distribution.CLL is the most common leukemia among adults in the Western hemisphere (~30% of all leukemias), but is rare in Asian countries [6,7].The age-adjusted incidence rate (AAIR) is about 4.4 new cases x 100.000 individuals/year in the US (2007)(2008)(2009)(2010)(2011) [8].In contrast, the AAIR in Asians is about 10 times lower, with 0.2-0.3new cases x 100.000 individuals/year [8,9].The reasons for this epidemiologic heterogeneity remain to be defined.The risk of developing CLL does not change among Asians residing in the US and in their descendants [10,11].Therefore, the genetic background is important in determining the risk of the disease, beyond potential environmental factors.
To date, little is known on the IGHV gene representation and mutations in Asian compared to Caucasian CLL.Indeed, some differences have so far been reported only in small cohorts of patients [24][25][26][27].The B-cell receptor (BCR) repertoire in CLL is influenced by the genetic susceptibility and/or by the existence of a promoting pressure derived from different antigenic elements.The potential role of antigens in the pathogenesis of CLL is further supported by the presence of the non-stochastic pairing of IGHV, IGKV and IGLV genes, which leads to nearly identical BCR sequences of heavy chain complementarity determining region 3 (HCDR3) region, described as "stereotyped" BCR, in about 30% of Caucasian CLL [28].
In this study, we investigated IGHV gene usage, mutational status and HCDR3 in the largest series of Chinese CLL so far reported, and compared them to those present in Italian CLL.Despite general analogous features, Chinese and Italian CLL showed significant differences in IGHVDJ gene usage, mutations and stereotypy.These findings raise the hypothesis that diverse specific antigens are implicated in the pathogenesis of Asian CLL.

Italian CLL: IGHV gene usage, mutation analysis and HCDR3 features
A total of 792 IGHVDJ productive rearrangements from 789 Italian CLL cases were evaluated; 2 cases carried double and triple in-frame rearrangements, respectively.Using the 2% cut-off to discriminate between mutated and unmutated rearrangements, 387 (49%) of the 792 sequences analyzed had more than 2% difference from the most similar germline gene and were classified as mutated with a median of 7.4% (range 2-17%) mutations/case The remaining 405 (51%) were considered as unmutated, of which 278 (68%) displayed a 100% sequence identity to the corresponding germline IGHV gene sequences (Supplementary Table S1A).The IGHVDJ subgroup and gene usage were for the most part comparable to those reported in previous studies [28][29][30][31], showing no relevant differences in the frequency of the most representative genes.As expected, the analysis revealed that the most common IGHV subgroup was IGHV3 (364; 46.0%), followed by IGHV1 (199; 25.1%) and IGHV4 (163; 20.6 %) (Supplementary Table S1B).
The median length of HCDR3 was 19 amino acids (range, .Unmutated sequences had significantly longer HCDR3s than mutated sequences: median length 22 versus 17 amino acids (p= 0.0001).
Moreover, we identified 10 (4.9%) sequences that fell into 5 groups with identical HCDR3 and thus were defined as paired clusters.
Regarding the 16 novel clusters (corresponding to 33 sequences), two were potential new subsets (composed of 3 Chinese CLL and 2 Chinese plus 2 Italian CLL, respectively) and 14 were paired clusters (all Chinese), as shown in Supplementary Table S2F.In contrast to common subsets, the majority of novel clusters were mutated (19/33 sequences; 57.5%).Of interest, 15 of 19 novel mutated sequences expressed genes from the IGHV3 subgroup, of which IGHV3-23 (8/33; 24.2%) was the most represented gene.

Comparison between Italian and Chinese CLL reveals differences in IGHV mutational status and gene repertoire
Comparing the two series, Chinese CLL showed a higher proportion of cases with mutated IGHV than Italian CLL.Such difference appeared to be statistically significant -mutated IGHV 66% and unmutated IGHV 34% vs mutated IGHV 49% and unmutated IGHV 51%, respectively (p= 0.0001) -and remained significant also after exclusion of pre-treated cases (p= 0.0001).
Whilst the IGHV3 subgroup was predominantly expressed both in Chinese and Italian patients, there was an inversion between IGHV1 and IGHV4 frequency.In fact, the frequency of IGVH1 in Chinese CLL resulted lower (p= 0.0001), while the IGHV4 subgroup was significantly higher (p= 0.0004) (Figure 1A).
Also the comparison of IGHD subgroups and genes allowed to identify additional differences between the two CLL cohorts.The IGHD6 subgroup, the second most frequent after the IGHD3 family in Chinese CLL, was significantly increased compared to Italian CLL (p= 0.004).Similarly, there was a significant over-usage of IGHD3-10 (p= 0.002), IGHD6-13 (p= 0.0001) and IGHD2-21 (p= 0.03) genes and under-usage of IGHD3-3 (p= 0.0001) and IGHD2-2 (p= 0.02) genes in Chinese CLL.
As for the IGHJ subgroup, although the distribution was similar among the two populations, IGHJ6 (p= 0.002) was significantly under-represented in the Chinese series.

Clinical impact of IGHV features in Chinese CLL
Prognosis was evaluated on 612 Chinese CLL patients, with a median follow-up of 28 months (range: 0-270.9).The median overall survival (OS) was 121.7 months for 566 evaluable cases.Treatment-free interval (TFI) from diagnosis was 18.4 months (range: 0-194.8)for 580 evaluable cases, 327 of whom required treatment.Richter's transformation occurred in 16 of 575 evaluable cases (2.8%).
CLL in Binet stages A, B or C had a median OS of 165.1, 98 and 82.2 months, respectively (p= 0.0001).
Chinese CLL with unmutated IGHV showed a poor prognosis in terms of OS (85.1 vs 165.1 months, p<0.0001) and TFI (4.0 vs 46.1 months, p<0.0001) compared to mutated IGHV cases, as expected.
As reported for Caucasian CLL, subset #1 Chinese CLL (n=14) showed a poor prognosis in terms of OS (median OS 28.0 months vs 85.1 for unmutated IGHV vs 165.1 for mutated IGHV, p<0.0001) and TFI (median TFI 0.8 months vs 4.8 for unmutated IGHV vs 46.2 for mutated IGHV, p<0.0001), even worse than that of the other IGHV unmutated CLL.
Subset #8 CLL (n=18) showed an OS (median 82.2 months) and TFI (median 1.7 months) as poor as the other IGHV unmutated CLL, although there was a significant association with the occurrence of Richter's syndrome (n=3 out of 16) (p= 0.0095), as reported for Caucasian CLL.The low number of events did not allow to analyze the prognostic impact of subset #4 (n=11).
Finally, the OS and TFI of Chinese patients with IGVH3-23 were not different from the other cases with mutated IGHV (p= 0.48 and p= 0.41, respectively).The low number of events did not allow to highlight a different outcome between stereotyped IGHV3-23 (n=7) and heterogeneous IGHV3-23 (n=32) cases.

DISCUSSION
In Caucasian CLL, the extensive research on IGHVDJ rearrangements has shown that immunoglobulin gene usage is not random, but biased by the recurrence of certain IGHV genes as well as by the existence of subsets with stereotyped BCR [31,32].In contrast, little is known about the IGHV features of Asian CLL, although some differences have been reported in small cohorts of Chinese and Japanese patients [24,25].In order to define geographic and ethnic immunoglobulin gene variations, we characterized IGHVDJ rearrangements and HCDR3 stereotypy in a large series of Chinese CLL and compared them with Italian CLL.Our Italian cohort is representative of Caucasian CLL, being comparable with other Italian [30] and Western series [28].
A mutated IGHV status was more frequent in Chinese than in Italian CLL, thus consolidating previous observations made in small Asian series [24,25].This holds true also when considering patients in homogeneous phases of disease.
With regard to the IGHV subgroups, Chinese CLL showed an IGHV3>IGHV4>IGHV1 order of frequency, significantly different from Italian CLL patients, who showed an IGHV3>IGHV1>IGHV4 distribution.These results support those reported on both Chinese [24] and Japanese [25,33] CLL, confirming that this family usage is typical of Asian CLL.
Focusing on the IGHV gene distribution, also Chinese CLL showed a non-random use of immunoglobulin genes, with the IGHV4-34 being the most represented.Moreover, IGHV1-69 and IGHV1-2 were significantly under-represented in Chinese CLL, whilst IGHV4-39 and IGHV3-7 were more recurrent in Chinese than Italian CLL.Furthermore, the IGHV4-59, IGHV3-74 and IGHV1-3 genes were common in Chinese CLL, whilst rare in Italian CLL.
The known pattern of mutations for each specific IGHV gene was largely conserved, i.e. the IGHV1-69 gene shows analogous molecular features in Caucasian and in Chinese CLL, being unmutated and representing the most frequent gene of the IGHV1 subgroup [28,34].Thus, the low recurrence of the IGHV1-69 gene in Chinese CLL largely accounts for the lower overall frequency of the IGHV1 subgroup and of unmutated cases.On the other hand, the high incidence of the IGHV3-7, IGHV4-39 and IGHV3-74 genes could be responsible of the high representation of mutated IGHV, since these genes often show a mutated profile.Relevant exceptions are represented by the IGHV1-2 gene, that was significantly more mutated in Chinese than in Italian CLL, and the IGHV3-21 gene.The latter, equally represented in Chinese (3%) and Italian CLL (3.8%), was more frequently mutated in China.
Our results reinforce those reported in other small Asian CLL series [24,25], especially with regard to IGHV1-69 and IGHV3-7, strongly supporting the contribution of ethnic and geographic parameters in shaping the IGHV repertoire.Moreover, the same pattern is also reported in Iranian CLL [27], suggesting that this geographic-related variation is already established in the Middle-East.Thus, this different pattern between Eastern and Western CLL adds to the other known geographic variation related to the IGHV3-21 gene, over-represented in Northern European CLL compared to Mediterranean CLL [29].
We also identified additional differences in the frequencies of IGHD and IGHJ genes.Among others, a significant under-usage of IGHD3-3 and IGHJ6 genes was observed in the Chinese cohort.Given the specific combination of IGHV1-69 with the IGHD3-3 and IGHJ6 genes in Caucasian CLL [28,34], their low occurrence in Chinese CLL could be related to the low frequency of IGHV1-69.To our knowledge, these results have never been previously reported.
The analysis of HCDR3 sequences proved the presence of stereotyped BCR in one-third of Caucasian CLL [28,30,31], suggesting the recognition of a common antigenic determinant [35].We therefore characterized the HCDR3 sequences in Chinese CLL, which have not been investigated so far, for their potentially relevant pathogenetic implications.Chinese CLL showed a significantly lower representation of stereotyped sequences (19.7%) compared to Italian CLL (25.8%).The stereotyped subsets defined major among Caucasian CLL collectively were over-represented also in the Chinese cohort, but with a significantly higher frequency of subset #8 (China vs Italy: 27.3% vs 4.3%) and a lower frequency of subset #2 (China vs Italy: 1.5% vs 10.4%).Subset #8, the most represented Chinese stereotype, is characterized by the use of the IGHV4-39 gene, which is indeed one of the most recurrent gene in Chinese CLL.On the contrary, the low frequency of subset #2 is due to the heterogeneous HCDR3 of IGHV3-21 in Chinese CLL, whilst in Caucasian CLL the IGHV3-21 gene is mostly stereotyped [29].The low frequency of subset #2 in Chinese CLL could parallel the recently reported low incidence of SF3B1 mutations [23], in line with their reported association in Caucasian CLL [36].Of note, the clinical impact of CLL stereotypy is independent of the ethnic context, since in Chinese CLL subset #1 displayed a poor prognosis and subset #8 was related to an increased risk of transformation into Richter syndrome, similarly to Caucasian CLL [37,38].
With the exception of the IGHV3-21 gene, it is worth noting that all the other IGHV genes (i.e.IGHV1-69, IGHV4-39, etc) maintain the same proportion of homologous/heterologous BCR both in Chinese and in Italian CLL.Thus, the determinant of the lower frequency of stereotyped BCR in Chinese CLL could be due to the prevalence of certain genes (i.e.IGHV3-7 and IGHV3-74) and the reduced frequency of others (i.e.IGHV1-69), each maintaining the same tendency to stereotypy.
By clustering analysis, we detected 16 new paired clusters among Chinese cases, characterized by the remarkable recurrence of the IGHV3-23 gene; contrariwise, in Caucasian CLL this gene constantly exhibits an heterogeneous HCDR3 [39].Moreover, IGHV3-23 gene usage is considered an independent negative prognosticator within Caucasian mutated CLL [39], but this was not the case among Chinese CLL.
In line with the proposal of Ghia et al. for Caucasian CLL [29], also in Chinese CLL the IGHV pattern may be due to either a specific genetic background and/or to the effects of potential environmental variables.The observation of new paired clusters in Chinese CLL, never reported in Caucasian CLL, supports the hypothesis of an Asian-specific antigenic selection at least in some cases.To further investigate the importance of the genetic background in determining the occurrence of different rearrangements in distinct geographic and/or ethnic groups, the normal IGHV repertoire of healthy subjects of the same ethnicity and from the same geographic areas should also be investigated.Although some evidences are reported for Italians [40], at present no data on the normal repertoire are available for Chinese population.
In conclusion, the results hereby described offer the most extensive catalogue of the BCR features of Chinese CLL, being based on the largest series of Chinese CLL so far reported.In comparison to Italian CLL, most of the Chinese IGHV genes show different frequencies, but maintain the same propensity towards mutations and stereotypy, with the same clinical impact.The distinctive molecular features of IGHV3-21 and IGHV3-23 Chinese CLL genes deserve further investigation.

Patients
A series of 623 patients diagnosed with CLL from 3 different institutions situated in China -Nanjing (338 cases) and Tianjin (166 cases) -and Hong Kong (119 cases), were included in the study.A comparison of the three CLL groups, all of Han Chinese ethnicity, showed a high degree of internal similarities with regard to the somatic hypermutation status and IGHVDJ rearrangements.Therefore, the whole cohort of Chinese CLL was compared to an Italian cohort of 789 CLL patients collected between 2001 and 2014 at the Hematology Institute of the "Sapienza" University of Rome.The Italian CLL cohort was comparable to the Mediterranean series [29,30], and to those reported by Agathangelidis et al [28] (Supplementary Table S1).

Analysis of immunoglobulin rearrangements and sequence analysis
In all cases, the analysis of IGHVDJ genes was carried out on leukemic cells obtained from peripheral blood samples after isolation by Ficoll gradient or buffy-coat.PCR amplification and sequence analysis of IGHVDJ rearrangements were performed on either genomic DNA (gDNA) or cDNA using sense familyspecific VH primers (framework region 1 [FR1] or VH leader primers), combined with consensus JH primers as previously described [42] or following the IGH Somatic Hypermutation Assay v2.0 protocol (InVivoScribe) [24].PCR products were sequenced directly or after a cloning procedure, using 3130 Genetic Analyzer (Life Technologies, Carlsbad, CA).
The following features were evaluated for all IGHVDJ rearrangements: IGHV gene and allele usage, percentage of identity to the closest germline IGHV allele, HCDR3 length and composition calculated between codons 107 and 117, and IGHD-IGHJ gene usage.
To identify clusters of sequences with common HCDR3 motifs, we evaluated the HCDR3 region by the multiple sequence alignment ClustalW2 software (http:// www.ebi.ac.uk), followed by a manual curation.Clustering was performed comparing our HCDR3 sequences to those present in literature's databases [28,45,36].
To identify subsets of similar HCDR3, we used several criteria.First, for major subsets we followed the criteria proposed by Agathangelidis et al. [28] : i) sharing at least 50% amino acid identity and 70% similarity calculated through common sequence patterns; ii) having identical HCDR3 lengths and identical offsets of shared patterns between sequences; and iii) carrying IGHV genes of the same clan.Second, we identified homologous HCDR3 as those which shared a HCDR3 homology equal to or exceeding 60%, regardless of the usage of the IGHV gene: groups of 2 CLL cases with homologous HCDR3 were defined as paired clusters; clusters of 3 or more cases were defined as subsets.Known stereotyped HCDR3s were defined and named according to published criteria: for major subsets by Agathangelidis et al. [28]; for minor subsets by Murray et al. [45] and for novel subsets by Rossi et al. [36].

Statistical analysis
Characteristics of patients were summarized by means of cross-tabulations (categorical variables), quantiles (median etc; for ordinal factors) or by means of standard positional and variation parameters (mean, standard deviation; for continuous variables).Nonparametric tests were applied, in univariate analysis, for comparisons between groups (Chi-Squared and Fisher Exact test for difference in terms of categorical variables or response rate, Mann-Whitney and Kruskal-Wallis test for difference in terms of continuous variables).
Overall survival was estimated using the Kaplan-Meier Product Limit estimator.Differences were evaluated by means of Log-Rank test after assessment of proportionality of hazards.
TFI was estimated using the proper non-parametric method; the Gray test was applied for significance tests on cumulative incidence curves.
All the analyses were performed using the SAS system software (version 9.4); all tests were two-sided, at a significance level of 0.05 and confidence intervals were calculated at 95% level.

Figure 1 :
Figure 1: Comparison between Chinese and Italian CLL. A. Frequency of the IGHV subgroups.B. Frequency of the IGHV genes.* marks the significant differences (p values in the text).Data are represented as percentage.

Figure 2 :
Figure 2: Mutational Patterns of IGHV genes.A. Chinese CLL.B. Italian CLL.Data are represented as percentage.

Figure 3 :
Figure 3: Stereotyped BCRs representation.A. Chinese and Italian stereotyped and non-stereotyped BCRs.B. Frequency of Major subsets in Chinese and Italian series.* marks the significant differences (p values in the text).Data are represented as percentage.

Figure 4 .
Figure 4. Frequency of stereotyped and heterogeneous BCRs for selected IGHV genes.A. Chinese CLL.B. Italian CLL.Data are represented as percentage.