Improved survival outcomes and relative youthfulness of multiple myeloma patients with t(4;14) receiving novel agents are associated with poorer performance of the revised international staging system in a real aging society

The Revised International Staging System (R-ISS) was developed for a more accurate risk stratification of patients with symptomatic multiple myeloma (MM). However, original and subsequent validation studies of the R-ISS included relatively younger patients, many of whom were treated without bortezomib. Hence, we investigated the real-world prognostic performance of the R-ISS in 400 patients with MM treated with novel agents in Japan, an aging society. The patients had a median age of 72 years, and 96.0% were treated with bortezomib. Patients in R-ISS stage II were significantly older and failed to show significantly longer overall survival (OS) compared to patients in R-ISS stages III (median age; 74 and 70 years, respectively; P = 0.001, and median OS; 63.4 vs. 54.7 months, respectively; P = 0.32). However, OS differed significantly among patients with all conventional ISS stages. ISS stage III patients recategorized to R-ISS stage III were significantly younger than those recategorized to R-ISS stage II and had a relatively longer OS. As a reason for these findings, patients with the high-risk cytogenetic abnormality t(4;14) were significantly younger and had an improved OS compared to others, which can be attributed to a young age and bortezomib therapy, as previously suggested. In conclusion, the R-ISS was less successful than the ISS in discriminating between stages II and III among bortezomib-treated patients with MM in an aging society, which might be attributable to the inclusion of t(4;14) in the R-ISS categorization strategy.


INTRODUCTION
To date, numerous risk stratification systems have been developed and validated for multiple myeloma (MM). Of these, the International Staging System (ISS), which was introduced in 2005 [1], is among the most representative. The ability of this system to provide very simple but robust survival predictions in patients with MM, has been validated in many independent cohorts [2][3][4]. Since the introduction of the ISS, several studies have elucidated the prognostic significance of some cytogenetic abnormalities (CA) detected using interphase Research Paper fluorescence in situ hybridization (iFISH), including del(17p), t (4;14), and t (14;16); as well as elevated serum levels of lactate dehydrogenase (LDH), in patients with MM [5][6][7][8][9].
In 2015, the International Myeloma Working Group developed the Revised ISS (R-ISS), which combined ISS with the status of high-risk CAs (detected by iFISH) and serum levels of LDH, to identify three MM entities with clearly different outcomes [10]. However, although MM tends to affect older adults, the original and validation studies of the R-ISS included relatively younger patients (median age: ~65 years) [11][12][13][14][15]. This is of particular concern, given the rapidly aging global population in many developed Western countries. Additionally, several studies have reported that the prevalence of high-risk CAs tend to be higher among younger patients [16,17], suggesting that the inclusion of only these patients in prognostic studies might not reflect the intrinsic prognostic value of high-risk CAs, especially in terms of overall survival (OS). Furthermore, although studies reported that bortezomib-containing therapies can reverse the unfavorable prognostic impact of t(4;14) CA [18,19], most original and validation studies of the R-ISS included few patients who had been treated with bortezomib. Therefore, the results of these studies are less applicable to patients in the era of novel targeted agents.
Today, Japan is considered one of the most aged countries worldwide and a potential example of future situations for other developed countries. In addition, the majority of current Japanese patients with MM are treated with bortezomib. Therefore, we hypothesized that the R-ISS-based prognostication of patients with MM may be somewhat unreliable in Japan. Here, we analyzed the prognostic performance of the R-ISS using a real-world cohort of Japanese patients treated in the era of novel targeted agents.
As previously described [21], ISS stage I included younger patients (median age: 68 years) when compared to patients with stages II and III disease, whereas these latter two stages did not differ significantly regarding age (median: both 73 years; P = 0.58). Accordingly, more patients were treated with autologous stem cell transplantation (ASCT) in ISS and R-ISS stage I compared with other stages. R-ISS stages II and III differed significantly in terms of age, with the former including significantly older patients than the latter (median ages: 74 and 70 years, respectively; P = 0.001). No significant differences in the uses of bortezomib and lenalidomide as well as in the induction regimens (doublet vs. triplet) were observed across the R-ISS stages.

Comparison of the prognostic performances of ISS and R-ISS for OS
The Kaplan-Meier OS curves according to the ISS and R-ISS stages are shown in Figure 1A and 1B, respectively. The three groups of patients categorized by ISS stage differed significantly in terms of survival duration (median OS: 106.2, 67.1, and 49.7 months for ISS stages I, II, and III, respectively; P = 0.013, <0.001, and 0.009 for ISS stage I vs. II, stage I vs. III, and stage II vs. III, respectively). In contrast, no significant differences in OS were observed between R-ISS stages II and III (median OS: 63.4 and 54.7 months, respectively; P = 0.32). However, patients in R-ISS stage I had a significantly longer OS (median: not reached) than those in R-ISS stages II and III (P < 0.001 for both R-ISS stage I vs. II and stage I vs. III). There were 131 (32.8%) patients who were censored by the end of year 5.
Receiver operating characteristics (ROC) curves were developed to compare the prognostic performances of the ISS and R-ISS. Notably, the area under the curve (AUC) was significantly greater for the ISS than for the R-ISS (0.659 vs. 0.608, respectively; P = 0.029, Figure 2A). We performed multivariate analyses for each system adjusting for age to evaluate their capability to discriminate between stage II and III. The ISS predicted a significantly poorer OS for patients with stage III compared to stage II even after adjusting for advanced age (≥70 years), whereas R-ISS failed to show significant discrimination capability in similar settings (Supplementary Table 2A and 2B).
Next, we analyzed OS according to the ISS and R-ISS stages in different age groups (younger or older than 70 years; Supplementary Figure 1). The R-ISS yielded a poor performance relative to the ISS in distinguishing OS between younger patients in stages II and III (Supplementary Figure 1A and 1B). However, no poor performance, relative to the ISS in distinguishing OS between older patients in stages II and III (Supplementary Figure 1C and 1D) was shown. No significant difference in the AUC was detected between the ISS and R-ISS among younger (0.686 vs. 0.653, respectively; P = 0.34) or older patients (0.592 vs. 0.549, respectively; P = 0.29).

Clinical characteristics and OS among patients upgraded or downgraded during recategorization from the ISS to the R-ISS
To identify the cause of above findings, we determined five groups of patients based on the following recategorizations: ISS stage I to R-ISS stage I (group A, 66 patients), ISS stage I to R-ISS stage II (group B, 32 patients), ISS stage II to R-ISS stage II (group C, 121 patients), ISS stage III to R-ISS stage II (group D, 90 patients), and ISS stage III to R-ISS stage III (group E, 91 patients), (Supplementary Figure 2). Although the length of OS was expected to decrease from one group to another, Group D actually had a shorter OS than Group E, although this difference was not significant (median OS: 41.6 and 54.7 months, respectively; P = 0.37, Figure 3). The clinical characteristics of patients in Groups D and E are shown in Supplementary Table 3. t(4;14) was the most frequently observed high-risk CA among patients in Group E, detected in approximately 30% of patients. Accordingly, patients in Group D were significantly older than those in Group E (median ages: 77 and 70 years, respectively; P < 0.001). However, no significant intergroup differences were observed in serum albumin, beta 2-microglobulin (B2M), and creatinine levels; hemoglobin concentration; prevalence of Durie-Salmon stage III; and therapeutic regimen.

Age distribution and survival in patients with or without high-risk CAs and elevated LDH levels
Supplementary Table 4 presents the age distributions according to the presence of high-risk CAs and elevated LDH levels. Patients with high-risk CAs were significantly younger than those without (median: 69 and 73 years, respectively; P < 0.001). Specifically, patients harboring t(4;14) and t(14;16) were significantly younger than those without either CA [median ages: 68 and 73 years, respectively; P = 0.021 for t (4;14), and 65 and 72 years, respectively; P = 0.009 for t(14;16)], whereas no significant difference in age was observed between patients with and without del(17p). Considering the younger age of the patients with t(4;14), more patients in this group were treated with ASCT compared with other patients [17/46 (37.0%) vs 79/354 (22.3%) patients; P = 0.042]. Furthermore, patients with lower and higher LDH levels did not differ significantly in terms of age. Figure 4 summarizes the OS outcomes according to the presence of high-risk CAs and LDH levels. Notably, OS did not differ significantly among those with any high-risk CA, relative to those without (median OS: 60.3 and 72 months, respectively; P = 0.15, Figure 4A); in contrast, patients with del(17p) had a significantly shorter OS, relative to those without (median OS: 41.8 and 75.0 months, respectively; P = 0.001, Figure 4B). OS did not differ significantly between patients with and without t(4;14) (median OS: 85.2 and 68.7 months, respectively; P = 0.48, Figure 4C). Patients with t(14;16) had a shorter OS than those without, although this difference was also not statistically significant (median OS: 37.3 and 71.9 months, respectively; P = 0.21, Figure 4D). Patients with a higher LDH level had a significantly shorter OS, compared to those with a lower LDH level (median OS: 46.6 and 75.0 months, respectively; P = 0.005, Figure 4E).

Modification of the R-ISS improved the prognostication for OS
We again divided the patients into three groups using a modified R-ISS (mR-ISS) categorization in which only t(14;16) and del(17p) were included as highrisk CAs. Accordingly, the patients were categorized as follows: mR-ISS stage I included patients in ISS stage I without t(14;16) or del(17p) detected by iFISH, nor elevated LDH levels; mR-ISS stage III included patients in ISS stage III with either t(14;16) or del(17p) detected by iFISH, or elevated LDH levels; while mR-ISS stage II included patients who did not meet the criteria for mR-ISS stage I or III (Supplementary Figure 3). Accordingly, 71 (17.8%), 256 (64.0%), and 73 (18.3%) of our patients were classified as mR-ISS stages I, II, and III, respectively.
The three groups categorized by mR-ISS stage exhibited significant differences in survival (median OS: not reached, 63.6, and 38.1 months for mR-ISS stages I, II, and III, respectively; P < 0.001, <0.001, and 0.004 for stage I vs. II, stage I vs. III, and stage II vs. III, respectively, Figure 1C). On multivariate analysis, mR-ISS predicted a significantly poorer OS for patients with stage III compared to stage II even after adjusting for advanced age (≥70 years) (Supplementary Table 2C). We additionally developed an ROC curve for the mR-ISS, and the resulting AUC value was significantly greater than that obtained for the R-ISS (0.657 vs. 0.608, respectively; P = 0.001, Figure 2B). However, AUC values did not significantly differ between the ISS and mR-ISS (0.659 vs. 0.657, respectively; P = 0.92, Figure 2C).

DISCUSSION
In this study, we investigated the prognostic performance of the R-ISS in a real-world context, namely, aging society patients who were treated with novel agents for symptomatic MM. As in the original study of the R-ISS, we aimed to analyze OS as the primary endpoint. Our analysis revealed a rather poor performance of the R-ISS relative to the ISS in distinguishing OS prediction between patients classified as R-ISS stages II and III. These findings may be of interest because one of the primary objectives of the R-ISS establishment was to identify patients with extremely poor prognosis as R-ISS stage III patients.
Several reliable studies have validated the utility of the R-ISS using cohorts independent from that used in the original study [11][12][13][14][15]. However, the patients included in these studies were uniformly younger than those in our cohort. This difference in age might partly explain why our study was the only one that failed to detect the superiority of the R-ISS relative to the ISS. Additionally, t(4;14), the most frequently occurring high-risk CA, is considered a primary genetic event in plasma cell disorders [20,22,23], and is known to correlate with disease progression and a younger age  [16,17,[24][25][26]. In our real-world cohort in an aging society, the association between the performance of the R-ISS and the relatively younger age of patients with t(4;14) might have been amplified by including patients with a wide age range. Indeed, patients in R-ISS stage III more frequently presented with t(4;14), and were significantly younger than those in stage II; a result of the clear age difference between Groups D and E. However, no other component of R-ISS stage III, such as ISS stage III (vs. stage II) or an elevated LDH level, was associated with a significant age difference. Possibly, the younger ages of patients with R-ISS stage III might allow them to tolerate more intensive treatment including ASCT, which would offset the aggressiveness of the disease characterized by highrisk CAs and elevated LDH levels. Furthermore, being of a younger age per se might contribute to a favorable OS, as previously described [21,27]. Although Group B might also have theoretically affected the prognosis of R-ISS stage II, the number of patients in this group was too small (relative to Group D) to have a significant influence on OS among patients categorized as R-ISS stage II.
As we expected, a sub-analysis of only older patients failed to yield a similar trend to that observed in the entire cohort, probably because they included few patients with t(4;14), a key subjects for our findings; suggesting that the phenomenon we found in this study necessitated the inclusion of not only older patients, but patients with a "wide age range" including both younger and older patients, as observed in real-world settings.
We further noted that almost all patients in our cohort received treatment with bortezomib, in contrast to previous studies [11][12][13][14][15]. Consistent with previous reports [18,19], the present study demonstrated improved survival outcomes among patients with t(4;14); although those with other high-risk CAs had worse OS outcomes despite the use of novel agents. The increased frequency of bortezomib use and the relative youthfulness of patients harboring t(4;14) might synergistically improve OS in this population, which would undermine the performance of the R-ISS. Particularly, the impact of improved survival outcomes of these patients through the use of new therapeutic agents might be a positive point for our findings, because relatively extended OS in R-ISS stage III was observed even among the younger patients. Our additional analysis based on the mR-ISS, which was established by excluding t(4;14) as a high-risk CA, considerably improved the prognostication of OS, suggesting that t(4;14) might not be a reliable highrisk CA to be used in risk stratification systems [28]. The performance of the mR-ISS was not necessarily better than that of ISS, probably due to the excessively increased number of patients with mR-ISS stage II. It also remains questionable whether t(14;16) should be considered a highrisk CA, given the lack of reliable data regarding this rare abnormality [29]. Although t(14;16) was also observed more frequently in younger patients, similar to a previous study [16], the prognostic impact of this CA in the present study was not favorable.
We further note that our cohort included more patients in ISS stage III. These patients were considered key subjects, given the possible mechanisms described above. The particular distribution of patients in our study, which could be explained by the advanced age in our cohort [21], might have intensified the effect of categorization on the performance of the R-ISS. Accordingly, our results do not necessarily question the usefulness of the R-ISS, but rather, and more importantly, our work indicated that a careful interpretation of the R-ISS may be needed when applying this system individually to patients in real-world settings. As noted above, younger patients in R-ISS stage III who harbor t(4;14) may achieve a considerably longer OS when compared to relatively older patients in R-ISS stage II, even if both receive intensive treatment with bortezomibcontaining regimens.
The present study is limited by its retrospective nature, heterogeneous treatment regimens, and relatively small sample size. In addition, it did not include details of progression-free survival along with the OS, mainly because of the unexpected discontinuation of chemotherapy in elderly patients, a frequent observation in real-world settings. Besides, the iFISH methods were not identical across the hospitals included in this study. Despite these limitations, our study findings highlight the limited usefulness of the R-ISS in the context of reasonable, naturally occurring mechanisms among patients with MM who are treated with novel agents in an aging society.
In conclusion, our study is the first to suggest that the performance of the R-ISS may be limited in discriminating OS between stage II and III when applied to real-world patients with MM who are treated with novel agents in aging populations. Furthermore, we suggest that this limitation may be attributed to the inclusion of t(4;14) as a high-risk CA in the R-ISS categorization strategy. This potential limitation suggests that the R-ISS should be carefully interpreted on an individual basis when applied to patients in a real-world setting. Our findings are of particular interest because many developed countries, including Western countries, are approaching a period of super-aging such as that observed currently in Japan. However, further studies are needed to validate our findings and develop more appropriate prognostic systems.

Study design and patients
This study retrospectively analyzed the data from 400 consecutive patients who were newly diagnosed with MM and received chemotherapy between January 2006 and December 2017 at Kameda Medical Center, www.oncotarget.com Chiba, Japan; Keiju Kanazawa Hospital, Ishikawa, Japan; and National Hospital Organization Okayama Medical Center, Okayama, Japan. The patients' background and outcome data were obtained from electronic medical records, and the diagnoses and treatment responses were evaluated using the International Myeloma Working Group criteria. We included only patients who had been treated with novel agents (e.g., immunomodulatory agents or proteasome inhibitors) to reduce the prognostic impact of heterogeneity during chemotherapy. Written informed consent was obtained from all the patients or their families. The study was conducted according to the Declaration of Helsinki and was approved by the review boards of each institution.
ISS and R-ISS categorizations were performed as previously described [1,10]. Briefly, R-ISS stage I included patients categorized as ISS stage I [a serum B2M level <3.5 mg/L and serum albumin level ≥3.5 g/dL] with neither an iFISH-detected high-risk CA [including del(17p), t(4;14), or t(14;16)], nor an elevated LDH level (above the upper limit of normal). R-ISS stage III included patients categorized as ISS stage III (a serum B2M level >5.5 mg/L) and either an iFISHdetected high-risk CA or an elevated LDH level. R-ISS stage II included all patients not classified as R-ISS stage I or III. Bone marrow samples were subjected to iFISH according to the standard methods for each institution with (Kameda Medical Center, n = 261) or without (Keiju Kanazawa Hospital and Okayama Medical Center, n = 139) CD138+ plasma cell enrichment using anti-CD138-coated magnetic MicroBeads (Miltenyi Biotech, San Diego, CA, USA). Patients were considered positive for a given CA when it was present in a percentage higher than the cutoff threshold, defined by each local laboratory. In case iFISH was performed with CD138 enrichment, the cutoff values for t(14;16), t(4;14), and del(17p) were ≥10%, ≥10%, and ≥20%, respectively [30]. In case iFISH was performed without CD138 enrichment, the cutoff values were based on the upper limit of 95% confidence interval for the expected false positive rate.

Statistical analysis
For continuous variables, normally distributed data were presented as means and standard deviations, and non-normally distributed data were presented as medians and interquartile ranges (IQRs). The relationships of the baseline characteristics with the ISS stage, R-ISS stage, or high-risk CA status were compared using the one-way analysis of variance, Kruskal-Wallis, or chi-squared test as appropriate.
We additionally analyzed and compared the OS to elucidate the prognostic relevance of the R-ISS in an aging society. The OS durations were calculated from the date of the initial diagnosis to the date of death from any cause. The probability of OS was estimated using the Kaplan-Meier method and compared using the logrank test. We further constructed an ROC curve to predict death within 5 years, according to each prognostic system. Patients who were alive at the last follow-up and had an observation period of <5 years were censored. Differences in the AUCs were compared using DeLong's approach [31]. Cox proportional-hazards analyses were used to adjust for possible confounding factors. A two-tailed P value <0.05 was considered statistically significant. All statistical analyses were performed using R version 3.1.2 (R Foundation for Statistical Computing, Vienna, Austria).

Author contributions
Y.A. conceived, designed, and initiated the study, collected data, performed statistical analysis, wrote the manuscript, and provided patient care. K.S., T.Y., M.U., and H.T. collected data and provided patient care. K.N., H.K., A.K., and M.T. provided patient care. K.M. supervised the study, collected data, wrote the manuscript and provided patient care.