Development and validation of two prognostic nomograms for predicting survival in patients with non-small cell and small cell lung cancer

Purpose This study aimed to construct two prognostic nomograms to predict survival in patients with non-small-cell lung cancer (NSCLC) and small-cell lung cancer (SCLC) using a novel set of clinical parameters. Patients and Methods Two nomograms were developed, using a retrospective analysis of 5384 NSCLC and 647 SCLC patients seen during a 10-year period at Xiang Ya Affiliated Cancer Hospital (Changsha, China). The patients were randomly divided into training and validation cohorts. Univariate and multivariate analyses were used to identify the prognostic factors needed to establish nomograms for the training cohort. The model was internally validated via bootstrap resampling and externally certified using the validation cohort. Predictive accuracy and discriminatory capability were estimated using concordance index (C-index), calibration curves, and risk group stratification. Results The largest contributor to overall survival (OS) prognosis in the NSCLC nomogram was the therapeutic regimen and diagnostic method parameters, and in the SCLC nomogram was the therapeutic regimen and health insurance plan parameters. Calibration curves for the nomogram prediction and the actual observation were in optimal agreement for the 3-year OS and acceptable agreement for the 5-year OS in both training datasets. The C-index was higher for the NSCLC cohort nomogram than for the TNM staging system (0.67 vs. 0.64, P = 0.01) and higher for the SCLC nomogram than for the clinical staging system (limited vs. extensive) (0.60 vs. 0.53, P = 0.12). Conclusion Treatment regimen parameter made the largest contribution to OS prognosis in both nomograms, and these nomograms might provide clinicians and patients a simple tool that improves their ability to accurately estimate survival based on individual patient parameters rather than using an averaged predefined treatment regimen.


INTRODUCTION
Cancer is the second most common cause of death globally and lung cancer is the leading cause of cancer deaths worldwide [1]. Lung cancers can be divided into two categories, non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC), which account for 85% and 15% of cases, respectively [2]. For patients with early-stage NSCLC, radical resection is the most common and potentially curative treatment, whereas www.impactjournals.com/oncotarget/ Oncotarget, 2017, Vol. 8, (No. 38), pp: 64303-64316 Research Paper patients with late-stage NSCLC receive a combination of adjuvant chemotherapy with complete resection [3]. Adjuvant radiotherapy and chemotherapy are typically recommended for high risk patients with lung cancer and more aggressive malignancies involving lymph nodes or residual cancer [4]. Some studies used stereotactic ablative radiation therapy for early-stage NSCLC treatment [5]. The treatment for SCLC is less effective than the treatment for NSCLC [6]. The prognosis for patients with lung cancer remains poor with an overall 5-year survival rate of approximately 15% [7]. The discriminatory value of prognostic biological markers is insufficient to predict an individual's overall survival (OS) [8], and survival time has unfortunately improved little in recent decades [9].
The survival time of patients with the same cancer stage varies widely [10,11]. Insights into the causes of this variation could be applied for development improved predictive models and new treatment strategies. Because of the inability to identify diagnostic biomarkers using traditional strategies, alternative strategies for identifying diagnostic parameters are of paramount importance. Efforts have focused on expanding the scope and number of parameters studied, and then using this expanded informational content for stratification of patients into risk groups, based on the association of these parameters with disease trajectories and patient outcomes. Statistical modeling theory states that, as the number of parameters increases, the patient cohort size incrementally reduces until the cohort comprises only a few or one patient. Personalized patient care improves predictive prognostics and enhances clinicians' capacity to individualize healthcare. This increases therapeutic efficacy and facilitates early treatment, thereby improving outcomes, which highlights the long-term potential of the strategy to increase the overall quality of healthcare while reducing costs. Prognostic models are based on the statistical analysis of multiple parameters such as age, gender histology, number of harvested lymph nodes, metastatic information, serum diagnostics and treatment-related factors [4,[11][12][13]. Multiparameter analysis with patient stratification has resulted in prognostic models that have improved our ability to predict lung cancer patient survival. To fulfill the full potential of the strategy, additional schemes for stratifying patient cohorts are urgently needed.
A nomogram is a graphical calculator that is based on regression models, and has become a popular tool for building predictive models. By creating an intuitive graph of a statistical predictive model, nomograms are accurate and precise tools for estimating risk by correlating the relationship between parameters and various cancer prognosis parameters such as metastatic probability, OS, and recurrence probability [14]. The development of personalized predictors of cancer patient survival is of vital importance for clinicians and patients, both of whom are involved in making treatment decisions. For several types of cancers, nomograms generate more precise predictions, compared to the traditional tumor-node-metastasis (TNM) staging classification system [4,15,16].
Nomograms covering a wide range of parameters have been constructed for NSCLC and SCLC. Demographic and clinical parameters used to construct nomograms include age, gender occupation, health insurance plan, and diagnostic method. However, few nomograms include the therapeutic regimen parameter as a prognostic factor for predicting OS [6,[17][18][19]. Furthermore, few studies have specifically focused on OS prognostics among lung cancer patients. We therefore conducted a retrospective study to explore the prognostic value of the therapeutic regimen parameter for predicting OS among lung cancer patients., In this paper, we developed two nomograms that incorporated both the standard and nonstandard parameters of therapeutic regimen, health insurance plan, and diagnostic method to determine if these parameters improve the ability of the nomogram to predict OS outcomes among NSCLC and SCLC patients.

The screening process and the clinicopathologic characteristics of the patients
In the primary NSCLC database (comprising 5384 patients), patients who had missing information on clinical stage (661 patients), smoking history (354 patients), clinical T stage (117 patients), clinical N stage (19 patients), and diagnostic method (13 patients) were excluded from the NSCLC database based on screening criteria. Finally, 4220 NSCLC patients and 643 SCLC patients were included.
In the NSCLC cohort, 3149 events (i.e., deaths) occurred during a median follow-up time of 6.5 years (range, 4 days-11.5 years). The median survival time was 2.3 years (95% confidence interval [CI], 2.2-2.4 years). In the SCLC group, 523 events (i.e., deaths) were identified and the median follow-up time was 6.3 years (range, 8 days-10.8 years). The median survival time for the SCLC group was 1.7 years (95% CI, 1.5-1.8 years). The demographic and clinicopathologic characteristics of the NSCLC and SCLC cohorts are listed in Table 1.

Independent prognostic factors in the training set of NSCLC and SCLC patients
The univariate analysis results of NSCLC and SCLC patients in the training set are listed in Table 1.
Among all occupations in the NSCLC training dataset, farmers had the lowest survival rates, followed by public sector employees, freelance or self-employed In the NSCLC group, higher education levels were associated with a better prognosis (P<0.001). In the SCLC group, patients who attended junior and senior high school had better survival rates than patients who attended primary school or below or patients who were undergraduates or over (P=0.002). Health insurance plan also impacted survival (NRC > URB > Self > UEB > Other; P=0.002), whereas other factors were not correlated with survival (Table 1).
A backward (i.e., step-down) Cox regression analysis was used to model the prognostic predictive value of several parameters from the NSCLC training cohort, which were age, occupation, health insurance plan, clinical T stage, N stage, and M stage, central location, differentiation phase, diagnostic method, and therapeutic regimen. In the SCLC group, the model included gender, health insurance plan, clinical stage, and therapeutic regimen ( Table 2).

Prognostic nomogram for survival
A nomogram that incorporated selected prognostic factors was established ( Figure 1, Supplementary Table  1). The plot of the NSCLC patients shows that the therapeutic regimen and diagnostic method parameters accounted for the largest contribution to OS prognosis. For the survival rate, the contribution of the clinical N stage, health insurance plan, clinical T stage, clinical M stage, occupation, age, central location, and differentiation phase parameters were significantly lower. The nomogram for SCLC with patients showed that health insurance plan and therapeutic regimen parameters provided the greatest contribution, followed by gender and clinical stage parameters. Each variable was assigned a score on the point scale, which made it possible to draw a straight line down to the survival line from the total point scale by adding up the score of each variable to get the estimated probability of survival at each time point.

Calibration and validation of the nomogram
The calibration data for the 3-year OS plots had optimal agreement between the nomogram prediction and actual observation, whereas the 5-year OS data had acceptable agreement for the NSCLC and SCLC cohort training datasets. Calibration of the validation dataset showed nearly the same results as the training dataset ( Figure 2). The C-index of our nomogram for predicting OS was higher in the primary cohort (0.67; 95%CI, 0.65-0.69), compared to the C-index values obtained from the TNM staging system in the NSCLC cohort (0.64; 95%CI, 0.62-0.66; P = 0.01). The C-index of our nomogram (0.60; 95%CI, 0.55-0.65) was superior to clinical stage, limited / extensive stage, and diagnosis (0.53; 95%CI, 0.47-0.59) for the SCLC cohort, although this was not significantly different (P =0.12).

Performance of the nomogram in stratifying patient risk
Three cutoff values for the NSCLC training cohort were determined by grouping patients into four subgroups, after sorting their respective total scores (score: 0-13.9, 14.0-17.0, 17.1-20.2, and ≥20.2). Each subgroup showed a distinct prognosis between the Kaplan-Meier curves within four clinical stages(P<0.001 for all; Supplementary  Table 1 and Figure 3A). When these cutoff values were applied to the validation cohort dataset, the plots also represented a significant distinction beyond the TNM categories (P < 0.001 for all, Figure 3B).
We also grouped the SCLC dataset into four subgroups in the training and validation cohorts (score: 0-10.7, 10.8-13.5, 13.6-16.6, and ≥16.7). From the plot, the Kaplan-Meijer curves in the training cohort demonstrated significant distinction prognosis beyond the limited stage (P=0.001) and the extensive stage (P<0.001). In the validation cohort, only the extensive stage curves showed a significant difference between the four subgroups (P=0.0435, Figure 4).

DISCUSSION
Nomograms reliably quantify risk by incorporating and illustrating important factors for oncologic prognoses. In several types of cancers, nomograms generate a more precise prediction, compared to traditional TNM staging systems. The aim of the current study these studies was to build on previous efforts to develop prognostic nomograms for predicting survival rates in patients with non-small cell and small cell lung cancers. The goal was to identify additional prognostic parameters that could be used to predict optimal treatments, disease trajectory, and OS. We specifically focused on identifying parameters that could improve the capability of nomograms to predict OS in these patients.
The entire cohort was from a single tertiary cancer hospital with advanced medical care facilities. The parameters of the patients with NSCLC included age, www.impactjournals.com/oncotarget  that the survival rates of patients with standard health insurance plans were superior to those with self-finance (i.e., Self) or other types of health insurance. We believe that standard health insurance plans increase survival by reducing the economic burden of healthcare because they provide more advanced treatments at no extra cost. Health authorities should consider this factor when instituting health insurance plan policies. The therapeutic regimen parameter showed a clear correlation with OS. The SC and S regimens positively impacted survival, which concurs with previous studies demonstrating that administering adjuvant chemotherapy in stage II and III cancers was beneficial [20,21]. The diagnostic method parameter revealed that surgery significantly improved survival, compared to biopsy and cytology. Biopsy is the preferred method to diagnose cancer. However, patients sometimes could not be diagnosed by obtaining available pathological tissues through bronchoscopy because of poor location or visualization of advanced tumor. The physician would then perform surgery or arrange a cytology examination for a diagnosis. On the other hand, cytology is commonly available for patients with advanced lung cancer who are unsuitable for surgery. So probably the diagnostic method way predicts survival rate via correlating therapeutic regimen.
In the SCLC cohort, the nomogram model included the parameters of gender health insurance plan, clinical stage, and therapeutic regimen. Female and clinical limited stage parameters were positively associated with survival. The health insurance plan parameter paralleled the results from the NSCLC cohort, which indicated that this parameter identified financial status rather than the specific disease per se. The therapeutic regimen parameter affected survival. The "other" category focused on surgery, which included SRC, SC, and S. This category showed increased survival, which outperformed the RC, R, and C treatments. These results differed significantly from those of Xie et al. [6], and indicated that additional factors are involved. These differences often provide critical insights for refining treatments. Further analysis of these cohorts and datasets will be required to resolve this apparent anomaly.
Nomogram validation is essential to prevent a model from potentially overestimating the predictive performance of the present data and to determine the generalizability of the nomogram to patients [22]. In present study, the calibration plots showed "optimal" agreement for the 3-year OS and "acceptable" agreement for the 5-year OS between the predicted and actual observed values for the NSCLC and SCLC training dataset cohorts. Thus, the predictive performance was more repeatable and reliable in the 3-year OS rate than in the 5-year OS rate. In present study, the follow-up time of most patients with NSCLC and SCLC were longer than 5 years (median follow-up time: 6.5 years and 6.3 years, respectively). In this long period, the survival status of most patients may have been affected by intervening measures such as psychological factors, behavioral factors, and by other therapeutic methods that we could not follow. Therefore, the model we established, based on demographic and clinicopathologic characteristics parameters, was more suitable for predicting recent survival: i.e. it was more precise for predicting the 3-year survival rate than the 5-year. Furthermore, the NSCLC and SCLC models fit the validation groups. Our nomogram of NSCLC patients outperformed the TNM staging system for predicting OS in patients with LC evaluated using the C-index (0.67 v 0.64, P =0.01). However, in the validated cohort, the difference in the C-index between the nomogram and TNM staging system (0.65 v 0.63, P=0.39) was not significant. In the validation group, compared to the clinical III and IV stages, the clinical I and II stages only accounted for 26.5% of the whole sample. This big inequality may have caused prediction bias in the model. Therefore, a larger validatiton sample is required.
The cohorts were further stratified into four risk subgroups within their respective TNM staging categories based on quartile deviation in training set. These plots of NSCLC showed distinctive survival. For the SCLC cohort, the predictive accuracy of the nomogram for OS was far superior to the clinical TNM staging system for the primary dataset (limited vs. extensive: 0.60 vs. 0.53) and the validation dataset (limited vs. extensive: 0.60 vs. 0.52). However, the P values indicated no significant differences in the primary dataset (P = 0.12) or the validation dataset (P = 0.25). The four risk groups in the limited and extensive stages exhibited significant prognostic capabilities for the primary and validation datasets, except for the limited stage in the validation cohort, which had no significant difference (P=0.624). These three insignificant points mentioned values (i.e., P value was not 0.05) may be because of the small sample size. Future studies are needed to validate these findings.
A nomogram based on the novel combination of the prognostic parameters of occupation, health insurance plan, diagnostic method and therapeutic regimen was constructed to predict NSCLC and SCLC survival. To the best of our knowledge, this is the first time NSCLC and SCLC survival nomograms have been constructed using this combination of parameters. The study used a cohort that was sufficiently large to allow statistical analysis and included long-term patient follow-up to improve data accuracy. Among the prognostic parameters used in this study, therapeutic regimen was the most significant prognostic factor for predicting NSCLC and SCLC lung cancer OS. It contributed the most to the OS, and is a very crucial variable to clinical physicians and patients, both of whom can easily and accurately use this scoring system to determine the optimal therapeutic regime preliminarily. It has never been identified in previous related studies. Using this tool, physicians can divide patients into different risk groups and care for them accordingly. Our nomogram constructs, more importantly, provide more accurate prognostic models than the TNM staging system. Currently, standardized therapy regimens probably causing over-treatment or under-treatment are applied generally used by oncologists, which influence the life quality and survival time to a large extent, because they do not involve in tumor and individual heterogeneity. However, the nomogram used to predict survival considers some individual factor such as demographic, clinical and serum parameters. In future studies, more advanced parameters like molecular factors may be found to predict survival, and then individual and targeted therapy could be deeply applied for patients with lung cancer [23].
Our study also has some limitations. First, the prognostic parameters used in the construction of our nomograms involved typical routine clinical data. More advanced clinical parameters that are prognostic of survival such as tumor size [24], FEV1 [25], lymphatic permeation [26], lymphovascular invasion [27], and molecular factors [23] were not included in the design of our nomogram because there were too few incomplete clinical data. An advantage of using basic low-cost clinical parameters is that the data will be widely available, which simplifies performing multicenter studies. The simplest solution is always the best solution. A second possible limitation is that our nomograms were constructed, based on clinical data from a single institute. However, it is becoming increasingly common to analyze multiple retrospective studies to increase the size of a dataset and to reduce center-specific effects. Single-center studies provide an excellent starting point; if current trends continue, then the clinical data on which this article are based will be used in in future studies. An additional point is that the dataset for the SCLC nomogram was much smaller than the NSCLC dataset and needs further validation using a larger cohort. A third limitation was that to prevent the overfitting of our multivariate Cox model, the number of variables was limited. It is possible that the association was confounded or mediated by another variable that we did not consider. Our findings need to be confirmed using multiple models and larger datasets from multiple clinics.

Patient population
The patient population in this retrospective study comprised patients with NSCLC (n=5384) and SCLC (n=647) diagnosed from January 2000 to December 2009. The clinical data were collected from the tertiary cancer hospital affiliated with the Xiang Ya Medical School of Central South University in Changsha, China. This study ended on December 31, 2014. Ethical approval was acquired from the institutional review board at the hospital.
For patients with NSCLC and SCLC, inclusion criteria included histopathological examination, acceptance of all main treatments (surgery, chemotherapy, radiotherapy and other therapy like Chinese traditional therapy, biological therapy, etc.) in the hospital, and no history of other malignant tumors or previous anticancer therapy. The exclusion criteria were uncertain tumor origin, probable metastatic lung cancer, and mixed histopathological primary lung cancer. In the data screening process, variables with more than 10% missing values were not included in study analysis. Moreover, data records on eligible variables with any missing value were omitted from the data analysis. In total, 4220 NSCLC patients and 643 SCLC patients were enrolled in the study group. To test the generalizability of the model, we randomly divided the entire database into a training dataset (NSCLC, n=2954; SCLC, n=450) and a validation dataset www.impactjournals.com/oncotarget (NSCLC, n=1266; SCLC, n=193) at a ratio of 7:3. Each patient signed an informed consent document in this study.

Data collection
The data collection form covered four areas: (1) (4) the follow-up data (i.e., follow-up date, outcome, outcome date, source of acquired outcome, and survival time). The patients' follow-up data were obtained by reviewing their medical files. Any relevant missing data were acquired by contacting the patient directly. Patient survival times calculated from the first date of pathological diagnosis data were registered until the patient's death or until the last registered contact. Clinical stage was determined via pathological diagnosis using the World Health Organization classification system, which is based on the seventh edition of the American Joint Committee on cancer TNM staging system [28].
Two alternatives were possible regarding central location: central or peripheral. A tumor was classified as centrally located if its center was in the medial third of the lung parenchyma, and as lateral if two thirds of the locations were classified as peripheral tumors.
The therapeutic regimen of the patient was normally determined by the responsible physician using the patient's diagnosis data, based on the latest version by the National Comprehensive Cancer Network (NCCN). For some special and complicated cases, the therapeutic regimen was decided in a consultation meeting with a senior physician in the department. It was then administered within a month. The surgery types comprised of wedge/segmentectomy, lobectomy, bi-lobectomy, and pneumonectomy, which included video-assisted thoracoscopic surgery and thoracotomy. Chemotherapy comprised of platinum-based doublets such as cisplatinum/paclitaxel and cisplatinum/pemetrexed. Patients who received radiotherapy were administered a dose ranging 45-64.8 Gy (i.e., 1.8-2.0 Gy/day) with 6 MV X-ray. Moreover, radiotherapy involved conventional radiotherapy, three dimensional conformal radiotherapy, and intensity-modulated radiotherapy. All data used for this study were managed by an authorized biostatistician and were collected by qualified medical personnel.

Construction of the nomogram
In the training group, survival curves for all variables were generated using Kaplan-Meier estimates and were statistically compared using the log-rank test. Based univariate analysis, variables with values of P<0.1 underwent Cox regression analysis to create a prediction model with a backward step-down process performed using the Akaike information criterion stopping rule [29].

Validation and calibration of the nomogram
Internal validation of the training dataset was performed using 1000 bootstrap resamples and the nomogram was applied to the validation cohort for external validation. The performance of the model to prognostically predict outcomes was evaluated using the concordance index (C-index) which ranged from 0.5 (random chance) to 1.0 (perfect discrimination). The nomogram for the 3-and 5-year OS was calibrated by comparing the predicted survival rates with the observed survival rates.

Risk group stratification, based on nomogram analysis
The sum-score for each patient was calculated, based on the established model. Patients were then divided into four risk groups beyond clinical stage. Cutoff values were determined using their sum scores from the training dataset (from highest to lowest). The Kaplan-Meier survival curves were delineated, based on four different groups with log-rank comparison. All data management and statistical analyses were performed using SPSS (version 19.0, SPSS, Chicago, IL, USA), and R software (version 3.3.0; R Foundation for Statistical Computing, Vienna, Austria) with the survival, and RMS package (Regression Modeling Strategies, Inc., Newark, CA). All statistical tests were two-sided, and a p-value <0.05 was statistically significant.

CONCLUSIONS
The NSCLC and SCLC nomograms for predicting the survival of lung cancer patients were established and validated in an objective and precise manner. Our results showed that the treatment regimen parameter is predictive of survival in lung cancer patients. This information makes it possible for clinicians and patients to choose a treatment regimen that is based on patient-specific parameters rather than on average predefined treatment regimens. The nomograms will provide clinicians and patients with critical information needed to make informed treatment decisions. Furthermore, the clinical data collected during the course of this study could be used in future literature review-based studies.

CONFLICTS OF INTEREST
The authors disclose no potential conflicts of interest.