Exploiting machine learning for predicting skeletal-related events in cancer patients with bone metastases

The aim of the bone metastases (BM) treatment is to prevent the occurrence of skeletal-related events (SREs). In clinical, physicians could only predict the occurrence of SREs by subjective experience. Machine learning (ML) could be used as predictive models in the medical field. But there is no published research using ML to predict SREs in cancer patients with BM. The purpose of this study was to assess the associations of clinical variables with the occurrence of SREs and to subsequently develop prediction models to help identify SREs risk groups. We analyzed 1143 cancer patients with BM. We used the statistical package of SPSS and SPSS Modeler for data analysis and the development of the prediction model. We compared the performance of logistic regression (LR), decision tree (DT) and support vector machine(SVM). The results suggested that Visual Analog Scale (VAS) scale was a key factor to SREs in LR, DT and SVM model. Modifiable factors such as Frankel classification, Mirels score, Ca, aminoterminal propeptide of type I collagen (PINP) and bone-specific alkaline phosphatase (BALP) were identified. We found that the result of applying LR, DT and SVM classification accuracy was 79.2%, 85.8% and 88.2%, with 9, 4 and 8 variables, respectively. In conclusion, DT and SVM achieved higher accuracies with smaller number of variables than the number of variables used in LR. ML techniques can be used to build model to predict SREs in cancer patients with BM.


INTRODUCTION
Bone is the most common site of metastasis in cancer. Cancer metastases to the bone are most prevalent among patients with advanced cancer of the breast (73%), prostate (68%), or lung (36%) [1]. Bone metastases (BM) can lead to skeletal-related events (SREs), defined as pathologic fracture, spinal cord compression, requirement for radiation, surgery to bone, and hypercalcemia [2][3][4][5][6][7]. Data from the untreated arms of clinical trials indicates that SREs are most common in patients with BM secondary to breast cancer (2-year cumulative incidence of 68%), followed by prostate cancer (2-year cumulative incidence of 49%), and non-small cell lung cancer (NSCLC) and other solid tumors (OST) (21 month cumulative incidence of 48%) [8][9][10]. Observational studies yielded similar patterns, with a 1-year cumulative incidence of SREs after BM diagnosis of 46% in prostate cancer patients and 38% in female breast cancer patients [11,12].
BM and subsequent SREs can be an important burden on a cancer patients' quality of life (QOL) and overall health status [13,14]. SREs will dramatically reduce patients' QOL and even shorten survival [15]. The aim of the BM treatment is to prevent the occurrence of SREs. Many risk factors play important roles in the incidence of SREs. But in clinical, physicians could only predict the occurrence of SREs by subjective experience. Multidimensional analysis of SREs requires considerable effort and expertise, demanding the development of more sophisticated ways to facilitate such complex, preferably automatic analysis [16].
Machine learning (ML) utilizes a variety of artificial intelligence and statistical models to learn from the observed data in order to create reasonable generalizations, discover patterns, classify previously unseen data or predict new directions [17]. The medical field is quickly embracing ML methodologies, such as decision tree (DT), support vector machine (SVM), as these approaches have shown progress in their usefulness in prediction and classification. Predictive models are used in a variety of medical domains for diagnostic and prognostic tasks. An increasingly large number of medical data are collected routinely, and often automatically, in many areas of medicine [18]. This implementation could prove useful in discovering ways to lower the cost of medication, improve clinical studies and help facilitate better assessments by physicians [19]. It is a opportunity for the field of ML and statistics to extract useful information and knowledge from this wealth of data [18].
ML has been used in the medical field to diagnose lung cancer, breast cancer, asthma, heart disease, dementia and other diseases and conditions. But there is no published research using supervised ML to predict SREs in cancer patients with BM. The purposes of this study were to identify the factors influencing SREs, to compare the accuracy of logistic regression(LR)-, DT-and SVM-based models in predicting SREs and to develop an effective and efficient model to predict SREs in cancer patients with BM that require intervention, based on laboratory tests commonly performed in clinical practice.

General characteristics of patients
Our study included 1143 patients with BM, median followup time was approximately 7 months. Table 2 shows the socio-demographic and clinical characteristics of the patients.
622 patients (54.4%) had SREs, 284 patients (24.8%) had developed multiple SREs, 263 patients (23.0%) had prior SREs. The rank one SREs were radiation to bone, following by pathological fracture and surgery to bone ( Table 3). Most of patients suffered from lung cancer, following by breast cancer, cancer of unknown primary and prostate cancer ( Table 4).

Development of the LR model
Taking into account the 9 variables, the results of applying LR accuracy was 79.2%. VAS scale, Frankel classification, Mirels score, Gender, Cancer type, Ca, PINP, β-CTX and BALP are selected as significant variables in the LR model. A complete list of study variables in each variable set along with p-values are listed in Table 5. Figure 1 shows DT classification of SREs in patients with BM. The DT classification of SREs consisted of 4 variables which in order of importance were the following: VAS scale, PINP, CA 153 and BALP. In Figure  1, each node shows the probability of SREs for patients with BM whom are satisfied in mentioned conditions in corresponding branches.

Development of the DT model
We used CART analysis to explore high-order interactions between variables. For example, CART analysis exhibited VAS scale is the most important factor affecting SREs in patients with BM. In individuals with VAS scale Grade 3, there is an interaction with CA153, while with VAS scale Grade 1 or 2, there is an interaction with PINP, and in patients with PINP ≥ 101.8 ng/ml, there is an interaction with BALP. Figure 1 shows interactions between variables clearly.

Development of the SVM model
To identify the variables that had the highest classification accuracy in prediction of SREs, we used SVM with radial basis function (parameter C = 1, γ = 1/number of features) that systematically searched through the space of subsets of variables, and evaluated the goodness of each variable subset according to the prediction accuracy. The variable subset showing the highest accuracy was identified as the predictor set. Parameter C is the weights between empirical error and generalization error. Parameter γ controls the shape of the separating hyperplane. It was similar to previous study on predictors of medication adherence in elderly patients with chronic disease [24].
We listed the top 8 ranked variables selected by SVM and their prediction accuracies using a combination of the top ranked variables together in Table 6 to examine the above results in detail. The accuracy using a single variable selected as VAS scale was 55.1%. The present accuracy of the SVM reached 67.4% with two variables, VAS scale and Frankel classification. The highest accuracy, 97.1%, was achieved with eight predictors: VAS scale, Frankel classification, Ca, Cancer type, Gender, Mirels score, PINP and Character of BM. The performance was very markedly decreased when more than 8 features were selected. Unlike the intuition that having more variables should achieve higher predictive performance, we found that using a small number of variables can achieve higher prediction accuracy. Table 7 compares the experimental results of LR, DT and SVM using 5 evaluation measures. SVM and DT showed better performance than LR in overall scoring categories, allowing identification of predictor candidates to determine the most probable SREs of a patient.

Comparison among prediction models
LR showed 79.2% accuracy when all 9 variables were used for the prediction of SREs (Tables 5 and 7). Compared to the result of LR, the result of DT showed significantly higher accuracy, 85.8%, with only 4 variables on the same patient samples (Table 7). Compared to the result of LR, the result of SVM showed significantly higher accuracy, 88.2%, with 8 variables on the same patient samples (Table 7). This result indicates that ML techniques (DT and SVM) can achieve greater accuracy with a smaller number of variables than the number of variables used in LR. It is interesting to note that the most significant variable (VAS scale) selected by the DT and SVM agrees with that selected by LR.
The results of the comparison of the discriminatory power of LR, DT and SVM models are summarized in Tables 7 and 8. The AUC indicates how well a prediction model discriminates between healthy patients and patients with disease. The following guidelines have been proposed for interpretation of this area: 0.5-0.7, rather low accuracy; 0.7-0.9, moderate accuracies useful for some purposes; and > 0.9, rather high accuracy [25]. Therefore, the classification accuracies of these models were moderate.
Our results indicated that the DT and SVM model had better diagnostic capability than LR model. The AUC had achieved a moderate diagnostic power.

DISCUSSION
Health and medical data are exponentially increasing, necessitating various means to take advantage of huge amounts of data. Big data technologies enable the fast processing of massive amounts of data [26].  **Soloway 0 refers to patients without BM; Soloway 1 refers to patients with < 6 BM; Soloway 2 refers to patients with < 20 BM; Soloway 3 refers to patients with > 20 but less than a "super scan"; Soloway 4 refers to patients with ''super scan'' that is defined by a > 75% involvement of the ribs, vertebrae, and pelvic bones [19]. ***Visceral metastases defined as distant metastases, except for BM, including brain metastases. ****Frankel classification of spinal cord injury [20]: Class A representing complete paralysis, Class B representing sensory function only below the injury level, Class C representing incomplete motor function below injury level, Class D representing fair to good motor function below injury level, and Class E representing normal function. *****Mirels scoring system [21] based on pain intensity, site, type (lytic, mixed or blastic) and amount of bony involvement.

Visceral metastases
Among these technologies, artificial intelligence has regained prominence as an important tool to provide intelligent services for big data, and ML techniques have also been used extensively for such purposes [27]. Traditional statistical methods, such as LR, have become increasingly difficult to use for prediction models due to several constraints that dictate the low statistical power with small sample size and complex polynomial interaction terms with curvilinear effects among the relationship of variables.
SVM and LR are similar in that both calculate a set of coefficients for variables based on a transformation of the feature space [28]. The major difference between SVM and LR is that while LR attempts to explicitly model the probability (via the odds) of outcomes, SVM attempts to directly find the best dividing hyperplane (or hyperplanes, in the case of more than two classes) regardless of the actual probability of class membership [27]. There are several advantages of SVM compared to LR. While in LR the data analyst must explicitly choose to increase the dimensionality of the feature space through the addition of interaction or polynomial terms among predictors, such transformations are standard practice in SVM approaches to classification [29]. In addition, SVM deals well with high-dimensional data, and they do not assume a parametric relationship between the model predictors and outcome.
DTs are classification algorithms which specify a "tree" of cut points that minimize some measure of diversity in the final nodes once the tree is complete. The final nodes then represent relatively homogenous individual classes [28]. To the extent that all data points classified at a given end node have a similar probability of class membership (that is, probability of treatment), then the output of DTs can be used to directly construct propensity categories [30]. Many methods for DTs (e.g. ID3, C4.5) do not provide a probability of class membership although some variants, in particular CART do provide such probabilities. However, performance of all DTs is dependent on both their method of construction and the amount of pruning (removal of highly specific nodes) performed. The major advantage of DT analysis over LR analysis is that the results of analysis are easy to understand. The simple allocation of patients into subgroups by following the flowchart form could define the predicted possibility of outcome. In this study, ML models based on routinely available clinical and laboratory parameters were constructed for SREs prediction in cancer patients with BM. As expected, ML techniques (DT and SVM) showed greater accuracy with a smaller number of variables than the number of variables used in LR, because they establish the optimal classifier to maximize the geometric margin between samples and therefore minimize empirical classification errors. In this analysis, VAS scale was revealed as the strongest predictor of SREs. VAS scale, PINP, CA153 and BALP were selected as the predictors of SREs according to the DT model. In SVM, VAS scale, Frankel classification, Ca, Cancer type, Gender, Mirels score, PINP and Character of BM were selected as the predictors.
Most of SREs were radiation to bone, and the aim of radiation was to relieve pain. Bone surgery also had the analgesic effect. That maybe the reason why VAS scale was revealed as the strongest predictor of SREs. These findings suggested that providing appropriate analgesic therapy may reduce the occurrence of SREs. Frankel classification was designed for the assess of spinal cord compression, while Mirels score for limbs pathological fracture. These clinical factors were also revealed as the predictor of SREs. These BTMs and tumor markers were not linear with SREs. In DT, patients with moderately elevated PINP, BALP were with the highest proportion of SREs. SREs are complex phenomenons with many causes and correlates. SREs are not only related with bone formation and bone resorption, but also with the sites of BM, soft tissue mass and many other factors. Serum BTMs could only reflect bone formation and bone resorption, that maybe the reason why serum BTMs were not as well as clinical factors in the prediction of SREs.
We found that the occurrence of SREs in our study was higher than some clinical trials. The reason was because we defined percutaneous osteoplasty (PO) as bone surgery. PO can immediately restore the mechanical properties of the affected skeletal segment, provide the treated bones with increased resistance to compressive stresses, and prevent further risk of fractures, allowing immediate weight-bearing. PO can be uesed not only in vertebral metastases, but also in pelvic, iliac, and femoral metastases. PO would be effective as a combined-modality   therapy for the treatment of BM [31]. We observed that suitable bone surgery, bone radiotherapy would not reduce patient's quality of life. This is just the opposite of what we defined in SREs. If there is a large clinical trial results can support this hypothesis, it will have a great impact on this model. Our study is, to the best of our knowledge, the first attempt to use ML techniques to identify the influencing factors and to apply prediction models for the SREs of the cancer patients with BM as an alternative and complement to the traditional statistical approaches. We only used SPSS and SPSS Modeler to construct all the DT, SVM and LR model. As we all know, SPSS is widely used in the medical field for its user friendly. It would be easier for other physicians to use the models in SPSS than other software. ML models may open new possibilities to find health-related factors that otherwise would be hidden in traditional analysis methods. We used ML techniques as a supplement to the LR to develop prediction models for SREs risk groups. Our study can be used as data in healthcare for the development of new clinical assessment and interventions for the cancer patients with BM. In other words, it would be possible to develop, specifically for the cancer patients with BM, an SREs measurement tool that helps prioritize intervention for SREs risk groups. Based on the identified influencing factors, this study could also provide guidelines for healthcare staff in caring for the cancer patients with BM and could help fine-tune and improve healthcare intervention in practice.
Identification of the risk factors associated with SREs development in cancer patients with BM is essential for formulating personalized surveillance programs. Treatment of BM aims to prevent the incidence of SREs includes orthopedic management, radiation, surgery, and systemic treatments (eg, bone-targeting agents (BTAs), endocrine therapy and chemotherapy). Our Network Meta-Analysis showed denosumab, zoledronate and pamidronate were generally effective in preventing SREs in cancer patients with BM and denosumab and zoledronate were also associated with reductions in risk of pathologic fractures and radiation compared to placebo [32]. The research was not finished when these model were found, and the model should included some decision support system. Our models can predict SREs and then direct when and what treatment should be done. With low and medium level, we would give patients BTAs; and for high level, we would give patients orthopedic management, radiation even surgery. After PO, SREs especially pathologic fracture were rarely happened in the treated bone. So PO is highly recommend in the high level patients for its highly effective and safety.
The current study has several limitations, which have to be improved for prospective studies in prediction modeling.   First, it was limited to examining the impacts of individual variables. We did not examine how each variable affects others; nor did we study the nature of direct or indirect influencing factors. In future studies, we need to study how they affect predictability by identifying the meaning and detailed univariate analysis will be needed.
Second, the classification accuracies of these models were moderate. In our study, the cross validation method used the same data as the test data and the training data. If there are enough samples in the future, we will be able to get more accurate results by ensuring that the test data and the training data are separated in advance.
In this study, we sought to assess the capacity of LR, DT and SVM models to predict SREs, with the goal of developing a more predictive profile for identifying important clinical risk facts that affect SREs recurrence. We found that ML served as an effective alternative to conventional LR in identifying the key variables to show the higher classification accuracy, thereby created valuable diagnostic programs for SREs prediction.

Data collection
This cross sectional retrospective study enrolled 1143 cancer patients with BM of both sexes, recruited from Department of Internal Oncology, Shanghai Sixth People's Hospital in the period between June 2007 and June 2014. This study was approved by the ethics committee of the Sixth People's Hospital, Shanghai Jiao Tong University. The principles of the Declaration of Helsinki were followed. Written consent was obtained. The diagnosis of cancer had been made by using the standard clinical criteria.

Feature selection and reduction
A subset of 19 features including routine laboratory workup (categorical or numerical) was used for the model building process (Table 1). The dataset was created containing 2 demographic variables (age, gender), 2 general conditional variables {Karnofsky Performance Scale (KPS) and Visual Analog Scale (VAS)}, 3 metastases variables (Character of BM, extent of BM and Visceral metastases), 2 injured variables (Frankel classification of spinal cord injury and Mirels scale), 4 bone turnover markers(BTM) {bone-specific alkaline phosphatase (BALP), N-terminal midfragment of osteocalcin (N-MID), aminoterminal propeptide of type I collagen (PINP) and β-cross-linked carboxyterminal telopeptide of type I collagen (β-CTx)}, 2 biochemical variables {alkaline phosphatase (AKP) and Serum calcium} and 4 tumor markers (CEA, CA125, CA153 and CA199). These variables were selected because they were of potential clinical importance as indicated by a panel of experts. A number of data transformation techniques have been used to format and prepare the patient records to be processed by the learning algorithms ( Table 1).

Construction of the prediction models
In this study, SPSS 19 ® and SPSS Modeler 14.1 ® (IBM, Armonk, NY, USA) were used to construct the DT, SVM and LR models. A p-value ≤ 0.05 was considered to be significant for inclusion into the model. To validate each prediction model, we used a 10-fold cross validation. In 10-fold cross-validation, the data set is divided into 10 folds with equal size. Then training is carried out with 9 and testing with 1; the process is repeated until all parts have been tested.
A binary LR was performed to determine the data set under consideration, associates each record (a patient) with the probability of SREs. Stepwise selections of the independent variables were stepwise incremented and the corresponding coefficients were computed.
We constructed the DT as classification and regression trees (CART). The approach builds a binary tree by splitting the records at each node according to a function of a single input field. The evaluation function used for splitting in CART is the Gini index [23]. One of the most critical problems in tree construction is determining an appropriate size of tree. Standard methods use a "stopping rule" to determine appropriate tree sizes.
We used SVM with radial basis function (RBF) as kernels. The "SVM" function in SPSS Modeler was used to build our SVM model with the radial basis function kernel applied as its classification method.

Comparison between prediction models
Comparisons among LR, DT and SVM discrimination for all models were performed. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were adopted to evaluate the performance of a model. Area under curve (AUC) was calculated to test the ability of each model to distinguish patients.

Statistical analysis
Patients were categorized into with SREs and without SREs. Qualitative variables were expressed by number, percent and compared by chi square or fishe' s exact test. Quantitative variables were expressed by mean and standard deviation (SD) and compared by t student. Sensitivity, specificity, PPV, NPV and accuracy were calculated subsequently.

ACKNOWLEDGMENTS AND FUNDING
This study was supported by National Natural Science Foundation of China Grant (81201628).