A chemotherapy response classifier based on support vector machines for high-grade serous ovarian carcinoma

Long-term outcome of high-grade serous epithelial ovarian carcinoma (HGSOC) remains poor as a result of recurrence and the emergence of drug resistance. Almost all the patients were given the same platinum-based chemotherapy after debulking surgery even though some of them are naturally resistant to the first-line chemotherapy. No method could verify this part of patients right after the surgery currently. In this study, we used 156 paraffin-embedded high-grade HGSOC specimens for immunohistochemical analysis with 37 immunology markers, and association between the expression levels of these markers and the chemoresponse were evaluated. A support vector machine (SVM)-based HGSOC prognostic classifier was then established, and was validated by a 95-patient independent cohort. The classifier was strongly predictive of chemotherapy resistance, and divided patients into low- and high-risk groups with significant differences progression-free survival (PFS) and overall survival (OS). This classifier may provide a potential way to predict the chemotherapy resistance of HGSOC right after the surgery, and then allow clinicians to make optimal clinical decision for those potentially chemoresistant patients. The potential clinical application of this classifier will benefit those patients with primary drug resistance.


INTRODUCTION
Epithelial ovarian carcinoma (EOC) is the fifth leading cause of cancer death in women [1]. The 5-year survival rates for late-stage EOC were < 10% between 2004 and 2008 [2]. Long-term outcome remains poor as a result of recurrence and the emergence of drug resistance after original debulking surgery and chemotherapy, especially for the high-grade serous epithelial ovarian carcinoma (HGSOC).
HGSOC is highly heterogeneous. Conventional prognostic features such as patient age, International Federation of Gynecology and Obstetrics (FIGO) stage, histological grade, and initial surgery results are insufficient to capture individual variations in chemoresponse and prognosis. All patients receive the same regimen regardless of their chemoresponse. Therefore, it is important to develop methods that can identify patients who may be resistant to traditional platinum-based chemotherapy and then redirect them to alternative, potentially more efficacious, chemotherapeutic agents (e.g., topotecan) or radiation therapy [3], which may help to improve their overall survivalts.
Molecular prognostic markers could potentially be represented by changes in gene copy number, mRNA and protein expression levels. For example, large-scale RNA expression profiling has been used to screen molecular markers associated with response to chemotherapy and prognosis in ovarian carcinomas [4][5][6][7]. However, mRNA expression detection generally requires fresh or frozen tissue that must be microdissected to remove associated non-tumor tissue that may attenuate the gene expression signature of the tumor. Furthermore, mRNA expression was not always correlated with protein levels [8]. In the clinical setting, immunohistochemistry (IHC) remains the most robust and widespread means to evaluate protein abundance in the neoplastic cells or stromal cells, since it only need paraffin-embedded (FFPE) tumor tissues that are routinely prepared in medical practice and are easy to store. Many studies have reported the assessment of single IHC markers for prognosis in ovarian cancer [9][10][11][12][13], but no consistent results have been obtained. We hypothesised that the value of prognostic predictors would be greatly enhanced by using a cluster of specific features, including multiple IHC markers. Support-vector machine (SVM) is a stateof-the-art classification algorithm that can take a small subset of highly discriminating genes to build extremely reliable cancer classifiers. This approach has not yet been applied to HGEOC. In this study, we developed a SVM-based prognostic classifier to predict the chemotherapy response of HGSOC which provides a more accurate measurement of chemotherapy response than is possible through traditional clinical means. Table 1 shows the demographic, clinical, and tumor characteristics of the patients in the discovery and validation cohorts. There was no significant difference in clinical or tumor characteristics between the training and validation cohorts. Moreover, we find that only the residual tumor (> 1 cm) is associated with chemotherapy resistance in validation cohort (Supplementary Table 1).

Patient characteristics and components of HGSOC-SVM classifier
On the basis of SVM-RFE analysis of the 100 cases of the training set, the final HGSOC-SVM classifier integrated one clinicopathologic feature (optimal debunking surgery) and expression of 6 proteins, BRCA2, E-Cadherin, P53, BRCA1, p-AKT, Dicer1, as critical factors. Representative IHC staining for these six SVM-RFE selected markers in HGSOC tumor tissues is shown in Figure 1. Raw data was provided as the Supplementary Table 2. HGSOC-SVM classifier and chemotherapy resistance of HGSOC ROC curves for traditional clinicopathological prognostic factors, including age, and clinical stage, grade, residual tumor volume as well as each individual molecular marker and the HGSOC-SVM classifier in both the testing and validation cohorts sets, are illustrated in Figure 2. In the testing cohort (n = 56), the HGSOC-SVM classifier (AUC = 0.802) outperformed all the other individual prognostic factors (Figure 2A). The HGSOC-SVM classifier was strongly predictive of chemotherapy resistance (overall accuracy, 83.9%; sensitivity, 94.1%; specificity, 68.2%). These prognostic associations were also observed in the independent validation cohort (n = 95) (AUC = 0.776) ( Figure 2B) including prediction of chemotherapy resistance (overall accuracy, 80.0%; sensitivity, 86.7%; specificity, 68.6%).
In univariate logistic analysis based on the testing cohort (Supplementary Table 3), the high-risk group based on the HGSOC-SVM classifier was highly associated with chemotherapy resistance (OR = 34.3, 95% CI: 6.35 to 185.24, P < .001). By contrast, there was no significant difference in chemotherapy resistance by age, histological grade, clinical stage, or optimal surgery. Similarly, in the validation cohort, high-risk group according to the HGSOC-SVM classifier was also the most important predictive factor for chemotherapy resistance (OR = 14.18, 95% CI: 5.05 to 39.77, P < .001) (Supplementary Table 3).

HGSOC-SVM classifier and HGSOC OS and PFS
In the 56 testing patients, HGSOC-SVM classifier defined 39 patients as low risk and 17 patients as high risk. OS differed significantly between low-and highrisk patients (median OS: 50.0 months, 95%CI: 41.8 to 53.5 months vs. 27 months, 95%CI: 19.5 to 35.2 months, P < .001) ( Figure 3A). In the validation cohort of 95 patients, the HGSOC-SVM classifier was used to define 63 patients as low risk and 32 patients as high risk. Again, the OS differed strikingly between low-and high-risk patients (P = 0.017) ( Figure 3B). PFS differed significantly between low-and high-risk patients in both the 56 testing patients (median PFS: 24.0 months, 95% CI: 19.1 to 28.9 months vs. median PFS: 11 months, 95% CI: 6.2 to 15.8 months, P < .001) ( Figure 3C) and 95 patients of the independent validation cohort (median PFS: 23.0 months, 95% CI: 16.4 to 29.6 months vs. median PFS: 14 months, 95% CI: 11.8 to 16.2 months, P < .001) ( Figure 3D). Univariate associations of the HGSOC-SVM classifier, clinicopathological parameters, and expression of each of the 6 immunological markers with OS and PFS in the 56 testing patients and in the 95 patients from validation cohort are shown in Tables 2 and 3. Only the Chemotherapy response # was defined as relapse or progression within 6 months or later than 6 months from the last platinum-based chemotherapy, respectively.

DISCUSSION
Identification of patents with likely chemoresistant before the commencement chemotherapy would greatly aid clinical management. Traditionally, clinical factors such as age and tumor grade have been used to assess prognosis, however they have poor predictive power [14,15]. Immunological biomarkers may have superior prediction capacity. Many studies have reported the value of single prognostic immunomarkers in ovarian cancer, but no consistent results have been obtained [9][10][11][12][13].
To improve the prognostic predictive value of individual genes, supervised methods, such as decision trees, artificial neural network, could be used to combine independently informative markers to improve predict values [16][17][18]. SVM is one of the most classic supervised learning algorithms, useful for recognizing subtle patterns in complex datasets. The algorithm performs discriminative classification, learning by example to predict the classifications of previously unclassified    [14,19,20]. BRCA1/BRCA2, P53, the EMT phenotype (particularly E-Cadherin expression) and Dicer1 expression have independently been reported as a prognostic factors for chemotherapy resistance and survival of women with ovarian carcinoma [19][20][21][22][23][24].
We also have reported that higher BRCA1 expression is associated with chemosensitivity in ovarian cancer patients [25], and ovarian cancer patients with BRCA dysfunction tend to have a better outcome [11]. What's more, these markers also hold considerable promise as therapeutic targets. Agents targeting p53, p-AKT are currently under investigation in clinical trials. Many studies have shown that tumors with BRCA1/BRCA2 dysfunction are highly responsive to PARP inhibitors [26]. Our results indicate that when each of these molecules individually in HGSOC was weakly associated with chemoresistance, PFS and OS, the HGSOC-SVM classifier was substantially stronger than any single component. Thus, HGSOC-SVM classifier was able to select the most informative factors that contributed independently and collectively to the prediction of HGSOC prognosis. The study has a few limitations. Although we validated our findings in an independent cohort, more samples from multicenter will be needed to strengthen our results in the future, which may increase sensitivity and specificity. Additionally, we chose candidate proteins based on published studies, not whole-genome analyses, and thus we cannot exclude the possibility of better models comprising other molecular biomarkers. Finally, our results are based on retrospective data, we need more prospective validation to gain the possibility that this method could help clinical decision making and improving outcome for those potentially chemoresistant patients.
In summary, we have developed an SVM-based prognostic method to predict chemotherapy response in ovarian cancer patients; this method has high sensitivity and specificity, with high positive and negative predictive value. This HGSOC-SVM classifier predicts the chemoresponse, PFS and OS of patients better than any other clinical parameters. Although further validation is necessary, this prognostic strategy may allow clinicians to select the most appropriate therapies, apart from the standard paclitaxel and cisplatin approach, for individual ovarian cancer patients in advance, especially potentially chemoresistant patients.

Patient selection
With the approval and support of the Ethics/ Institutional Review Board of Tongji Hospital and Hubei Cancer Hospital (Wuhan, Hubei Province, China), HGSOC patients with FIGO stage III C or IV were identified. All patients were primary, biopsy-confirmed HGSOC who underwent debulking surgery and subsequent platinum/ taxane-based chemotherapy. Informed consent was obtained from all patients. We enrolled two independent sets of HGSOC patients, including a discovery cohort  Table 1.
Follow-up information was updated in Jun 2014 through the patients' medical records and telephone based follow-up review. Chemotherapy resistance or sensitivity was defined as tumor relapse/progression within 6 months or 6 months after completion of prior platinum-based chemotherapy, respectively. Primary therapy response was defined as response evaluation criteria in solid tumors (RECIST). Progression-free survival (PFS) was calculated from time of surgery to time of progression or recurrence. Optimal debulking was defined as when residual tumors were ≤ 1 cm.

Tissue microarray construction and IHC
After review of HE-stained sections, three 1.5 mm cores were identified from the most representative areas of tumor tissue and re-embedded into tissue microarray blocks with a Manual Tissue Microarrayer (Beecher Instruments, Sun Prairie, WI, USA). Array blocks were sectioned to produce 4 μm sections.

IHC scoring
Tissue samples were scored manually using methods previously described [17]. Briefly, the aggregate score is the average of the score of tumor-cell staining multiplied by the score of staining intensity. Tumor-cell staining was assigned a score using a semi-quantitative five-category grading system: 0, no tumor-cell staining; 1, 1-10% tumor-cell staining; 2, 11-25% tumor-cell staining; 3, 26-50% tumor-cell staining; 4, 51-75% tumor-cell staining; and 5, > 75% tumor-cell staining. Staining intensity was assigned a score using a semi-quantitative four-category grading system: 0, no staining; 1, weak staining; 2, moderate staining; and 3, strong staining. Every core was assessed individually and the mean of the three readings was calculated for every case. The pattern of staining (cytoplasmic, membranous, nuclear) was also described in each case. Two trained histopathologists (Dr Su and Dr Li) blinded to clinical data scored all cases and a concordant score was obtained for 85.2% of the cases. A consensus score was recorded for the 14.8% of cases with a discordant score.

Introduction of SVM and details of experiments
SVM was used to predict whether a patient would have chemotherapy-resistant HGSOC based on clinicopathological features and immunomarkers. We performed a set of experiments in the discovery cohort of 156 HGSOC patients. And this cohort was further randomly subdivided into 100 patients for SVM training, and 56 for testing. To make an accurate assessment of the model, we re-enrolled an independent cohort of 95 patients in accordance with the modeling standards. These patients were input into the model as a double-blind test set to evaluate and calculate the sensitivity and specificity of the model. (Supplementary Figure 1) Before training, all continuous data were preprocessed by standardizing to zero mean and unit variance. The same offset and scaling were applied to the test data. We adopted the SVM-recursive-featureelimination (SVM-RFE) algorithm for feature selection. The radial basis function kernel was used, since our classification problem was nonlinear. During the training phase, 10-fold cross validation was used to determine the optimal values of the kernel parameter a and the regularization parameter C with a 10 × 10 grid search in the region −10 < log 2 a < 10 and −10 < log 2 C < 10, and step size of log 2 1 . This algorithm is performed 40 times, and each time, one feature subset is excluded. During the training, we evaluated the performance of SVM using 10-fold cross validation error. After the feature subset with the best 10-fold cross validation performance was selected, we predicted the labels of tested and validation samples and recorded the actual performance of the trained SVM models. The programs were coded with Matlab software (MathWorks, Natick, MA, USA).

Statistical analysis
Differences in patient characteristics between different groups were tested with Pearson χ 2 test. To identify the association between all features with chemotherapy resistance, univariate Logistic regression was performed. Survival curves were estimated by the Kaplan-Meier method. Differences between survival curves were compared by the log-rank test. Univariate analyses of prognostic factors were performed with Cox proportional hazards regression modeling. In Logistic and Cox proportional hazards regression analysis, all immunological features were dichotomized according to cutoff values based on receiver operating characteristic (ROC) curve analysis. A significant difference was defined as a P value of ≤ .05 from a two-tailed test. All statistical analyses were performed with SPSS 16.0 for Windows (SPSS, Chicago, IL, USA).