A three-protein signature and clinical outcome in esophageal squamous cell carcinoma.

Current staging is inadequate to precisely predict clinical outcome of esophageal squamous cell carcinoma (ESCC) and determine treatment choices, which vary from operation alone to intensive multimodal regimens. The purpose of this study is to investigate the prognostic values of an immunohistochemistry-based three-protein signature model in patients with ESCC. We determined the protein expression of Annexin II, cofilin 1, ezrin, fascin, kindlin-2, moesin, MTSS1, myosin-9, profilin-1, Rac1, radixin, ROCK2, talin, tensin and villin 1 in a test cohort including 110 formalin-fixed, paraffin-embedded esophageal curative resection specimens by tissue microarrays (TMAs). A three-protein signature elicited from the protein cluster, Annexin II, kindlin-2, and myosin-9, was validated by TMAs on an independent cohort of 147 specimens. The expression of three-protein signature was highly predictive of ESCC overall survival (OS) and disease-free survival (DFS) in both generation and validation datasets. Regression analysis shows that this three-protein signature is an independent predictor for OS and DFS. Furthermore, the predictive ability of these 3 biomarkers in combination is more robust than that of each individual biomarker. This study demonstrates a clinically applicable prognostic model that accurately predicts ESCC patient survival and/or tumor recurrence, and thus could serve as a complement to current risk stratification approaches.


INTRODUCTION
Esophageal squamous cell carcinoma (ESCC) still remains the most common cancer-induced mortality in China, in particular in areas nearby the Taihang Mountain range and Coastal Chaoshan [1]. Since treatment choices vary from operation alone to intensive multimodal regimens, staging is critical for determination of therapeutic modality, but current markers remain inadequate to precisely predict clinical outcome of ESCC. Therefore, identifying patients at high risk and improving the overall prognosis of the disease are urgent needs for the current clinical management of ESCC [2].
Gene-expression profiling is useful for identifying Oncotarget 5436 www.impactjournals.com/oncotarget sets of genes of prognostic importance in various types of cancer [3][4][5][6]. Nevertheless, the use of microarrays in clinical practice is limited by the overwhelming number of genes identified by gene profiling, lack of both reproducibility and independent validation, and need for complicated statistical analyses [7]. Different prognostic gene sets are identified when the microarray platform and the analytic strategy vary [8]. Moreover, suitable specimens from patients are still a technical challenge for gene-expression profiling, which requires frozen tissue for analysis [9]. To put these expression profiles into clinical practice, it is essential to identify the appropriate number of genes and develop a profile that can be operated by conventional assay. The aim of this study was to evaluate the expression of 15 cytoskeleton proteins and their correlation with the clinical outcome of ESCC patients. We also develop a technically simple immunohistochemistry-based three-protein signature model for current clinical risk stratification approaches and test its performance using an independent validation dataset of ESCC tissue samples.

Immunohistochemical Characteristics of 15 Biomarkers
Initially, more than 15 biomarkers were examined by immunohistochemical staining. Some molecular hallmarks, however, were not optimized successfully or showed ambiguous or no immunostaining. All 15 biomarkers mainly stained the tumour cell cytoplasm and showed a variety of staining patterns of different staining intensity and positive cell percentage. Based on the staining intensity, all biomarkers displayed the four immunostaining phenotypes: negative staining, weakly positive staining, moderate positive staining and strongly positive staining in ESCC. The staining patterns of the 15 biomarkers were focal, scattered and diffuse with different staining intensities. Cofilin 1, ezrin, fascin, moesin, myosin-9, radixin, ROCK2, talin, tensin and villin 1 were constitutively observed in the cytoplasm, and Annexin II was observed in the membrane and cytoplasm. Kindlin-2, MTSS1, profilin-1 and Rac1 not only showed positive cytoplasmic immunostaining, but also strong nuclear immunostaining. The staining pattern of the 15 biomarkers is summarized in Table 1. Representative pictures of the 15 biomarkers with low-and high-expression are shown in Figure 1.

Prognostic Significance of 15 Biomarkers and Clinicopathological Characteristics
The 1-, 3-, and 5-year OS and DFS percentages, for the generation dataset, were 87.3% and 88.0%, 59.3% and 57.9%, and 44.4% and 45.0%, respectively. According to the Kaplan-Meier analysis, Annexin II and myosin-9 were closely associated with OS among patients with ESCC, and kindlin-2 was of only borderline significance in the relatively small generation cohort. However, the remaining 12 biomarkers had no statistical significance between protein expression and clinical outcome of ESCC ( Figure 2). Kaplan-Meier analysis for DFS showed that three biomarkers (Annexin II, kindlin-2 and myosin-9) correlated with recurrence ( Figure 3A-3C). On univariate analyses, the three biomarkers and two clinicopathological factors (regional lymph nodes and TNM classification) were all confirmed as prognostic factors for OS and DFS, while other clinicopathological indexes (age, gender, tumor location, tumor size, primary tumor, histologic grade and therapy) had no prognostic significance for OS Oncotarget 5437 www.impactjournals.com/oncotarget and DFS ( Table 2).

The Predictive Model Based on the Three Biomarkers
A three-protein signature model, involving Annexin II, kindlin-2, and myosin-9, resulted from our analysis. The risk score of 3-protein signature predictive model was calculated as described in the equation: Here, Y is the outcome final predictive risk score and βn is its corresponding regression coefficient, using univariate Cox proportional hazards regression analysis. The corresponding coefficients were as fellows: β1=0.808, β2=0.522 and β3=0.829. The 50th percentile (median) of the final risk scores was 0.808 (0 to 2.159). All patients were ranked and divided into high-risk (risk score >0.808) and low-risk groups (risk score ≤ 0.808).
The 5-year OS and DFS rates in high-risk group were significantly lower than those in low-risk group (21.0% and 21.3% vs. 53.5% and 54.3%, P=0.000 and P=0.000, Figure 4A, 4B). Using univariate analyses, we found a significant correlation between the three-protein signature and prognosis in all the cases (P=0.000 for OS and DFS, Table 2). According to multivariable Cox proportional hazard regression analysis, both regional Oncotarget 5438 www.impactjournals.com/oncotarget lymph nodes and the 3-protein signature were strong and independent prognostic factors for both OS and DFS (for regional lymph nodes, hazard ratio=2. 36

Predictive Power of the Predictive Model
In comparison with a single biomarker, the predictive power of the 3-protein signature was more robust than that of Annexin II, kindlin-2, or myosin-9 alone, as revealed by ROC curve analysis ( Figure 5A, 5C). The specificity and sensitivity were 83% and 40.4%, respectively, and the area under the curve (AUC) with 95% CI for OS was 0.617. Moreover, the predictive power of the 3-protein signature, which almost same as that of regional lymph nodes and TNM classification, was higher than other clinicopathological indices (age, gender, tumor location, tumor size, primary tumor, histologic grade and therapy) ( Figure 5B, 5D).

Validation of the Predictive Model
We validated the 3-protein signature predictive model by TMAs on another independent cohort of 147 specimens with ESCC. The results were similar to those in the generation dataset. Annexin II, kindlin-2, and myosin-9 were optimized successfully to stain tissue and were prognostic in the validation dataset, but borderline significant for Annexin II ( Figure 3D-3F). Only kindlin-2 and myosin-9 were closely related to tumor recurrence ( Figure 3G-3I). ESCC patients in the high-risk group had a shorter 5-year OS and DFS than those in low-risk group (33.7% and 32.2% vs. 50.8% and 47.7%, P=0.005 and P=0.016, Figure 4C, 4D). On univariate analyses, the three biomarkers and four clinicopathological factors (histologic grade, regional lymph nodes, TNM classification and therapy) were all confirmed as prognostic factors for OS, while only myosin-9 and three clinicopathological factors (histologic grade, regional lymph nodes and TNM classification) were risk factors for tumour recurrence of ESCC. And using univariate analyses, we also found a significant correlation between the three-protein signature and prognosis in all the cases (P=0.005 for OS and P=0.018 for DFS). In multivariate analysis, the 3-protein signature model was still a strong and independent Oncotarget 5439 www.impactjournals.com/oncotarget  , together with histologic grade and regional lymph nodes ( Table 2). The P values for Annexin II indicated some difference between generation dataset and validation dataset, which may be due mainly to the sample size and limited follow-up time.
The ROC curve analysis revealed that the predictive ability of the 3-protein signature was higher than that of a single biomarker and other clinicopathological indices, and approximated regional lymph nodes and TNM classification ( Figure 5E-5H).

Combination of the three-protein signature and TNM classification
Our results indicated that there is a significant correlation between the three-protein signature and prognosis in all the cases. In current clinical practice, TNM classification is considered the optimal prognostic indicator. Therefore, we next considered these characteristics together. Patients were subdivided into three subgroups: low-risk + stage I+II, high-risk + stage III+IV, and other (low-risk + stage III+IV and high-risk + stage I+II). Those high-risk + stage III+IV patients had the poorest prognosis, while the low-risk + stage I+II subgroup had the best prognosis. Kaplan-Meier curves showed significant differences in OS among the three groups (P=0.000 in generation dataset and validation Oncotarget 5441 www.impactjournals.com/oncotarget dataset, Figure 6A, 6B). However, cumulative survival time was to be the same between the high-risk + stage III+IV subgroup and other subgroup at about 50 months in validation dataset. That the median follow-up time of validation dataset was 28.8 months might be the reason. The ROC curve analysis indicated that the combination was most robust among four indices (the combination of the three-protein signature and TNM classification, 3-protein signature, regional lymph nodes and TNM classification) ( Figure 6C-6F).

DISCUSSION
This study generates and internally validates a 3-protein signature predictive model that predicts prognosis and tumor recurrence in two entirely independent cohorts of ESCC patients. The methodology performed in our predictive model is a cross-validation approach based on immunohistochemistry and statistical analysis. The reason for adopting this methodology is that formalin-fixed, paraffin-embedded tissue has far greater availability than other types of samples, such as fresh-frozen samples, and immunohistochemistry is rapid, convenient, robust, technically simple, easy to interpret, reproducible, and cost-effective for clinical practice, unlike gene expression at the mRNA level, for example.
Molecular signatures are reported frequently and have proven to be prognostic in various tumors. Molecular signatures, similar to ours, have been shown to predict survival and/or tumor recurrence in breast cancer [10,11], lung cancer [12], colorectal cancer [13], hepatocellular carcinoma [14,15], bladder cancer [16], classic Hodgkin's lymphoma [17], neuroblastoma [18] and even esophageal and gastroesophageal junctional adenocarcinoma [19]. In ESCC, however, the application of molecular signatures is less advanced. Therefore, this study provides a useful framework for future work on generating similar prognostic models.
The biomarkers involved in this study were carefully selected according to our previous studies and

-protein signature model compared with single biomarkers and other clinical prognostic indices according to receiver operating characteristic (ROC) curves and areas under the curve (AUCs) with 95% CI in the generation dataset (A, B and C, D) and the validation dataset (E, F and G, H).
Oncotarget 5444 www.impactjournals.com/oncotarget other reports. The cytoskeleton, which plays a prominent role in the cell cycle, morphogenesis and migration, is closely associated with human tumors [20]. Recent studies from our laboratory and others have identified more than 200 different cytoskeleton constituents or binding proteins. However, only about 45 kinds have been reported in ESCC [21]. Hence, the 15 biomarkers used here include biomarkers that are reported for the first time. In addition, our 15 biomarkers all belong to actin-binding proteins [22]. Annexin II is a member of the calcium-dependent phospholipid-binding protein family, plays a role in the regulation of cellular growth and in signal transduction pathways, and is overexpressed in hepatocellular carcinoma [23], colorectal cancer [23,24], breast cancer [23,25] and ESCC [23,26], and has also been identified as a potential target for therapy [27,28]. Our result showed the same as Zhang X and Feng JG [23,26]. Kindlin-2 is a member of the kindlin family of focal adhesion proteins, and is involved in integrin signaling and linkage of the actin cytoskeleton to the extracellular matrix. Although few studies report on kindlin-2, it has been found to be up-regulated in breast and gastric cancer cell lines [29,30], but down-regulated in leiomyosarcomas [31] and mesenchymal cancer cells [32]. Targeting  A and B). Predictive ability of the combination compared with 3-protein signature, regional lymph nodes and TNM classification by receiver operating characteristic (ROC) curves (C and D) and areas under the curve (AUCs) with 95% CI are shown in E and F. Oncotarget 5445 www.impactjournals.com/oncotarget kindlin-2 may improve drug efficacy and reduce the dose of drug required to treat prostate cancer [33]. Our results reveal that over-expression of kindlin-2 is associated with poor prognosis of ESCC. Myosin-9 is a myosin IIA heavy chain, and is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Myosin-9 has been reported to be a target for antiinvasive treatment in human MCF-7 breast cancer cells [34] and gastric cancer [35]. High expression of myosin-9 is correlated with short survival in lung adenocarcinoma and ESCC [36], similar to our results.
The 3 biomarkers are heterogeneously expressed in clinical samples, and the positive frequency of a single biomarker is commonly <10%. No patient expressed all the 15 biomarkers and only 7.0% (18/257) patients expressed the three biomarkers simultaneously. Therefore, it is reasonable to combine multiple biomarkers instead of an individual marker, which will significantly increase both the predictive range and power of the predictive model.
In this study, the three markers (Annexin II, kindlin-2 and myosin-9) combined prove to be significant predictors of OS and DFS. ESCC patients predicted to be high-risk had a very poor prognosis and were more prone to experience tumour recurrence. The predictive power of the 3-protein signature closely approached the predictive power of clinical staging, and that of the combination of the 3-protein signature and TNM classification was stronger than clinical staging. The 3 biomarkers employed in our predictive model may provide a novel therapeutic candidate for ESCC. Certainly, this study is a retrospective study that is limited to patients with ESCC undergoing curative resection. Prospective studies involving larger populations will be required to further validate the usefulness of this system. In conclusion, we demonstrate a clinically applicable prognostic model that accurately predicts ESCC patient survival and/or tumor recurrence, and thus can serve as a complement to current clinical risk stratification approaches.

Patients and Tissue Specimens
Two independent datasets (ESCC tissues) were randomly collected from patients with ESCC undergoing curative resection at Shantou Central Hospital from 2000 to 2006 (generation dataset, n=110) and from 2007 to 2009 (validation dataset, n=147), and embedded in paraffin. Patients in the generation dataset were followed up for a maximum period of 148.7 months and a median of 36.5 months, while follow-up of patients in the validation dataset was terminated on 20 November 2013 at a median of 28.8 months. Overall survival (OS) was defined as the interval between surgery and death or the last observation taken. Disease-free survival (DFS) was defined as the interval between the date of surgery and the date of diagnosis of any type of relapse or death. All the tumors were confirmed as ESCC by the pathologists in the Clinical Pathology Department of the Hospital, and the cases were classified according to the seventh edition of the tumor-node-metastasis (TNM) classification of the International Union against Cancer. Evaluation of tumor differentiation was based on histological criteria of the guidelines of the WHO Pathological Classification of Tumors. Information on age, gender, stage of disease, therapy and histopathological factors (such as tumor location, tumor size, primary tumor, histologic grade, regional lymph nodes) was obtained from the medical records. Patient data is summarized in Table 3, and there is no statistical difference between the two datasets using the chi-squared test. The study was approved by the ethical committee of the Central Hospital of Shantou City and the ethical committee of the Medical College of Shantou University, and written informed consent was obtained from all surgical patients to use resected samples for research.

Microarrays (TMAs) and Immunohistochemistry
Markers that were used included Annexin II, cofilin 1, ezrin, fascin, kindlin-2, moesin, Metastasis suppressor 1 (MTSS1), myosin-9, profilin-1, Ras-related C3 botulinum toxin substrate 1 (Rac1), radixin, Rho-associated coiledcoil containing protein kinase 2 (ROCK2), talin, tensin and villin 1. These markers were analyzed in a test cohort of 110 formalin-fixed, paraffin-embedded esophageal curative resection specimens (generation dataset) using tissue microarrays (TMAs), and then, Annexin II, kindlin-2 and myosin-9, which were significantly related to clinical outcome in TMA analysis of the generation dataset, were further validated by the validation dataset TMAs. TMA construction has been described in our previous studies [37][38][39][40]. Two tissue cores of 1.8 mm in diameter were taken from the donor blocks and transferred to the recipient paraffin block at defined array positions. These microarrays were serially sectioned (4 µm) and stained with hematoxylin and eosin to ensure tissue sampling and completeness. The unstained sections were baked overnight at 56°C in preparation for immunohistochemistry staining.
Primary antibodies used in this study are shown in Table 1. Immunohistochemistry was carried out by a two-step protocol (PV-9000 Polymer Detection System, ZSGB-BIO, Beijing, China) as previously described [40].

Evaluation of Immunohistochemical Variables
Tissue sections were independently and blindly assessed by three independent histopathologists (Cao HH, Wang SH, and Shen JH). Discrepancies were resolved by consensus. Positive reactions were defined as those showing brown signals in the cell cytoplasm. Each separate tissue core was scored on the basis of the intensity and area of positive staining. The intensity of positive staining was scored as follows: 0, negative; 1, weak staining; 2, moderate staining; 3, strong staining. The rate of positive cells was scored on a 0-4 scale as follows: 0, 0-5%; 1, 6-25%; 2, 26-50%; 3, 51-75%; 4, >75%. If the positive staining was homogeneous, a final score was achieved by multiplication of the two scores, producing a total range of 0-12. When the staining was heterogeneous, we scored it as follows: each component was scored independently and summed for the results. For example, a specimen containing 25% tumor cells with moderate intensity (1×2=2), 25% tumor cells with weak intensity (1×1=1), and 50% tumor cells without immunoreactivity (2×0=0), received a final score of 2+1+0=3. The mean value of the two scores was considered representative of one tumour. For statistical analysis, we had each kind of protein expression score grouped into two subgroups, high-expression and low-expression, according to X-tile [41].

Construction of a Weighted OS Predictive Score Algorithm
We had used a univariate Cox proportional hazards regression analysis to evaluate the association between clinical outcome and the expression of each biomarker. Subsequently, we developed a model for estimation of prognosis risk similar to what was described as follows. A patient's poor prognosis risk score was then derived by the summation of each biomarker's expression level (high-expression=1, low-expression=0) multiplied by its corresponding regression coefficient [42][43][44]. All patients were then dichotomized into high-risk and low-risk groups using the 50th percentile (median) cut-off of the final risk score as the threshold value.

Statistical Analysis
Statistical analyses were performed using SPSS 13.0 for Windows. Comparisons between groups were performed using the chi-squared test and one-sample t test. Cumulative survival time was calculated by the Kaplan-Meier method and analyzed by the log-rank test. Univariate and multivariate analyses were based on the Cox proportional hazards regression model. Receiver operating characteristic (ROC) curve analysis was used to determine the predictive value of the parameters, and the differences in the area under the curve (AUC) were detected by using GraphPad Prism 5. The Kendall tau-b rank correlation analysis was used to evaluate the association between our 3-gene signature expression and clinicopathological factors. A P value of less than 0.05 was considered statistically significant.