A four-gene signature predicts survival in clear-cell renal-cell carcinoma

Clear-cell renal-cell carcinoma (ccRCC) is the most common pathological subtype of renal cell carcinoma (RCC), accounting for about 80% of RCC. In order to find potential prognostic biomarkers in ccRCC, we presented a four-gene signature to evaluate the prognosis of ccRCC. SurvExpress and immunohistochemical (IHC) staining of tissue microarrays were used to analyze the association between the four genes and the prognosis of ccRCC. Data from TCGA dataset revealed a prognostic prompt function of the four genes (PTEN, PIK3C2A, ITPA and BCL3). Further discovery suggested that the four-gene signature predicted survival better than any of the four genes alone. Moreover, IHC staining demonstrated a consistent result with TCGA, indicating that the signature was an independent prognostic factor of survival in ccRCC. Univariate and multivariate Cox proportional hazard regression analysis were conducted to verify the association of clinicopathological variables and the four genes’ expression levels with survival. The results further testified that the risk (four-gene signature) was an independent prognostic factors of both Overall Survival (OS) and Disease-free Survival (DFS) (P<0.05). In conclusion, the four-gene signature was correlated with the survival of ccRCC, and therefore, may help to provide significant clinical implications for predicting the prognosis of patients.

nephrectomy remains to be the main therapy. Researchers have found that only early stage ccRCC (T1-2) can be treated with surgery and may have a good long-term prognosis [6]. For metastatic and late-stage ccRCC (T3-4), the curative effect of chemoradiotherapy and surgery are poor [7,8]. Problematically, molecular-targeted therapies, such as axitinib, sorafenib and temsirolimus, have an efficiency of only 10% to 40% [9]. Similarly, immunotherapy also has a low efficacy. Regardless of the therapeutic strategy, the long-term outcome is poor for most ccRCC patients. Therefore, investigation on the molecular mechanism of ccRCC is necessary to better understand the behavior of the disease, predict the prognosis, inform rational treatment programs, and provide novel therapeutic targets.
Many genes have been reported to be involved in the tumorigenesis and progression of the tumor, and have been found to be correlated with patient prognosis and survival. For example, phosphatase and tensin homologue deletion on chromosome 10 (PTEN) is one of the most frequently mutated human tumor suppressor genes [10]. PTEN is located on human chromosome 10q23.3, and encodes a protein containing 403 amino acids. It functions as a dual protein and lipid phosphatase and has been reported to inhibit cell growth and survival, suggesting a critical tumor suppressor effect [10,11]. In recent years, many studies have shown that PTEN often has an abnormal frequency of deletions, genetic mutations or methylation in a variety of cancers, such as prostate cancer and renal cell carcinoma [11][12][13][14]. In addition, PTEN has been found to be closely related to the tumor metastasis and invasion. Loss of PTEN can also result in abnormal activation of the Phosphatidyl Inositol 3-kinase/Protein Kinase B (PI3K/Akt) pathway, which regulates proliferation, apoptosis, survival, translation, differentiation and cellular metabolism [15].
The PI3K/Akt pathway can also become activated by the upregulation of kinases in the pathway, such as phosphatidylinositol-4-phosphate 3-kinase catalytic subunit type 2 alpha (PIK3C2A), which belongs to the class II PI3Ks and plays an essential role in angiogenesis [16,17]. In fact, upregulation of PIK3C2A has been reported in several cancers [18], such as breast cancer, cervical cancer, lung cancer, stomach cancer, colon cancer, liver cancer and oral squamous cell carcinoma [19].
Inosine triphosphate pyrophosphohydrolase (ITPA) is an enzyme that is involved in the 6-Mercaptopurine metabolic pathway and is responsible for converting inosine triphosphate (ITP) back to inosine monophosphate (IMP), thereby preventing the accumulation of the toxic metabolite ITP. In recent years, ITPA has been reported to be one of the five mixed-lineage leukemia associated genes and its upregulation may lead to amplification of the Mixed Lineage Leukemia (MLL) gene region of 11q23 [20]. ITPA expression is also associated with eventfree survival and relapse rates in children with acute lymphoblastic leukemia that are undergoing maintenance therapy [21]. Further evidences show that the absence of functional ITPA activity can result in elevated mutagenesis and accumulation of non-canonical nucleotides, which may cause DNA damage and cancer, indicating a significant role of ITPA in preventing base analog-induced apoptosis, DNA damage and mutagenesis in human cells [22]. Conversely, overexpression of ITPA has been reported in various cancer cell lines, such as colon, lung, liver, pancreatic, and brain [23]. In addition, the expression level of ITPA is higher in stage III melanoma patients with poor prognosis, compared with patients having a good prognosis [24]. B-cell lymphoma 3 (BCL3) is a proto-oncogene that belongs to the Iκ-B family. It has been pointed out to be upregulated in hematological malignancies, as well as in a wide range of solid tumors [25,26], including breast cancer, ovarian cancer, colorectal cancer and non-smallcell lung cancer, and it is also associated with the survival and relapse frequency [27][28][29]. Furthermore, BCL3 has been reported to exhibit anti-apoptotic effect in cancer [30,31], which plays a proto-oncogene role.
However, no study had been reported to clarify the relationship between the four genes (PTEN, PIK3C2A, IPTA and BCL3) together and diseases. Of course, some reports had revealed links between paired comparisons of the four genes. For example, PI3K/PTEN expression was frequently deregulated in many malignancies contributing to the upregulation of PI3K/Akt/mTOR pathway. The activation of PI3K/PTEN/Akt/mTOR pathway was implicated in both the pathogenesis of malignancies and the resistance to anticancer therapies [32]. BCL-3 was reported to increase in human colorectal cancers and can promote cell survival under tumor microenvironment. It may protect colorectal adenoma/carcinoma cells from apoptosis though activation of AKT pathway, which was mediated by PI3K/mTOR pathways [33].
However, analyses on the association between the four genes' expression and survival in ccRCC patients remain limited. In this study, SurvExpress analysis, tissue microarrays and IHC techniques were used to detect the expression of PTEN, PIK3C2A, ITPA and BCL3 in ccRCC, and to explore their relationship with survival. Our findings may provide valid indicators for clarifying the pathogenic mechanism of ccRCC and predicting the prognosis.

Survival analysis with SurvExpress
We analyzed PTEN, PIK3C2A, ITPA and BCL3 expression in TCGA dataset (KIRC) from SurvExpress. The patients from TCGA (n = 468) were classified into predicted low and high risk groups according to the Prognostic Index (PI). The results demonstrated that low www.impactjournals.com/oncotarget Kaplan-Meier survival curves were also constructed to reveal the relationship between predicted risk of ccRCC patients and the OS time. The results showed that patients with high risk had a significantly shorter OS time than those with low risk (A-D). Green and red lines indicated low-and high-risk groups, respectively. P <0.05 was considered to be statistically significant. Cens: Censored; Event: Death; Prog. Idx.: Prognosis Index. expression of PTEN and PIK3C2A were significantly correlated with high risk, poor prognosis and shorter OS time ( Figure 1A, 1B), while high expression of ITPA and BCL3 indicated high risk, poor prognosis and shorter OS time ( Figure 1C, 1D). Moreover, survival differences between predicted low and high risk groups were evaluated with Kaplan-Meier survival curves. Our results showed that patients with high risk had a significantly shorter OS time than those with low risk (Figure 1). Green and red lines indicated low-and high-risk groups, respectively. P <0.05 was considered to be statistically significant

Expression of PTEN, PIK3C2A, ITPA and BCL3 in ccRCC and their relationship with survival
To verify the relation between PTEN, PIK3C2A, ITPA and BCL3 expression with regard to survival and risk, we first performed IHC analysis on tissue microarrays. The expression levels of the four proteins were all divided into two groups (negative and positive expression groups) based on the staining score. A total score of 0-4 points was defined as negative expression, whereas 5-6 points were considered as positive expression. Our study showed that the positive expression rate of PTEN, PIK3C2A, ITPA and BCL3 in 174 cases of ccRCC were 48.9% (Figure 2A), 63.8% ( Figure 2B), 34.5% ( Figure 2C) and 23.6% ( Figure  2D), respectively. Furthermore, Kaplan-Meier survival curves were constructed to analyze the relationship between PTEN, PIK3C2A, ITPA and BCL3 expression and OS as well as DFS. A consistent result with that from the TCGA dataset was shown. Our results demonstrated that negative expression of PTEN and PIK3C2A were correlated with shorter OS and DFS time and worse prognosis (

The four-gene signature predicted survival in ccRCC
Using this four-gene signature, we analyzed its ability to predict survival using TCGA with SurvExpress. In our four-gene signature, the PI of the 468 patients was from -0.1001 to 4.1239, with the optimal cut-off value of 2.43. PI that less than 2.43 was divided into low risk group (n = 249), while PI that higher than 2.43 was High Risk group (n = 219). The analysis demonstrated that high risk was correlated with low expression of PTEN and PIK3C2A, high expression of ITPA and BCL3, shorter survival time and worse prognosis, while low risk was correlated with high expression of PTEN and PIK3C2A, low expression of ITPA and BCL3, longed survival time and better prognosis ( Figure 3A). Moreover, we detected the gene expression level of PTEN, PIK3C2A, ITPA and BCL3 in high risk and low risk group. Our results displayed that the gene expression of PTEN and PIK3C2A were lower in high risk group than that in low risk group, while the gene expression of ITPA and BCL3 were higher in high risk group than that in low risk group, and all had significant difference in the four-gene signature (P = 4.37e-27, P = 1.00e-51, P = 3.83e-51 and P = 1.17e-69, respectively) ( Figure 3B). Moreover, Kaplan-Meier survival curves showed that patients with predicted high risk (n = 219) had significantly shorter OS time than those with low risk (n = 249) (P <0.05) ( Figure 3C). To estimate the accuracy of the four-gene signature on predicting survival, we performed receiver operating characteristic (ROC) analysis to compare the sensitivity and specificity of the survival prediction between our models. TCGA dataset revealed that the area under receiver operating characteristic (AUC) curve of the four-gene signature was 0.701 (time = 60 months) (P<0.05) ( Figure 3D).
In addition, the 174 patients of ccRCC were also divided into two groups (low risk group and high risk group) based on IHC of the four proteins. Patients that coincided with at least three of PTEN (+), PIK3C2A (+), ITPA (-) and BCL3 (-) were characterized as low risk group, while the remaining was considered as high risk group. Consistent with the above results, Kaplan-Meier survival curves also indicated shorter OS and DFS time for the high-risk group (n =74) than in the low-risk group (n =100) (P <0.05) ( Figure 4A, 4C). ROC analysis was also performed to compare the sensitivity and specificity of the models. Our data showed an AUC of 0.719 in OS model and 0.658 in DFS model (P<0.05) ( Figure 4B, 4D). The specificity and sensitivity of the four-gene signature were 0.697 and 0.741 respectively in Overall survival analysis, while they were 0.614 and 0.702 respectively in Diseasefree survival analysis. Table 1 lists the relationship between clinicopathological features and the expression of PTEN, PIK3C2A, ITPA and BCL3 as well as Risk in the 174 cases of ccRCC patients. Our study revealed that the expression of BCL3 (P = 0.028) and Risk (four-gene signature) of the patients (P = 0.040) were significantly correlated with the clinical grade of the tumors. Moreover, the classification of Risk based on the four-gene signature (P = 0.006) was significantly related with the size of the tumors. Furthermore, our data suggested that the expression of BCL3 and ITPA may be associated with PTEN expression (P = 0.012, P = 0.002, respectively).

Survival prediction effect of Risk at different grade in cancer patients
In order to verify whether risk can be used to predict the survival of cancer patients with different grades, we www.impactjournals.com/oncotarget performed subgroup analysis of the differentiation grade according to Survexpress and IHC results. Fuhrman Grade was used for the evaluation of tumor grade. Subgroup analysis according to Survexpress revealed that high risk was all related with short OS time in ccRCC patients of grade 2-4, and showed statistical significance between high risk and low risk group (p<0.05) ( Figure 5). Similarly, Subgroup analysis according to IHC results also demonstrated that high risk was associated with short OS and DFS time in patients of grade 1-3, and had statistical significance (p<0.05) ( Figure 6).

Selection of independent prognostic factors for predicting survival in ccRCC
To identify independent factors associated with survival in ccRCC, a univariate Cox proportional hazard regression analysis was conducted to clarify the   Table 2 revealed that tumor size, grade and clinical stage, the expression levels of the four genes, and risk (four-gene signature) were all correlated with OS. Among the above factors, only PTEN and PIK3C2A were shown to be protective factors in patients, while other indicators that had statistical significance were all risk factors (P <0.05). Then, the above-mentioned factors were brought into further multivariate Cox proportional hazard regression analysis. As Risk was evaluated based on the expression level of PTEN, BCL3, ITPA and PIK3C2A, in our opinion, analyzing the Risk and each gene in one multivariate analysis may lead to interference of the results; so two separate multivariate analyses were performed. One multivariate analysis suggested that the expression levels of PTEN, ITPA, PIK3C2A and BCL3 were all independent predictors of OS (P <0.05). Another multivariate Cox proportional hazard regression analysis showed that tumor grade and risk were independent prognostic factors (P <0.05) ( Table 2). Table 3 revealed that tumor size, grade and clinical stage, the expression levels of the four genes, and risk were also correlated with DFS. Similarly, the above-mentioned factors were further brought into two separate multivariate analyses. One multivariate analysis suggested that the expression levels of PTEN, ITPA and PIK3C2A, but not BCL3, were independent predictors of DFS (P <0.05). Another multivariate Cox proportional hazard regression analysis showed that only risk (four-gene signature) was an independent prognostic factor (P <0.05) ( Table 3).

DISCUSSION
Although great progress has been made in pathogenesis and therapeutic strategy of RCCs, the longterm outcome is poor for most ccRCC patients [34,35]. Therefore, it is requisite to investigate on the molecular mechanism of ccRCC in order to better understand the disease, predict the prognosis, formulate rational treatment programs, and provide novel therapeutic targets [36]. Wu et al. identified a 4-microRNA (miR-10b, miR-139-5p, miR-130b and miR-199b-5p) signature and it was validated to be associated with ccRCC metastasis and prognosis [37]. Another tumor-specific miRNA signature consisting of 22 miRNAs was also demonstrated as an independent prognostic factor, serving as a novel biomarker for prognostic promopt and treatment outcome prediction in ccRCC [38]. Moreover, Wang et al. revealed that combined chemokine (C-Cmotif) ligand 2 (CCL2) and its receptor CCR2 expression may exert its role as an independent prognostic factor for non-metastatic ccRCC patients after surgical treatment [39].
In the present study, we have identified a four-gene signature (PTEN, ITPA, PIK3C2A and BCL3) that was able to predict ccRCC prognosis for the first time. Each of the four genes we identified had been previously reported to be associated with other types of cancer, as well as survival. However, little was known on the expression and function of these four genes in ccRCC. Moreover, as the efficacy of a single index was limited, multi-biomarkerbased model may provide more powerful effect for the prognosis prediction of patients.
In our study, we first analyzed the association of PTEN, PIK3C2A, ITPA, and BCL3 expression with the prognosis of ccRCC patients in TCGA dataset (KIRC) from SurvExpress. The data demonstrated that low expression of PTEN and PIK3C2A were significantly correlated with high risk, poor prognosis and a shorter OS time, while high expression of ITPA and BCL3 indicated high risk, poor prognosis and a shorter OS time (P <0.05). Moreover, Kaplan-Meier survival curves showed that patients with high risk had significantly shorter OS time than those with low risk (P <0.05).
To verify the relationship between PTEN, PIK3C2A, ITPA and BCL3 expression with regard to  the results from TCGA dataset, which also suggesting a prognostic prompt function of the four genes in ccRCC. Furthermore, we analyzed the association of the four-gene signature with survival time according to both SurvExpress and IHC results. Our discovery suggested that the four-gene signature predicted survival better in ccRCC, indicating that the four-gene signature may be an independent predictor of prognosis in ccRCC. Univariate and multivariate Cox proportional hazard regression analysis were then conducted to verify the association of clinicopathological variables and the four genes' expression levels with survival. Our results further testified that the risk (four-gene signature) was an independent prognostic factor of both OS and DFS (P<0.05).
However, some limitations were existed in our study. For example, only TCGA (KIRC) data set was selected for this research, resulting in limited samples for the four-gene signature model of prognosis. As a result, further verifying studies of our model in independent larger cohorts were required in the future.
In conclusion, our results suggested that the fourgene signature was related to the survival and was an independent predictor of prognosis in ccRCC. This may help to provide significant clinical implications for the prognosis prediction. However, the mechanisms of these genes impacting on the survival remain unknown. Therefore, further studies are needed to verify our findings and elucidate the molecular mechanisms so as to provide a deeper understanding of its function in predicting the prognosis of ccRCC.

Datasets
In our analysis, SurvExpress was used to provide survival analysis and risk assessment. SurvExpress (http:// bioinformatica.mty.itesm.mx/SurvExpress), which is a comprehensive gene expression database and online biomarker validation tool based on several datasets, can provide risk assessment and survival analysis in cancer datasets using a biomarker gene list as an input [40]. In databases provided by SurvExpress, TCGA (KIRC) and ZHAO database contain much larger samples (n> 100), and can provide more reliable results of survival analysis. However, RCC, but not ccRCC, is the study object of ZHAO, which does not match with our research. As a result, TCGA (KIRC) database was chose. Using this bioinformatic tool, we analyzed the expression differences of PTEN, PIK3C2A, ITPA, as well as BCL3, and their correlation with the survival of ccRCC patients in TCGA dataset (KIRC), and then analyzed the survival prognostic significance of the four-gene signature for ccRCC. PI, namely risk score, was often used for risk grouping [41,42]. SurvExpress can perform risk grouping through two methods. The first method was default, which was to divide the ordered PI by the groups of risk so that the sample numbers of each group was equal. The second method was conducted through an optimization algorithm using the ordered PI. For example, log-rank test was accomplished using the arranged PI values for two risk groups. Then, the algorithm selected the dividing point, where P was at the minimum value. This process was extended to multi-groups tautologically to optimize a risk group till no change existed. The procedure flow chart of SurvExpress was as shown in Figure 7.

Patients
In total, 174 cases of ccRCC tissues that were histopathologically diagnosed were collected from the Institute of Pathology of Tongji Hospital, Tongji Medical College of Huazhong University of Science Then each clinical characteristic was grouped for subsequent analysis as follows: age (<55 years, ≥55 years); tumor size (<7cm, ≥7cm); Furhman grade (1, 2 vs. 3, 4) and clinical stage (I, II vs. III, IV) [43]. All tissues were collected under the highest ethical standards, and each patient provided written informed consent before randomization. Our research was a retrospective study with follow-up of patients on OS and DFS. OS was referred to the time from the first surgery to remove the tumor to death, regardless of any reason. DFS was referred to the time from the first surgery removing the tumor to disease recurrence / metastasis. SPSS software was used for all survival analysis. The adjusted hazard ratio (HR) as well as 95% confidence intervals (95% CI) was calculated  by Cox proportional hazards model. In univariate analysis, variables that had p value less than 0.05 were used for multivariate analysis. Moreover, we also calculated the sensitivity and specificity of the gene signature.

Evaluation of IHC
The IHC staining results were evaluated by two independent pathologists. Different criteria were used for proteins expressed in the cytoplasm and nuclei. For cytoplasmic proteins (PTEN, ITPA, PIK3C2A), the pathologists performed a semi-quantitative analysis according to the staining intensity and percentage of positive cells microscopically. We used the IHC integral criteria that had been used in many other studies [44,45]. Staining intensity was scored as follow: "−" as 0 point; "+" as 1 point; "++" as 2 points; "+++" as 3 points. The percentage of positive cells (tumor cell counts ≥ 200 cells) was graded as follows: less than 25% as 0 point; 25%-50% as 1 point; 51%-75% as 2 points; 75%-100% as 3 points. The staining score was counted by adding the intensity score and the percentage score. A total score of 0-4 points was defined as negative expression, whereas 5-6 points were considered as positive expression. As nuclear proteins (BCL3), we referred to a separate IHC integral criterion regardless of intensity, because the protein had been reported to be located predominantly within the nucleus [46]. Each sample was composed of more than 200 tumor cells and the ratio of positive cells that were over 20% was considered to be BCL3 positive, while the rest were negative. Patients were divided into two groups based on the expression of PTEN, PIK3C2A, ITPA and BCL3. The low risk group coincided with at least three of PTEN (+), PIK3C2A (+), ITPA (-) and BCL3 (-), while the remaining was considered as high risk group. Detailed information was shown as Supplementary  Table S1.

Statistical analysis
The Chi-square test was used to compare between the two groups. To obtain and compare the survival curves, Kaplan-Meier method and log-rank test were performed [47]. The univariate and multivariate Cox proportional hazard regression analyses were conducted to evaluate independent prognostic factors associated with survival. All data were analyzed using SPSS 18.0 software, with the level of statistical significance at P < 0.05.