Survival prediction of kidney renal papillary cell carcinoma by comprehensive LncRNA characterization

Kidney renal papillary cell carcinoma (KIRP) accounts for 10%–15% of renal cell carcinoma (RCC), patients with KIRP tend to have a poor prognosis, and there was a lack of effective prognostic indicators for this type of cancer. Currently, owing to the availability of The Cancer Genome Atlas (TCGA), long non-coding RNAs (LncRNAs) have been discovered to indicate a prognostic value in some tumors. In that regard, we analyzed lncRNA-sequencing data of KIRP in TCGA, and among 780 differentially-expressed lncRNAs, we selected 37 lncRNAs which were able to assist the prognosis. In addition, by using the multivariate cox regression analysis, the prognosis index (PI) that consisted of 7 lncRNAs (including AFAP1-AS1, GAS6-AS1, RP11-1C8.7, RP11-21L19.1, RP11-503C24.1, RP11-536I6.2, and RP11-63A11.1) could predict the progression and outcomes of KIRP with accuracy. More importantly, the PI was considered an independent indicator for prognostication of KIRP. Moreover, having categorized patients with KIRP into cohorts of high risk and low risk, according to the PI, we found that the key genes and pathways varied in these two groups. Overall, these LncRNAs, especially the PI, may be conceived as biomarkers and helpful for determining the different pathological stages for KIRP patients. However, their biological functions need to be further confirmed.

Approximately 30% of RCC patients have been found to exhibit distant metastasis when diagnosed, thus the prognosis of RCC remains poor. Clinically, patients with other RCC subtypes tend to have desirable outcomes; however, patients with KIRP were more likely to experience obviously worse clinical courses [1,6,7].
Currently, several biomarkers for KIRC that have been detected include von Hippel-Lindau (VHL) [8][9][10][11], vascular endothelial growth factor (VEGF) [12][13][14][15], carbonic anhydrase IX (CAIX) [16][17][18], and hypoxiainducible factor 1 alpha/2 alpha (HIF1a/2a) mutations [19][20][21], some of which were able to forecast the medical Research Paper effectiveness and outcomes. Despite that, there have been scarce studies into the molecular biomarkers of KIRP for the prediction of therapeutic efficacy and prognostication. Long non-coding RNAs (LncRNAs) were considered to play a vital role in tumorigenesis and progression, exhibited by their capability to predict the patients' outcomes [22][23][24][25][26]. Nonetheless, studies found that only a small number of lncRNAs could foretell the development and prognosis of KIRP. Hence, it is important to seek novel molecular biomarkers of lncRNAs with prognostic value for KIRP, which would facilitate the understanding of the pathogenesis of KIRP and be helpful in prognosis evaluation.
The Cancer Genome Atlas (TCGA) has rendered the results of KIRP available for researchers [27][28][29][30], which updated the knowledge towards the understanding of lncRNA-based diseases. Nevertheless, there is no report regarding the clinical significance of lncRNAs and the differences among lncRNAs based on the TCGA data so far. Consequently, mRNA-Seq data, including lncRNAs, of patient samples with KIRP collected from TCGA data were set to recognize some key lncRNAs that were related to clinical manifestations. We particularly focused on the ability of prognostic evaluation of these lncRNAs, and investigated the clinical significance by selecting seven lncRNAs with high prognostic potential to form a pool.
These key lncRNAs would show clinical significance and play a role in the initiation and development of KIRP.

Differentially-expressed lncRNAs (DELs)
Analysis of the DELs was conducted to compare the expressions of 13,198 lncRNAs between KIRP and normal kidney tissues. A total of 780 DELs ( Figures 1,2) were collected by EdegR. Then we eliminated cases without adequate data on survival duration, and finally, we obtained 271 DELs for further exploration.

Construction of DEL-based prognostic signature
It was demonstrated by the univariate Cox proportional hazards regression that 37 of these 271 differentiallyexpressed lncRNAs displayed remarkable prognostic value. Then multivariate Cox proportional hazards regression analysis confirmed that seven differentially-expressed lncRNAs showed prognostic significance for KIRP, containing actin filament associated protein 1 antisense RNA1(AFAP1-AS1), GAS6 antisense RNA 1 (GAS6-AS1), RP11-1C8.7, RP11-21L19.1, RP11-503C24.1, RP11-536I6.2, and RP11-63A11.1 (Table 1 and Figure 3). The relationship between the expression of major lncRNAs and clinicopathological features in KIRP are shown in Figure 4. The expression of the seven lncRNAs could also forecast the development of KIRP (Table 2 and Figure 5). The prognosis index (PI) formula for overall survival prediction was as follows: PI = 0.195 * exp AFAP1-AS1 -0.474 * exp GAS6-AS1 -0.243 *  (Figure 6). Also the PI, to a large extent, was able to foretell the 5-year survival of KIRP patients, with the area under the receiver operating characteristic curve (AUC) value being 0.824 ( Figure 7A). Moreover, Kaplan-Meier (K-M) curves showed that the average survival time of patients in the high risk group was 109.4 months, which was noticeably shorter than that of their low risk counterparts (117.3 months, P < 0.001, Figure 7B).
Meanwhile, we also measured the prognosis value of various clinical parameters with PI. By the univariate Cox proportional hazards regression, we discovered that a variety of parameters were closely related to undesirable prognosis of KIRP patients (Table 3). Yet, the multivariate Cox proportional hazards regression unveiled that only distant metastasis could be an independent prognosis indicator for KIRP ( Table 3). The K-M curves of the clinical features were depicted in Figure 8.
We also evaluated the correlations between PI and each of the clinical features. It was found that PI exhibited moderate ability to foretell the status of tumor stage, metastasis, and lymphatic invasion (Table 4 and Figure 9). The expression of the 7 differentially-expressed lncRNAs in the cohorts of high risk and low risk was illustrated in Figure 10.

Functional assessment of the differentiallyexpressed genes in high risk and low risk groups
Gene Set Enrichment Analysis (GSEA) was performed to distinguish relevant biological processes and signaling pathways [37]. We also investigated differences in the gene expression pattern between KIRP patients and KIRC patients. The gene set enrichment analysis was carried out on the gene sets that showed notably differential expression based on normalized enrichment score (NES) from high to low. It was detected that a total of 156 pathways were considerably enriched in the high risk group, including KEGG_VASCULAR_SMOOTH_MUSCLE_ CONTRACTION ( Figure 11A, Figure 12), KEGG_ TGF_BETA_SIGNALING_PATHWAY, KEGG_MAPK_ SIGNALING_PATHWAY. On the contrary, in the low risk group, the enrichment was seen in 21 pathways, including some cancer-related pathways like KEGG_OXIDATIVE_ PHOSPHORYLATION ( Figure 11B, Figure 13), KEGG_ REGULATION_OF_AUTOPHAGY. The relevant top 10 biological pathways were listed in Tables 5 and 6.  Figure 2B). Omitting those common DEGs between high-risk and low-risk groups, a total of 1081 DEGs were gathered from high-risk group, while 1387 DEGs were collected from low-risk group (Supplementary Figure 1C), which could help explain the different clinical outcomes of these two groups. The results of the KEGG analysis and of the top 50 pathways from GO analysis were presented in Supplementary Figure

Validation of the novel lncRNAs using Gene Expression Omnibus (GEO) DataSets
A total of 67 series were retrieved from GEO DataSets according to the search term. Finally, only GSE2748 and GSE48352 were found to contain AFAP1-AS1 or GAS6-AS1 expression. The expression level of AFAP1-AS1 from GSE48352 was higher in papillary renal cell carcinoma (PRCC) than in that of normal controls (P = 0.0318) ( Figure 14E). The GAS6-AS1 statistical significance differences were noted in the TNM stage (III/ IV vs. I/II) and distant metastasis (M0 vs. M1) (all P < 0.05) ( Figure 14A, 14B). AFAP1-AS1 could be used as a prognostic factor, and GAS6-AS1 might be a protective factor in PRCC ( Figure 14C, 14D). These results conformed to our previous findings based on TCGA.

DISCUSSION
In the current study, we analyzed lncRNAsequencing data of KIRP in TCGA, and used multivariate Cox regression analysis to obtain seven lncRNAs to form a pool, which included AFAP1-AS1, GAS6-AS1, RP11-1C8.7, RP11-21L19.1, RP11-503C24.1, RP11-536I6.2, and RP11-63A11.1. The PI was also calculated. Surprisingly, the PI was discovered to be an independent indicator for KIRP with high prognostic significance. Additionally, after categorizing the KIRP patients into high risk and low risk groups according to the PI, we observed alterations in key genes and pathways between these two cohorts. To our knowledge, this current study pioneered research into the prognostic value of lncRNAs in KIRP.
RCC represents over 90% of renal carcinomas, among which the subtype KIRP ranked second in the incidence rate, following KIRC [1][2][3][4][5]. Remarkable developments have been achieved in the therapy for metastatic KIRC over the past decades, and a number of targeted therapies have been proposed for it. On the other hand, studies aimed at controlling the metastatic non-clear cell RCC (nccRCC) still remains limited. In addition to KIRP, non-clear cell diseases entail chromophobe and sarcomatoid RCC as well [2]. Currently, some genes and microRNA have been demonstrated to play a role in the prognosis of KIRP. Furthermore, there is a growing awareness that lncRNAs may also exert prognostic capability. However, we know little of the clinical  significance of lncRNAs in KIRP despite their multiple functions as a type of non-coding RNA.
In recent years, researchers have had a great opportunity to identify a wide range of novel biomarkers for various malignancies thanks to TCGA, but the roles of lncRNAs in kidney cancers were rarely reported. In 2015, Malouf GG et al. [31] conducted an analysis of the RNA-sequencing investigation of 475 primary KIRC cases from TCGA. Four lncRNA subtypes in KIRC were investigated by unsupervised clustering, which correlated with different clinical, pathological, and genomic characteristics of KIRC. The most aggressive tumors were determined via Cluster C2 (23.4%), with the highest Fuhrman grade, the most advanced TNMstage, and the worst overall survival duration. Moreover, enrichment of cluster C2 was carried out for 9p deletion and chromatin remodeler BAP1 somatic mutations. It was interesting to note that cluster C4 (7.8%) was linked with the tumor subtype that was derived from the distal tubules of the nephron. Due to its distinguishable ontogeny, cluster C4 lacked typical alterations in KIRC, but had frequent 1p deletion and 17q gain, and was enriched for MiTF/TFE translocations. However, this was merely a research dealing with KIRC. Because of  the histological distinctions between KIRC and KIRP and their different molecular mechanisms, there was a pressing need for the expression profiling of lncRNAs in KIRP and a selection of lncRNAs that could predict progression and prognostication. However, studies on these issues have not been conducted yet. We singled out the differentially-expressed lncRNAs and then gathered 7 lncRNAs by univariate and multivariate Cox proportional hazards regression, each of which would bear clinical significance such as TNM stage, distant metastasis, and tumor stage, etc. to some extent. More interestingly, the PI composed of these seven lncRNAs displayed considerable predictive potential for disease progression, and could even qualify itself as an independent indicator for prognosis with its clinical significance. Furthermore, taking GEO dataset into validation, we collected 67 KIRP-related series. The clinical and prognostic value of lncRNAs AFAP1-AS1 and GAS6-AS1 could be partially verified by GSE2748 and GSE48352. However, the clinical efficacy of these lncRNAs and the PI required further confirmation with a larger sample size and joint effort.
Having classified the patients into high risk and low PI cohorts according to their PI scores, we examined the signaling pathways of differentially-expressed genes in the two cohorts and found that the major pathways of the high risk group included KEGG_VASCULAR_ SMOOTH_MUSCLE_CONTRACTION, KEGG_TGF_     BETA_SIGNALING_PATHWAY, and KEGG_MAPK_ SIGNALING_PATHWAY; these were utterly different from its low counterpart, of which the main pathways involved KEGG_OXIDATIVE_PHOSPHORYLATION and KEGG_REGULATION_OF_AUTOPHAGY. It was assumed that the pathway differences helped clarify the underlying molecular mechanisms which produced contrasting outcomes. Nonetheless, the relationships between these pathways and KIRP required experimental verification.
For the present, the clinical implications and mechanisms of lncRNAs in other diseases appeal to our research team. By literature research, of the seven lncRNAs, we only retrieved reports on AFAP1-AS1 and GAS6-AS1 in other diseases, whereas reports were lacking on RP11-1C8.7, RP11-21L19.1, RP11-503C24.1, RP11-536I6.2, and RP11-63A11.1. It was thus necessary to conduct in-depth investigation.
Owing to the extensive research on AFAP1-AS1, Liu et al. [50] performed meta-analysis on the clinical role of AFAP1-AS1. Web-based search in PubMed, EMBASE,   Cochrane Library, China National Knowledge Infrastructure (CNKI) and Wanfang database was conducted. After gathering papers concerned with AFAP1-AS1, Liu et al. [50] investigated the relationships between expression levels of AFAP1-AS1 and lymph node metastasis, distant metastasis, overall survival, relapse-free survival, and progression-free survival duration. Eventually, 1,017 patients from eight studies were involved, up to 2015. It was found that high expression of AFAP1-AS1 rendered cancer patients more vulnerable to lymph node metastasis and distant metastasis. Additionally, compared with low expressions of AFAP1-AS1, a significant relationship between highly-expressed AFAP1-AS1 and shorter overall survival, undesirable progression-free survival, and depressive recurrencefree survival was noted. Overall, increased expression of AFAP1-AS1 indicated adverse clinical outcomes. AFAP1-AS1 has high potential to function as a new biomarker for predicting clinical consequences in human malignancies. However, there were no reports on this issue available except our current study. The connections between AFAP1-AS1 and KIRP remained obscure, thus requiring more profound and comprehensive clinical research.
Studies on GAS6-AS1 were also rather scarce, and only relevant studies on lung cancers were retrieved. It was reported that down-regulation of GAS6-AS1 expression was observed in tumor tissues in 50 cases of non-small cell lung cancer (NSCLC) compared with adjacent normal tissues (P < 0.001). In addition, reduced expression of GAS6-AS1 was adversely linked with lymph node metastasis (P = 0.032) as well as advanced tumor node metastasis stage (P = 0.003). Via univariate and multivariate analysis, GAS6-AS1 was considered to act as an independent predictive factor for overall survival duration (P = 0.036). Moreover, GAS6-AS1 level showed a negative relationship with GAS6 mRNA level. Aberrant GAS6-AS1 expression could participate in the development of NSCLC via affecting of its host gene, rendering it possible in becoming a diagnosis target in NSCLC patients [51]. The RNA sequencing data analysis revealed that GAS6-AS1 showed a similar trend in KIRP as in NSCLC, as well as being strongly connected with prognosis. Hence, we supposed that GAS6-AS1 might play a vital part in the initiation and development of KIRP, but its molecular mechanism requires further investigation.
Overall, in this current research, we pioneered the deep RNA-sequencing data mining of KIRP lncRNA in TCGA and found that AFAP1-AS1, GAS6-AS1, RP11-1C8.7, RP11-21L19.1, RP11-503C24.1, RP11-536I6.2, and RP11-63A11.1 were differentially expressed in tumor tissues of KIRP, hence exhibiting their capacity to predict outcomes. More importantly, The PI, which consisted of the seven novel lncRNAs, could become a new independent prognostic indicator for KIRP, providing new targets for clinical therapy. Nevertheless, the data in the study were entirely retrieved from TCGA. A joint effort is necessary to clarify the clinical significance and molecular mechanisms of these lncRNAs.

Patients and data mining from TCGA
TCGA has been a major source of valuable data for 33 types of tumors, including for mRNA expression, protein expression, various mutations, amplifications, etc. In this study, we collected sequencing data of KIRP from the TCGA website (https://portal.gdc.cancer.gov/), which contained 321 KIRP tissues and 32 tumor-free adjacent normal tissues up to May 10, 2017. The clinical parameters, including age, tumor size, status of lymph node metastasis, status of distant metastasis, clinical TNM stage, and overall survival, etc. were collected and analyzed. Due to the public availability of the data from TCGA, additional approval by the ethics committee was not required. The data was used and processed according to TCGA Human Subjects Protection and Data Access Policies.

Evaluation of the differentially-expressed lncRNAs
The current KIPR TCGA dataset was comprised of gene counts for 60,244 mRNAs. In this study, we chose the lncRNAs with descriptions from GENCODE   (http://www.gencodegenes.org/) for further analysis, and a subset of expression profiles of 13,198 lncRNAs was eventually obtained. The differentially expressed lncRNAs were subsequently filtered by edgeR R package, with Padj < 0.05 and |log 2 FC| > 2 of expression level when comparing tumor and adjacent normal kidney tissue. The differentially expressed lncRNAs, which had been log2 transformed, were illustrated in a volcano plot and heatmap.

Construction of DEL-based prognosis index (PI)
In performing the prognosis analysis, we excluded differentially-expressed lncRNAs whose expression levels were below 1 in more than 10% of all subjects [52]. Patients whose key clinical statistics were available in the survival evaluation were involved. Additionally, the follow-up time or survival time of the patients had to exceed 0 days. Subjects without clinical information were removed. The end-point of the current study was overall survival (OS).
The univariate Cox proportional hazards regression was applied to obtain the differentially-expressed lncRNAs that were strongly associated with OS, with the significance level set at 0.05. The multivariate Cox regression model was also used to calculate the prognosis value of these differentially-expressed lncRNAs. Next, the clinical role of these DELs with prognostic value was assessed. Student's T-Test (SPSS Inc., Chicago, IL, USA) was used to examine the differential expression of these lncRNAs between KIRP and non-cancerous kidney tissues. The relationships between DEL expression and clinical progress were evaluated via Student's T-Test, Spearman correlation, and K-M Curve. The PI, which was used for OS prediction, was created on the basis of the linear combination of the expression level multiplied by the regression coefficient that was derived from the multivariate Cox regression model (β). The calculation formula is as follows: PI = βDEL1 × exprDEL1 + βDEL2 × exprDEL2+ ··· + βDELn × exprDELn. [53][54][55].
Patients with KIRP were categorized into high risk and low risk cohorts according to the cut-off of the individual infection point of PI. The impact of PI on the OS of KIRP cases was measured by univariate and multivariate Cox proportional hazards regression analysis. Several clinical parameters, such as age, tumor stage, and cancer status, etc. were adjusted. Hazard ratio (HR) and 95% confidence intervals (CI) were evaluated. In order to gauge the accuracy of the prognostic model for timedependent disease outcomes, we utilized the R package "survival ROC" to conduct ROC curve analysis within 5 years as the defining point [53]. Kaplan-Meier survival curves were employed to calculate OS duration for KIRP patients with predicted high or low PIs.

Different signaling pathways involved in high risk and low risk cohorts
GSEA, also known as functional enrichment analysis, could facilitate the identification of types of genes or proteins that are over-represented in a large set of genes or proteins which may be connected with disease phenotypes. GSEA distinguishes itself from other pathway analysis owing to its enrichment process-the gene expressions in each cohort are first calculated and then enriched according to their expressions. The calculation is as follows: calculate the enrichment score (ES) that embodies the amount to which the genes in the set are over-represented at either the top or bottom of the list. The Molecular Signatures Database (MsigDB) accommodates a large group of annotated gene sets that can be utilized with most GSEA Software. In the current study, a total of 60,483 genes were inputed for GSEA. Gene sets with a p-value less than 0.05 and a false discovery rate (FDR) value <0.25 were deemed significantly enriched. If no significant gene set could be obtained, then the gene sets were listed ascendingly according to the orders of p-value and FDR. The results were generated by GSEA.

Different signaling pathways obtained in highrisk and low-risk groups
DEGs were calculated between high-risk and low-risk groups and normal tissues, respectively. DEGs were identified using the edgeR package with Padj<0.05 and |log 2 FC| > 2. The inconsistent DEGs between highrisk and low-risk groups were sent for further signaling pathways analysis to unveil different possible molecular mechanisms. DAVID database was used for the annotation and visualization of DEGs in different risk groups. GO terms and KEGG pathway of DAVID were considered as significant with Padj < 0.05.

Validation using gene expression omnibus (GEO) datasets
The correlative microarrays from GEO DataSets were gathered to validate the clinical roles of the seven lncRNAs; the following search terms were adopted: (kidney OR nephridium OR renal) AND papillary AND (cancer OR carcinoma OR tumor OR neoplas * OR malignan * OR adenocarcinoma). The levels of lncRNA expression between different groups were analyzed by Student's t-test. Survival analysis was performed using Kaplan-Meier method.