Expression profiles analysis of long non-coding RNAs identified novel lncRNA biomarkers with predictive value in outcome of cutaneous melanoma

Recent advancements in cancer biology have identified a large number of lncRNAs that are dysregulated expression in the development and tumorigenesis of cancers, highlighting the importance of lncRNAs as a key player for human cancers. However, the prognostic value of lncRNAs still remains unclear and needs to be further investigated. In the present study, we aim to assess the prognostic value of lncRNAs in cutaneous melanoma by integrated lncRNA expression profiles from TCGA database and matched clinical information from a large cohort of patients with cutaneous melanoma. We finally identified a set of six lncRNAs that are significantly associated with survival of patients with cutaneous melanoma. A linear combination of six lncRNAs (LINC01260, HCP5, PIGBOS1, RP11-247L20.4, CTA-292E10.6 and CTB-113P19.5) was constructed as a six-lncRNA signature which classified patients of training cohort into the high-risk group and low-risk group with significantly different survival time. The prognostic value of the six-lncRNA signature was validated in both the validation cohort and entire TCGA cohort. Moreover, the six-lncRNA signature is independent of known clinic-pathological factors by multivariate Cox regression analysis and demonstrated good performance for predicting three- and five-year overall survival by time-dependent receiver operating characteristic (ROC) analysis. Our study provides novel insights into the molecular heterogeneity of cutaneous melanoma and also shows potentially important implications of lncRNAs for prognosis and therapy for cutaneous melanoma.


INTRODUCTION
Skin cancer is the most common form of cancer and can be divided into three histological subtypes: basal cell carcinoma (BCC), squamous cell carcinoma (SCC) and cutaneous melanoma (CM) [1]. Although BCC and SCC are more common kinds of skin cancer, CM accounting for less than 5% of all skin cancer is the most aggressive skin cancer and causes more than 75% of skin cancerrelated deaths worldwide [2]. The patients with localized melanoma often can be cured by surgical management alone and have good prognosis. However, 10%-40% of patients diagnosed with localized lesions still die from melanoma [3]. The current prognosis evaluation is mostly based on clinical and histological features, such as tumor thickness, mitotic rate, and ulceration which has been shown to be independent predictors of survival [4]. Recent large-scale genomic analyses have demonstrated the molecular heterogeneity of cutaneous melanoma, leading to a critical need to identify molecular markers for more accurate individualized prognosis for cutaneous melanoma patients and improve overall survival outcome [5]. www.impactjournals.com/oncotarget/

Oncotarget, Advance Publications 2017
With the application of whole genome and transcriptome sequencing technologies, it has been shown that only < 2% of the human genome can encode proteincoding genes and at least 90% of the genome is actively transcribed and give rise to a range of the non-coding RNA transcripts (ncRNAs) [6]. NcRNAs can be grouped into two major classes according to their size: small ncRNAs (such as microRNAs) and long non-coding RNAs (lncRNAs). Currently, lncRNAs were generally defined as mRNA-like non-coding transcripts ranging in length from 200 nt to ~100 kilobases [7]. Like mRNAs, lncRNAs are transcribed by RNA polymerase II (RNA pol II) and have a 5'terminal methylguanosine cap and are often spliced and polyadenylated [8]. Recent advancements in cancer biology have identified a large number of lncRNAs that are dysregulated expression in the development and tumorigenesis of cancers, highlighting the importance of lncRNAs as a key player for human cancers [9][10][11]. Several research groups have focused on expression change of lncRNAs and identified some differentially expressed lncRNAs implicated in the pathogenesis of cutaneous melanoma, revealing the potential of lncRNAs as biomarkers or therapeutic targets for cutaneous melanoma [12,13]. Although many efforts have been made to identify lncRNA signature in a wide range of human cancers [14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31], the prognostic value of lncRNAs still remains unclear and need to be further investigated.
In this study, we analyzed the lncRNA expression profiles and clinical data in 225 patients with cutaneous melanoma from The Cancer Genome Atlas (TCGA) project. The aim of our study was to assess the prognostic value of lncRNAs in cutaneous melanoma and tried to identify a lncRNA signature that could be used as a molecular prognostic marker for patients with cutaneous melanoma.

Identification of prognostic lncRNAs from the training cohort
All patients from TCGA were first randomly divided into the training cohort (n=113) and validation cohort (n=112) for the purpose of discovery and validation of prognostic RNAs. By subjecting lncRNA expression of patients in the training cohort to univariable Cox proportional hazards regression analysis, we identified 40 prognostic lncRNA candidates that were significantly correlated with survival time (p<0.005). Then we performed the multivariate analysis to evaluate the independent prognostic value of these 40 candidate prognostic lncRNAs and identified six lncRNAs as independent prognostic factors with the ability to predict the outcome. The overview of these six prognostic lncRNAs was listed in Table 1.

Development of prognostic signature based on the six prognostic lncRNAs in the training cohort
To construct a prognostic signature, the six prognostic lncRNAs were fitted in a multivariate Cox regression analysis in the training cohort. Then a prognostic lncRNA signature with six prognostic lncRNAs was constructed using a mathematical formula for survival prediction according to the expression of the six prognostic lncRNAs and using the multivariate Cox regression coefficient as the weight, as follows: Risk score= (-0.1779* expression value of LINC01260)+(-0.1522*expression of HCP5)+ (0.2537* expression value of PIGBOS1)+(-0.4409*expression of CTA-292E10.6)+ (-0.8444* expression value of RP11-247L20.4)+(-0.2056*expression of CTB-113P19.5). A patient with six-lncRNA signature risk score larger than the median risk score (0.562) was classified as high-risk, whereas patients with risk score six-lncRNA signature risk score smaller than the median risk score (0.562) was classified as low-risk. When the six-lncRNA signature was applied to the training cohort, all patients of the training cohort were divided into the high-risk group (n=57) and low-risk group (n=56) according to the threshold of the median risk score (0.562). Patients in the high-risk group had significantly shorter median survival time than those in the low-risk group (median survival 37.5 months vs. 164.3 months, p<0.001) ( Figure 1A). In details, the survival of patients in the low-risk group was 81.8% at 36 months and 77.7% at 60 months which compared with 50.9% and 26.6%, respectively, in the high-risk group. The univariable analysis also revealed a significant association between six-lncRNA signature risk score and overall survival ( Table  2). Time-dependent ROC curves were used to assess the prognostic power of the six-lncRNA signature. The AUC of the six-lncRNA signature for survival prediction was 0.726 at 36 months of OS and 0.825 at 60 months of OS.
Distribution of the six-lncRNA signature risk scores, the expression pattern of prognostic lncRNAs and the survival status was shown in Figure 1C. As shown in Figure 1C, the higher levels of expression of six lncRNAs in the signature were associated with shorter survival of patients.

Validation of the six-lncRNA signature for survival prediction in the validation cohort and entire TCGA cohort
To confirm the survival prediction power of the six-lncRNA signature, we tested the six-lncRNA signature in the validation cohort. Each patient of validation cohort was assigned a risk score by the six-lncRNA signature, and was classified into the high-risk or low-risk patient according to the threshold of the median risk score (0.562) derived from the training cohort. The patients of the validation cohort were classified as high-risk (n=59) or low-risk (n=53) with significantly different survival time. Kaplan-Meier curves for the high-risk and low-risk groups within the validation cohort are shown in Figure 2A. As in the training group, the overall survival time of the highrisk group patients was significantly shorter than that of low-risk group patients (median survival 72.8 months vs. 174.6 months, p=0.037) ( Figure 2A). In details, the survival of patients in the low-risk group was 86.3% at 36 months and 77.5% at 60 months which compared with 69.9% and 55.2%, respectively, in the high-risk group. The univariable analysis also revealed a significant association between six-lncRNA signature risk score and overall survival (HR=1.434, 95% CI=1.091-1.885, p=0.01) ( Table  2). The AUC of the six-lncRNA signature for survival prediction in the validation cohort was 0.621 at 36 months of OS and 0.623 at 60 months of OS. Distribution of the six-lncRNA signature risk scores, the expression pattern of prognostic lncRNAs and the survival status was shown in Supplementary Figure 1A, which is consistent with findings in the training cohort.
The six-lncRNA signature was then tested for its prognostic value in the 225 patients of entire TCGA cohort. The same six-lncRNA signature and threshold as those derived from the training cohort classified 116 and 109 patients of the entire TCGA cohort into the high-risk and low-risk groups, respectively. Consistent with the findings described above, patients in the high-risk group had significantly shorter overall survival than those in the low-risk group (median survival 48 months vs. 164 months, p<0.001) ( Figure 3A). In details, the survival of patients in the low-risk group was 84% at 36 months and 77.6% at 60 months which compared with 60.5% and 41%, respectively, in the high-risk group. The univariable analysis also revealed a significant association between six-lncRNA signature risk score and overall survival ( Table 2). The AUC of the six-lncRNA signature for survival prediction was 0.661 at 36 months of OS and 0.721 at 60 months of OS.
Distribution of the six-lncRNA signature risk scores, the expression pattern of prognostic lncRNAs and the survival status was shown in Supplementary Figure 1B, which is consistent with findings in the training cohort and validation cohort.

Independent prognostic value of the six-lncRNA signatures
To test whether the prognostic power of the six-lncRNA signature for survival prediction is independent of other clinicopathological factors, we performed We next performed data stratification analysis for age and stage. With the six-lncRNA signature, the younger patients can be further subdivided into the high-risk group and low-risk group with significantly different survival (p=0.002) ( Figure 4A). Similar results were observed when the six-lncRNA signature was applied to the elder patients (p<0.001) ( Figure 4B). Then we stratified the entire TCGA patients into early-stage patients and advanced-stage patients. The six-lncRNA signature could classify early-stage patients into the high-risk group and low-risk group with significantly different survival (p<0.001) ( Figure 4C). For advanced-stage patients, the six-lncRNA signature also showed similar prognostic value for survival prediction ( Figure 4D). The results of the multivariate Cox regression and stratification analysis thus indicated that the predictive ability of the six-lncRNA signature is independent of other clinical factors for survival prediction in patients with cutaneous melanoma.

DISCUSSION
Cutaneous melanoma is one of the most common malignancies and causes the greatest number of skin cancer-related deaths. Prediction of disease progression and prognosis is important for personalized risk assessment to improve treatment efficacy. With advances in the high-throughput omics study, large-scale genomic analyses of cutaneous melanoma have provided insights into the biological heterogeneity of cutaneous melanoma which has potentially important implications for improving prognosis [32]. Although traditional clinical and histological variables have been applied to guide treatment decisions, the early identification of patients at highest risk for disease progression still is unsatisfactory because of highly variable clinical behavior and molecular heterogeneity [4]. Increasing evidence in the molecular profiling analysis has found that molecular characterization can improve prognosis prediction compared with traditional clinical and histological variables. Timar reported a meta-analysis of the metastasis-gene signatures using seven melanoma cohorts from Gene Expression Omnibus (GEO) and identified a 350-gene signature [33]. Another study performed by Gerami et al. revealed a 28gene signature to predict the metastatic risk associated with cutaneous melanoma [34]. Furthermore, recent some efforts also have been made to access the clinical significance of miRNAs in cutaneous melanoma and successfully developed several miRNA-based signatures to improve risk stratification for patients with cutaneous melanoma. For example, analysis of miRNA microarray expression profiling in primary and metastatic melanoma specimens identified a six-miRNA signature significantly stratified stage III patients into "better" and "worse" prognostic patients group [35]. A subsequent study of 80 melanoma patients at primary diagnosis identified a five-miRNA signature which can be used to determine recurrence risk of primary melanoma patients [36].
Despite great improvements in developing molecular biomarkers in cutaneous melanoma, these existing molecular signatures were mainly based on expression of mRNAs or miRNAs. Recently, a novel class of ncRNAs, termed lncRNAs, has been discovered and indicated as one of cancer hallmarks [37]. Compare to mRNAs and miRNAs, expression and functions of lncRNAs tended to be typically more specific in terms of cell-, tissue-and tumor-type [38]. Moreover, many lncRNAs were found to be stable and easily detectable in plasma or other body fluids, highlighting the possibilities of lncRNAs for the diagnostics and treatment of cancer [39]. Although several lncRNAs have been reported to be involved in the pathogenesis of melanoma, such as BANCR, SLNCR1, CASC15, MALAT1 and so on [12], genome-wide systematic analysis for the predictive value of lncRNAs in prognosis prediction for cutaneous melanoma is lacking because of limited available lncRNA expression profiles in cutaneous melanoma. Fortunately, Li et al proposed a computational pipeline to obtained genome-wide lncRNA expression profiles in a large number of patients with various cancers by repurposing large-scale RNA-Seq cohorts from TCGA project [40], thus facilitating the discovery and validation of novel lncRNA biomarkers in some cancers and providing an  unprecedented opportunity to systematically evaluate clinical implication for diagnosis and prognosis in cutaneous melanoma.
In this study, we tried to assess the prognostic value of lncRNAs in cutaneous melanoma by integrated lncRNA expression profiles from TCGA database and matched clinical information from a large cohort of patients with cutaneous melanoma. We finally identified a set of six lncRNAs that are significantly associated with the survival of patients with cutaneous melanoma. A linear combination of six lncRNAs (LINC01260, HCP5, PIGBOS1, RP11-247L20.4, CTA-292E10.6 and CTB-113P19.5) was constructed as an indicator for the clinical outcome of patients with cutaneous melanoma. By applying the six-lncRNA signature to the training cohort, a clear separation was observed in the survival curves between high-risk group and the low-risk group. Patients with high-risk lncRNA signature had poor survival outcome than those with low-risk lncRNA signature.
Moreover, the six-lncRNA signature demonstrated robust and stable prognostic ability for survival prediction in both the validation cohort and entire TCGA cohort. Further analysis of univariable and multivariate Cox regression models showed that the six-lncRNA signature was an independent prognostic factor for patients with cutaneous melanoma. The six lncRNAs identified in this study provide novel insights into the molecular heterogeneity of cutaneous melanoma and also have potentially important implications for prognosis and therapy for cutaneous melanoma.
Although our results highlight the potential of lncRNA expression profiling to improve clinical prognosis in patients with cutaneous melanoma, some limitations should be recognized. Firstly, the six lncRNAs was discovered and validated in the patients from a single source (TCGA project) and their prognostic values should be tested in another independent patient cohort. Secondly, the biological function of these six lncRNAs has not been reported up until now and therefore needed to be studied using biological experiments in the future.

Clinical characteristics of patients with cutaneous melanoma
Clinical characteristics of patients with cutaneous melanoma were downloaded from The Cancer Genome Atlas (TCGA) data portal (https://cancergenome.nih.gov/). After removing patients without clinical information and lncRNA expression profiles, 225 patients with cutaneous melanoma were used in this study. Patients with cutaneous melanoma were randomly divided into a training cohort (n=113) and a validation cohort (n=112). The detailed clinical characteristics of patients in the training cohort and validation cohort were summarized in Table 3.

Genome-wide lncRNA expression profiles of patients with cutaneous melanoma
Genome-wide lncRNA expression profiles of patients with cutaneous melanoma were obtained from the TANRIC database (http://bioinformatics.mdanderson.org/) [40]. Briefly, the genomic coordinates of 13,870 human lncRNAs from the GENCODE Resource (version 19) were obtained and were further filtered by removing those lncRNAs whose exons overlapped with any known coding genes based on the gene annotations of GENCODE. Expression levels of the remaining lncRNAs were measured as reads per kilobase per million mapped reads (RPKM) [40]. Then we removed lncRNA with RPKM expression values of 0 in >10% tumor samples. Finally, 3100 lncRNAs were retained for further study.

Definition of prognostic lncRNA signature
Univariable Cox regression analysis was used to identify candidate lncRNAs that are significantly associated with survival. Then these candidate lncRNAs were subjected to the multivariate Cox regression analysis to access their interactive effect and identify independent prognostic lncRNAs to predict survival of patients. To develop a lncRNA signature, these independent prognostic lncRNAs were fitted in a multivariate Cox regression model in the training cohort to obtain estimated regression coefficients as weights to represent their relative power in predicting survival. Then a prognostic lncRNA signature was constructed by including expression values of each prognostic lncRNAs, weighted by their estimated regression coefficients in the multivariate Cox regression analysis as follows: Where n is the number of lncRNAs in this signature, exp i is the expression value of lncRNA i and Coef is the estimated regression coefficients of lncRNA i in the multivariate Cox regression analysis. The median risk score in the training cohort was used to as risk cutoff value to classify patients into the high-risk group and low-risk group.

Survival analysis
Kaplan-Meier survival curves and log-rank tests were used to assess the differences in survival time between the high-risk and low-risk patients using the R package "survival". Multivariate analyses were performed using Cox proportional hazards regression model to determine whether the lncRNA signature was independent of other clinical variables. Hazard ratios (HR) and 95% confidence intervals (CI) were calculated. The prognostic accuracy of lncRNA signature was also tested using time-dependent receiver operating characteristic (ROC) analysis using the R package"survival-ROC" [41]. We used the area under the curve at three and five years to measure prognostic accuracy. All analyses were performed using the R/Bio-Conductor (version 3.0.2).