RNA-sequencing investigation identifies an effective risk score generated by three novel lncRNAs for the survival of papillary thyroid cancer patients

Scholars are striving to apply molecular biology involving long non-coding RNA (lncRNA) in the prognostication of papillary thyroid cancer (PTC). However, the clinical role of lncRNAs in the prognostic setting of PTC is still unclear. Herein, a comprehensive inquiry was performed to screen lncRNA expression profiling with 507 PTC patients from The Cancer Genome Atlas RNA-sequencing datasets. A total of 734 lncRNAs were detected to be aberrantly expressed, among which three novel lncRNAs including AC079630.2, CRNDE and CTD-2171N6.1 were markedly related to the progression and survival of PTC. Furthermore, the aberrant expression of these lncRNAs could be verified by other cohorts from gene expression omnibus (GEO) as detected by microarrays. Next, we established a three-lncRNA signature and divided the PTC patients into two subgroups of high- and low-risk. Interestingly, patients with high-risk tended to gain obviously poorer outcome. Most importantly, this three-lncRNA signature was an independent biomarker to predict the patient survival of PTC. The accurate molecular roles of these three lncRNAs remains unclarified and warrants further characterization, but our current data propose that they might play pivotal roles in PTC tumorigenesis and more importantly, these novel lncRNAs are closely related to patients’ survival. These discoveries will have far-reaching consequences with respect to molecular prediction of PTC.


INTRODUCTION
Thyroid cancer is the most widespread class of endocrine malignancies. The occurrence of thyroid cancer has been gradually growing over the last 30 years [1,2]. It was estimated that 64,300 new cases of thyroid cancer occurred in the United States in 2016, and 1,980 of them deceased (https://www.cancer.org/). Women are more prone to be affected by thyroid cancer than men. People between the ages of 25 and 65 are the high-risk group to suffer thyroid cancer [3,4]. There are four subtypes of thyroid cancer according to the histopathological examination, including papillary, follicular, medullary and anaplastic thyroid cancer. Generally, thyroid cancer is divided into two classes: well-differentiated (papillary or follicular) and poorly-differentiated (medullary or anaplastic), which also lead to various clinical outcomes. For most of thyroid cancer, the therapeutic options including operation, radioactive iodine ablation, and thyroid stimulating hormone (TSH) suppressive therapy provide an overall survival (OS) rate of 97.7% at 5 years. Nonetheless, locoregional recurrence takes place in up to 20% of thyroid cancer patients. More seriously, distant metastases will occur in almost 10% of thyroid cancer patients in 10 years [5][6][7].
The molecular biology has been greatly improved in the prognostication of papillary thyroid cancer (PTC). Several molecular biomarkers have been commended for the risk stratification for PTC, along with traditional clinicopathological parameters. The typical examples are BRAF V600E and telomerase reverse transcriptase (TERT) [8][9][10][11][12][13][14]. Nonetheless, the clinical characteristics of molecular characterization in the prognostic setting of PTC still remain unclarified. A deeper understanding of the drivers and molecular mechanisms behind PTC tumorigenesis, as well as documentation of candidate prognostic markers, is essential to put forward better diagnostic and therapeutic strategies for PTC [15][16][17].
Long noncoding RNAs (lncRNAs) have been proved to play essential roles in cancer biology, including PTC. The detection of transcriptome profile of PTC is an ideal option to screen and discover new lncRNA candidates [18]. By far, several groups performed genomewide investigation into lncRNAs expression profile in PTC using microarray [19,20] or RNA sequencing [21][22][23]. However, these profilings were all based on a small number of cases and the results were varying with each other. The cancer genome atlas (TCGA) has made great efforts to unveil the molecular events of various cancers, including the role of lncRNAs. Ma et al. [24] mined the data from TCGA dataset and they reported 220 lncRNAs with altered expression from the annotated 2773 lncRNAs which were approved by the HUGO gene nomenclature committee via cBioPortal. They further reported that FAM41C, CTBP1-AS2, LINC00271, HAR1A, LINC00310 and HAS2-AS1 were related to recurrence of PTC patients. More importantly, only one lncRNA, LINC00271 was identified to be an independent risk factor for the progress and recurrence of PTC in multivariate analyses. In the current study, the genomewide lncRNA expression data from the high-throughput RNA-sequencing dataset was also achieved from TCGA; however, 7589 lncRNAs were annotated by Ensembl genome browser (www.ensembl.org/). More interestingly, we used DEseq R package to screen the differentially expressed lncRNAs and the results differed from Ma et al. [24]. We further selected those significant prognostic lncRNAs to form a signature. The three-lncRNA signature can be an effective independent indicator to predict the progression and survival of PTC.

Selection of differentially expressed lncRNAs
Initially, detailed information of 507 PTC patients and 59 normal thyroid tissues was obtained from TCGA dataset. A sum of 734 lncRNAs were selected as differentially expressed lncRNA in PTC compared to normal thyroid tissues, of which 309 lncRNAs were down-regulated and 425 lncRNAs were up-regulated. The overview of aberrant expression lncRNAs was shown in Figure 1A, 1B.

Specific-lncRNAs correlated with clinical parameters and overall survival (OS)
With respect to the correlation between lncRNAs and OS, 33 lncRNAs were assessed to be significantly associated with OS of PTC patients by univariate analysis (Table 1). In the light of multivariate analysis, a total of 3 lncRNAs (AC079630.2, CRNDE and CTD-2171N6.1) were remarkably correlated with OS and these lncRNAs were subjected to further analysis (Table 2). We also investigated the association between these lncRNAs and disease-free survival (DFS), but limited predicted significance was observed ( Figure 2). Fifty-three patients were excluded for the reason that they lacked specific lncRNAs (AC079630.2, CRNDE or CTD-2171N6.1) expression data. Eventually, 449 patients were included for further analysis in prognosis model. The available data of clinicopathological parameters of the involved patients were summarized in Table 3. Spearman rank correlation was performed to determine the association between these three lncRNAs and clinicopathological variables ( Table 4). The results indicated that AC079630.2 was prominently related to tumor size (P = 0.005), CRNDE was remarkably related to gender (P = 0.016) and CTD-2171N6.1 was significantly related to tumor subtypes (P = 0.001). Additionally, the three lncRNAs-based risk score was observed to be notably related to age (P = 0.004) and tumor subtypes (P = 0.013). www.impactjournals.com/oncotarget

Specific lncRNAs featured with diagnostic significance
The expression of these three lncRNAs was greatly higher in PTC tissues than that in normal thyroid tissues based on the data from TCGA. Receiver operating characteristic (ROC) curve was conducted to verify the diagnostic value of the three novel lncRNAs in PTC patients. The results of ROC analysis indicated that AC079630.2 exhibited high diagnostic ability to distinguish normal tissue and PTC tissue. The areas under curves (AUC) of these three lncRNAs (AC079630.2, CRNDE and CTD-2171N6.1) were 0.940 (95% CI: 0.917-0.958), 0.677 (95% CI: 0.636-0.715), 0.664 (95% CI: 0.623-0.703), respectively ( Figure 3). Moreover, a total of 10 microarrays from GEO were achieved in our study to heighten our finding from TCGA, of which two microarrays contained the data of AC079630.2, 10 for CRNDE, and two for CTD-2171N6.1. The expression level of these three lncRNAs was shown in Table 5. The expression level of AC079630.2 in PTC was evidently higher than that in normal thyroid tissues as shown in both GSE83520 and GSE64912. The AUCs were 1.000 (95% CI: 0.858-1.000) and 0.986 (95% CI: 0.821-1.000), which demonstrated favorable diagnostic value of AC079630.2 in PTC ( Figure 4). Among the 10 microarray data containing  Figure 6). But no significant distinction of CTD-2171N6.1 expression between PTC and normal thyroid was observed in GEO datasets ( Figure 5K, 5L). In addition, we conducted meta-analyses of AC079630.2 and CRNDE to explore their diagnostic significances. The results indicated that the pooled AUCs were 0.975 and 0.809 for these two lncRNAs respectively (Figure 7).

Prognostic significance of clinicopathological parameters and the three-lncRNA signature in PTC
For each patient, a specific risk score was calculated with 3 recruited lncRNAs (AC079630.2, CRNDE and  Table 2. Then, all patients were divided into low risk group (N = 364) and high risk group (N = 85) by optimal cut off value. The expression value of the 3 lncRNAs was performed in the heatmap ( Figure 8A). The risky lncRNAs (CRNDE, CTD-2171N6.1) feature high expression in high risk group, while the protective lncRNA (AC079630.2) expresses decreased level in high risk group. The Kaplan-Meier survival curve indicated that patients with high risk score were significantly associated with poor prognosis in PTC ( Figure 8B). Considering the correlation of risk score with clinicopathological features, the results of Spearman rank correlation revealed that 3 lncRNAs-based risk score was notably related to age (P = 0.004) and tumor subtypes (P = 0.013), which indicating that elder patients had a higher risk score. The univariate and multivariate Cox proportional hazards regression tests were carried out to explore the prognostic value of clinical factors and risk score. For univariate analysis, we found that age, history of other tumor, pathological stage, M stage and risk score were remarkably associated with worse survival status (P < 0.05). After adjusting concomitant variables, the results of multivariate analysis indicated that age (HR = 1.148, 95% CI: 1.070-1.232, P < 0.001) and risk score were independent prognostic indicators in PTC (HR = 5.650, 95% CI: 1.188-26.875, P = 0.030) ( Table 6). In addition, Kaplan-Meier survival curve and log-rank method were further used for the subgroup of clinical features which were significantly associated with OS in univariate analysis. For younger patients, survival analysis was not conducted as all patients were censored. The OS of patients with high risk score were remarkably lower than those with low risk score in elder patients ( Figure 9A, P < 0.001). When patients were divided into two groups of with and without the history of other tumors, the risk score model exhibited significant prognostic power in both two subgroups ( Figure 9B, P = 0.007; Figure 9C, P = 0.001). Similarly, survival analysis was performed for subgroups of early stage (I-II) and advanced stage (III-IV). The results indicated that the OS time of high risk score group were significant shorter than that of low risk score in patients with advanced stage ( Figure 9E, P < 0.001). However, no clear difference was observed in patients with early stage ( Figure 9D, P = 0.326). Furthermore, significant difference of OS time was detected between    high risk score group and low risk score group in patients with M0 stage ( Figure 9F, P < 0.001) but not M1 stage ( Figure 9G, P = 0.156). In subtypes of tall cell PTC, survival analysis failed to be performed because of lacking ending events. Interestingly, the OS time of classical/usual PTC patients with low risk score was significantly longer than those with high risk score ( Figure 9H, P < 0.001) and there was no statistical significance among follicular PTC patients ( Figure 9I, P = 0.0543). Unfortunately, no survival data of AC079630.2, CRNDE or CTD-2171N6.1 in PTC could be obtained from GEO datasets.

Functional and protein-protein interaction (PPI) analysis of correlative genes of these lncRNAs
A total of 4086 genes were dysregulated in PTC tissues, of which 409 were identified as lncRNAcorrelative genes. All the statistically significant results of functional and pathway enriched analysis were shown in Table 7. As for Gene Ontology (GO) terms, the top 5 significantly enriched terms were listed in Table 7. The results of Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis indicated that the related genes of

DISCUSSION
In this study, the genome-wide high-throughput RNA-sequencing data of lncRNA expression level of PTC were retrieved from TCGA dataset. Survival analysis revealed that three lncRNAs (AC079630.2, CRNDE and CTD-2171N6.1) were pronouncedly related to the patient survival. More importantly, a risk score calculated by the expression level of these three lncRNAs was an independent prognostic factor of PTC.
The clinical role of a certain lncRNAs has been well documented. For example, remarkably higher expression of HOX transcript antisense RNA (HOTAIR) was detected in PTC tissues versus normal tissues [28].The expression of antisense non-coding RNA in the INK4 locus (ANRIL) was also distinctly up-regulated in thyroid cancer tissues and cells, and closely related to histological grade and status of lymph node metastasis [29]. Similarly, the BRAF-activated non-coding RNA (BANCR) levels were prominently lower in PTC tumor tissues than in control tissues. Decreased BANCR levels were related to tumor size, the presence of multifocal lesions and advanced stage [30]. Furthermore, another lncRNA, PTCSC3, was strongly downregulated in PTC tissues [31]. Not many lncRNAs have been verified to be effective indicators for the progression or prognosis prediction. A functional lncRNA, SLC6A9-5:2, was found to be down-regulated in PTC and it was involved in the radioactive therapy resistance of PTC [32]. However, only clinical role of one single lncRNA was studied in aforementioned studies.
Profiling of lncRNAs based on the techniques of microarray or RNA-sequencing can greatly facilitate the discovery of novel lncRNA candidates in PTC. A genome-wide analysis of lncRNA expression profile with microarray in PTC with 62 cases revealed that  [20]. In addition to microarray, next-generation sequencing (NGS) in thyroid cancer allows concurrent high-throughput sequencing exploration of variable genetic changes and thus can offer a wide-ranging cognition of thyroid cancer biology [33]. RNA sequencing was performed to identify the changes to the transcriptome profile in 12 thyroid tissues compared to paired normal adjacent tissues. A total of 188 lncRNAs were detected to be abnormally expressed in 50% or more of the thyroid tissues as assessed by RNA-sequencing compared with paired normal adjacent tissues. Forty-seven lncRNAs were verified in 31 pairs of PTC specimens using RT-qPCR. The lncRNAs NONHSAT076747 and NONHSAT122730 were found to be related to lymph node metastasis, and NONHSAG051968 expression was adversely related to tumor size [21]. Whole exome sequencing was also performed by another group with 91 pairs of PTC tissues. They further performed Sanger sequencing with 311 pairs of samples. Interestingly, they found that lncRNA GAS8-AS1 is the second most commonly differentiated gene and functions as a new tumor suppressive lncRNA in thyroid [22]. However, most of the studies above were performed based on small number of patients. Due to different technique platforms, distinct setting for cut-off value, various patient sources, the profiling of differentially expressed lncRNAs was inconsistent. Furthermore, the prognostic role of these lncRNAs has not been explored.
In the light of the established molecular aberrations, the TCGA dataset has accomplished a comprehensive genetic and epigenetic investigation into a large cohort of thyroid cases by the most innovative approaches of next-generation sequencing [34][35][36]. To the best of our knowledge, there has been only one paper so far, which has mined the high-throughput data of lncRNAs of PTC and was published when we were preparing for our paper. Ma et al. [24] reported that 220 lncRNAs were altered expressed from the annotated 2773 lncRNAs which were approved by the HUGO gene nomenclature committee in TCGA dataset. Among these lncRNAs, FAM41C,    CTBP1-AS2, LINC00271, HAR1A, LINC00310 and HAS2-AS1 were found to be related to patient recurrence. After the classical clinicopathological factors and BRAF V600E mutation were adjusted, LINC00271 was proved to be an independent risk factor for a series of clinical characteristics, including extrathyroidal extension, lymph node metastasis, advanced tumor stage III/IV and recurrence in multivariate analyses. The results of our study were different with that published by Ma et al. The reasons might be that Ma et al. [24] carried out the expression alteration of the lncRNAs using The cBioPortal for Cancer Genomics, while in the current study, DEseq R package was used, which is widely applied for the analysis of RNA-seq data. Furthermore, we annotated 7589 lncRNAs, more than Ma et al. did, using Ensembl genome browser (www.ensembl.org/), which supports the result of our current study. Collectively, based on 507 thyroid patients and 59 normal thyroid tissues, we identified 734 differentially expressed lncRNAs, among which 33 lncRNAs were evaluated to be significantly associated with OS of thyroid patients by univariate analysis and eventually, the prognostic values of three lncRNAs (AC079630.2, CRNDE and CTD-2171N6.1) were confirmed by multivariate analysis. Generally, DFS is more appropriate to assess the outcome of PTC patients. Actually, we also investigated the association of these lncRNAs with DFS, but the results indicated that only CRNDE was significantly associated with DFS. A specific risk score was calculated with these three lncRNAs and this risk score and age were identified as independent prognostic indicators for PTC. In addition, the risk score was positively correlated with age, which indicated that the elder PTC patients had higher risk score. Furthermore, the aberrant expression of AC079630.2 and CRNDE could be confirmed with other PTC samples based on different detecting methods, namely, microarrays, which strengthens our current findings. We also attempted to validate the prognostic value of the three-lncRNA signature in PTC based on GEO datasets. Regrettably, no survival data of AC079630.2, CRNDE or CTD-2171N6.1 in PTC was available in GEO datasets. The genetic alteration of lncRNAs was observed in various tumors and its alteration can assist to predict patients' survival [37,38]. Nevertheless, no alteration was observed for CRNDE and no relevant information about another two lncRNAs could be achieved in cBioPortal for Cancer Genomics (http://www.cbioportal.org/). Other clinical experiments need to be carried out to verify the prognostic role of the novel three-lncRNA signature in PTC.
Concerning the biological function and clinical role of these lncRNAs, only CRNDE has been studied in cancers so far. Several studies indicated that CRNDE expression significantly increased in gliomas and was related to the tumor progression, recurrence and survival [39,40]. CRNDE can promote the malignant progress of glioma by lessening miR-384/PIWIL4/STAT3 Axis, miR-186 or mTOR signaling pathways [41][42][43]. CRNDE has been reported to be a new serum-based marker for diagnosis and prognosis of colorectal cancer [44], and it can stimulate cell growth and chemoresistance of colorectal cancer cells via correlating with IRX5 or miR-181a-5p-mediated regulation of Wnt/β-catenin signaling [45,46]. Moreover, CRNDE can promote tumor growth in medulloblastoma [47], hepatic carcinoma [48], gallbladder carcinoma [49], renal cell carcinoma [50] as well as in ovarian carcinoma patients [51] with various prospective molecular mechanisms. However, the clinical value and underlying mechanism of CRNDE in PTC remains largely unveiled and no possible pathway could be suggested by our signaling investigation. More exploration is acquired to understand the mechanism of CRNDE in the future. The pathway of "Transcriptional misregulation in cancer" is closely related to AC079630.2 and 5 pathways are linked to CTD-2171N6.1. Interestingly, published studies indicate that some genes can exert its oncogenic function by activating PI3K-Akt signaling pathway in PTC [52,53], which can be functioned via CTD-2171N6.1. However, relevant validated research needs to be carried out in the future.
One of the limitations of the current study is that most of the cases in the cohort were censored and there were only 14 end events, which might cause some bias to obtain accurate result. Furthermore, no data on the iodine therapy or other details of therapeutic strategies could be achieved. Finally, validation with FISH or qPCR is still needed afterwards.
To sum up, we have identified three novel lncRNAs (AC079630.2, CRNDE and CTD-2171N6.1) which are markedly related to the survival of PTC. The novelty of the current study lies in the special risk score generated by  the three lncRNAs, which is an independent prognostic indicator for PTC. The accurate molecular roles of these three lncRNAs identified warrants further characterization, but our current data propose that they are prospective to play pivotal roles in PTC tumorigenesis and more importantly, these novel lncRNAs are closely related to patients' survival. These results have far-reaching consequences with respect to molecular prediction PTC.

Patient cohort and selection of candidate lncRNAs
The RNA-sequencing and clinical data of 510 PTC patients and 59 normal thyroid tissues were provided by TCGA data center (https://portal.gdc.cancer.gov/). After choosing the maximum expression value when multiple detections were performed for the same patient, 507 PTC patients and 59 normal thyroid tissues were finally included. The type of surgery and whether these patients received radioactive iodine therapy were unclear and the follow-up data was available for 502 patients. To further analyze the association of genes level with clinical data, the expression of genes were log2 transformed and records were considered as censored when the expression level were 0 or 1 [54]. In addition, the genes were eliminated when the censored data were more than 10%. Dysregulated genes were screened by R software using the package "DEseq" and genes with significant differential expression between PTC tissues and normal thyroid tissues were selected by the standard of log2 fold change   (FC) and adjusted P value. The genes were noted as upregulated or down-regulated when logFC>1 or logFC<-1 with adjusted P < 0.05, respectively. Subsequently, the expression data of dysregulated lncRNAs and survival data were analyzed with univariate Cox proportional hazards regression model. To explore each lncRNA as an independent indicator for survival, multivariate stepwise regression analysis was utilized for lncRNAs which were significantly associated with OS (P < 0.05) in univariate analysis. Finally, the diagnostic and prognostic values of those selected lncRNAs were validated using other cohorts from Gene Expression Omnibus (GEO) databases and a meta-analysis was subsequently conducted via the calculation of summarized ROC (sROC).

Construction of lncRNA-related prognostic indicator
With the coefficient value of multivariate analysis, an lncRNA-related prognostic model was constructed by calculating a risk score for each subject. The risk score = β of lncRNA1 * level of lncRNA1+β of lncRNA2 * level of lncRNA2+…+β of lncRNAn * level of lncRNAn. Additionally, patients were divided into the groups of high risk score and low risk score with the optimal cut off value. And the risk score was regarded as a biomarker for further analysis. Moreover, multivariate Cox proportional hazards regression was performed to adjust classical clinicopathological parameters, such as pathological stage, T stage, N stage, M stage, focus types, extrathyroidal extension and BRAF V600E .

Functional and protein-protein interaction (PPI) analysis for the correlative genes of lncRNAs
Pearson correlation analysis was performed to explore the correlation between dysregulated genes and each lncRNA in prognostic model. According to the ranked coefficients, the top 1% genes were selected as lncRNAassociated genes for further analysis. Then, GO and KEGG enriched analyses, which are functional annotation analyses by bioinformatics enrichment tools using large gene lists [55], were conducted using the related genes of each lncRNA separately in DAVID website (http:// david.abcc.ncifcrf.gov/). PPI was performed within the Interacting Genes/Proteins (STRING) database (http:// string-db.org/).

Statistical analysis
HR and corresponding 95% CI were assessed by univariate/multivariate Cox proportional hazards regression model to explore the prognostic impact of clinicopathological parameters and biomarkers. Kaplan-Meier survival curve and log-rank test were conducted to investigate the association between risk score and OS in PTC patients. In addition, we performed a subgroup analysis with clinicopathological parameters which were correlated with OS and subtypes of PTC. ROC curve was implemented to estimate the optimal cut off value of risk score to predict survival status. Simultaneously, Spearman rank correlation was performed to assess the association between three lncRNAs and clinicopathological parameters. Statistical significance threshold was set at a two-side P < 0.05. The analyses above were conducted with R software and SPSS 22.0 (SPSS Inc., Chicago, IL, USA).