Comprehensive analysis of a novel four-lncRNA signature as a prognostic biomarker for human gastric cancer

Emerging evidence indicates that long non-coding RNAs (lncRNAs) play a crucial role in predicting survival for gastric cancer (GC) patients. This study aims to identify a lncRNA-related signature for evaluating the overall survival of 379 GC patients from The Cancer Genome Atlas (TCGA) database. The associations between survival outcome and the expression of lncRNAs were evaluated by the univariate and multivariate Cox proportional hazards regression analyses. Four lncRNAs (LINC01018, LOC553137, MIR4435-2HG, and TTTY14) were identified as significantly correlated with overall survival. These four lncRNAs were gathered as a single prognostic signature. There was a significant positive correlation between GC patients with low-risk scores and overall survival (P = 0.001). Further analysis suggested that the prognostic value of this four-lncRNA signature was independent in clinical features. Gene set enrichment analysis found that these four lncRNAs were correlated with several molecular pathways of the tumor. Our study indicates that this novel lncRNA expression signature may be a useful biomarker of the prognosis for GC patients, based on bioinformatics analysis.


INTRODUCTION
Gastric cancer (GC) belongs to one of the most frequently diagnosed cancer in the world with both high mortality and incidence.According to the Global Cancer Statistics 2012, more than 7.2 million GC-related deaths and about 9.5 million new diagnosed cases occurred worldwide [1].Moreover, GC ranked the second in both the most common incident cancer and the leading cause of cancer death in China, 2015 [2].The poor prognosis of GC patients is a significant reflection of the fact that most GC cases are diagnosed at advanced stages [3].The detection of GC in an early stage, effective prediction of outcomes before treatment, and development of novel therapeutic targets are effective strategies to improve the prognosis of GC.Therefore, the identification of new biomarkers related to prognosis is essential for improving outcomes in GC patients.
Long non-coding RNAs (lncRNAs), greater than 200 nucleotides that have no protein-coding potential.LncRNAs have been widely identified in various diseases, including cancers.According to the recent evidence, lncRNAs can regulate different processes of gene expression by sequestering and binding them [4].LncRNAs play critical roles in a variety of mechanisms, including cell development and differentiation [5], cell growth arrest and apoptosis [6], and X chromosome inactivation [7].
A series of lncRNAs have been discovered and confirmed as tumor suppressors or oncogenes.For example, MEG3 played as a tumor suppressor through the activation of p53 [8], and H19 performed as an oncogene in GC and colon cancer [9,10].Due to the contributions in the development and progression of cancer, lncRNAs were regarded as possible biomarkers for early diagnosis and prognosis.Till now, lncRNAs acted as biomarkers The object of this study aims to identify a novel lncRNA signature for GC prognosis through the data mining in The Cancer Genome Atlas (TCGA) (http:// cancergenome.nih.gov).By performing a comprehensive lncRNA expression profile analysis, we identified a lncRNA signature in GC with four lncRNAs (LINC01018, LOC553137, MIR4435-2HG, and TTTY14), as a new candidate indicator with the potential to predict the OS in GC patients.

Patient characteristics
There were 379 GC patients and 35 normal controls included in the present study obtained from TCGA database.After the initial screening TNM stage, the GC patients were divided into four groups: stage I-, stage II-, stage III-and

Variables
Patient N=379       stage IV-group.The clinical features were summarized in

Identification of differentially expressed lncRNAs
1081 lncRNAs were identified from initially performed differential expression analysis from the TCGA database in GC.Fold change >2 and P value <0.05 were set up to be origins to identify significantly differentially expressed lncRNAs.Then we obtained 226 differentially expressed lncRNAs between stages I GC and adjacent normal gastric tissue, 173 differentially expressed lncRNAs between stages II GC and adjacent normal gastric tissue, 198 differentially expressed lncRNAs between stages III GC and adjacent normal gastric tissue, and 206 differentially expressed lncRNAs between stages IV GC and adjacent normal gastric tissue (fold change > 2, P value < 0.05).When we combined these four groups of differentially expressed lncRNAs together, 131 differentially expressed lncRNAs showed consistently differential expression (Figure 1 and Figure 2).

Identification of lncRNA significantly associated with OS and prognostic signature construction
By subjecting differentially expressed lncRNAs expression data in 379 patients from TCGA database to the univariate Cox regression model, a total of 23 lncRNAs were identified as candidate biomarkers significantly associated with OS (P-value < 0.05) (Table 2).Multivariate Cox regression analysis was performed to take into account for the interrelated relationship among 23 lncRNAs and identified four lncRNAs (LINC01018, LOC553137, MIR4435-2HG, and TTTY14) as independent biomarkers for OS in GC patients (P < 0.05) (Table 3 and Figure 3).
We performed univariate Cox regression analysis to identify the four lncRNAs within each subclass of clinical features as follow: TNM stage, T stage, M stage, and N stage.Table 4 presented the HR for the association of these four lncRNAs with OS in each category.
Based on the risk score model mentioned above, GC patients were classified as low-or high-risk patients using the median risk score as the cutoff value, which divided into the low-risk group (n = 190) and high-risk group (n = 189) (Figure 4).The risk score could largely predict the 5-year survival of GC patients, as the area under ROC curve (AUC) was 0.627 (Figure 5A).Meanwhile,   5B).

The prognostic value of four-lncRNA signature is independent of other clinical features
Furthermore, to examine whether the prognostic value of the four-lncRNA signature is independent of other clinical features, the univariate and multivariate Cox proportional hazard regression analyses were performed to analyze with risk score and other clinical features, such as including race, age, gender, Tumor stage and T stage, as covariates in TCGA datasets.
The univariate Cox proportional hazards regression showed that some features could predict poorer survival of GC, including age, Tumor stage, T stage, N stage, M stage, Primary therapy outcome, Radiotherapy, Residual tumor (Table 5).However, when analyzed by multivariate Cox proportional hazards regression test, only Residual tumor (P = 0.047) together with the risk score (P = 0.004), was an independent prognostic indicator of GC (Table 5).The K-M curves of the above clinical features are shown that Tumor stage (P < 0.001), T stage (P = 0.005), N stage (P = 0.008), M stage (P = 0.004), Residual tumor (P < 0.001), and Radiotherapy (P = 0.001) were associated with OS (Figure 6).
We assessed the relationship between the risk score based on the differentially expressed lncRNAs signature and various clinical features, and the risk score showed prognostic value for predicting the status (Figure 7).The expression pattern of these four differentially expressed lncRNAs in the GC and adjacent normal tissues, low-and high-score groups were shown in Figure 8.

Functional assessment of the four lncRNAs
There were 434 genes identified in TCGA database co-expressed with these four lncRNAs (LINC01018, LOC553137, MIR4435-2HG, and TTTY14) (|R| > 0.5) (Supplementary Table 1).It revealed enrichment of 240 GO Terms and 47 Pathways (P-value of <0.05 and an enrichment score of >1.5; Supplementary Table 2).It was found that the top GO biological process of co-expressed genes was synaptic transmission (GO: 0007268) and transmembrane transport (GO: 0055085) (Figure 9A).After the pathway analysis, the coexpressed genes were mainly enriched in Neuroactive ligand-receptor interaction and Glutamatergic synapse (Figure 9B).For the construction of the protein-protein interaction (PPI) network, there were 106 genes in the PPI network, which were regarded as hub genes (Figure 10).

DISCUSSION
Gastric cancer (GC) is one of the deadliest solid tumors with the high global morbidity and mortality [11].Although over several decades GC shows a slight decline in morbidity and mortality [12], it remains a significant clinical challenge owing to limited detection methods and poor prognosis [13].The specific biomarkers for its early diagnosis, therapeutic process monitoring, and prognostic evaluation might increase survival rate.Accumulating evidence suggested that lncRNAs may play a major role in tumorigenesis, development, metastasis, the prognosis of GC [10,[14][15][16][17].The recent large-scale genome analysis has revealed the molecular characteristics associated with GC OS [18].However, most studies focused on miRNA, miRNAs, gene and protein expression [19][20][21][22][23][24].With knowledge growing, the functional role of lncRNAs in tumorigenesis and development also represents a significant untapped resource for cancer prognosis.
In the present study, to identify lncRNAs significantly related to GC OS, we divided into groups based on GC patients TNM stage with information from the TCGA database.Firstly, 131 differentially expressed lncRNAs were subjected to univariate Cox proportional hazards regression, with a significance level at 0.05.A total of 23 OS-related lncRNAs were identified.
Meanwhile, multivariate Cox hazards regression analysis showed that LINC01018, LOC553137, MIR4435-2HG, and TTTY14 all had a significant prognostic value for GC survival.Then, we set a risk score by combining these four lncRNAs and found that this four-lncRNA signature could independently predict OS in GC patients.The advantage of this study is a combination of clinical features and TCGA data to assess the survival of GC patients by setting a lncRNArelated risk score.The relationship between differentially expressed lncRNAs and the survival of GC has been studied in small samples via different approaches.Li et al. [16] analyzed the prognostic value of one lncRNA via qRT-PCR array in 84 GC patients and found that higher level of BANCR could predict a poor prognosis for GC patients.Similarly, Fu et al. [25] studied lncRNA-NEAT1 in 140 freshly frozen GC samples and 20 paired adjacent normal gastric tissue samples via qRT-PCR.In addition, Fan et al. [26] has done data mining in GEO database and achieved four studies: GSE63089, GSE50710, GSE38749, and GSE27342, from which they found that AK001094, AK024171, AK093735, NR003573, and BC003519, these five lncRNAs could be considered as an independent risk factor for GC patients.
Although TCGA database has been used to analyze the lncRNA-related signature for GC prognosis [27], compared with previous studies, the advantage of this study was the combination of clinical features and TCGA data and assessed the survival of GC patients by constructing a risk score that associated with lncRNAs.Based on this, the four novel lncRNAs (LINC01018, LOC553137, MIR4435-2HG, and TTTY14) have the reason to be a new risk factor.Besides, the risk score constructed from these four lncRNAs could be served as a prognostic indicator for GC patients.
However, there is no study as of yet investigated the function of those above four lncRNAs.Here, we identified the genes that strongly correlated with the four lncRNAs expression (Pearson |R| > 0.5) in TCGA database.434 genes were identified co-expressed with the four lncRNAs.The relevant genes were mainly enriched in synaptic transmission, transmembrane transport, Neuroactive ligand-receptor interaction and Glutamatergic synapse.After the PPI network construction, 106 co-expressed genes revealed as hub genes in the regulation of the four lncRNAs in GC.
The findings of this study may have substantial clinical significance; however, some limitations should be taken into consideration.First, we identified the target lncRNAs by using tumor stage of GC, but tumor metastasis was not included.Second, the data extracted from TCGA were based on the RNA-Seq technique; other experimental methods are required to verify the results.Third, the role of LINC01018, LOC553137, MIR4435-2HG, and TTTY14 in GC are still unknown; in vivo and in vitro experiments are expected to answer this question.
In conclusion, by analyzing the GC lncRNA expression profiles in a large-scale database from TCGA, we identified a four-lncRNA signature, which could act as an indicator for GC patient outcome and could be a potential independent biomarker for prognosis prediction of GC.Future functional investigations are required to explore the mechanisms underlying the roles of these lncRNAs in GC.

TCGA database
The GC data (Level 3 RNA sequencing) of 443 individuals with clinical information were extracted from TCGA database on April 10, 2017, including data from 408 GC tissues and 35 adjacent normal gastric tissues.The exclusion criteria were listed as follows: (i) histologic diagnosis ruled out GC; (ii) another malignancy besides GC.Then, 379 GC patients were included in this study.As the data was downloaded from the public database, ethical approval was not applicable in this case.Data processing procedures met the policies of TCGA data access and human subject protection (http://cancergenome.nih.gov/publications/publicationguidelines).Of these 379 GC patients, there were 54 GC patients with tumor stage I, 120 GC patients with tumor stage II, 166 GC patients with tumor stage III and 39 GC patients with tumor stage IV.

Identification of dysregulated lncRNAs in GC
Here, only lncRNAs with a description from NCBI or Ensemble were selected for further study.Finally, we obtained the expression profiles of 1801 lncRNAs.The raw data of lncRNA sequencing were post-processed and normalized by TCGA RNASeqv2 system.No further normalizations were applied in the expression profile data in level 3, due to TCGA already normalized these data.To detect the differential expression of lncRNAs, samples were divided into GC tumor tissues vs. adjacent nontumor gastric tissues, tumor stage I, stage II, stage III, and stage IV.For further analysis, the intersection of lncRNA was selected.The flow chart for bioinformatics analysis was presented in Figure 11.

Construction of the prognostic signature
The GC-specific lncRNAs were selected, and the expression level of each lncRNA was log2 transformed for further analysis.The univariate Cox proportional hazards regression model was used to analyze the GCspecific lncRNAs associated with OS.The multivariate Cox regression model was further performed to evaluate the prognostic value of these OS-related lncRNAs.The semi-supervised method that combines the gene expression profile with clinical information was used to conduct univariate Cox regression analyses [28,29].In each subgroup stratified by tumor TNM system, the OSrelated lncRNAs were identified by the multivariate Cox regression model.
The prognostic risk score for predicting OS was calculated: Risk score = exp lncRNA1 *β lncRNA1 + exp lncRNA2 *β lncRNA2 + …exp lncRNAn *β lncRNAn (exp: expression level; β: the regression coefficient derived from the multivariate Cox regression model) [30].The median risk score was used as the cutoff point, and GC patients were divided into high-and low-groups [31].Further univariate and multivariate Cox proportional hazards regression analyses were conducted to investigate the effects of various clinical features and the risk score of OS for GC patients.The hazard ratio (HR) and 95% confidence interval (CI) were assessed.The defining point set up by 5-year time-dependent receiver operating characteristic (ROC) curve analysis, was used to evaluate the predictive value of the risk score for time-dependent outcomes [32].Via IBM SPSS Statistics 21 (SPSS Inc., Chicago, IL, USA), Kaplan-Meier survival curves and the log-rank test were used to assess the equality of survival distributions in different groups.The ROC was used to assess GC-specific lncRNAs for the sensitivity and specificity of GC detection.

Integrative prediction analysis of lncRNA function
The four lncRNAs expression was heterogeneous across different grade GC.To investigate the biological feature of GC with different four lncRNAs expression, we asked the genes that strongly correlated with these four lncRNAs expression (Pearson |R| > 0.5) in TCGA database [33].The Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichment analyses of co-expressed mRNAs of these lncRNAs were performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/).The enriched results were restricted to KEGG pathway and GO biological process.The adjusted P-value < 0.05 was considered to be significant.Then, the co-expressed genes were performed to construct the protein-protein interaction (PPI) network via STRING (Version 10.5) (https://string-db.org/).preparation: Yan Miao; Final approval of manuscript: Yan Miao, Jing Sui, Si-Yi Xu, Ge-Yu Liang, Yue-Pu Pu, Li-Hong Yin.

Figure 1 :
Figure 1: Venn diagram analysis of differentially expressed lncRNAs in gastric cancer.Each oval represents a group.The brown intersection in the middle represents RNAs, which are consistently and significantly differentially expressed in four groups.

Figure 2 :
Figure 2: The differential expression of intersected lncRNAs in gastric cancer.A heatmap is showing the differentially expressed RNAs.

Figure 3 :
Figure 3: Four differentially expressed lncRNAs (LINC01018, LOC553137, MIR4435-2HG, and TTTY14).(A) Kaplan-Meier curves showing the relationship between the four lncRNAs and overall survival.The cases were divided into under-and overexpression groups by the mean lncRNAs level; (B) ROC curves of the four lncRNAs to distinguish gastric cancer tissue from adjacent normal tissues.

Figure 4 :
Figure 4: Risk score analysis of the differentially expressed lncRNA signature of gastric cancer.Survival status and duration of cases (Top); risk score of lncRNA signature (Middle); low and high score groups for the four lncRNAs (Bottom).

Figure 5 :
Figure 5: The four differentially expressed lncRNA signature of gastric cancer for the outcome.(A) The risk score is shown by the time-dependent ROC curve for predicting 5-year survival.(B) The Kaplan-Meier test of the risk score for the overall survival.

Figure 6 :
Figure 6: The prognostic value of different clinical features for overall survival of gastric cancer patients.Kaplan-Meier curves of seven independent prognostic indicators.SD, stable disease; PD, progressive disease; CR, complete remission; PR, partial remission.

Figure 7 :
Figure 7: The predictive value of the risk score for clinical features.ROC curve is predicting different clinical features.

Figure 8 :
Figure 8: The expression level of the four lncRNAs (LINC01018, LOC553137, MIR4435-2HG, and TTTY14).(A) The expression level of lncRNAs between gastric cancer tissues and adjacent normal tissues; (B) The expression level of lncRNAs between the low-risk and high-risk groups.*P<0.05.

Figure 10 :
Figure 10: The map represents the protein-protein interaction network of co-expressed genes.

Figure 9 :
Figure 9: Top 20 enrichment of KEGG pathways and GO terms for co-expressed mRNAs.

Figure 11 :
Figure 11: Flow chart of bioinformatics analysis.

Table 1 : The predictive values of related clinical features and risk score
for diagnosis in GC have been reported in many studies.However, limited research reported the use of lncRNAs, especially lncRNA signature as biomarkers for Overall Survival (OS) in GC.