Integrated analysis of competing endogenous RNA network revealing lncRNAs as potential prognostic biomarkers in human lung squamous cell carcinoma

Accumulating evidence shows the important role of long non-coding RNAs (lncRNAs) in competing endogenous RNA (ceRNA) networks for predicting survival in tumor patients. However, prognostic biomarkers for lung squamous cell carcinoma (LUSC) are still lacking. The objective of this study is to identify a lncRNA signature for evaluation of overall survival (OS) in 474 LUSC patients from The Cancer Genome Atlas (TCGA) database. A total of 474 RNA sequencing profiles in LUSC patients with clinical data were obtained, providing a large sample of RNA sequencing data, and 83 LUSC-specific lncRNAs, 26 miRNAs, and 85 mRNAs were identified to construct the ceRNA network (fold change>2, P<0.05). Among these above 83 LUSC-specific lncRNAs, 22 were assessed as closely related to OS in LUSC patients using a univariate Cox proportional regression model. Meanwhile, two (FMO6P and PRR26) of the above 22 OS-related lncRNAs were identified using a multivariate Cox regression model to construct a risk score as an independent indicator of the prognostic value of the lncRNA signature in LUSC patients. LUSC patients with low-risk scores were more positively correlated with OS (P<0.001). The present study provides a deeper understanding of the lncRNA-related ceRNA network in LUSC and suggests that the two-lncRNA signature could serve as an independent biomarker for prognosis of LUSC.


INTRODUCTION
Lung cancer remains one of the most frequently diagnosed and fatal cancers globally. In 2012 nearly 1.8 million new cases were diagnosed, causing 1.6 million deaths worldwide, with a sharp rise from 2008 [1]. Nonsmall cell lung cancer, including lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD), is the most pathological type (approximate 80%) in lung cancer. Nearly 30% of NSCLC is LUSC, which causes approximately 400,000 deaths annually worldwide, with both high incidence and poor prognosis (5-year survival rate<15%) [2]. Based on tumor node metastasis (TNM) taxonomy, LUSC can be classified into stages I, II, III, and IV [3]. Recent studies show that LUSC is closely associated with smoking and is more common in men than in women [4]. It is important to distinguish between LUSC and LUAD in the management of NSCLC since their therapeutic regimens and targeted agents differ [5]. Thus, identify effective potential molecular biomarkers for distinguishing between LUSC and LUAD is urgent. In the present study, we aim to find effective potential molecular biomarkers for predicting survival in LUSC. Long non-coding RNAs (lncRNAs), ranging from 200 nucleotides to 100 kb in length, can modulate gene expression at the transcriptional, post-transcriptional, and epigenetic levels and are broadly distributed in the genome [6][7][8][9]. A growing body of evidence demonstrates that lncRNA expression profiles are different in tumors tissues compared to the adjacent non-tumor tissues in various cancers [10][11][12], including LUSC [13,14]. It has been proposed that the differentially expressed lncRNAs may correlate with progression and survival in various cancers, which have also been detected in LUSC [15][16][17][18][19].
In 2011, the ceRNA (competing endogenous RNA) hypothesis was presented as a novel regulatory mechanism between non-coding RNA and coding RNA [20]. The central concept is that RNA interacts with miRNA response elements (MREs); this kind of RNA competition crosstalk also exists between lncRNAs and mRNAs [21].
Although numerous lncRNAs have been determined to predict outcomes for lung cancer, the conclusions of previous studies are inconsistent, possibly due to small sample sizes. Recently, lncRNA expression profiles were obtained from The Cancer Genome Atlas (TCGA) database, an open-access and publicly available largescale database. In the present study, the TCGA database was first used to obtain lncRNA expression profiles and combined with clinical features to construct a lncRNA-miRNA-mRNA ceRNA network in LUSC. Through an integrated analysis of lncRNA expression patterns in the ceRNA network, we identified a lncRNA signature in LUSC with two lncRNAs (FMO6P and PRR26) as a new candidate indicator with the potential to predict overall survival (OS) in LUSC patients.

Identification of significantly differentially expressed lncRNAs
In 474 LUSC patients from TCGA database, we initially performed differential expression analysis by comparing the expression of 1801 lncRNAs in LUSC and adjacent normal lung tissue in the TCGA database. We set fold change>2 and P value>0.05 as cutoffs to identify significantly differentially expressed lncRNAs. Then we obtained 171 differentially expressed lncRNAs between stages I-II (non-lymphatic metastasis) LUSC and adjacentnormal lung tissue, 161 differentially expressed lncRNAs between stages III-IV (non-lymphatic metastasis) LUSC and adjacent-normal lung tissue, 184 differentially expressed lncRNAs between stages I-II (lymphatic metastasis) LUSC and adjacent-normal lung tissue, and 180 differentially expressed lncRNAs between stages III-IV (lymphatic metastasis) LUSC and adjacent-normal lung tissue (fold change>2, P value<0.05). When we combined these four groups of differentially expressed lncRNAs, 127 differentially expressed lncRNAs (55 up-regulated and 72 down-regulated) showed consistently differential expression (Figures 1A and 2A, Supplementary Table 1) and were thus selected to construct the ceRNA network.

miRNA predicted targets analysis and ceRNA network construction
A total of 1,030 miRNAs and 18,633 mRNAs were identified from the TCGA database, 101 miRNAs and 4,289 mRNAs of which were found to be differentially expressed between LUSC and adjacent normal lung lymphatic metastasis; nLym, non-lymphatic metastasis. Each oval represents a group. The brown intersection in the middle represents RNAs, which are consistently and significantly differentially expressed in four groups. www.impactjournals.com/oncotarget tissue (fold change>2, P<0.05). In total, 42 differentially expressed miRNAs (17 up-regulated and 25 downregulated) and 2950 differentially expressed mRNAs (1180 up-regulated and 1770 down-regulated) (shown in  Figures 1B, 1C and 2B, 2C; Supplementary Tables 2, 3) were selected to construct the ceRNA network.
The next step was to predict the mRNA and lncRNAs targeted by miRNA. We then focused on the targeting relationship between the 42 miRNAs and 2,950 mRNAs as mentioned above. miRTarBase and Targetscan were used to predict miRNA-targeted mRNAs. The result showed that 26 of 42-LUSC specific miRNAs might target to the 85 of 2,950 LUSC-specific mRNAs (Table 1). A few targets are tumor-associated genes such as CDC25A, COL11A1, FZD10, INHBE, ISL1, NAT8L, PAX7, PLAU, SIX4, and TFAP4.
Subsequently, the targeting relationships among 127 differentially expressed lncRNAs (Supplementary  Table 1) and 26 differentially expressed miRNAs (Table  1) were assessed. In the ceRNA network, lncRNA might interact with the miRNA through MREs. Using miRanda to identify the potential MREs, the result showed these 26 LUSC-specific miRNAs might target to 83 of the 127 LUSC-specific lncRNAs ( Table 2).
Based on the above data (Tables 1 and 2), we constructed the lncRNA-miRNA-mRNA ceRNA network with Cytoscape 3.0. To obtain the most reliable results, a maximal information coefficient (MIC) algorithm was   [22]. In all, 83 lncRNAs, 26 miRNAs, and 85 mRNAs were involved in the ceRNA network ( Figure 3).

Construction of the LUSC-specific lncRNAbased prognostic signature
The 83 specific lncRNAs in the ceRNA network were further analyzed according to clinical features including gender, tumor stage, TNM staging system, lymph node metastasis, and patient outcome assessment at diagnosis in TCGA database. There were 25 LUSCspecific lncRNAs, the expression levels of which were significantly differentially expressed in clinical feature comparisons (P<0.05; Table 4). Seventeen lncRNAs were differentially expressed in gender, three in tumor stage, three in TNM staging system, six in lymphatic metastasis, and four were differentially expressed in patient outcome assessment. Univariate Cox proportional hazards regression showed that 22 of the 83 differentially expressed lncRNAs in the ceRNA network were identified to have a significant prognostic value (Table 5). A multivariate Cox proportional hazards regression analysis indicated that only two lncRNAs showed a significant prognostic value for LUSC: KRTAP5-AS1 and SOX2-OT ( Figure 5). The risk score for predicting the OS was constructed with the formula: Risk score=exp FMO6P * (-0.401) + exp PRR26 * 0.399.
Meanwhile, based on the individual inflection point of the prognostic risk score, LUSC patients were divided into low-risk and high-risk groups ( Figure 6). The risk score widely predicted 5-year survival of LUSC patients, as the area under ROC curve (AUC) was 0.694 ( Figure  7A). Furthermore, K-M curves confirmed that low-risk was more positively correlated with OS (P<0.001, Figure 7B). Moreover, the prognostic value of different clinical features was also evaluated. The univariate Cox proportional hazards regression showed that some clinical features predicted poorer survival of patients with LUSC (Table 6).However, when analyzed by a multivariate Cox regression model, gender, cancer neoplasms, and risk score were the independent prognostic indictors of LUSC ( Table 6). The Kaplan-Meier curves of the above clinical features are shown in Figure 8. The results revealed that Tumor stage, T stage, M stage, residual tumor, cancer neoplasm, and primary therapy outcome were associated with OS (P=0.002, P=0.004, P=0.009, P=0.005, P=0.005, and P<0.001, respectively).
In addition, we assessed the relationship between the risk score based on the differentially expressed lncRNAs signature and various clinical features, and the risk score showed prognostic value for predicting the status of tumor stage, T stage, metastasis, and neoplasm ( Figure 9). The expression patterns of these two differentially expressed  lncRNAs in the low-score and high-score groups were given in Figure 10.

DISCUSSION
To define lncRNAs significantly related to OS, we constructed a lncRNA-miRNA-mRNA ceRNA network according to the specific criteria in a large sample of LUSC patients, based on information from the TCGA database. A univariate Cox proportional hazards regression with the significance level set at 0.05 was first performed on 83 differentially expressed lncRNAs from the ceRNA network. A total of 22 lncRNAs were identified as associated with OS. A multivariate Cox proportional hazards regression analysis indicated that FMO6P and PRR26 showed a significant prognostic value for LUSC patients'' survival. We then developed a risk score by combining the two lncRNAs and found that this two-  lncRNA signature independently predicted OS in LUSC patients, which was further validated in LUSC patients. To our knowledge, this is the first study combining a ceRNA network with TCGA data to assess the survival of LUSC patients by constructing a lncRNA-related risk score. Lung cancer produces the most lethal solid malignancies, and the role of lncRNAs in the genesis and development-and thus diagnosis and prognosis -of lung cancer has been widely studied [23][24][25][26]. In addition to the metastatic behavior of lung cancer, lack of precise and accurate biomarkers for diagnosis and prognosis may also contribute strongly to the low survival rate (< 15%). Therefore, there is a pressing need for potential and reliable prognostic biomarkers that predict lung cancer outcomes. In previous studies, expression of miRNA and mRNA have been examined to reveal characteristics associated with lung cancer outcome [27][28][29][30]. Differential expression of lncRNAs has rapidly emerged in various studies of tumorigenesis and cancer progression, and the dysregulated lncRNAs play key roles in cancer prognosis.
There is accumulating evidence that lncRNAs might be a pivotal component in the ceRNA network by modulating other RNA transcripts [31][32][33]. CHIAP2 was detected in the lung cancer-related ceRNA network [34]. Snhg1 promoted cell proliferation by acting as a sponge for the tumor suppressor miR-338 in esophageal cancer cells [35]. Wang et al. [36] affirmed that TUG1 might affect ROCK1 expression, which could mediate migration/ invasion to influence prognosis, by working as a ceRNA manner by targeting miR-335-5p. The up-regulation of AFAP1-AS1 was associated with poor prognosis in NSCLC patients [37]. Therefore, potential connections among lncRNA, miRNA, and mRNA might exist in the genesis and development of LUSC. In the present study, we first used the TCGA database to construct a ceRNA network to reveal a novel regulatory network in LUSC. Some cancer-specific lncRNAs, such as ABCC13, HOTAIR, LINC00472, and FER1L4 have also been reported to act as potential diagnostic and prognostic biomarkers in cancers [38][39][40][41]. In the present study, we sought to determine whether the LUSC-specific lncRNAs in the ceRNA network were indirectly related to mRNA signal pathways. The results of this pathway analysis showed that there were a few pathways related to cancer. Hence, our results suggest that specific lncRNAs may play crucial roles in the genesis and development of LUSC.
Based on public data from large-scale samples, the prognostic value of lncRNAs has been evaluated in various   cancers, including lung adenocarcinoma [42], gastric cancer [43], urothelial carcinoma [44], colorectal cancer [45], intrahepatic cholangiocarcinoma [46], and breast cancer [47]. One recent study evaluated the prognostic value of miRNAs and mRNAs in LUSC. Using the TCGA database, Gao et al. [48] found that 12 of the 133 most significantly altered miRNAs were associated with OS of LUSC After a comprehensive analysis, a seven-miRNA signature for prediction of OS in LUSC patients was established. Similarly, multivariable Cox regression analysis indicated that up-regulated expression of ASCL2 was an independent prognostic factor for OS (HR=2.764, P<0.05) in LUSC patients [49]. However, the relationship between differentially expressed lncRNAs and survival in LUSC patients has not been studied in large-scale samples using comprehensive analysis. Unlike previous studies, we combined the ceRNA network with TCGA data to analyze the lncRNAs in LUSC. Since then, we determined that significantly differential expression of two novel lncRNAs (FMO6P and PRR26) could be a novel independent risk factor for LUSC. Moreover, the risk score based on these two lncRNAs could be a new indicator for the prognosis of LUSC patients. However, there is no report on the association between these two lncRNAs and disease. Moreover, no study has investigated the function of the above two lncRNAs. Here, we constructed a lncRNA-miRNA-mRNA network to discover the relevant genes of these two lncRNAs. The FMO6P-related genes were enriched in the Fanconi anemia pathway and cancer transcriptional misregulation. Furthermore, the PRR26-related genes were enriched in Transcriptional misregulation in cancer, the MAPK signaling pathway, in tight-junction Proteoglycans in cancer, and in Protein digestion and absorption, most of which are classical signaling pathways closely related to the genesis and progression of cancer. For example, the MAPK signaling pathway has been proposed to be associated with the occurrence, invasion, and metastasis of LUSC [50].   Therefore, functional enrichment analysis may elucidate the role of FMO6P and PRR26 in carcinogenesis of LUSC.
Although the findings of the present study might have significant clinical implications, several limitations should be considered. First, a longer follow-up time is needed to validate our findings. Second, apart from data from the TCGA database, other experimental methods are required to verify the present findings. Third, the function of FMO6P and PRR26 in LUSC needs to be further studied.
In conclusion, the present study identified a two-lncRNA signature as a potential outcome predictor for LUSC patients via analyzing the genome-wide lncRNA expression profiles from TCGA using a ceRNA network. Future functional investigations are required to explore the mechanisms underlying the roles of these lncRNAs in LUSC.

TCGA dataset and patient information
RNA sequencing (RNA-Seq) data from 504 individuals with LUSC were obtained from the TCGA data portal (https://portal.gdc.cancer.gov/). Exclusion criteria were as follows: 1) histologic diagnosis ruled out LUSC; 2) another malignancy besides LUSC. Overall, 474 LUSC patients were included in this study, including data from 429 LUSC tissue samples and 45 non-tumorous adjacent-normal lung tissue samples up to February 12, 2017. Clinical data, including outcome and staging information on TCGA LUSC patients, were also downloaded from the Data Coordinating Center. Of those 474 patients, there were 159 LUSC patients with lymphatic metastasis and 270 LUSC patients with nonlymphatic metastasis. According to the TNM stage, 343 patients were identified as having well-or moderately differentiated LUSC (stage I-II), and the remaining 86 patients had poorly differentiated LUSC (stage III-IV). Since the data were obtained from the TCGA database, additional approval from the Ethics Committee was not needed. Data processing procedures met the requirements of the data access policies and NIH TCGA human subject protection (http://cancergenome.nih.gov/publications/ publicationguidelines).

Identification of differentially expressed RNAs in LUSC samples
The expression of human RNA (lncRNA, mRNA, and miRNA) in LUSC samples was analyzed using RNASeqV2 and Illumina HiSeq 2000 miRNA sequencing platforms. Level 3 RNA expression data were collected from TCGA Data Portal and normalized by TCGA [51]. To detect the differential expression of RNA, samples were divided into tumor tissues vs. adjacent non-tumor tissues, stage I-II vs. stage III-IV, and lymphatic metastasis vs. non-lymphatic metastasis, respectively. The flow chart for bioinformatics analysis was given in Figure 11. Since many RNAs were not expressed in particular tissue types or showed little variation among patients in the TCGA database, only RNAs expressed in at least two normal or tumor samples, with at least 100 counts per million were retained in the profile. Expression differences were characterized by fold change and associated P-values. Fold change indicates the difference in expression of each RNA from LUSC to adjacent non-tumor tissues. Up-regulated and down-regulated RNA were assigned fold changes >2 and <0.5, respectively, with FDR-adjusted P<0.05.

Construction of the ceRNA network and functional assessment
Based on the relationship among lncRNA, mRNA, and miRNA, the ceRNA network was constructed in three steps: i) LUSC-specific RNA (lncRNA, mRNA, and miRNA) filtration: Up-regulated and down-regulated LUSC-specific RNAs were assigned fold changes >2 and <0.5, respectively, with P<0.05. To maximize data reliability, LUSC-specific lncRNAs not registered in GENCODE (http://www.gencodegenes.org/) were abandoned; ii) miRTarBase (http://mirtarbase.mbc.nctu. edu.tw/) and Targetscan (http://www.targetscan.org/) were used to predict the mRNAs targeted by miRNAs; iii) lncRNA-miRNA interactions were predicted using miRanda tools (http://www.microrna.org/microrna/home. do). Furthermore, miRNAs that negatively regulated expression of both lncRNAs and mRNAs were selected for construction of the ceRNA network. Cytoscape v3.0 was used to construct and visualize the ceRNA network [52]. To understand the underlying pathways and biological processes of differentially expressed genes in the ceRNA network, the Database for Annotation, Visualization, and Integrated Discovery (DAVID) (http://david.abcc.ncifcrf. gov/) [53] was employed for functional enrichment analysis, in which KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways and GO (Gene Ontology) biological processes were interested at significance level of P<0.05 and an enrichment score>1.5.

Construction of the LUSC-specific lncRNAbased prognostic signature
Based on the ceRNA network, LUSC-specific lncRNAs were selected, and the expression level of each lncRNA was log2 transformed for further analysis. The univariate Cox proportional hazards regression model with a significance level set at 0.05 was used to analyze LUSC-specific lncRNAs associated with OS. The prognostic risk score for predicting OS was calculated as previously reported: Risk score=exp lncRNA1 *β lncRNA1 +exp lncRNA2 *β lncRNA2 +…exp lncRNA *β lncRNAn (exp: expression level, β: the regression coefficient derived from the multivariate Cox regression model) [54]. Utilizing the median risk score as the cutoff point, LUSC patients were divided into high-score and low-score groups [55]. Univariate Cox proportional hazards regression analyses were further conducted to investigate the effects of various clinical characteristics and the risk score on the OS of LUSC patients.
We further analyzed clinical features including gender, tumor stage, TNM staging system, lymphatic metastasis, and patient outcome assessment with lncRNAs in the ceRNA network. The relationship between the LUSC-specific lncRNAs and clinical features was examined using Student's t-test. Univariate Cox proportional hazards regression analyses were further performed to investigate the effects of various clinical features and risk score on the OS of LUSC. The hazard ratio (HR) and 95% confidence interval (CI) were assessed. HRs for a two-fold change in gene expression from univariate Cox regression analysis were used to identify LUSC-specific lncRNAs associated with OS. lncRNAs defined as having a protective signature showed HR<1 and those defined as high-risk had HR for death>1. A time-dependent receiver operating characteristic (ROC) curve analysis within five years as the defining point was performed with the R package "survival-ROC," to evaluate the predictive value of the risk score for timedependent outcomes [56]. Kaplan-Meier and log-rank methods (Mantel-Haenszel test) were used to test the equality of survival distributions in different groups subjected to comparison using IBM SPSS Statistics Version 21 software (SPSS Inc., Chicago, IL, USA). The operating characteristic curve was used to evaluate LUSCspecific lncRNAs for their sensitivity and specificity for detection of LUSC.

Author contributions
Conceived and designed the study: Jing Sui and Geyu Liang. Performed the study: Jing Sui. Analyzed the data: Jing Sui and Siyi Xu. Contributed analysis tools: Songru Yang, Lihong Yin and Yuepu Pu. Wrote the paper: Jing Sui.