The prognostic value of long non coding RNAs in non small cell lung cancer: A meta-analysis

Background Reports have demonstrated the prognostic function of long non-coding RNAS (lncRNAS) in patients with cancer. However, their prognostic functions in non small cell lung cancer (NSCLC) remain controversial. We therefore performed a meta-analysis on six lncRNAs (PVT1, AFAP1-AS1, LINC01133, ANRIL, MEG3 and UCA1) to clarify their prognostic roles in NSCLC. Results Thirty-six studies involving 6267 patients with NSCLC and 34 lncRNAs were included. Of the listed lncRNAs, 20 were shown to negatively affect patients' overall survival while the high expression of 13 lncRNAs indicated better survival outcomes. Materials and Methods The log-rank p value and Kaplan–Meier survival curves of survival outcomes were extracted for hazard ratio (HR) calculation. Survival outcomes were measured by overall survival (OS) and event free survival (EFS) which were then analyzed by calculating pooled hazard ratios. The heterogeneity was detected by Q statistic and I-squared statistic. Conclusions The abnormal expression of lncRNAs may significantly affect NSCLC patients' survival and may serve as a novel predictive factor for prognosis of NSCLC patients.


INTRODUCTION
Lung cancer is one of the most common causes of cancer-related deaths worldwide and non small cell lung cancer (NSCLC) accounts for 80% of all cases [1]. GLOBOCAN 2012 reported that there were approximately 14.1 million cancer patients in the world and 8.2 million of them died in 2012, most of which were population from less developed countries [2]. Patients with lung cancer are usually diagnosed at advanced stages with relatively poor prognosis. The estimated overall 5-year survival rate of advanced stage lung cancer is 0-14% [3,4], while the 5-year survival rate of early stage NSCLC can be as high as 83%., which informs us that the early diagnosis and the finding of new molecular targets for NSCLC are the key to improve clinical strategies and outcomes of NSCLC [5]. Long non-coding RNAs (lncRNAs) are non-protein-coding RNA molecules with a length of more than 200 nucleotides and often expressed in a spatial, temporal and tissue-specific pattern [6,7]. In the past, lncRNAs were merely viewed as transcriptional ''noise'' [8]. Recently, a growing number of genome-wide transcriptome studies have identified about 3000 lncRNASs and at the same time indicated their diverse biological functions in both normal and degenerated tissues, including cell growth, differentiation and disease progression [9]. lncRNAs may act as primary regulators of the molecular interaction with DNA-binding proteins and epigenetically regulate the expression of target genes [10].
So far, controversy about the prognostic role of lncRNAs in NSCLC still exists. Some studies drew statistically insignificant conclusions [11,12], while some studies showed that lncRNAs could be important biomarkers for the assessment of overall survival and recurrence. Due to the limitation of sample size and Research Paper research number, a single study may not be able to reflect the facts accurately. Therefore, we conducted a metaanalysis to identify the exact role of lncRNAs in NSCLC patients' prognosis. At the same time, we summarized in our study the relation of different lncRNAs to patients' prognosis. Kaplan-Meier survival analysis and log-rank tests were performed in our enrolled studies to further evaluate the correlation between lncRNA expression and the prognosis of NSCLC patients. Pooled results indicated that lncRNAs played an important role in NSCLC overall survival time, which provided us with new insights in the therapeutic strategies of NSCLC.

Study selection
After full-text assessment of all included articles, we excluded studies that did not use EFS or OS as survival parameters. Studies that lacked information for calculation with methods developed by Parmar, Williamson , and Tierney (Parmar et al., 1998;Williamson et al., 2002;Tierney) were also excluded. The initial search returned 128 articles, from which 36 duplicated records were removed. Abstracts of the remaining 92 articles were carefully read by two authors independently and we excluded 60 unqualified literatures: laboratory studies (n = 12), review articles (n = 11), other biomarkers (n = 2) and other types of cancer (n = 35). Next we went through the full texts of the remaining 32 studies and 25 with adequate data for calculation were finally enrolled. The flow chart of selection process is shown in Figure 1. The supplementary search returned 46 articles, 11 of which contain useful information.

Characteristics of included studies
Among the 36 studies, one article [11] used progression free survival instead of disease free survival, we therefore combined DFS and PFS together and use event free survival EFS as prognosis parameter of our study. 28 studies used overall survival OS as prognosis parameter, one study used event free survival EFS and four contained OS as well as EFS. All 36 studies used the quantitative real-time reverse transcription-PCR (qRTPCR) method to measure the expression of lncRNAs in tumor samples. All the included studies analyzed the prognosis of 6267 patients with NSCLC and the correlation between 34 lncRNAs levels and survival outcomes. All necessary data of included trials are listed in Table 1 and Table 2.
The number of patients enrolled in each study ranged from 20 to 1926, and the follow-up duration varied from 25 months to 200 months. Among them, 33 studies involved participants from China  and three studies involved patients respectively from Japan [44], Germany [45] and USA [46]. All studies investigated patients with NSCLC and qRT-PCR was used to detect lncRNAs expression in tumor tissues.

Overall Analyses
20 lncRNAs were shown to negatively affect patients' overall survival while 13 lncRNAs were associated with better survival outcomes. One study [11] on ANRIL (Nie et al: OS HR = 2.23 , 95% CI: 0.89-5.59, P = 0.09) showed no significant prognostic effect of lncRNAs expression on patients' overall survival. Wang et al. [12] observed no correlation between the expression of TUSC7 and patients' DFS, but significant correlation between TUSC7 expression and patients' OS. The BC087858 expression level was also associated with prognosis but it just reached the marginal statistical significance (P =0.083) [38]. All HRs, 95% CI and P values of included studies are listed in Table 3.

Subgroup analysis
Among the 20 listed lncRNASs, eight (HOTAIR, PVT1, AFAP1-AS1, LINC01133 and ANRIL, UCA1, MALAT-1, MEG3) have been studied by two or more articles. We then carried out meta analyses and obtained the combined HRs. While other studies have sufficient information for pooled analysis, studies on HOTAIR and MALAT-1 looked into OS and EFS separately and we were therefore unable to conduct relevant meta analysis.

PVT1
We performed meta-analysis on articles choosing lncRNA PVT1 as a prognostic marker. The two studies included in meta-analysis [14,15], both conducted multivariate Cox regression analysis and the data such as HR is therefore directly extracted and put into pooled analysis. The median follow-up period is 41 months [14] and 32 months [15] respectively and the information of a total number of 190 patients were collected. There was evidence of considerable heterogeneity in these two groups (P = 0.11, I2 = 62%) so the random effect model was selected. A combined HR of 2.34 (95% CI: 1.25-4.39, P = 0.008) for those patients with high expression of PVT1 was found, from which we drew a conclusion that high expression of long non-coding RNAS PVT1 is a predictor of poorer overall survival ( Figure 2).
We then carried out meta analysis with these two articles containing three groups of data and the subsequent combined HR is shown in Figure 4. Significant heterogeneity among selected studies according to Q-test (chi2 = 6.97) and I-squared result (I2 = 71%, P = 0.03) was observed, so the random model was applied to calculate a pooled HR (HR = 3.22,95% CI: 1.53-6.75, P =0.002), which indicated that an elevated expression level of AFAP1-AS1 was a strong predictor of poorer OS.

LINC01133
We included two studies investigating the correlation of LINC01133 expression with patients   Both studies conducted Kaplan-Meier survival analysis and no significant heterogeneity was observed (I2 = 0%, P = 0.91). Further meta analysis using the fixed effect model revealed that high expression of LINC01133 could develop as an independent factor for predicting the prognosis of NSCLC patients (HR = 2.29, 95% CI: 1.42-3.71, P = 0.0007) ( Figure 5).

ANRIL
Two studies involved the multivariate Cox regression analysis of prognostic parameters including the expression of ANRIL in NSCLC patients. Ling et al (N = 87) and Nie et al (N = 68) had clinical follow-ups of 60 months and 36 months respectively. In Nie's study, the ANRIL over-expression did not show a significant influence on OS (HR = 2.23, 95% CI: 0.89-5.59, P = 0.09). In order to clarify the impact of ANRIL expression on patients' survival, we performed a pooled analysis. We observed no heterogeneity between studies (I2 = o%, P = 0.82) and therefore fixed effect model was applied to calculate the association between high tumoral ANRIL expression and OS (HR 2.42, 95% CI: 1.40-4.19, P = 0.002). These results suggest that high expression of ANRIL could predict worse prognosis of NSCLC patients regarding overall survival and may be an independent prognostic marker ( Figure 6).

UCA1
Two article about lncRNA UCA1 studied OS and were therefore included in meta-analysis [14,15]. Both studies conducted multivariate Cox regression analysis and the data such as HR is therefore directly extracted and put into pooled analysis. We observed no heterogeneity between studies (I2 = 0%, P = 0.32) and therefore fixed effect model was applied. A combined HR of 1.49 (95% CI: 1.17-1.91, P = 0.001) for those patients with high expression of PVT1 was observed. We could then conclude that high expression of lncRNA UCA1 can be used as a predictor of poorer overall survival (Figure 7).

MEG3
We carried out meta analysis with two articles describing the correlation between elevated expression of Meg3 and overall survival. The subsequent combined HR is shown in Figure 4. No heterogeneity among selected studies according to Q-test (chi2 = 0.14) and I-squared result (I2 = 0%, P = 071) was observed, so the fixed model was applied to calculate a pooled HR (HR = 0.28, 95% CI = 0.15-0.53, P < 0.0001), which indicated that elevated expression of MEG3 could positively affect patients' overall survival (Figure 8).

DISCUSSION
The current meta-analysis investigating the correlation between lncRNAs and cancer prognosis, demonstrated that the over-expression of lncRNAs was an effective predictor of survival in a variety of cancers, in terms of both OS and EFS. For NSCLC, it is of great interest to identify its prognostic biomarkers, which can help cast light on the stratification of patients and make clinical decisions. In recent years, an increasing number of studies have proved the aberrant expression of lncRNAs in human cancer including NSCLC [49].
Our study included 36 recently published articles and a total number of 6267 patients, which is considered powerful enough to consolidate and perform the subgroup analyses. In this study, we listed 34 lncRNAs that were potential prognostic biomarkers for prognosis (Table 3). Our meta-analysis looked into six lncRNAs (PVT1, AFAP1-AS1, LINC01133 and ANRIL, UCA1, MALAT-1, MEG3) whose prognostic roles have been clearly demonstrated in two or more articles. The combined HRs suggested that elevated expressions of PVT1, AFAP1-AS1, LINC01133, ANRIL, UCA1, MALAT-1 and MEG3 were significantly correlated with patients' poor prognosis (Figures 2, 4, 5, 6, 7, 8). Although one study on ANRIL alone showed no statistical significance (HR = 2.23, 95% CI: 0.89-5.59, P = 0.09), the pooled outcome of two studies added convincing evidence that increased expression of ANRIL indicates shorter overall survival time (HR = 2.42, 95% CI: 1.40-4.19, P = 0.002). Due to the limitation of the study number, these conclusions need more clinical trials for verification. The heterogeneity of the population was probably due to the difference in source of population, the cut-off value of lncRNAs and the duration of follow-ups.
Distinct from earlier studies, this meta-analysis have summarized the prognostic role of all published lncRNAs in NSCLC and carried out pooled analysis on some certain lncRNAs with enough data. To the best of our knowledge, this is the first meta-analysis summarizing information about the prognostic value of all available lncRNAs in NSCLC patients. We strictly followed the literature inclusion criteria and all enrolled literatures were examined independently by two authors. Furthermore, we paid substantial attention to the details of study design and data reporting in quality assessment. We extracted data only of multivariate analysis to avoid the influence of heterogeneity among the included studies and to further explore the potential role of lncRNAs as prognostic biomarkers of NSCLC. As for Kaplan Meier survival curves, we carefully selected studies with valid information and strictly followed methods developed by Parmar, Williamson, and Tierney. Blurred curves were retouched with Microsoft Paint to make it precise for calculation. Furthermore, all data of extracted lncRNAs were based on frozen tissue samples of clear clinical origins. It was proven that the type of samples could influence the experimental outcomes in terms of RNASs detection [50]. All enrolled studies used qRT-PCR to measure lncRNAs which made pooled data from different studies more persuasive considering the consistent measurement background. Last but not least, all returned studies of our search strategy have been covered in this study which demonstrated the prognostic value of various lncRNA expression in NSCLC.
However, some details of our study need to be further refined. To start with, the number of eligible articles is relatively small, which lead to the relative insufficiency of studies in subgroup analyses. The possible cause for this was that studies reporting positive results were more likely to be published or that published literatures in other languages were missed during our search process. For the same reason, publication bias and sensitivity analyses were not performed, which might lead to the lack of statistical power. Second, the main ethnicities of the patients in our analysis were Asian. Thus, standardized analyses are expected in order to apply our results to other populations. Third, although all four sets of pooled outcomes of HR for OS in patients with high lncRNA expression were proven to be statistically significant (all HR > 2), some independent outcomes are not strong enough to have clinical value. Because empirically, a predictive HR value of more than 2.0 was considered to   be statistically strong [51]. Although these results remain to be verified by larger numbers of clinical trials, they still possess statistic validity to reflex the general correlation of lncRNA expression with OS. The prognostic performance of lncRNAs in NSCLC has been proven. However, further clinical studies are warranted to Figure out the complicated molecular networks through which lncRNAs act to exert an influence on NSCLC patients.

Search strategy
A comprehensive search was done via Pubmed database for literatures that analyzed the prognostic value of lncRNAs in NSCLC patients. Studies were selected using the varying combination of the following    keywords: long non-coding RNAs, prognosis, lung cancer or NSCLC. The last search update was performed on May 19th , 2016. A second search was done on September 13th, 2016, using the following words: long non-coding RNAs, survival, lung cancer or NSCLC. Additional studies mentioned in those review articles were manually added to our evaluation list.

Inclusion criteria
We referred to the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) Statement issued in 2009 as well as the checklist of the Dutch Cochrane Centre represented by MOOSE [52]. We then came up with a criteria for studies that are considered eligible for our full-text evaluation: (i) studies about the relation between lncRNAs expression in tumor or blood samples and prognosis of patients with NSCLC; (ii) the survival outcomes were measured with overall survival (OS) or event free survival (EFS) including disease free survival (DFS) and progression free survival ( PFS). The inclusion criteria is shown in Table 5.
Studies were excluded based on any of the following conditions: (i) review articles, laboratory articles or letters; (ii) articles about the prognosis of other tumors or other markers. When two articles involving the same medical center with similar data, the article with a larger sample size was selected. Two authors independently selected studies, and disagreements were resolved by consulting a third author.

Data extraction
All data were extracted independently by two authors and any disagreements were resolved by consensus and consultation with a third investigator. We extracted the results of multivariate Cox hazard regression analysis provided in the articles. However, if these data were not directly available, we extracted the log-rank p value and Kaplan-Meier survival curves of survival outcomes with the number of patients at risk in each expression group for further calculation. The following data were extracted: name of first author, investigated lncRNAs, number of patients, HR with 95% CI, P value, population, sample site, assay and survival outcome parameter.

Statistical methods
All HRs and 95% confidence interval(CI) were calculated with Tierney's method. The logHR and SE (logHR) (SE) were recorded for aggregation of the survival outcomes of different long non-coding RNAs. Pooled analysis of the survival outcomes of specific lncRNAs was then performed. A test of heterogeneity of combined HRs was carried out using Cochran's Q test and Higgins I-squared statistic. P value of < 0.05 or I2 > 50% was considered statistically significant. A random effect model (Der Simonian and Laird method) was applied if heterogeneity was observed (P < 0.05 or I2 > 50%), otherwise the fixed effect model was used [53].
All P values were two sided and a P value of less than 0.05 was considered to be statistically significant.