LncRNA profile study reveals four-lncRNA signature associated with the prognosis of patients with anaplastic gliomas

Anaplastic glioma is Grade III and the median overall survival is about 37.6 months. However, there are still other factors that affect the prognosis for anaplastic glioma patients due to variable overall survival. So we screened four-lncRNA signature (AGAP2-AS1, TPT1-AS1, LINC01198 and MIR155HG) from the lncRNA expression profile from the GSE16011, CGGA and REMBRANDT datasets. The patients in low risk group had longer overall survival than high risk group (median OS 2208.25 vs. 591.30 days; P < 0.0001). Moreover, patients in the low risk group showed similar overall survival to Grade II patients (P = 0.1669), while the high risk group showed significant different to Grade IV (P = 0.0005) with similar trend. So based on the four-lncRNA, the anaplastic gliomas could be divided into grade II-like and grade IV-like groups. On the multivariate analysis, it showed the signature was an independent prognostic factor (P = 0.000). The expression of four lncRNAs in different grades showed that AGAP2-AS1, LINC01198 and MIR155HG were increased with tumor grade, while TPT1-AS1 was decreased. Knockdown of AGAP2-AS1 can inhibit the cell proliferation, migration and invasion, while increase the apoptosis cell rates in vitro. In conclusion, our results showed that the four-lncRNA signature has prognostic value for anaplastic glioma. Moreover, clinicians should conduct corresponding therapies to achieve best treatment with less side effects for two groups patients.


INTRODUCTION
Glioma is the most common brain tumor and it has high morality and recurrence rate [1]. According to the World Health Organization (WHO) classification, glioma is classified into four grades, as which the anaplastic glioma (AG) is Grade III [2], including anaplastic astrocytoma (AA), anaplastic oligodendroglioma (AO) and anaplastic oligoastrocytoma (AOA). AGs comprise 6-10% of all primary brain tumors [3] and the median overall survival (OS) is about 37.6 months [4]. Current evidence suggests that the progression of gliomas may involve the accumulation of multiple genetic alterations, such as isocitrate dehydrogenase (IDH) mutation, codeletion of chromosomal arms 1p and 19q, methylation of O(6)methylguanine-DNA methyl transferase (MGMT) promoter and alpha thalassemia/mental retardation syndrome X-linked (ATRX) mutations or loss, et al [5]. However, due to variable OS of patients with AGs, there are still other factors that affect the prognosis for AG patients.

Research Paper www.impactjournals.com/oncotarget
Long noncoding RNA (LncRNA) is defined as longer than 200nucleotides without protein-coding ability [6]. Many studies have revealed a wide range of functional activities of lncRNAs [7,8], including chromatin remodeling, transcriptional control and posttranscriptional processing, et al. The dysregulation of lncRNAs might contribute towards glioma pathogenesis, such as cellular proliferation and apoptosis [9][10][11][12]. Aberrant expressions of lncRNAs may have prognostic value for AG patients and can be exploited as potential therapeutic targets [13,14].
In our study, we obtained GSE16011 dataset as training set while the Chinese Glioma Genome Atlas (CGGA) and the Repository for Molecular Brain Neoplasia Data (REMBRANDT) datasets as validated sets. A total of 183 (GSE 80; CGGA 36; REMBRANDT 67) AG patients were included. Using Cox regression analysis and receiver operating characteristic (ROC) curve, we identified four-lncRNA signature (AGAP2-AS1, TPT1-AS1, LINC01198 and MIR155HG) which have prognostic value for AGs. Based on the median risk score of the signature, AGs could be divided into low risk and high risk groups. The patients in low risk group had longer OS than high risk group.

Identification and validation of four-lncRNA signature from the three datasets
The lncRNAs list and their expression profiles were extracted from each microarray dataset by using lncRNA expression profile mining [15]. A total of 572 lncRNAs were identified from the three datasets. Then, we collected 80 anaplastic glioma patients from GSE16011 dataset as training set. 45 probes (33 lncRNAs) were pinpointed on univariate Cox analysis and the top 10 prognostic probes were listed in Table 1 ranked ascendingly by their p value.
By applying time-dependent ROC curve, we could get a series of area under the curve (AUC) (0.909, 0.907, 0.899, 0.942, 0.913, 0.917, 0.918) by adding genes in the list from top to bottom to the signature [16]. The maximal AUC (0.942) was achieved from the top 4 probes (4 lncRNAs) ( Figure 3A). The four lncRNAs were AGAP2-AS1, TPT1-AS1, LINC01198 and MIR155HG. Then, we developed a four-lncRNA signature using a risk score method [17][18][19]. We divided two groups (low risk and high risk groups) based on the median risk score (0.2067). The patients in low risk group had longer OS than high risk group (median OS 2208.25 vs. 591.30 days; P < 0.0001; Figure 1A). We further validated the four-lncRNA signature in two additional datasets using the same β value. It showed similar results (median OS 1432 vs. 548.5 days; P = 0.0012) (median OS 1276 vs. 465.5 days; P = 0.0005; Figure 1A) between two groups, respectively.

Low risk and high risk groups showed similar overall survival to Grade II and IV gliomas respectively
We divided the AGs into two groups based on the median risk score (0.2067). Furthermore, we assessed the prognostic value of this signature compared with Grade II and IV gliomas.
22 Grade II glioma patients and 142 Grade IV glioma patients from GSE16011 were included in our analysis. The patients in the low risk group showed similar OS to Grade II patients (P = 0.1669), while the high risk group showed significant different to Grade IV (P = 0.0005) with similar trend ( Figure 1B). We further validated the findings in CGGA and REMBRANDT datasets ( Figure 1B). The low risk group showed similar OS to Grade II (P = 0.0740; P = 0.6768) and the high risk group also showed similar OS to Grade IV (P = 0.3001; P = 0.7172), respectively. It indicated that OS of two groups

Clinical and molecular features of low and high risk AG patients
The expression levels of the four lncRNAs showed significant difference between low risk and high risk groups (Figure 2A, C). TPT1-AS1 was higher expressed in low risk group, so we considered it as a protective lncRNA. The other three lncRNAs (AGAP2-AS1, LINC01198 and MIR155HG) were higher expressed in high risk group, so we considered them as risky genes. We observed that AG patients in the high risk group had shorter OS than low risk group ( Figure 2B). The related clinical information such as gender, histology, TCGA subtype, CGGA subtype, age, IDH1 mutation and KPS were obtained from GSE16011 database. Patients in high risk group tended to display older age (>47 years), classical and mesenchymal TCGA subtype, G3 CGGA subtype and lower KPS ( Figure 2D). Moreover, we further validated in CGGA and REMBRANDT databases ( Figure 2D).
We assessed the independence of the four-lncRNA signature in the GSE16011 dataset. In the univariate cox regression analysis, the signature was significant associated with the OS (P = 0.000) along with age and KPS status. Moreover, it showed the signature was an independent prognosis factor (P = 0.000) on the multivariate analysis (Table 2). In two additional datasets, the results indicated the similar results that the risk score was an independent prognostic factor (P = 0.030; P = 0) (Supplementary Table S1).

Function exploration of the signature for prognosis of AGs
To explain the different prognosis of AGs divided by the signature, we performed SAM analysis between two groups using R software (False Discovery Rate, FDR < 0.01). The top 500 probes were selected from positive and negative group, respectively. Then the expression of 1000 probes were showed in Figure 4A using a hierarchical clustering analysis.
Moreover, we performed Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analysis using DAVID (The Database for Annotation, Visualization and Integrated Discovery). The results showed that selected up-regulated genes were enriched in lysosome, complement and coagulation cascades, apoptosis and p53 signaling pathway, et, al. While the down-regulated genes were enriched in MAPK signaling pathway, ribosome, Wnt signaling pathway, Notch signaling pathway and TGF-beta signaling pathway, et, al ( Figure  3B). We further performed Gene Set Enrichment Analysis (GSEA) for functional annotation [20,21], the results showed that the differential expressed genes were enriched in hypoxia, apoptosis and p53 pathway ( Figure 3C). Moreover, the important genes related to apoptosis, p53signaling pathway, MAPK signaling Overall survival among AG patients in different groups stratified by the signature and Grades (Grade II and Grade IV) in three datasets. *P < 0.05, **P < 0.01, ***P < 0.001. www.impactjournals.com/oncotarget pathway and TGF-beta signaling pathway were indicated in Figure 4B.
Finally, we explored the interaction properties with proteins of the four lncRNAs using CLIPdb (http:// clipdb.ncrnalab.org), which was developed to predict lncRNA-binding proteins using the CLIP-seq data [22]. We entered the lncRNAs into the Binding Target Search form and identified lncRNA-binding proteins. However, we failed to predict the protein which is interacted with LINC01198. The results showed in Table 3. We further The black dotted lines in the middle of each graph represent the gene signature cutoff (median risk score). C. Heat map of the expression of four lncRNAs in low risk and high risk groups. Rows represent corresponding genes, and columns indicate corresponding patients. D. Clinical or molecular pathological features in three datasets. Rows represent corresponding items (gender, histology, TCGA subtype, CGGA subtype, age, IDH1 and KPS). validated these findings using starBase V2.0 [23,24]. The results were partly similar to CLIPdb', so the lncRNAbinding proteins we predicted in Table 3 may just provide clues for further study of the four lncRNAs. It partially explained the poor OS of patients in high risk group.

The expression of the four-lncRNA signature in different grades
We assessed the expression of four lncRNAs in different grades and the results showed that AGAP2-

MIR155HG
The lncRNA-binding proteins were predicted by CLIPdb.
AS1, LINC01198 and MIR155HG were increased with tumor grade, while TPT1-AS1 was decreased in GSE16011 dataset ( Figure 5A). The results were similar in two additional datasets (CGGA, REMBRANDT) (Figure 5 B, C). We further validated these findings using quantitative real time polymerase chain reaction (qRT-PCR) in an independent group (Grade II 9, Grade III 12, Grade IV 15) ( Figure 5D). The primers were listed in Supplementary Table S2.

Knockdown AGAP2-AS1 suppresses cell proliferation, migration and invasion, while increases apoptosis cell rates in vitro
To determine the functional role of AGAP2-AS1 in glioma, we assessed the effects of knockdown of AGAP2-AS1 with siRNAs on cell proliferation, migration and invasion. Three AGAP2-AS1 specific siRNAs were listed in Supplementary Table S2. We evaluated their knockdown efficiency in LN229 and U87MG ( Figure 6A), si-1 and si-2 were found to have a higher silencing efficiency.
The results showed that knockdown of AGAP2-AS1 can inhibit the cell proliferation in LN229 and U87MG using CCK8 assay ( Figure 6B). Moreover, downregulation of AGAP2-AS1 can also suppress cell migration and invasion ( Figure 6C, D). Annexin-V staining showed that apoptosis cell rates were increased with AGAP2-AS1 knockdown in LN229 and U87MG (Figure 7).

DISCUSSION
The classification of AGs based on mRNA expression profiling has been reported preciously [17]. However, with the functions of lncRNAs exploring, there are few studies focus on the classification of AGs based on lncRNAs expression profiling.
We mined lncRNAs data from three datasets and selected GSE16011 as a training dataset. After Cox regression analysis and time-dependent ROC curve, we screened four-lncRNA signature (AGAP2-AS1, TPT1-AS1, LINC01198 and MIR155HG) for the prognosis of AGs. Moreover, we validated the finding in CGGA and CGGA dataset. C. REMBRANDT dataset. D. qRT-PCR validation. *P < 0.05, **P < 0.01, ***P < 0.001, **** P < 0.0001. www.impactjournals.com/oncotarget The cell invasion was inhibited by AGAP2-AS1 knockdown. *P < 0.05, **P < 0.01, ***P < 0.001, **** P < 0.0001. www.impactjournals.com/oncotarget REMBRANDT datasets. We divided two groups based on the median risk score which is developed by a widely used approach [17][18][19]. We observed that AG patients in the high risk group had shorter OS than low risk group. Furthermore, we assessed the prognostic value of this signature compared with Grade II and IV gliomas. It indicated that the signature can specially divide grade IIlike and grade IV-like AGs.
The WHO classification based on morphological criteria (nuclear atypia, mitoses, vascular proliferation and necrosis) [25,26]. However, the four-lncRNA signature can specially divide the AGs into two groups. Clinicians should pay more attention to the treatment of AG patients to achieve best prognosis with less side effects.
Differential expressed genes of low risk and high risk groups were performed KEGG pathways analysis and the results showed that up-regulated genes were enriched in lysosome, complement and coagulation cascades, apoptosis and p53 signaling pathway, et, al. While the down-regulated genes were enriched in MAPK signaling pathway, ribosome, Wnt signaling pathway, Notch signaling pathway and TGF-beta signaling pathway, et, al ( Figure 3B). Moreover, GSEA showed that the differential expressed genes were enriched in hypoxia, apoptosis and p53 pathway ( Figure  3C). We further explored the interaction properties with proteins of the four lncRNAs using CLIPdb and starBase V2.0 (Table 3). AGO2 can play a role of RNA interference by encoding a member of argonaute family of proteins [27] and it predictively interacted with TPT1-AS1 and MIR155HG. It is reported that AGO2 can regulate tumor invasion, proliferation, apoptosis and cell cycle in glioma, cervical and prostate cancer [28][29][30]. CPSF7 can play a regulatory role in polyA site selection [31] and it is found to be significantly associated with tumor recurrence in breast cancer recently [32]. ELAVL1 (HuR) is an RNA-binding protein and the decreased of ELAVL1 can resist cell proliferation and invasion in ovarian and prostate cancer [33,34]. FUS belongs to the FET family and it encodes a multifunctional RNA-binding protein which is highly associated with tumor progression. The overexpression of FUS can promote growth inhibition and apoptosis of prostate cancer [35,36].
We also assessed the expression of four lncRNAs in different grades and the results showed that AGAP2-AS1, LINC01198 and MIR155HG were increased with tumor grade, while TPT1-AS1 was decreased ( Figure 5). Moreover, knockdown of AGAP2-AS1 can inhibit the cell proliferation, migration and invasion, while increase the apoptosis cell rates in LN229 and U87MG ( Figure 6, and 7).
There are limitations in our manuscript. Only a total of 183 primary AGs were enrolled, which was a small sample and only a part of lncRNAs included in our analysis from microarray data. Moreover, functions of the identified four lncRNAs were only predicted by DAVID, CLIPdb and starBase V2.0, so the RNA-binding proteins we predicted in Table 3 may just provide clues for further study of the four lncRNAs. However, identified lncRNAs may exert their functions through similar mechanisms to the presumed interacted proteins.
In conclusion, our results showed that the four-lncRNA signature has prognosis value for patients with AGs. Due to the different prognosis in two groups, clinicians should conduct corresponding therapies to achieve best treatment with less side effects. Moreover, knockdown of AGAP2-AS1 can inhibit the cell proliferation, migration and invasion, while increase the apoptosis cell rates in vitro.

LncRNA profile mining
The lncRNA profile was achieved by the established mining method [15]. Genes were identified as proteincoding genes or noncoding genes based on their Refseq IDs or Ensembl IDs. GSE16011 was settled on Affymetrix HG-U133 Plus 2.0 platform, so we further filtered them by removing pseudogenes, rRNAs, tRNAs, snRNAs, snoRNAs, and other short non-coding RNAs and retained only the long noncoding genes in NetAffx Annotation files. We only retained the 572 lncRNAs that were represented in all three datasets (GSE16011, CGGA and REMBRANDT) to ensure the validity of the gene signatures.

Signature development
The risk score was developed as previously reported [17][18][19]37], based on a linear combination of the lncRNA expression level (expr) weighted by the regression coefficient (β) derived from the univariate Cox regression analysis. The risk score for each patient was calculated as follows: Risk score = β gene1 × expr gene1 + β gene2 × expr gene2 + ··· + β genen × expr genen We divided anaplastic gliomas patients into low risk and high risk groups using the median risk score as cutoff point.

Quantitative real time polymerase chain reaction
Quantitative real time polymerase chain reaction (qRT-PCR) was performed to detect the expression levels of AGAP2-AS1, TPT1-AS1, LINC01198 and MIR155HG. Total RNAs were extracted from frozen tissues or cell lines using TRIzol reagent (Invitrogen, CA, USA) and then reversely transcribed using ReventAid First Strand cDNA Synthesis Kit (Thermo, MA, USA) in accordance with the manufacturer's instructions. Glyceralde-hyde 3-phosphate dehydrogenase (GAPDH) was used as an internal control and all the primer sequences are shown in Supplementary  Table S2. qRT-PCR was performed using SYBR Select Master Mix (Applied Biosystems, CA, USA).

Cell proliferation, migration and invasion assays
Cell proliferation was assayed using Cell Counting Kit-8 (Dojindo, Japan) according to the manufacturer's instructions. Cells were seeded at a density of 2000 cells per well in 96 well plates. After AGAP2-AS1 siRNAs and negative control (NC) transfection for 24 h, cells were evaluated by measuring the absorbance at 450 nm. Cells were transfected with siRNAs targeting AGAP2-AS1 for 48 h and then about 1000 cells were plated in each well of the 12-well plate and maintained for 2 weeks to form colony.
Cell migration assay were performed by using Transwell insert chambers (8μm pore size, Corning, USA). About 2 x 10 4 cells were seeded into the upper chamber in serum free medium in triplicate. The lower chamber was filled with 600μl medium containing 10% fetal bovine serum (FBS). After incubation for 4 h, cells migrating to the lower surface of membrane were fixed using paraformalclehyde and stained with 0.1% crystal violet. For invasion assay, Matrigel Invasion Chambers in the 24-well plates were used.

Flow cytometry analysis for apoptosis
For apoptosis assay, cells were harvested after transfection with AGAP2-AS1 siRNAs for 36 h, and then processed to stain with Annexin V-FITC/Propidium Iodide kit (Beijing 4A Biotech Co., Ltd, China) according to the manufacturer's instructions. The flow cytometry was performed by ImageStreamX Mark II instrument and analyzed with IDEA software.

Statistical analysis
We firstly excluded patients without survival data or ≤ 30 days because they may die of other reasons. Then we performed Cox analysis and the probes were ranked ascendingly by their p value. By applying time-dependent ROC curve, we could get a series of area under the curve (AUC) by adding genes in the list from top to bottom to the signature. www.impactjournals.com/oncotarget The significance analysis of microarray (SAM) and Cox regression analysis was calculated using R software (version 3.2.3) with the samr and survival packages. The univariate, multivariate cox regression analysis and ROC curve were performed by SPSS software (version 22; SPSS Inc., Chicago, IL, USA). The Kaplan-Meier curve was performed by GraphPad Prism 6 (GraphPad Software Inc., La Jolla, CA, USA). A two-sided P value of < 0.05 was regarded as statistically significant.