Systemic analysis of lncRNA and miRNA expression profiles with associated ceRNA network in glioblastoma

Glioblastoma (GBM) is the most common malignant brain tumor with high morbidity and mortality. Long and small non-coding RNA, i.e. lncRNA and miRNA in this study, play important roles in progression of complex diseases and competes with each other when regulate expression of mRNA. In this study, we conducted systemic analysis of high-throughput gene sequencing and microarray datasets to explore expression regulation patterns in GBM. Differential expression mRNAs, lncRNAs and miRNAs were obtained through comprehensively comparinged their expression profiles between GBM and adjacent non-tumor tissues. As a result, a total of 28 lncRNAs were found to be aberrantly expressed (|log2 (fold change)| > 1, corrected p-value < 0.01) and six out of them were significantly associated with GBM prognosis. The expression changes of the six lncRNAs were further validated through quantative PCR analysis. Finally, lncRNA mediated competition endogenous (ceRNA) network was constructed, which should be helpful for the understanding of lncRNA mediated ceRNA network and identification of prognosis biomarker for GBM.


INTRODUCTION
Despite grave progress of glioblastoma (GBM) care and treatment, its prognosis is still poor and large variability in patients' outcome is observed.In United States, the overall survival of untreated GBM diagnosed from 2000-2003 was only 8.1 months and it was still lower than one year even if take those underwent surgery and radiation into account [1].What's more, lots of novel methods only benefit for specific patients, such as the adjuvant temozolomide with radiotherapy could only affect GBM patients with specific MGMT methylation status compared with those radiotherapy only [2].It has been widely accepted that GBM is a genetic-driven disease and influenced by variations of multi genes [3].While, its mechanisms are still largely unknown.In this case, it is urgent to identify potential prognosis biomarkers for GBM.
Long non-coding RNAs (lncRNAs) are non-coding RNA first described in high-throughput sequencing of full-length cDNA [4] with length longer than 200 nucleotides.They were taken as transcriptional noise for their lacking of translation potential.Recent studies have revealed their roles in carcinogenesis and cancer metastasis [5,6] and more and more attentions have been paid on them.One of the most attractive and enigmatic points is how lncRNAs regulate gene expression.The most accepted mechanism of lncRNAs' gene regulation function is chromatin modification, i.e. mediates epigenetic changes by recruiting chromatin remodeling complexes to specific genomic loci [7].Besides, lncRNAs could act as microRNA (miRNA) sponge to reduce miRNA regulatory effect on mRNA [8], and thus up-regulate expression of miRNA target genes.Although the major mechanism of lncRNAs is the regulation of expression of neighboring genes, lncRNAs could also serve as scaffold for proteinprotein interactions or decoys to proteins [9].Additionally, lncRNAs can regulate kinase functions.LncRNA NBR2 engages a metabolic checkpoint by regulating AMP-activated protein kinase (AMPK) under energy stress [10].Accumulating evidence suggests that lncRNAs play a role in fundamental biological functions, and dysregulation of lncRNAs contributes to cancer development, progression, and metastasis in many malignancies [11].MiRNA, another type of well-studied non-coding RNA, could repress gene expression through binding complementary sequences in mRNAs, which known as miRNA response elements (MRE) [12].A lot of microRNA signatures have been reported and there is an increasing possibility of using microRNAs as novel biomarkers and tools of treatment for tumor [13].Previous studies have proposed and validated the hypothesis of competing endogenous RNA (ceRNA), i.e. lncRNAs, mRNAs and other types of RNAs act as miRNA sponges to suppress the function of miRNAs by using shared MRE [14,15].So, lncRNA could repress the function of miRNA, and miRNA could affect mRNA expression.And thus, the three types of RNA form the lncRNA-miRNA-m RNA ceRNA network, which should play important roles in many types of diseases.Liang and collages [16] illustrated that lncRNA H19 promotes epithelial to mesenchymal (ETM) transition by acting as miRNAs sponge in colorectal cancer.Endogenous miRNA sponge lincRNA-RoR regulates Oct4, Nanog, and Sox2 in human embryonic stem cell self-renewal [17].Zhang et al constructed lncRNA-miRNA-mRNA interaction network to explore breast cancer progression based on bioinformatics analysis [18].So, we believe that exploration of interaction between lncRNAs, miRNAs, and mRNAs should be helpful for the identification of prognosis biomarkers for cancer.
In this study, we conducted systemic analysis of high-throughput RNA sequencing and microarray datasets of GBM.Differential expression lncRNAs (DEL), miRNAs (DEMI), and mRNAs (DEM) were identified in 156 GBM samples compared with adjacent normal tissues.As a result, 28 DELs, 47 DEMIs, and 3,329 DEMs were identified.Eventually, 1 miRNA, 28 lncRNAs, and 2 mRNAs were used for the construction of lncRNA-miRNA-mRNA ceRNA network and 6 lncRNAs were found to be significantly associated with prognosis of GBM.Our study should be helpful for the understanding of lncRNA-miRNA-mRNA regulation patterns and the therapy of GBM.

Differential expression mRNAs
Principle component analysis (PCA) through prcomp command in R obtained two main sample clusters, i.e.GBM samples and adjacent normal brain samples (Figure 1A).Consistent with PCA result, there are 3357 DEM (1,489 (44.36%) down-regulated and 1,816 (55.64%) up-regulated genes) were identified in PG compared with NG samples, while, only 124 mRNAs (21 down-regulated (16.94%) and 103 (83.06%) up-regulated genes) were found to be differential expression in RG compared with PG samples.Figure 2B and 2C illustrated read count (log2 scale) mapped to each gene in PG samples versus NG samples and RG samples versus PG samples respectively.

Enriched KEGG pathways
Figure 2A left and 2B left show the heatmap of top 100 down-regulated and 100 up-regulated DEM in PG and all of the DEM in RG.A total of 30 and 5 KEGG pathways were found to be significantly enriched in DEM of PG and RG. Figure 2A right and 2B right illustrate the pathway map of the most significantly enriched pathway of PG and RG respectively visualized by pathview package [26].The top 10 most significantly enriched pathways of PG are listed in Table 1.

Differential expression lncRNAs
Based on the criteria of |log2(fold change)| > 1 and FDR adjusted p-value < 0.05, a total of 28 lncRNAs (Table 2) that with annotation in GENCODE lncRNA version26 were found to be differential expression in PG compared with NG.While, there is no DEL in RG samples compared with PG samples.So, the following analysis are mainly focused on primary GBM and adjacent normal brain samples.

Differential expression miRNAs
A total of 47 miRNAs (31 (65.96%) down-regulated and 16 (34.04)up-regulated genes) were found to be significantly differential expression in GBM samples compared with adjacent normal brain samples.Figure 3A illustrate miRNA levels in GBM samples versus adjacent normal brain tissues.Figure 3B is the two-way cluster of DEMI and samples visualized by pheatmap package.As our expected, two main clusters, i.e.GBM and normal brain tissues clusters, were obtained.
Besides, to obtain more comprehensive miRNA-mRNA regulation patterns in GBM, we searched miRNA-mRNA pairs from TargetScan database containing DEMI and DEM.As a result, we obtained a total of 590 miRNA-mRNA regulation pairs containing 11 DEMIs and 442 DEMs.The full list of miRNA-mRNA regulation pairs were provided in Supplementary Table 1.

qPCR analysis
For lncRNAs significantly associated with GBM survival, we further validated their expression changes in U251 and normal cell lines through qPCR analysis.As a      result, the results of all of the six lncRNAs were consistent with the microarray analysis, i.e. significantly up-regulated in GBM cell lines compared with normal cells (Figure 5).

DISCUSSION
More and more evidences indicate that lncRNAs play important roles in the progression of human cancer.Several studies have shown the association between aberrant lncRNA expression and different types of cancers [27,28].In recent years, some efforts have been made to uncover dysregulated lncRNAs in GBM through gene microarray or RNA-sequencing [29][30][31].However, no study has focused on the ceRNA network which reflecting lncRNA-miRNA-mRNA interaction or regulation pairs for GBM.
In previous studies, lncRNAs were considered as junk RNAs for their lack of translation potential.While, recently studies identified several lncRNAs as valuable diagnosis and prognosis biomarkers in cancers and other types of complex diseases, such as lncRNA-p53, H19, MALAT1 [16,32,33].In this study, besides differential expression analysis of lncRNA, miRNA and mRNA, we also drew our attention to the prognosis significance of DEL.As a result, 6 out of the 28 DEL were found to be significantly associated with the survival of GBM, i.e.RP11-273G15.2,RP3-439F8.1,RP11-286B14.1,RP11-161H23.9,LINC01579 and LY86-AS1.As to LY86-AS1, previous studies have proved its potential in several types of cancer, including myeloid cancer, colorectal cancer [34][35][36].Our study confirmed the novel function of LY86-AS1 in GBM.While, the other 5 DEL were firstly determined as cancer prognosis biomarkers, and further studies are still needed to confirm their roles in GBM.
MiRNA plays an important role in tumor immune response.For example, miR-9 is overexpressed in several types of malignancies, including lung cancer, which resulting in down-regulation of MHC class I and preventing cancer cells detection through the immune system [37].MiR-139-5p is the only miRNA in ceRNA Figure 5: qPCR analysis of the six lncRNAs significantly associated with GBM survival.www.impactjournals.com/oncotargetnetwork, which have been proved to associated with the progression of multi human cancers, such as breast cancer, liver cancer, as well as esophageal cancer [38][39][40].Dysregulated calcium signaling is a classical cancer hallmark and provides chance for cancer diagnosis and therapy [41][42][43].Calcium signaling pathway is the most significant pathway of DEM in PG compared with adjacent normal brain tissues and miR-139-5p was proved to be associated calcium metabolism by previous studies [38,44,45].Consistent with our study, several molecular biology and bioinformatics analysis also proved the association between miR-139-5p and GBM [46][47][48].
CXCR4 and NR5A2 are the two DEM that regulated by miR-139-5p in ceRNA network.CXCR4 and NR5A2 are all reported to be associated with the progression of human cancers [49][50][51][52].Besides, CXCR4 has been proved to be related to GBM [53,54].As to NR5A2, it encodes a DNA-binding zinc finger transcription factor protein and involved in the expression of genes for hepatitis B virus and cholesterol biosynthesis.It inhibit pancreatic cancer stem cell properties [55] and associated with pancreatic cancer risk [56].While, no study has focused on its role in GBM, our study should provide novel evidence of potential of NR5A2 on the survival of GBM.
In conclusions, our study presents overall ceRNA network pattern in GBM, and provides several known and novel diagnosis and prognosis biomarkers.This should be helpful for the treatment of GBM.

TCGA and GEO datasets
RNA sequencing datasets were obtained from TCGA which contained 156 primary GBM samples, 13 recurrent GBM samples and 5 adjacent normal brain tissues.MiRNA expression profiles were downloaded from GEO with the accession number of GSE65626.A total of 3 GBM and 3 adjacent normal brain tissues were used for the total RNA extraction and miRNA/mRNA profiling based on commercial affymetrix mRNA and miRNA microarray platform.

Differential expression analysis
The RNA sequencing data were classified into 3 cohorts, i.e. primary GBM (PG), recurrent GBM (RG) and adjacent normal brain tissues (NG).The single expression data was combined and filtered RNA and miRNA data with no expression values.The raw count data were normalized based on DESeq version2 [19], a free available R Bionconductor package, and differential expression RNAs in PG compared with NG, and in RG compared with PG, were identified using DESeq command by instructions of DESeq manual.MiRNA microarray datasets were firstly preprocessed by affy package [20] of R, mainly including background correction and normalization.Linear Models for Microarray and RNA data (limma) package [21] was used for the identification of DEMI in GBM samples compared with adjacent normal brain tissues.Genes with absolute fold change (log2 scale) > 1 and FDR adjusted p-value < 0.05 were considered as significant.

Pathway enrichment analysis
To uncover the biological processes involved in the progression of GBM, we conducted functional enrichment analysis for DEM in PG and RG in the Database for Annotation, Visualization and Integrated Discovery (DAVID) [22].KEGG pathway with p-value < 0.05 were considered as significantly enriched in DEM.

Identification of differential expression lncRNAs
To obtain DEL from the total differential expression RNAs, we intersected the GENCODE lncRNA annotation version26 [23] with our result.Non-coding RNAs not included in the annotation were excluded.

Construction of ceRNA network
LncRNA-miRNA interactions and miRNA targeted mRNAs were obtained through miRcode [24] and miRTarBase [25] respectively.For miRNA-mRNA pairs, only those validated by at least two of the following methods, i.e. qRT-PCR, western blot, microarray and ChIP-seq, luciferase reporter assay, were retained.
To obtain more comprehensive landscape of miRNA-mRNA regulation pairs, we further explore miRNA-mRNA pairs from TargetScan database (http:// www.targetscan.org/vert_71/).The miRNA-mRNA network containing DEMI and DEM was obtained.Supplementary Figure 1 illustrates the distribution of miRNA-mRNA regulation number that validated by different experimental methods number.

Statistical analysis
All of the statistical analysis, including differential expression test and survival analysis, were conducted based on R version3.2.2 programming software.Differences of expressions of qPCR analysis were tested through two tailed t-test with p-value < 0.01 as significance.

Figure 1 :
Figure 1: Analysis overview of RNA expression profile from TCGA.(A).Principle component analysis (PCA) of GBM and adjacent normal brain tissue samples.(B) and (C) is the scatter plot of relative read count (log2 scale) mapped to every gene in PG versus NG and RG versus PG respectively.

Figure 2 :
Figure 2: KEGG pathway analysis of DEM in PG and RG.(A).Heatmap of the top 100 down-regulated and 100 up-regulated genes in PG samples (left).The right panel represents the most significant KEGG pathway of PG DEM.(B).Heatmap of DEM in RG samples (left).The right panel represents the most significant KEGG pathway of RG DEM.Green, red and white box in right panel of (A) and (B) represents down-regulated, up-regulated and non-differential expression genes respectively.

Figure 3 :
Figure 3: Analysis overview of miRNA expression profiles.(A).Scatter plot of relative miRNA level (log2 scale) of GBM sample versus adjacent normal brain tissues.(B).Two-way cluster of DEMI and samples.X-axis and Y-axis represents DEMI and samples respectively.

Figure 4 :
Figure 4: CeRNA network and survival analysis.(A).CeRNA network.Diamonds, circle and triangles represent lncRNAs, miRNA and mRNAs respectively.Green and red color is down-regulated and up-regulated genes.(B).Kaplan meier plots of six lncRNAs that significantly associated with the survival of GBM. Green and red line is the GBM samples with lower and higher (based on the median of all GBM expression values) gene expression values respectively.