COL3A1 and SNAP91: novel glioblastoma markers with diagnostic and prognostic value

Although patients with glioblastoma (GBM) have grave prognosis, significant variability in patient outcome is observed. This study aims to identify novel targets for GBM diagnosis and therapy. Microarray data (GSE4290, GSE7696, and GSE4412) obtained from the Gene Expression Omnibus was used to identify the differentially expressed genes (DEGs) by significant analysis of microarray (SAM). Intersection of the identified DEGs for each profile revealed 46 DEGs in GBM. A subset of common DEGs were validated by real-time reverse transcription quantitative PCR (qPCR). The prognostic value of some of the markers was also studied. We determined that RRM2 and COL3A1 were increased and directly correlated with glioma grade, while SH3GL2 and SNAP91 were decreased in GBM and inversely correlated with glioma grade. Kaplan-Meir analysis of GSE7696 revealed that COL3A1 and SNAP91 correlated with survival, suggesting that COL3A1 and SNAP91 may be suitable biomarkers for diagnostic or therapeutic strategies for GBM.


INTRODUCTION
Glioma is the most common malignant primary brain tumor in humans, occurring in 6 of every 100,000 people. With a five-year survival rate of 20-30%, it is also one of the most aggressive [1]. These tumors are composed of a heterogeneous population of cells that contributes to the resiliency of the disease [2]. Generally speaking, gliomas are classified as either relatively slow growing low-grade (I or II) tumors or rapidly growing, highly metastatic high-grade (III or IV) tumors [3]. Overall, the disease is a fast-progressing fatal malignancy and the majority of patients with high-grade gliomas (III or IV) suffer from a poor quality of life [4,5]. Currently, the standard clinical treatment is surgical resection followed by radiotherapy and chemotherapy [6][7][8][9]. However, patients who receive these treatments may develop resistance to chemotherapy [10]. Thus, recent efforts have focused on identification of candidate biomarkers of glioblastoma (GBM; grade IV glioma) development for early detection and to produce more effective therapeutic strategies [11][12][13].
At present, diagnosis of GBMs is mainly based on histological detection. Although the importance of numerous genes, including EGFR, bFGF, VEGF, IGF-1, p53 and p16 [14][15][16][17][18][19] to glioma progression has been well established. These genes are neither predictive of survival of glioma patients nor able to guide therapeutic decisions. With the continuous development of biotechnology and the innovation of novel high-throughput technology, studies have begun to investigate diseases at the genome level, and gene chip technology has become more common. To date, microarray analysis has been successfully used to identify unknown glioma-associated oncogenes [20], and analyze gene expression within different biological networks [21][22][23].
In the present study, we analyzed three microarray gene expression profiles to examine changes in gene expression associated with glioma progression and identify novel targets for glioblastoma diagnosis and therapy. We identified 46 differentially expressed genes in GBM that were common among all three profiles. Differential expression of a subset of differentially expressed genes

Research Paper
(DEGs) was validated by real-time quantitative reverse transcription PCR (qPCR). We determined that RRM2 and COL3A1 were up-regulated and directly correlated with glioma grade, while SH3GL2 and SNAP91 were downregulated in GBM and inversely correlated with glioma grade. Kaplan-Meir analysis of GSE7696 revealed that COL3A1 and SNAP91 correlated with survival, suggesting that COL3A1 and SNAP91 may be suitable biomarkers for diagnostic or therapeutic strategies for GBM.

Identification of DEGs
Three gene expression profiles (GSE4290, GSE7696 and GSE4412) of non-tumor, low grade glioma and high grade glioma tissue samples were analyzed to identify genes differentially expressed during tissue progression. A total of 1,183 genes (343 up-regulated and 840 down-regulated genes) between normal and tumor tissues in GSE4290, 1,787 genes (821 up-regulated and 966 down-regulated genes) between normal and GBM tissues in GSE7696, and 138 genes (110 up-regulated and 28 down-regulated genes) between grade IV and III grade glioma samples in GSE4412 were filtered as differentially expressed genes (DEGs) ( Figure 1A-1C, Supplementary Table  S1). Intersection of the DEGs identified a total of 46 common DEGs, suggesting that these DEGs may play an important role during glioma progression ( Figure 1D, Table 1). Moreover, the results of cluster analysis are showed in Supplementary Figure S1.

Function enrichment of DEGs
To identify the functional differences of DEGs in different glioma grades, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed. Enrichment analysis suggested that DEGs function differed between different glioma grades. DEGs in GBM tissue compared to non-tumor tissue were mainly involved in the regulation of the extracellular matrix (ECM) ( Table 2), an environment that is essential to tumor development and maintenance [24,25].

Independent validation of GBM DEGs
A subset of DEGs in GBM samples compared to non-tumor samples ( Table 1) was further validated by real-time quantitative reverse transcription-PCR (qPCR). It's worth noting that there were four genes specifically expressed among DEGs. We found that expression of up-regulated DEGs RRM2 and COL3A1 (>2-fold up-regulated as determined by the gene expression profile) was significantly higher in malignant gliomas compared to lower grade gliomas and non-tumor brain tissue and directly correlated with glioma grade (Figure 2A & 2B). Expression of down-regulated DEGs SH3GL2 and SNAP91, which were found to be >2-fold lower in GBM samples by microarray analysis, was lower in malignant gliomas compared to non-tumor tissue and lower grade gliomas. Expression of SH3GL2 and SNAP91 also inversely correlated with glioma grade ( Figure 2C & 2D). Additionally, the association between gene expression and clinicopathological parameters was analyzed in the present study. No significant association was observed between the expression of validated DEGs and patient age or gender (Table 3, Supplementary Table S2). Gene expression of the validate DEGs was consistent with the TCGA datasets ( Figure 3). These results suggest that expression of DEGs correlates with glioma tumor grade.

DISCUSSION
Recently, microarray-based expression profiling studies have revealed that genes differentially expressed among glioma grades could be of prognostic value [26]. Therefore, identification of genes differentially expressed in GBMs compared to lower grade gliomas could greatly facilitate prognostication and our ability to develop effective treatment protocols [27][28][29]. In order to identify potential biomarkers for glioblastoma prognosis and therapy, we used microarray data (GSE4290, GSE7696, and GSE4412) to identify the differentially expressed genes (DEGs) by significant analysis of microarray (SAM). 46 DEGS were found to be in common in all profiles. A subset of the DEGs identified in GBM-COL3A1, SNAP91, RRM2, and SH3GL2-were also validated by real-time reverse transcription quantitative PCR (qPCR).
A potential prognostic factor for GBM is COL3A1. We found that COL3A1 was up-regulated in GBM tissues. COL3A1 encodes a fibrillary collagen molecule that has been linked to myocardial infarction [30] and the risk of stroke recurrence and prognosis in Chinese patients [31]. PDGFRβ was found to significantly correlate with the reference COL1A1, COL1A2, and COL3A1 expression [32]. Another potential target for GBM therapies and diagnosis is SNAP91. We determined that SNAP91 was down-regulated in GMB tissues. SNAP91 encodes a synapse-associated protein that is expressed highest in the brain [33].
We also determined that RRM2 and SH3GL2 are differentially expressed in GBM and could potentially serve as GBM biomarkers. RRM2 codes an enzyme that's overexpression has been correlated with resistance to radiotherapy and chemotherapy, and enhanced malignant potential in multiple cancers [34][35][36]. SH3GL2 is a multifunctional gene that encodes a protein called endophilin-1. Endophilin-1 is primarily distributed in the central nervous system and functions as a tumor suppressor in many tumors. SH3GL2 has been shown to be decreased by miR-330 and associated with ERK and PI3K/AKT signaling pathways [37,38]. We also found that while RRM2 and SH3GL2 were differentially expressed in GBM, their expression had no prognostic value for GBM, suggesting that they are tumor-specific, but not GBM specific. Our results are in line with previous studies that showed that RRM2 was tumor-specific, but not associated with the survival in GBM [39].
Microarray datasets not only renders the analysis of large quantities of biological information quick and simple, but also facilitates the identification of potential biomarkers for various diseases and medical conditions. In this study, we have identified and validated a set of novel GBM biomarker candidate genes. We also provided evidence of the prognostic value of two of these potential markers, COL3A1 and SNAP91. However, further studies are needed to more precisely characterize the functional significance of these genes in glioma progression and their potential prognosis application for glioma needs to receive more attention.

Affymetrix microarray analysis
Three expression profiles (GSE4290, GSE7696, and GSE4412) were acquired from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) database. The platform of GSE4290 and GSE7696 is GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array. The platform of GSE4412 is GPL96 [HG-U133A] Affymetrix Human Genome U133A Array. In GSE4290 data, 23 brain tissue samples from epilepsy patients as nontumor (control) samples and 157 glioma samples, including astrocytoma, oligodendroglioma and glioblastoma samples (45 grade II, 31 grade III and 81grade IV) were used. Total of 80 glioblastoma specimen and 4 non-tumor brain samples were analysis in GSE7696. Only grade III (n = 24) and IV (n = 50) gliomas were included in GSE4412. The original CEL files and probe annotation of the platform was used.

Data pre-processing
The probe-level data in CEL files were converted into expression profiles. Background correction and quartile data normalization were extracted by the robust multi-array average (RMA) with affy package. For genes corresponding to multiple probe sets, the gene expression values of those sets were averaged [40,41].

Differentially expressed genes (DEGs) analysis
Significant analysis of microarray (SAM) using two classes (tumor vs normal) of unpaired measurements was performed for DEGs identification in GSE4290 and GSE7696. GSE4412 only analyzed the DEGs between grade III and grade IV samples. Genes with a |log fold change (FC)| ≥ 2 and p < 0.05 were considered differentially expressed. The results from each analysis were intersected to identify common DEGs that may play a significant role in tumor progression.

Gene ontology (GO) enrichment analysis
For preliminary investigation into the functional differences of DEGs in glioblastoma (GBM), Database for Annotation Visualization and Integrated Discovery (DAVID) online software (DAVID Bioinformatics Resources 6.7) was used to perform gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes(KEGG) enrichment analysis. DAVID utilizes Fisher's exact test to enrich functions of certain genes [42].

Patients and tissue samples
All patients undergoing surgical treatment in the Hunan cancer hospital (Changsha, Hunan, China) for primary brain cancers between 2007 and 2013 were invited to participate in this institutional review boardapproved study. This study compiled data for glioma tumor stages I through IV and normal brain controls. Tumors were histopathologically classified according to the WHO classification. The tissue samples were flash frozen in liquid nitrogen immediately after resection and stored at −80 °C for future processing.

RNA extraction and real-time quantitative PCR
Total RNA was extracted by trizol reagent according to the manufacturer's protocol. When the A260/A280 ratio was between 1.9 and 2.1, the extracted RNA was determined to be pure and was used in subsequent experiments. 2µg RNA was reverse-transcribed using the Primescript RT reagent Kit with gDNA Eraser (Takara Bio Inc, Japan). Real-time PCR was performed using the SYBR Premix DimerEraser kit (Takara Bio Inc, Japan). The reactions were cycled 40 times [95°C, 30 seconds, (95°C, 5 seconds; 55°C, 30 seconds; and 72°C, 30 seconds)] with fluorescence measurements. A melting curve was performed at the end of amplification cycles to verify the specificity of the PCR products. All the determinations were performed in duplicate. Primers used for real-time PCR are shown in Table 4. The relative expression of target gene mRNA was normalized to the expression level of GAPDH mRNA using the 2 −ΔCt method.

Statistical analysis
SPSS16.0 software was used for general statistical analyses. Comparisons between two experimental groups were performed using Student's t test. Survival rate was calculated using the Kaplan-Meier method with the logrank test applied for comparison. All tests performed were two-sided and the criterion for statistical significance was taken as p < 0.05.