The transcriptome and miRNome profiling of glioblastoma tissues and peritumoral regions highlights molecular pathways shared by tumors and surrounding areas and reveals differences between short-term and long-term survivors.

Glioblastoma multiforme (GBM) is the most common and deadliest primary brain tumor, driving patients to death within 15 months after diagnosis (short term survivors, ST), with the exception of a small fraction of patients (long term survivors, LT) surviving longer than 36 months. Here we present deep sequencing data showing that peritumoral (P) areas differ from healthy white matter, but share with their respective frankly tumoral (C) samples, a number of mRNAs and microRNAs representative of extracellular matrix remodeling, TGFβ and signaling, of the involvement of cell types different from tumor cells but contributing to tumor growth, such as microglia or reactive astrocytes. Moreover, we provide evidence about RNAs differentially expressed in ST vs LT samples, suggesting the contribution of TGF-β signaling in this distinction too. We also show that the edited form of miR-376c-3p is reduced in C vs P samples and in ST tumors compared to LT ones. As a whole, our study provides new insights into the still puzzling distinction between ST and LT tumors, and sheds new light onto that "grey" zone represented by the area surrounding the tumor, which we show to be characterized by the expression of several molecules shared with the proper tumor mass.


INTRODUCTION
Glioblastoma multiforme (GBM; World Health Organization [WHO] grade IV) is the most common and most malignant primary tumor of the brain. Excessive proliferation, diffuse infiltration into surrounding brain tissue and suppression of antitumor immune surveillance contribute to the malignant phenotype of glioblastomas. www.impactjournals.com/oncotarget Despite multimodal aggressive treatment, the prognosis of GBM patients is poor and the median survival time after diagnosis is still in the range of just 12 months [1], or even shorter [2]. An interesting exception to this rule is a small fraction (less than 10%) of GBM patients, referred to as long-term survivors (LTS), who survive longer than 36 months. Several attempts have been made to unravel the molecular differences at the basis of such a clinically relevant issue, addressing either the genomic aspects [3] or the mutation or methylation state of specific genes [4,5]. Some studies also performed genomewide gene expression analyses by using the microarray technology, resulting in the discovery of differentially expressed genes [6]. Nonetheless, we are still far from the comprehension of the reasons leading a glioblastoma patient to survive significantly longer than another one.
Targeted therapies have been introduced for GBM, based on information obtained from molecular studies of the tumor tissue [7]. Nevertheless, no clear survival benefit has been demonstrated, probably because tumor tissue represents the last step of tumorigenesis involving a number of alterations allowing tumor cells to survive. Since recurrence occurs in peritumoral tissue in about 95% of patients [8], getting a deeper insight into the biology of the peripheral areas immediately surrounding tumors is of great interest, as it may unveil molecular alterations representing the first signs of the future malignant evolution to glioblastoma. In this regard, only a few works have addressed the question of which molecular alterations affect the brain area surrounding the tumor [9,10]. These works have indicated that peritumoral areas are diverse and composed of several different cellular components, such as microglia, reactive astrocytes, T lymphocytes, and others [11]; at the same time, they have started showing that the peritumoral areas are deeply interested by molecular and metabolic changes rendering them very different from normal brain and much closer to transformed cells, or at least necessary to sustain the growth of the frankly tumor cells.
In this study, we report a broad analysis of central tumor samples, from both long term survivors (LT) and short term ones (ST), integrated by the same analysis performed on peritumoral areas from the same patients. We provide data from gene expression (SAGE) analysis and from microRNA deep sequencing that, as a whole, allow not only to draw a novel comprehensive picture of LT vs ST tumors, but also to shed light into the critical, tumor-supporting peritumor areas.

Gene expression (SAGE) analysis of GBM centers and peritumor areas reveals RNA molecules differentially expressed in all tumor centers vs their own peritumorareas
We collected frankly tumoral areas (C) as well as peritumoral areas from 4 long term (LT: survival longer than 36 months) and 9 short term (ST: survival shorter than 36 months) patients diagnosed with primary glioblastoma (Table 1a). The peritumoral samples (P) were collected at an average distance of 2 cm from the border of the enhanced tumor, and did not show any evidence of tumor presence at macroscopic evaluation performed by the surgeon. We submitted total RNAs extracted from these samples to SAGE profiling, and, for each C and P sample in our SAGE dataset, we determined the GBM subtype by calculating the single sample Gene Set Enrichment Analysis [12] for each SAGE profile relative to the classified gene lists from Verhaak et al. [13]. Single-sample GSEA (ssGSEA), an extension of Gene Set Enrichment Analysis (GSEA), calculates separate enrichment scores for each pairing of a sample and gene set. Each ssGSEA enrichment score represents the degree to which the genes in a particular gene set are coordinately up-or down-regulated within a sample. C samples were largely heterogeneous, showing correlation with neural (5/13 samples), mesenchymal (4/13 samples), classical (2/13 samples), or proneural (2/13 samples) subtypes. In contrast, the majority (8/13) of samples from P regions showed the highest correlation with the neural subtype. No specific enrichment in any subtype was observed when comparing the ST with the LT samples (Table 1b).
In order to investigate which RNA molecules can distinguish frank tumor areas (Cs) from the corresponding P regions, we analyzed our SAGE data for differentially expressed molecules. Filtering these results for those RNAs whose differential expression was characterized by a FDR < 0.05, yielded the transcripts listed in Suppl. Tables 1a and 1b. Among RNAs overexpressed in tumor centers vs peritumoral areas, only four were shared by STCs and LTCs (compare Suppl. Table 1a and 1b), while other molecules were specific of either STCs or LTCs. For example, STCs exhibited, compared to STPs, the overexpression of several markers of the mesenchymal subtype of glioblastoma [13], such as COL1A1, COL1A2, COL5A1, IGFBP6, DCBLD2, many of which are also typically expressed by microglia or reactive astrocytes in glioblastoma microenvironment [10]. Conversely, RNAs overexpressed in LTCs vs LTPs showed an enrichment in markers of the neural subtype of glioblastoma, such as BASP1, CDC42, and SH3GL2 [13].
We then asked a slightly different question, that is which RNAs are differentially expressed in each C vs P pair, so that their expression can distinguish each C sample when compared to its own P area. To this aim, we employed a ReliefF algorithm [14] in conjunction with leave-one-out cross validation to compute the average (over all validation folds) merit of every RNA molecule for the binary classification problem. Figures 1 and 2 show the heatmaps drawn based on average merit ranking for LT and ST comparisons, respectively. Among the 50 top ranking RNAs, both over-and under-expressed molecules (C vs P) were found in almost equal proportions. M 70 19 Yes 30% X X  Interestingly, almost no RNAs were shared by LT C/P and ST C/P comparisons, with the exception of one noncoding isoform of ELOVL fatty acid elongase 1, which was more expressed in both STPs and LTPs when compared to their own tumor centers; however, amongst the mRNAs found to be overexpressed in both LTCs and STCs with respect to their P regions, several RNAs were seen to be involved in the same processes, such as those encoding for two nuclear pore complex proteins, RANBP2 and RANBP9 (overexpressed in LTCs and STCs) respectively, or UBE3C and UBR1, both members of the ubiquitin protein ligase E3 complex (overexpressed in STCs and LTCs) or TMX4 and PRDX2, involved in cell redox homeostasis (overexpressed in the majority of STCs and LTCs). Among RNAs differentially expressed in the ST group, but not in the LT one, we found the prolylendopeptidase (PREP), highly expressed in each STC compared to its own STP. The enzyme encoded by this gene is a serine protease working at the digestion of ECM by fibroblasts upon fibroblast activation, in all those processes mediated by cancer associated fibroblasts and supporting tumor development [15], such as tissue remodeling, angiogenesis [16], and immune tolerance [17]. Another interesting example is ARHGAP29, encoding for Rho-GTPAse Activating Protein 29 (overexpressed in STCs vs STPs but not in LTCs vs LTPs): this mRNA is part of a previously described "mesenchymal" signature of glioblastoma [13], also found to be expressed by reactive astrocytes and microglia [10]. On the LT side, we found a consistent overexpression of neuritin 1 (NRN1), a neurotrophin promoting neuronal migration [18] and induced by hypoxia [19] (in all LTCs vs their respective LTPs). Two mRNAs whose expression characterizes LTCs vs LTPs, LRFN3 (encoding for Leucine Rich repeat and Fibronectin type III domain containing 3) and NR2F6 (Nuclear Receptor subfamily 2 group F member 6), were previously included in the "classical" signature of glioblastoma [13]. Taken together, these findings indicate that GBM centers show a different expression profile compared to peritumor areas, and that the mRNAs whose expression is relevant to distinguish C samples from P samples are different when comparing ST with LT patients.

All C and P samples, from both ST and LT patients, show a shared perturbed expression of a number of genes in comparison with healthy white matter
Even if C and P samples are obviously differentiated by the expression of many genes, one of the most important issues we wanted to address was to decipher the molecular signature, if any, shared by C and P samples when compared to normal controls of healthy white matter. To this aim, we compared our SAGE data from GBM patients to the RNA expression profile obtained through the same method on a healthy white matter sample.
We found a limited number of genes (i.e. 11, Table 2a, p value < 0.05) underexpressed in both Cs and Ps of ST and LT patients, among which CNP and ENPP2 (encoding for 2′, 3′-cyclic nucleotide 3′ phosphodiesterase and for ectonucleotidepyrophosphatase/phosphodiesterase 2, respectively), both oligodendrocyte progenitor cell markers [20], and EDIL3, expressed by oligodendrocytes [20] and a marker of the "neural" subgroup of glioblastoma [13]. The overexpressed RNA molecules were slightly more (i.e. 21, Table 2b, p value < 0.05) than the underexpressed ones, and included microvascular, proliferation and survival markers of GBM, such as COL4A1 and TOP2A [21,22], ECM molecules, as NID1 [23], markers of activated microglia, as CXCL14 [24], and RNAs characterizing the "classical" glioblastoma subtype, such as RGS12 and SOCS2 [13], also expressed in activated astrocytes [25], or even the "mesenchymal" subtype, such as TGFBI [13]. Notably, some of the overexpressed RNAs showed a comparable level of increase in both Cs and Ps compared to healthy white matter, likely indicating that some molecular events are taking place in the peritumor areas, which closely resemble those overtly occurring in frankly tumor regions. Several of these genes encode proteins playing their roles in the extracellular matrix (ECM), and/ or in the regulation of cell-cell or cell-substrate adhesion, as collagen IV, CXCL14, and TGFBI. This indicates that ECM remodeling is a key process taking place in our samples.
We validated, by real-time qPCR, the differential expression of 8 mRNAs and 2 lncRNAs in in both STCs and STPs compared to healthy white matter, notably extending our SAGE results also to some new samples that had not been submitted to deep sequencing (Suppl. Table 5 and Suppl. Fig. 1). We then chose to validate by WB if the differential expression of some mRNAs was paralleled by a consistent modulation of the corresponding protein products too. In fact, we could confirm that LMTK3, encoding for Lemur Tyrosine Kinase 3 known to promote invasion in breast cancer [26], was widely overexpressed in both Cs and Ps of LT and ST patients ( Figure 3). SOCS2 expression increase was confirmed at the protein level in the majority of tumor centers, while its overexpression in the peripheral areas was variable and in general less marked than in the centers (Fig. 3). Among the proteins whose increase we could validate, the strongest result was that of TGFBI, that we confirmed as overexpressed in nearly all LT, ST, P and C samples we assayed (Fig. 3).

The differential expression of some key RNAs characterizes samples from short term and long term patients
A further key question that needs to be addressed in the comprehension of glioblastoma is the difference existing between the most frequent short survival patients (ST) and those that survive longer than 36 months after diagnosis, called long term survivors (LT). We thus searched our SAGE data for RNAs differentially expressed in these two categories, either in the comparison between tumor centers, or between peripheral areas.We started by focusing on RNAs whose expression difference between categories was statistically significant (i. e. FDR < 0.05). As shown in Table 3, 8 RNAs were overexpressed in STCs vs LTC, and only one underexpressed molecule reached this high statistical significance in the comparison. These RNAs include some whose involvement in GBM is soundly documented, such as PDGFRA [13], and also others, previously described in other types of solid tumors, but never in glioblastoma. Among these, two homeobox genes, HOXC10 and HOXD10, and TMSB15A, encoding for thymosin-beta-15A, a tumor motility gene promoting metastases in prostate cancer [27], all overexpressed in STCs vs LTCs. A strong difference in expression (Log 2 FC STC vs LTC = 4.4) was found for the mRNA CA3, encoding for carbonic anhydrase III, known to play an anti-oxidant role by working as an oxygen radical scavenger [28], and whose expression correlates with poor survival in GBM [29]. BCAR3, on the contrary, encoding for the SH2-containing signal transducer Breast cancer anti-estrogen resistance 3 [30], was clearly underexpressed (Log 2 FC = -2.07) in STCs vs LTCs. We chose to validate the differential expression of 3 out of such 8 molecules, and we could confirm all of them (Suppl. Fig. 2). A few genes were differentially expressed (FDR < 0.05) also between STPs and LTPs (Table 3), among which lumican (LUM), a proteoglycan with an established role in the control of tumor progression [31] that was underexpressed in STPs vs LTPs (Log 2 FC = -3.25).
When we extended our view of transcripts differentiating ST from LT samples also with a FDR higher than 0.05, among the RNAs more expressed in STCs vs LTCs, we found again TGFBI (Log 2 FC STC vs LTC = 1.8, p = 0.034), in agreement with its higher expression in all GBM samples vs healthy controls and likely as a sign of the greater aggressiveness of STCs vs LTCs. We also observed the overexpression of two molecules, COL6A2 (Log 2 FC STC vs LTC = 2.99, p = 0.0077) and the cell surface receptor CD44 (Log 2 FC STC vs LTC = 2.65, p = 0.019), commonly expressed in mesenchymal tissues [32] and associated with ECM remodeling [33]. Among the relevant genes overexpressed in STCs vs LTCs, we also detected two key transcription factors, SOX4 and SOX11 (Log 2 FC STC vs LTC = 2.58, p = 0.000458, and 2.75, p = 0.00236, respectively), usually associated with neural stem cells [34]. In particular SOX4, a direct TGF-β target gene, was demonstrated to sustain tumorigenicity of glioma-initiating cells [35]. Interestingly, when we compared the peripheral areas of ST vs LT patients, we found the overexpression (Log 2 FC STP vs LTP = 1.6, p = 0.0178) of IGFBP5, encoding for insulin-like growth factor binding protein 5, previously shown to be expressed by activated astrocytes in retinoblastoma, where they cooperate with tumor cells to promote tumorigenesis [36]. We then submitted our differentially expressed RNAs to a category enrichment analysis by using DAVID bioinformatics resources [37,38], to understand if ST tumors are distinguished from LT ones by specific cellular components or molecular pathways. As shown in Table 4, when considering the RNAs overexpressed in STCs vs LTCs, the extracellular matrix is the cellular component clearly prevalent, whereas the categories highly enriched in RNAs underexpressed in STCs vs LTCs are those of "synapse", "cell junction", "cation channel complex", and similar ones related to neuronal development and differentiation. Also the molecular pathways enriched in STCs or LTCs are clearly different (Table 5): genes overexpressed in STCs vs LTCs are enriched in KEGG pathways such as "cell cycle", and, significantly, "TGFbeta signaling pathway". On the contrary, and in agreement with the cellular component analysis, genes underexpressed in STCs vs LTCs are enriched in pathways involving the function of several types of synapses, calcium signaling pathway, and also chemokine signaling pathway.
Altogether, these results indicate that ST tissues differ from LT ones, especially for the expression of markers of TGF-β pathway, of reactive astrocytes, extracellular matrix remodeling and tumor cell motility.

The miRNome profiling of GBM centers and peritumor areas reveals microRNA molecules differentially expressed in all tumor centers vs their own peritumor areas
In order to improve our molecular picture of glioblastoma tissues, we performed a microRNA deep The last four columns show the Log 2 FC of STC, STP, LTC and LTP, respectively, vs healthy white matter sequencing analysis on a cohort of samples almost completely overlapping (3 out of 4 LT and 7 out of 9 ST) with those already studied by SAGE. The data we obtained were initially employed to check whether the sole miRNome expression is able, per se, to separate our samples into clinically relevant classes. In fact, as shown in Figure 4a, total miRNA expression could correctly clusterize most of GBM C and P samples, and, as expected, placed the healthy white matter control at one edge of the cluster prevalently made by P samples, opposite to C tissues. We then asked the same question previously posed with the SAGE data, i.e. which microRNAs are differentially expressed in all C/P pairs. For this purpose we made a paired analysis comparing the expression level of each C and P sample in the 10 analyzed samples. The heatmap shown in Figure 4b represents the miRNAs with expression significantly different (FDR < 0.05) in this comparison. Among miRNAs overexpressed in C vs P samples, we found miR-21-3p, miR-196b-5p, miR-135b-5p, miR-183-3p, known as "oncomiRs" in several tumors, including glioblastoma [39][40][41][42]. In addition, we found a few other overexpressed miRNAs, such as miR-1246, miR-1290, miR-7641, and miR-503-5p, whose role in cancer or in glioblastoma hasn't been investigated yet. We also found a larger number of underexpressed miRNAs in C vs P samples, among which miR-219a and miR-338-3p and miR-338-5p, with an established role in oligodendrocyte maturation [43], miR-34b and miR-34c, widely recognized as tumor suppressor miRNAs in general and specifically in GBM [44], but also several others, whose role in cancer is not known. We further confirmed our findings by validating also by real time qPCR the expression of 5 microRNAs among those shown in Figure 4b, and of three other randomly chosen ones (Suppl. Fig. 3).

A common set of microRNAs distinguishes C and P samples from healthy white matter
As for the SAGE analysis, we were interested in the comparison of C and P samples vs the healthy control, in search of molecular signatures able to differentiate not only frankly tumoral regions, but also the peritumoral areas, and thus corroborating the involvement of such regions in tumor onset, growth and support. We selected those miRNAs with a Log 2 FC vs healthy control of at least 1.5 for overexpressed miRNAs, or with a Log 2 FC vs healthy control of at least −1.5 for the underexpressed ones. Among overexpressed miRNAs, listed in Table 6a, we report several known "oncomirs", such as miR-10b, miRNAs of the miR-17-92 cluster, miR-21, miR-148 and others, largely described for their roles in cancer and in glioblastoma specifically [45][46][47], even if not described in the peritumoral areas. In addition, we found a set of miRNAs, among which miR-503-5p, not only overexpressed in C and P samples vs healthy white matter, but also in Cs vs Ps (see Figure 4b), suggesting a direct correlation of its expression with the presence of tumor cells. Moreover, the average expression of two miRNAs, miR-18a-5p and miR-503-5p, resulted higher in ST tumors as compared to LT ones (Log 2 FC STC vs LTC = 2.52, and 2.25, respectively), suggesting a correlation between their expression and tumor aggressiveness. In this class of microRNAs correlated with the glioblastoma state, we inserted also three miRNAs that, apart from distinguishing P and C samples from healthy controls, showed also a differential expression between those peripheral areas with a detectable tumor cell infiltration and the noninfiltrated ones. These three miRNAs, miR-182-5p, miR-183-5p and miR-96-5p (Log 2 FC PNI vs PI = −3.96, −3.76, −3.79, respectively, with respective p-values of 8.96 × 10 −5 , 1.7 × 10 −4 , 1.64 × 10 −4 ), interestingly belong to the same genomic cluster located on chromosome 7, and own a recognized role as mediators of TGFβ signaling in glioblastoma [48]. In the comparisons with healthy controls too, as for the C/P comparisons, we found a larger number of underexpressed microRNAs in tumor samples (both P and C, LT and ST) compared to healthy control (Table 6c). Among these, several miRNAs previously described as downregulated in glioblastoma were present, such as miR-128, miR-124, miR-129 [49,50], but also many others, not yet described in this context, were found. Moreover, only four miRNAs, miR-146b-3p, miR-483-3p, miR-184, and miR-187-3p, were overexpressed in non infiltrated peritumoral areas vs infiltrated ones (Log 2 FC PNI vs PI = 3.17, p = 1.84 × 10 −4 , 3.18, p = 1.81 × 10 −4 , 3.66, p = 2.03 × 10 −5 , 3.83, p = 9.43 × 10 −6 , respectively).
In order to investigate which functions can be most likely affected by the modulated miRNAs, we performed a DIANA miRPath analysis [51], that identifies KEGG pathways involving the mRNAs predicted to be targeted by the modulated miRNAs. The highest ranking pathways predicted to be affected by either overexpressed or underexpressed miRNAs are listed in Table 6b and 6d, respectively: many pathways are expected to be influenced by both classes of miRNAs, while some others are preferentially related to one class. For example, overexpressed miRNAs are predicted to affect "Regulation of actin cytoskeleton" and "p53 signaling" pathways, clearly representing the morphological modifications typical of tumor cells, and the complex molecular interactions downstream of p53.

C samples show a reduced percentage of A to G editing of miR-376c-3p
The editing of RNA sequences by adenosine deaminases is an important mode of modulation of RNA function [52], and has been shown to affect microRNA sequences too, with several functional consequences, among which the change in targeted mRNAs [53]. We then studied our microRNA sequencing data in search of possible sequence isoforms that may originate from an A to G editing, and we found that miR-376c-3p, even if not differentially expressed in C vs P samples or in tumor samples vs healthy control, was less edited in C samples with respect to the P ones, that owned a fraction of edited variants comparable to that of the healthy white matter.
Specifically, the edited forms harbored a G in place of an A at nucleotide 6 starting from the 5′ of the mature miRNA, thus affecting its "seed" sequence and potentially its target mRNAs. As shown in Figure 5, most P samples showed a relative amount of edited miR-376c-3p which was comparable to that of the healthy control sample, while the majority of C samples owned a reduced percentage of the edited forms in favor of the unedited ones.
In order to verify the significance of these observation, we performed a chi-squared test by considering two independent groups, one made by the C samples group and the other made by the P samples counterpart. For each group, we summed all the normalized counts of the edited variants of the miRNA 376c-3p, separating them from the unedited ones. The factorization clustering of miRNA expression data distinguishes two clusters: one is prevalently made of P samples and includes the healthy control (SB), the other is enriched by C samples. B. A selected number of miRNAs can clearly distinguish each C sample from its own P region. Shown is the fold change heatmap of differentially expressed miRNAs in the paired comparison P/C cell types. Each box refers to a specific C/P cell type comparison, the color scale reflects the relative Log2(Fold Change) (green indicates over expression in C samples, red indicates underexpression).

Figure 5: Percent distribution of edited (in green) vs non edited (in purple) variants of miR-376c-3p in 7 C/P pairs of ST samples, 3 C/P pairs of LT samples, and in one healthy white matter sample (SB).
two sample test for equality of proportion with continuity correction gave a p-value = 3.6 × 10 −12 (X-squared of 48.3). Interestingly, we also noticed that LTC samples display, on average, a fraction of edited variants higher than the STC samples. For these two groups too, we obtained a p-value = 2.2 × 10 −16 (X-squared of 245.4).
To understand if the editing of miR-376c-3p might affect its mRNA target cohort, we performed a target prediction analysis of the two forms, edited vs unedited, and then submitted the two lists of predicted mRNAs to a functional annotation clustering analysis. We thus found that the edited forms are predicted to target mRNAs grouping in clusters such as "Hedgehog signaling pathway" and "Wnt signaling pathway" (Suppl.  . Table 2a) are enriched in mRNAs coding for proteins involved in sugar transmembrane transport and in apoptosis, completely absent among those predicted to be targets of edited miR-376c-3p. Of note, among the predicted targets of unedited miR-376c-3p, overexpressed in C samples, we found key cell control genes, such as CDKN1A (coding for p21) or the microRNA processing enzyme Drosha. All these data indicate that a certain loss of the physiological A to G editing of miR-376c-3p occurs in GBM samples compared to healthy tissues and also to the peripheral areas, and that this likely induces a shift in the mRNAs targeted by miR-376c-3p, in turn driving GBM cells towards cell proliferation and other typically tumoral functions.

DISCUSSION
We undertook a multifaceted molecular analysis of a group of tissue samples from glioblastoma patients, and specifically focused on two main issues: i) RNAs whose expression is perturbed not only in glioblastoma tumor samples (Cs), but also in the peritumoral areas (Ps), compared to white matter healthy controls; ii) molecules distinguishing the tissues from "long term" (LT) GBM patients (that survived longer than 36 months after diagnosis) from those from the most common "short term" (ST) ones. The rationale behind these two main goals of our work is the idea that the different survival of LT compared to ST patients might be explained by different molecular signatures, to which a significant contribute may be provided by the peritumoral areas, including multiple cell types. More generally, our view, supported by the data we are presenting, is that peritumoral areas are actively involved in glioblastoma even when a clear tumor phenotype is not detectable.

A mesenchymal signature, indicative of a possible infiltration of stromal cells, characterizes C and P samples compared to healthy white matter and specifically short-term tumors
The analysis of modulated RNAs affecting both tumor centers and peritumoral areas unveiled the overexpression of several molecules belonging to the "mesenchymal signature" of glioblastoma [54,13,10], that greatly overlaps with the infiltration of tumorsupporting "stromal cells", a role that in brain is played by reactive astrocytes and microglia. Among these RNAs, in our view the most notable ones are CXCL14 and TGFBI. We find CXCL14 overexpressed in C and P samples (both ST and LT) vs healthy white matter; however, while its expression levels are comparable in LTCs and in LTPs, it is overexpressed in the majority (7/9) of STC vs STP samples. Indeed, we confirmed CXCL14 expression by IHC (Suppl. Fig. 4) in the cytosol and extracellular region of reactive astrocytes in STC samples, whereas STP samples showed a lighter staining in the cytoplasm of some astrocytes, likely representing the activated ones. A few works have been published using only murine models, showing that CXCL14 is expressed by activated microglia after induction by GBM conditioned medium [24]; our finding of CXCL14 increased expression in STCs compared to STPs may reflect a higher level of microglial or more probably reactive astrocyte infiltration in ST tumor centers compared to peritumor areas, which, on the contrary, doesn't characterize LT tumor centers when compared to their own peritumor areas. This hypothesis could be supported by our data collected in the comparison of STCs vs STPs: among the RNAs overexpressed in STCs vs STPs, COL1A1, COL1A2 and COL5A1 are known markers of the mesenchymal glioblastoma subtype, as well as IGFBP6, DCBLD2 and OLFM1, all expressed by microglia or reactive astrocytes [13,10]. Notably, such an enrichment in genes expressed by stromal cells is absent among RNAs overexpressed in LTCs vs LTPs.
Another relevant example is TGFBI which, besides its recognized role as a marker of the mesenchymal class of glioblastoma [13], was described, in conjunction with SOX4, as a mediator of non-SMAD mediated TGF-beta signaling pathways in glioblastoma [55]. TGFBI is mainly secreted in the extracellular matrix, where it interacts with several other components, such as collagens, among which collagen IV alpha 2, whose mRNA is among those overexpressed in all C and P samples of our study. Our data thus not only confirm those reported by Lin and coworkers [55], but extend the significance of TGFBI overexpression also to peritumoral regions. Notably, when we searched for TGFBI protein expression by IHC, we found it in the extracellular space and in the basement membrane of vessels of C tissues, while it marked less intensely single cells in P samples (Suppl. Fig. 4), with a difference of expression reflecting that observed for TGFBI mRNA by SAGE. In P regions showing clear evidence of tumor cell infiltration, TGFBI staining was found extracellularly and in the basement membrane of vessels, reflecting the staining observed in C samples (not shown). Specific studies addressing the question of which cell types produce and secrete TGFBI in GBM microenvironment are surely needed, in order to understand its possible paracrine or autocrine roles in this context.
In general, we observed that, among overexpressed mRNAs shared throughout all C and P samples, the level Ubiquitin mediated proteolysis 7.57e-12 53 16 Focal adhesion 1.44e-11 71 16 Pathways in cancer 3.91e-11 114 18 p53 signaling pathway 6.66e-11 30 14 MAPK signaling pathway 6.66e-11 87 17 Wnt signaling pathway 1.02e-10 58 18 The number of genes targeted in each pathway and the number of miRNAs predicted to target those mRNAs are indicated in the third and fourth column, respectively of modulation (Log 2 FC, see Table 2b) vs healthy white matter was frequently comparable in LTCs and LTPs, while it was almost always higher in STCs compared to STPs. We propose that this may be explained by an enhanced transformation stage characterizing STCs, but at the same time may imply that several molecular pathways supporting tumor growth are already "switched on" in peritumor areas.

An "oncomir" expression pattern distinctive of TGFβ activation characterizes all C and P samples compared to healthy white matter
Also when we analyzed the microRNAs extensively overexpressed throughout all C and P samples compared to healthy white matter, we found many that may represent both signs of TGFβ active pathways and mediators of a general state of immune escape typically sustaining glioma growth. In fact, among the several known "oncomiRs" overexpressed in our samples, we found miR-106b and miR-93, recently shown to target NKG2DL, a ligand of the activating receptor of natural killer (NK) cells NKG2D, and thus suggested to contribute to the immune evasion typical of glioblastoma cells [56]. Mir-183, that we found to characterize most of our samples and also to distinguish the infiltrated vs the non-infiltrated peritumoral areas, is a TGFβ-induced miRNA previously reported to suppress tumor-associated natural killer cells, thus explaining one of the ways through which TGFβ plays its immunosuppressive roles in tumor microenvironment [57]. We can recognize several further microRNAs positively correlated with TGFβ role as a supporter of tumor growth: miR-21, shown to be induced by TGFβ1 and to target SMAD7, thus leading to the TGFβ-promoted formation of cancer associated fibroblasts [58]. Mir-106 too was described as deeply involved in TGFβ protumorigenic signaling through targeting of SMAD7, in turn inducing EMT in breast cancer [59], similarly to miR-10b, induced by TGFβ1 and promoting EMT in breast cancer [60]. Specifically in glioblastoma, miR-182 is induced by TGF-β, leading to prolonged NF-κB activation in a glioma subset [48]. Even among the underexpressed miRNAs we found possible indications of TGFβ action, such as  -769-5p  355  60  19  79  104   hsa-miR-770-5p  190  55  4  8  12   hsa-miR-874-3p  3834  1002  590  470  748   hsa-miR-99b-5p  1186  71  217  121  171 The numbers shown represent the cpm median values for each class of samples. Only miRNAs with a module fold change (log2) higher than 1.5 in all four comparisons (LTC vs HS, LTP vs HS, STC vs HS, STP vs HS) and with at least 50 cpm in one of the 5 samples are reported The number of genes targeted in each pathway and the number of miRNAs predicted to target those mRNAs are indicated in the third and fourth column, respectively. The significance level is indicated by a corrected p-value the reduced expression of miR-127, demonstrated to be under the negative control of TGFβ in hepatocellular carcinoma [61]. To our knowledge, ours is the first work where a systematic measurement of microRNA expression is performed in glioblastoma peritumoral areas, and the results yielded by such a study, together with the gene expression analysis of the same samples, indicate that peritumoral regions share with frankly tumor areas a mRNA and microRNA signature distinctive of TGFβ activation and possibly of the involvement of cell types mediating a general immunosuppressive condition.

ST tumor and peritumor areas, compared to LT ones, show an enrichment of mRNAs known to be expressed by activated astrocytes and microglia
When we compared tumor centers to peritumoral areas, in the ST group we observed several signs of a higher participation of reactive cell types, such as microglia and reactive astrocytes, that could be depicted also as a generally more "mesenchymal" feature compared to what we found in LTs. Suggestive of a such a view, in the STC vs STP comparison, but not in the LTC-LTP one, we found the overexpression of prolylendopeptidase, a serine protease digesting ECM and typically expressed by microglia [62], and that of ARHGAP29, expressed by reactive astrocytes and microglia [10]. An even stronger indication was provided by the direct comparison of STCs vs LTCs, that underscored a clear enrichment of RNAs known to be expressed by activated astrocytes and microglia, such as carbonic anhydrase III (CA3), CD44 and TGFBI. In particular, CD44 is the receptor for hyaluronate and osteopontin [63,64], critical for cell adhesion and invasion, and is a key marker of reactive astrocytes, whose infiltration in the tumor and in the peritumoral area correlates with a worse prognosis [11]. It has also been recently shown that CD44 expression, being a marker of TNFalpha/NFkB-induced mesenchymal differentiation of glioblastoma, correlates with poor radiation response and shorter survival [54]. Our IHC results show CD44 localized at the membranes of specific cells, most likely reactive astrocytes, in C samples, and throughout the intricate net of cell processes in P tissues (Suppl. Fig. 4). A gene that resulted strongly underexpressed in our STC samples vs the LTC ones was BCAR3, whose role in glioblastoma has never been revealed before. However, it has very recently reported that it is downregulated by TGFβ and it can also inhibit TGFβ/SMAD signaling in breast cancer, where its expression associates with favorable disease outcome [65]. Our finding of its reduced expression in STCs vs LTCs is intriguing as it suggests that in glioblastoma too it may play a role similar to that performed in breast cancer, in the context of TGFβ-modulated pathways. Notably, also in the comparison of the ST peritumoral areas as opposed to the LT ones, we found some indications of a possible higher contribution of tumor "stromal" cells, as reactive astrocytes. IGFBP5 mRNA in fact, overexpressed in STPs vs LTPs, even if not previously described in the context of GBM, had been reported as expressed by reactive astrocytes supporting retinoblastoma growth [36]. By IHC, we were able to detect for the first time IGFBP5 in the cytoplasm and extracellular matrix of reactive astrocytes in C tissues from short term patients, and also in the cytoplasm of astrocytes present in STP samples, though with a lower intensity (Suppl. Fig. 4), while we did not find it in LT peritumoral regions (not shown). In addition, we found a sound overexpression of the extracellular protein lumican in LTPs as compared to ST ones, still corroborating the view of a great involvement of tumor microenvironment in the definition of long-term vs shortterm survivors. This stromal protein has been very recently shown to positively correlate with prolonged survival after tumor resection in in pancreatic ductal adenocarcinomas, due to its limiting role in EGFR-expressing pancreatic cancer progression [66].

A reduced A to G editing of miR-376c-3p distinguishes tumor centers from peritumor areas and is more evident in STCs compared to LTCs
When we considered the results of microRNA expression profiling, we were not able to find miRNAs clearly distinguishing STCs from LTCs in a statistically significant way (i.e. FDR < 0.05). However, we found a significant difference in the percentage of the A to G edited variants of miR-376c-3p: LTCs harbored, on average, more edited mir-376c-3p than STCs. In addition, we also found that this same miRNA is more edited in all Ps vs Cs. The A to G editing of miR-376c-3p had previously been described, specifically in brain, where adenosine deaminase acting on RNA-1, ADAR1, was shown to be the enzyme responsible of this specific editing [67]. Intriguingly, our SAGE data revealed a reduced expression of a brain-specific member of the adenosine deaminase gene family, ADARB2, in all Cs and Ps, of both ST and LT patients. Even if the editing activity of this enzyme has never been formally demonstrated, and an inhibitory role was rather proposed for ADARB2 [68], the concomitant reduction of miR-376c-3p editing and ADARB2 expression suggest a functional correlation between these two findings. A reduced editing of miR-376a, another member of the same miRNA cluster located on chromosome 14, and evolutionarily related to miR-376c, was previously reported in glioblastoma compared to healthy control tissues [69], and a functional consequence of this was identified in the differential targeting of specific mRNAs by the edited vs the non-edited miR-376a isoforms. In our case, we did not find any difference in the levels of miR-376a editing when comparing Cs vs Ps, or www.impactjournals.com/oncotarget all samples vs healthy controls. However, the indication, obtained by target prediction, that some mRNAs highly relevant for tumor cell biology (e.g. CDKN1A and Drosha) could be differentially targeted by the non-edited, tumor enriched miR-376c isoform, is surely intriguing and deserves further investigation.
In conclusion, this comprehensive study of GBM tissues and peritumoral areas indicates some key molecules characterizing LT from ST tissues, and at the same time shows the commonality of some molecules and pathways (e.g. TGFβ) between tumor centers and peritumoral areas, strongly suggesting that such molecules/pathways should be targeted in both areas in order to fight, and defeat, glioblastoma.

Tissue procurement and RNA extraction
In this study we used tissue samples coming from 13 GBM patients; 9 patients were classified as short-term survival and 4 cases as long-term survival (survival ≥ 36) ( Table 1). All patients provided informed consent to use their tumor and peritumor material as well as clinical data for research purposes, none of them was identifiable. The study was approved by the local ethics committee (Catholic University Ethics Committee, Rome).
Tumor removal was achieved with resection margins that included the neighboring, apparently normal tissue (between 1 cm and 2 cm from the tumor border; larger resections were performed in tumors that grew far from eloquent areas) and the tumor, which were removed entirely en bloc.
Neuronavigation and intraoperative ultrasounds were used to maximize the extent of intracranial tumor resection. Thirty-five days after surgery (range, 30-40 days), patients received an external source limitedfield radiotherapy (with an average dose of 60 grays administered in fractions). Simultaneously, chemotherapy with temozolomide (Temodal) was administered at a dose of 75 mg/m 2 per day. After a 1-month break, patients received up to 6 cycles of temozolomide at a dose from 150 to 200 mg/m 2 per day on the standard schedule of 5 days per week every 28 days. All patients underwent the same adjuvant therapy protocol.
All patients were followed as outpatients 1 month after surgery and every 3 months thereafter.
We obtained 13 pair samples from different areas: Tumor and Peritumor white matter tissues close to the tumor, between 1 cm and 2 cm from the tumor border. One series (tumor and peritumor tissues) was immediately fixed in 10% neutral buffered formalin for histological analysis and the second one was immediately frozen on dry ice for molecular analysis. All histological samples were reviewed by a board-certified neuropathologist and all tumors were classified as glioblastoma (WHO IV). Multiple levels of each paraffin block of samples used for research purposes (tumor and peritumor) were thoroughly examined. Specimens were histologically assessed using H&E sections. Neoplastic cells were identified by their nuclear atypia and heteropyknotic staining. Glial fibrillary acidic protein (GFAP) immunolabeling of peritumor area samples was used routinely to identify reactive astrocytes, which were identified by their clear, large eccentric nuclei; eosinophilic cytoplasm; and long, thick, stellate, GFAP-positive processes, according to Hoelzinger et al. [70].
Specimens from surrounding tumor area were assessed histologically by 2 independent pathologists who identified the presence of neoplastic cells in each sample. Two sections from each area were examined by 2 different observers independently, and at least 1000 to 2500 cells from 4 to 10 randomly selected fields in each section were counted. By using a light microscope (Axioskop 2 plus; Zeiss), the presence of tumor cells was evaluated at 3400X magnification. Samples with discrepant scoring were re-evaluated jointly on a second occasion, and agreement was reached. The results were categorized as 'infiltrated' if tumor cells were detected in at least 1 specimen. In the peritumoral areas classified as "non-infiltrated" in this study, no tumor cells were seen, and the expression of Ki-67, marker of proliferation detected by immunohistochemistry, was less than 1%. On the contrary, the peritumoral areas classified as "infiltrated", showed a Ki-67 immunoreactivity of >> 1%. CTRL specimens were derived from patients at the same ages operated for deep cavernomas with radiological signs of recent bleeding.
Total RNA was isolated from all tissue samples by RNA/DNA/Protein Purification Kit (Norgen, CA) as described by manufacturer's instructions followed by clean-up DNase digestion on column by RNase-Free DNase Kit (Norgen, CA).

Western blot analysis
For Western blot analysis, total protein extract was isolated from tissue samples by RNA/DNA/Protein Purification Kit (Norgen, CA) as described by manufacturer's instructions. Equivalent amounts of protein extract were separated by electrophoresis on 12% SDS-PAGE gels and blotted onto nitrocellulose. The membranes were blocked with 5% nonfat dry milk and 0.1% Tween-20 in Phosphatebuffered saline and then incubated with antibodies followed by appropriate horseradish peroxidase-conjugated secondary antibody (Promega, USA). Rabbit polyclonal anti-LMTK3 (Abcam, UK) was used diluted 1:500, rabbit polyclonal anti-TGFBI (Thermo Scientific, USA) was diluted 1:1000, rabbit polyclonal anti-SOCS2 (Thermo Scientific, USA) was diluted 1:200. Rabbit polyclonal anti-β-actin (Sigma, USA) diluted 1:400 was used to reveal β-actin as a loading control.

SAGE sequencing and analysis
SAGE barcoded libraries were prepared from 27 total RNA samples extracted independently from the cores and www.impactjournals.com/oncotarget peritumoral regions of 13 glioblastoma tumors and 1 sample of healthy white matter. Each library was generated by using the SOLiD™ SAGE™ Kit with Barcoding Adaptor Module and a SOLiD™ RNA Barcoding Kit (Catalog nos. 4452811 and 4427046, Applied Biosystems, Foster City, CA), following the manufacturer's instructions. The barcoded libraries were combined and sequenced on 3 full slides on an Applied Biosystems SOLiD 4 System. Sequencing length was 35 bp. Library preparation, barcode addition, emulsion PCR, and SOLiD sequencing were performed at Genomnia Srl (Lainate, Milan, Italy).
Suppl. Table 3a summarizes all the samples and the target mapping percentages associated with this experiment. All the SAGE Color Space sequence files generated by the sequencing were corrected de novo for sequencing errors before mapping on the reference transcriptome, using the Life Technologies SAET (SOLiD Accuracy Enhancer Tool) version 2.2 program (https:// www.biostars.org/static/downloads/solid/solid-denovoassembly/saet.2.2/SAET.v2.2.pdf).
The error-corrected sequences were then trimmed to 27 nt from the original 35. These were mapped using the bowtie version 0.1.27 version 2 software against a reference 3′ UTR database derived from RefSeq and relative to the hg19 version of the human genome sequence. The alignments were parsed with ad-hoc created Perl scripts in order to extract and count the aligned SAGE tags beginning with the CATG sequence, with max 1 mismatch (globally) with the reference sequence and with a single mapping position on the reference sequence. The aligned reads counts have been then grouped by target RefSeq ID; analyzed for differential expression with the Bioconductor EdgeR library; annotated with a proprietary script.
The results consist in tables reporting the differentially expressed genes (by P value and False Discovery rate) for each of the comparisons which are summarized below. STP vs STC; LTP vs LTC; LTP vs STP; LTC vs STC; STPs vs healthy control; LTPs vs healthy control; STCs vs healthy control; LTCs vs healthy control.
Single Sample Gene Set Enrichmemnt Analysis was performed using the SGEA module from the Gene Pattern Public Server (http://www.broadinstitute.org/ cancer/software/genepattern) and the classified gene lists associated with the subtype calls from the expression values of the core TCGA samples (https://tcga-data.nci. nih.gov/docs/publications/gbm_exp/TCGA_unified_ CORE_ClaNC840.txt).

Feature selection with ReliefF
Weka 3.7 [71] was used to employ the the ReliefF algorithm [14]. ReliefF computes merit as the difference between the probability of observing a different value of a metric in a different class and the probability of observing a different value of a metric in the same class. Merit therefore simultaneously emphasized inter-class dishomogeneity and intra-class homogeneity (i.e. merit is high when an RNA is both different between different classes and equal between equal classes or in within the same class). The RNA molecules were then ranked according to merit, and the first 50 highest ranking molecules were selected for visual representation through a heatmap depicting the ratio of expression between classes for each RNA molecule. Mathematica 9 (R) was employed to generate the heatmaps.

MicroRNA sequencing and analysis
A second aliquot of the same total RNA samples analyzed by SAGE sequencing was used to explore the small RNA fraction, after enrichment using the Invitrogen PureLink ® miRNA Isolation Kit. The amount and quality of the enriched small RNA was examined using the Agilent Bioanalyzer. Barcoded sequencing libraries were generated from an average of 10 ng of purified small RNA using the Applied Biosystems SOLiD™ Small RNA library protocol (July 2011 Rev.B) with the Total RNA-Seq Kit (PN 4445374). The libraries were pooled and sequenced with the SOLiD 5500XL platform using 9 lanes. Sequence files in Colorspace format (.csfasta and .qual) included in the .xsq file were mapped and analysed using the 'small RNA' Lifetech Lifescope version 2.5.1 pipeline, using as targets the human genome GRCh37/hg19 and the dataset of mature sequences and precursors mirBase version 20 (June 2013). Matches with the genome repetitive elements (SINE, LINE..), snoRNAs, piRNAs, tRNAs and rRNAs were eliminated by the results through preliminary filtering. Sequence data are shown in Suppl. Table 3b. Filtered small RNA alignments in .bam format were converted in .fastq format and subjected to further analyses.
Read mapping and isomiRNA counts were made by using a stand-alone Perl program able to detect miRNA variants (isomiRNAs) in small RNA-Seq experiment [Grassi et. al. in preparation]. The program compares the sequenced reads with respect to canonical mature miRNA and relative premiRNA present in miRBase (version 20) [72]. For each expressed isomiRNA, the occurrences were counted and possibly grouped in order to define the corresponding miRNA counts. Subsequently, miRNA counts were normalized by using the TMM normalization procedure [73]. The samples were classified in two groups based on miRNA expression, according to the method described by Brunet et al. [74], by using the non-negative matrix factorization (NMF) R package [75]. In 10 patients, miRNA expression of C samples was compared with respect to that of the corresponding P samples. The differential expression analysis was made by using a generalized linear model (GLM) implemented in edgeR and able to handle the paired experimental design [76]. We considered for the analysis only the miRNAs with counts per million (cpm) > =10 in more than 4 (at least in one sample in the pair) of the 10 analyzed C-P pairs (469 in total). The derived p-values and the corresponding FDR values are reported in Suppl. Table 4.