Molecular profiling of colorectal tumors stratified by the histological tumor-stroma ratio - Increased expression of galectin-1 in tumors with high stromal content

The tumor microenvironment is a dominant determinant of cancer cell behavior. Reactive tumor stroma is associated with poor outcome perspective. The tumor-stroma ratio (TSR) is a strong independent prognostic factor in colorectal cancer and is easily assessed using conventional hematoxylin and eosin (H&E) stained paraffin sections at the invasive margin of the tumor. We aim to understand the biology of the tumor stroma in colorectal cancer by investigating the transcriptomic profiles of tumors classified by the TSR method. The TSR was assessed in a cohort of 71 colorectal cancer patients undergoing surgery without (neo)adjuvant therapy. In the cohort, stroma-high tumors were distinguished from stroma-low tumors at gene expression level in the upregulation of biological pathways related to extracellular matrix (ECM) remodeling and myogenesis. The activated microenvironment in stroma-high tumors overexpressed different types of collagen genes, THBS2 and 4 as well as INHBA, COX71A and LGALS1/galectin-1. The upregulation of THBS2, COX7A1 and LGALS1/galectin-1. The upregulation of THBS2, COX7A1 and LGALS1/galectin-1 in stroma-high tumors was validated in The Cancer Genome Atlas. In conclusion, the gene expression data reflects the high stromal content of tumors assessed based on the histological method, the TSR. The composition of the microenvironment suggests an altered proteolysis resulting in ECM remodeling and invasive capacity of tumor cells.


INTRODUCTION
The tumor microenvironment or tumor stroma is a dominant determinant of cancer cell behavior and disease progression. The tumor stroma constitutes of immune cells, cancer-associated fibroblasts (CAFs), endothelial cells and the extracellular matrix (ECM). During tumor evolution, changes occur in the composition of the tumor stroma. Fibroblasts become activated fibroblasts called CAFs and the overall content of the ECM is remodeled. The ECM of tumors is composed of a complex network of collagen, proteoglycans (such as lumican and versican) and glycoproteins (such as fibronectins, thrombospondins and laminins), locally secreted mainly by CAFs and assembled into a mesh [1]. This network of ECM constituents functions as a scaffold for epithelial and stromal cells and is involved in cell-matrix and cell-cell adhesions which enables tumor cells to migrate. High stromal content, in particular collagen, was associated with a pro-metastatic capacity of cancer cells [2]. CAFs can induce stem cell-like properties and epithelialto-mesenchymal transition (EMT) in cancer cells [3]. The composition of the tumor microenvironment is an essential aspect of tumor biology [2,4,5].

Research Paper www.oncotarget.com
The importance of the tumor microenvironment is also emphasized in the colorectal cancer (CRC) consensus molecular subtypes (CMS), a recent classification developed based on transcriptional profiles. The CMS describes four CRC subtypes, of which the poor-prognosis CMS4 is characterized by high stromal content. CMS4 shows high mesenchymal gene expression, which can be attributed to stromal cells as well as to cancer cells [2,[4][5][6][7]. Reactive stroma in solid tumors is associated with poor outcome perspective [5,8,9]. We and other research groups have demonstrated that the tumor-stroma ratio (TSR) is a strong independent prognostic factor. The TSR is easily assessed using conventional hematoxylin and eosin (H&E) stained paraffin sections at the invasive margin of the tumor [10][11][12]. The TSR has been reported in colon cancer as well as in other solid cancer types [10][11][12][13][14][15][16][17][18]. We aim to understand the biology of the tumor stroma in CRC by investigating the overall transcriptomic profiles of tumors classified by the TSR method using gene expression data. We first compared the quantity of stromal and immune cells based on gene expression in the stroma-low and stroma-high groups using the TSR method. Secondly, we investigated biological pathways differently activated between the stroma-low and stromahigh groups to identify genes of interest. Thirdly, we validated the genes of interest in a second cohort and on protein level.

The prognostic value of the tumor-stroma ratio
A retrospective cohort consisted of 76 sporadic CRC patients undergoing surgery at the Leiden University Medical Centre (LUMC) which were part of a larger cohort [19]. Out of 76 CRC patients, 71 patients were included in the study based on the availability of histological material and of gene expression data. The TSR was scored on H&E sections at the invasive part of the tumor using a microscope ( Figure 1A and 1B). Twenty (28.2%) patients belonged to the stroma-high group and 51 (71.8%) to the stroma-low group. The patient characteristics are shown in Table 1. As shown in Figure  1C and 1D, the TSR analysis defined a 5-year overall survival (OS) and distant metastasis-free survival (DMFS) rates of 78.4% and 82.4% in the stroma-low group, and 25% and 35% in the stroma-high group, respectively. The stroma-high group had a significantly worse OS and DMFS rates compared to the stroma-low group (OS p = 0.003, HR = 3.76 (1.99-7.09); DMFS p = 0.0001, HR = 5.35 (2.40-11.89)). In a multivariate analysis accounting for confounding variables including age, sex and TNM stage, the TSR was an independent predictor for survival (OS p = 0.0001, HR = 4.586 (1.96-10.75); DMFS p = 0.015, HR = 3.53 (1.273-9.81)) ( Table 2). The mesenchymal properties of the CMS4 was shown to be not only attributed to the stromal compartment but also to the epithelial cells. We examined the association between the TSR and the CMS classification (epithelial (CMS2/3) versus mesenchymal subtypes (CMS4)) in the LUMC and The Cancer Genome Atlas (TCGA) cohorts. CMS1 patients were excluded. In the LUMC cohort, 17 patients were CMS1. 32/42 stroma-low patients were CMS2/3 and 8/20 stroma-high patients were CMS4 (Table 1). In TCGA, 87/123 stroma-low patients were CMS2/3 and 23/43 stroma-high patients were CMS4 (Supplementary Figure  1A). In both cohorts, the TSR and the CMS classification were associated although with a fair agreement (LUMC: χ² test = 7.714; κ = 0.141; p = 0.005; TCGA: χ² test = 7.14, κ = 0.22; p = 0.008). A log-rank test was performed in the LUMC cohort categorizing patients by TSR and CMS classification. In the LUMC cohort, stroma-high patients stratified as CMS2/3 or CMS4 did not have a different DMFS nor OS ( Figure 1E and 1F). Stroma-low CMS4 patients showed no difference in OS and a worse DMFS compared to stroma-low CMS2/3 patients (OS log-rank test = 1.550, p > 0.05; DFMS log-rank test = 11.770, p = 0.001; Figure 1E and 1F). No Cox regression model could be fitted due to small numbers in the subgroups.

The transcriptomic composition of the microenvironment in stroma-high and stromalow tumors
The gene expression profiles of tumors stratified by TSR were investigated. A key challenge in gene expression data analysis is that transcriptomic data is composed of different cell populations including stromal and immune cells. We therefore deconvoluted the samples using three existing computational tools. The ratio of stromal and immune infiltration in the mRNA expression compared to epithelial cells was assessed based on the ESTIMATE gene signatures consisting of 141-stromal and 141-immune genes which were previously shown to be reliable tools [20,21]. Stroma-high tumors showed a significant increased percentage of stromal infiltration in the mRNA data compared to stroma-low tumors (t-test = -4.76, p = 2.58 * 10 −5 ) while there was no difference in immune infiltration between the stroma-high and -low groups (t-test = -1.88, p = 0.066; Figure 2A). We further investigated the cell composition of the tumor by assessing the ratio of CAFs, endothelial cells and different immune cell types using the Microenvironment Cell Populations (MCP)-counter [7]. The stroma-high group had a significant increased number of CAFs (t-test = -3.91, p = 0.0005) and endothelial cells (t-test = -2.68, p = 0.010) compared to the stroma-low group ( Figure 2B). As with the ESTIMATE genes, there was no significant difference in quantity of immune cells between the stroma-high and -low groups, except for cells of the monocytic lineage like macrophages (t-test = -2.477, p = 0.0203). Moffitt et al. developed a stromal signature by virtual microdissection in pancreatic cancer which discriminated between normal and activated stroma [22]. Using this signature, the LUMC cohort was divided into normal (N = 44) and activated (N = 27) stroma ( Figure 2C). Next, we investigated the correlation between the TSR and the MCP-counter CAFs as well as the stromal signature by Moffitt, followed by survival analysis. The LUMC cohort was divided into CAFs low (N = 54) and high (N = 17) based on MCP-counter CAF markers. The TSR correlated with the MCP-counter CAFs (χ² test

Transcriptomic profiling of the microenvironment in stroma-high and stromalow tumors
Using the GlobalTest, the transcriptomes of patients stratified according to TSR were significantly different (p = 0.0002, Stat = 3.73, SD = 0.359, Covariates = 15923). The publically available Hallmark gene sets on the MSigDB consist of 50 gene sets and were used to explore the difference in transcriptomic pathways between the stroma-low and -high groups [23]. The myogenesis (p = 0.0010) and apical junction (p = 0.0010) pathways differed most in the two TSR groups. The pathways differing between the TSR groups were mainly related to ECM remodeling, inflammation, metabolism and cell differentiation ( Figure 3). To further explore the ECM, we selected transcriptomic pathways related to the ECM and to the function of CAFs (Supplementary Figure 3A). The ECM, focal adhesion and integrin pathways were significantly different between the two TSR groups. Stroma-high tumors expressed high levels of collagen, laminin and integrin subunits. Thorough investigation of the focal adhesion pathway identified two interesting related genes THBS2 (p = 0.0130) and THBS4 (p = 0.0185) coding for thrombonspondin-2 and -4, respectively. Following a network analysis of the two THBS genes, THBS4 was mainly associated with collagen genes and THBS2 was associated with both collagen genes and ADAM genes (cBioportal; Supplementary Figure    (p = 0.0185) were highly upregulated in the stroma-high group. COX7A1 gene is known to be expressed by stromal cells [2] and analysis of the myogenesis pathway showed that COX7A1 coding for a cytochrome C protein was highly expressed in stroma-high compared to stroma-low tumors (p = 0.0004). Based on co-expression analysis in the TCGA CRC database, COX7A1 was highly coexpressed with LGALS1 coding for galectin-1 (LGALS1/ galectin-1), a lectin that is upregulated in the tumor stroma and can inhibit immune response through CD45 protein phosphatase activity (Spearman's correlation = 0.84).
LGALS1/galectin-1 was also highly expressed in the EMT pathway (p = 0.0143

Validation of increased galectin-1, cytochrome C7A1, thrombospondin-2 and -4 expression in the TCGA dataset
Subsequently, we validated the expression of four genes of interest THBS2, THBS4, COX7A1 and LGALS1/ galectin-1. As the LUMC cohort was characterized by an increased number of MSI-H patients, we excluded MSI-H patients classified as CMS1 in order to avoid a potential effect of MSI-H status. In total, 166 CRC patients of TCGA had genomic data on cBioportal and H&E tumor sections of the invasive part available on the Cancer Digital Slide Archive website to score TSR. The TSR was scored and showed prognostic value in TCGA (Supplementary Figure 1B). When combining TSR and CMS classification, there was no statistical difference in OS and DMFS (Supplementary Figure 1C). While THBS4 was not significantly differently expressed (p = 0.088), THBS2, COX7A1 and LGALS1/galectin-1 expression were higher in the stroma-high group compared to the stroma-low group in the TCGA cohort (THBS2 p = 0.011; COX7A1 p = 0.030; LGALS1/galectin-1 p = 0.007).

Protein expression of galectin-1
Galectin-1 is expressed and released by different cell types and exerts biological functions at different levels of tumor progression [24]. This protein is likely involved in the functional interaction between cancer and stromal cells. Also, research on galectin-1 has mainly focused on its role in tumor and immune cells and not in fibroblasts. We therefore selected galectin-1 for further investigation. We next examined which cell types expressed galectin-1 and whether there was a correlation between galectin-1 expression at protein level and gene expression level. The tumor material of 43 patients of the LUMC cohort was available to perform galectin-1 immunohistochemistry staining.
As demonstrated in Figure 4, galectin-1 was observed in different cell types. Some tumor cells expressed galectin-1 in the cytoplasm at a low staining intensity and percentage ( Figure 4A). Galectin-1 was mainly expressed in the stromal compartment ( Figure 4B). The staining of galectin-1 was scored in tumor cells and in the stromal compartment. The nuclei and cytoplasm of stromal cells were scored for staining intensity in three categories (low (1.), medium (2.) and high (3.)) and percentage, and the tumor cells were scored for absence (1.) or presence (2.) of staining (Table 3). In this complex pattern, we could not deduce a clear correlation between gene expression and protein level when looking at the score independently and combined (Supplementary Figure 4A). The galectin-1 protein expression in the stromal compartment (including stromal and immune cells) correlated with the TSR. Galectin-1 medium protein expression was associated with high stromal content (χ² test = 10.226; p = 0.006; Supplementary Figure 4B). Strikingly, 14 tumors scored   Figure 4C). This was in contrast to what was expected. Most of these 14 tumors were categorized as stroma-low (13/14) and 5 out of 14 were MSI-H. Based on the immunohistochemistry staining, the high intensity of galectin-1 protein expression was mainly on immune cells ( Figure 4C).

DISCUSSION
The TSR showed prognostic value in the LUMC cohort. Patients classified as stroma-high had a worse 5-year DMFS rate (35%). Based on gene expression data, stroma-high tumors were characterized by an increased quantity of CAFs that likely leads to an altered proteolysis and results in ECM remodeling. Supporting this hypothesis is the increased expression of collagen, THBS and additional genes involved in the extracellular matrix remodeling in stroma-high compared to stroma-low tumors. Both collagen and THBS have been shown to be involved in aggressive behavior of CRC cells [2,25]. They mediate cell-cell contact and cell-matrix interaction. Our results are in line with previous transcriptomic studies that identified metastatic-associated signatures in multi-cancer types. Key genes contributing to the signature were related to the microenvironment including THBS2, INHBA and several collagen genes [26,27].
In addition, stroma-high tumors were associated with an increased mRNA expression of LGALS1/galectin-1 in two cohorts. Galectin-1 is a galactoside-binding protein which localizes both intra-and extracellularly and has a wide range of biological functions. In tumors, intracellular galectin-1 modulates cell signalling    Figure 2B) [28]. The carbohydraterecognition domain of extracellular galectin-1 can bind to carbohydrates located at the cell surface of cancer and stromal cells. These interactions result in modelling cell adhesion and migration of the target cell. Galectin-1 can also induce an apoptotic response in immune cells by binding for instance CD45. Furthermore, galectin-1 interacts with glycoproteins of the ECM such as laminin, thrombospondin, vitronectin, fibronectin and osteopontin, which were highly expressed in stroma-high tumors [29,30]. Previous studies demonstrated in different tumor types, including colon cancer, an association between high galectin-1 expression and poor prognosis [24,[31][32][33][34].
Most studies investigated the expression of galectin-1 in cancer epithelial and immune cells while we found an increased expression of galectin-1 in stromahigh tumors. Therefore, we further investigated the localization of galectin-1 in the tumor at the protein level. The present study showed that galectin-1 was expressed by CAFs, immune, endothelial and tumor cells at different intensities. In this complex pattern, we could not deduce a clear correlation of LGALS1/galectin-1 at gene expression level and protein level, which was also observed in a previous study [35]. The immunohistochemical results of the present study identified tumors of which the immune cells showed particular upregulated galectin-1 expression and a really good DMFS rate. We hypothesize that tumors with high galectin-1 expression specifically on immune cells reflects an antitumor immune response. During tumor progression, stromal cells, in particular CAFs, increase galectin-1 secretion which suppresses the immune response and is involved in tumor invasion [36]. This suggests that upregulated galectin-1 expression in stromahigh tumors provides a microenvironment characterized by immune suppressive response resulting in invasive tumor cells. Furthermore, the question remains what drives the activation of CAFs. TGFβ is known to activate CAFs and it was shown that this growth factor induced galectin-1 expression in fibroblasts [37,38]. However, the biological mechanism of galectin-1 remains complex. The role of galectin-1 is likely a balance of different factors including the ECM composition, cellular localization and the cell type, among others. Further research is needed to investigate the role of galectin-1 in the complex interaction between cancer and stromal cells leading to the aggressive behavior of cancer cells.
The TSR was scored at the most invasive part of the tumor while the mRNA was isolated from the tumor bulk. Strikingly, the gene expression data reflects high stromal score, including an increased number of CAFs, endothelial cells and cells of the monocytic lineage. No difference in immune cells was found between the stroma-high and -low tumors. Previous studies showed that CAFs drive immune evasion through for instance TGFβ [39]. One study found in invasive ductal breast cancer an inverse relationship between high stromal content measured by the TSR and macrophage and T-cell infiltration [40]. Another study investigated the association between TSR and inflammatory response in CRC. The authors did not find any association between TSR and T cell infiltration and an inverse association borderline significant between TSR and immune cell infiltration measured on H&E sections [16]. Most interestingly, they later found that combining immune infiltration and TSR added prognostic value [41].
A first limitation of this study is that the LUMC cohort comprised an increased number (29.5%) of MSI-H patients, which is not representative with the reality (15%) [42]. Secondly, galectin-1 immunohistochemistry was performed on perpendicular tumor punches where the orientation of the tumor was unknown. It was not possible to assess the level of galectin-1 expression in the standardized manner at the most invasive part of the tumor, which is the region expected to have an increased amount of CAFs and remodeled ECM [43].
Given the current high costs of transcriptomic data, standard pathological assessment relies heavily on microscopy. Therefore, it is of interest to use a microscopybased method to select patients which will benefit or not from (targeted) therapy. The TSR can be used to identify patients with increased stromal infiltration and a poor prognosis. Previous research has shown that the activation level of stromal cells is associated with prognosis [4,8]. Tumors classified as stroma-high and CMS4 overlapped to a certain extent. Both methods have their limitations, which are likely related to the methodology and the tumor heterogeneity. In the era of personalized medicine, a main goal is to increase the predictive value of subsets of patients. CMS4 patients are known to respond poorly to treatment [6,44,45]. Once beneficial treatment for colorectal tumors with high stromal content will be 797 available in the clinic, an easy to use stratification method will be necessary such as the TSR.

MATERIALS AND METHODS Cohorts
A retrospective cohort consisted of 76 sporadic CRC patients treated at the LUMC between 1991 and 2005, and diagnosed as TNM stages I, II and III. The LUMC cohort was previously analyzed as part of a larger cohort [19]. Patients underwent surgery without any (neo)adjuvant therapy. MSI-H status had been determined for this cohort as described previously [46]. All samples were handled according to the National Ethical Guidelines. The gene expression data of CRC patients of TCGA were used as a validation cohort [47].

Tumor-stroma ratio score
Patient material was fixed in formalin and embedded in paraffin and consisted of 5 µm H&E-stained sections from the most invasive part of the primary tumor. On the same H&E section, two investigators, independently, selected and estimated the region with the highest stroma percentage in a blinded manner using a 2.5× or 10× microscopic field. A ×10 objective microscopic field was scored where tumor cells were present at two opposite borders of the image field (example in Figure 1A and 1B). Scoring percentages were given in 10 fold percentage per image field and the final score was assessed in the field with the highest stroma percentage. Tumors with less than or equal to 50% of stroma were considered stromalow and tumors with greater than 50% of stroma were considered stroma-high.
The TSR of patients of the TCGA was determined using H&E sections available online on the Cancer Digital Slide Archive (http://cancer.digitalslidearchive.net/). Only pathological sections estimated to be the most invasive part of the tumor were used. A zoomed in area was used to score the TSR in a similar manner as described above.

mRNA expression array and analysis
The mRNA of the LUMC cohort was previously isolated from fresh frozen tissue and hybridized to a customized Agendia 44 K oligonucleotide array as described elsewhere [19]. The quantity of stromal and immune cells were estimated using the online R package ESTIMATE [20], the MCP-counter v1.1.0 [48] and Moffitt's stromal signature [22]. The LUMC cohort was divided into high and low fibroblast expression based on the fibroblast markers of the MCP-counter using a cutoff at the 3rd quartile. Based on 46 out of the 48 genes of Moffitt's stromal signature, the LUMC cohort was clustered using correlation as a distance metric with average linkage.
To identify gene sets differently expressed between the two TSR groups, publically available databases were selected from MSigDB website. The statistical analysis of the mRNA expression data was done using the Global test as well as to further investigate genes differently expressed within gene sets [49,50]. The global tests were followed by multiple testing correction using False Discovery Rate (FDR) in case of comparing the gene sets and the inheritance procedure in case of genes within a gene set [51]. Finally, patients of the LUMC cohort were classified for CMS using the R package described previously and patients of the TCGA were previously classified in CMS [6]. mRNA data of patients of the TCGA cohort was downloaded from http://www.cbioportal.org/ where a network analysis was performed for the genes of interest.

Immunohistochemistry
Galectin-1 immunohistochemistry was performed on previously punched formalin-fixed, paraffinembedded tumors of the LUMC cohort to investigate the level of galectin-1 at protein level. Punches were perpendicularly re-embedded in paraffin. 4 µm sections were cut and dried overnight at 37° C. On the day of the immunohistochemistry, sections were deparaffinized, rehydrated and underwent a 20 minutes incubation in a 0.3% hydrogen peroxide solution (Millipore). The sections underwent antigen retrieval by heating 10 min at 95° C in pH low Target Retrieval Solution (Dako) and allowed to cool down. Unspecific binding sites were blocked with 5% goat serum (Dako) for 15 minutes. Monoclonal primary antibody against endogenous galectin-1 (1:400, D608T, Cell Signaling) was applied overnight. The following day, secondary HRP labelled antibody anti-rabbit (Dako EnVision+) was applied for 30 minutes. Antigen-antibody complexes were visualized using 3,3ʹ-diaminobenzidine (DAB)+ Substrate-Chromogen System (Dako). Finally, sections were counterstained with hematoxylin and mounted in Pertex. In addition to the galectin-1 staining, a sequential section was stained with H&E to identify different cell type and tissue structure.

Survival and statistical analysis
Statistical analyses were performed using R version 3.3.0. OS time was defined as the time period between surgery and death or end of follow-up. DMFS time was defined as the time period between surgery and metastasis or end of follow-up [52]. Univariate and multivariate Cox regression analyses were performed to test the differences in OS and DMFS between patients stratified according to TSR. Covariates entered in the model were age, sex, TNM classification and location of the tumor (colon versus rectum). Kaplan-Meier curves and log-rank tests were performed to compare the survival probabilities between groups. Student t-tests were performed to test the transcriptomic