Comprehensive immune transcriptomic analysis in bladder cancer reveals subtype specific immune gene expression patterns of prognostic relevance

Recent efforts on genome wide profiling of muscle invasive bladder cancer (MIBC) have led to its classification into distinct genomic and transcriptomic molecular subtypes that exhibit variability in prognosis. Evolving evidence from recent immunotherapy trials has demonstrated the significance of pre-existing tumour immune profiles that could guide treatment decisions. To identify immune gene expression patterns associated with the molecular subtypes, we performed a comprehensive in silico immune transcriptomic profiling, utilizing transcriptomic data from 347 MIBC cases from The Cancer Genome Atlas (TCGA). To investigate subtype-associated immune gene expression patterns, we assembled 924 immune response genes and specifically those involved in T-cell cytotoxicity and the Type I/II interferon pathways. A set of 157 ranked genes was able to distinguish the four subtypes in an unsupervised analysis in an original training cohort (n=122) and an expanded, validation cohort (n=225). The most common overrepresented pathways distinguishing the four molecular subtypes, included JAK/STAT signaling, Toll-like receptor signaling, interleukin signaling, and T-cell activation. Some of the most enriched biological processes were responses to IFN-γ, antigen processing and presentation, cytokine mediated signaling, hemopoeisis, cell proliferation and cellular defense response in the TCGA cluster IV. Our novel findings provide further insights into the association between genomic subtypes and immune activation in MIBC and may open novel opportunities for their exploitation towards precise treatment with immunotherapy.


INTRODUCTION
Urothelial bladder cancer (UBC) is the fifth most common cancer worldwide [1] and is one of the most management intensive cancers in North America [2].Although the majority of incident cases of UBC are non-invasive at presentation, muscle invasive bladder cancer (MIBC) represents very aggressive disease with rapid progression to metastases [3] and poor overall survival despite intensive local and systemic therapy.Current standards for localized MIBC include radical cystectomy with or without perioperative cisplatin-based chemotherapy [3].Unfortunately, many suffer early disease recurrence and, despite palliative chemotherapy, median survival rates are generally less than one year [4].The optimal management of patients with higher risk UBC is ambiguous with a significant need for better prediction tools and enhanced therapeutics [5].
MIBCs are highly heterogeneous tumours.Recent investigations based on molecular profiling of specimens from large UBC cohorts have led to their classification into molecular subtypes that display distinct genomic and transcriptomic features, resembling those seen in breast cancer [3], [6][7][8].Interestingly, these subtypes may exhibit distinct associations with treatment response and survival [8,9].Although different groups have classified UBC into two [8], three [3], four [6], or five [7] subtypes, there is a consensus that the top separation occurs as the basal and luminal subtypes [10].Basal tumours, enriched with EGFR and hypoxia-inducible factor 1 expression, are often metastatic at presentation, possess squamous and sarcomatoid histological features, and have epithelial-to-mesenchymal transition cell biomarkers [11].In comparison, luminal cancers have papillary features and commonly FGFR3, ERBB2, and ERBB3 activating mutations [11].The Cancer Genome Atlas Network (TCGA) bladder analysis working group classified bladder tumours into four clusters named I, II, III, and IV [6].Clusters I and II correspond to the luminal subtype, while III and IV represent the basal subtype [12].Tumours in Cluster I are enriched in FGFR3 overexpression due to mutations and amplification and show better overall survival, whereas those in cluster II, designated "p53-like" tumours, express active p53 gene signatures and are resistant to neoadjuvant cisplatinbased combination chemotherapy [3].Cluster IV shares features with the claudin-low subtype of breast cancer, express immune checkpoint molecules, and were actively immunosuppressed, despite having an enriched immune gene signature [13].In particular, Kardos et al. [13] demonstrated that immune infiltration was not correlated with predicted neoantigen burden, but from unopposed NF-kB activity from downregulated PPARγ signaling.
Given the urgent need of alternative approaches in MIBC treatment, there has been a growing interest in immunotherapies, such as those targeting the immune checkpoints: CTLA-4, PD-L1, and PD-1 [14].Atezolizumab, a PD-L1 inhibitor, has been recently approved by the FDA for bladder cancer that progressed during or following chemotherapy [15].Evolving evidence based on the success of immune checkpoint blockade therapies in melanoma and non-small cell lung cancer has confirmed the significance of the pre-treatment tumour immune state as a strong prognostic and response predictive indicator [14,16].An important feature, key to the success of immunotherapy, is the spatial organization of cytotoxic CD8 + tumour infiltrating lymphocytes (TILs) in the epithelial and stromal compartments and their activation status [17].Higher density of CD3 + and CD8 + TILs have been associated with increased disease-free and overall survival in melanoma, head and neck, breast, bladder, urothelial, ovarian, colorectal, prostatic, and lung cancer; however, their activation status determines their prognostic significance in most cancers [18,19].In particular the expression of interferons (IFNs), which play a central role in anti-tumour immune responses, are emerging as prognostic and predictive biomarkers of both chemotherapy and immunotherapy [20].Higher infiltration with CD4 + and regulatory subsets of TILs and higher CD68 to CD3 ratios are associated with poor prognosis in bladder cancer [21][22][23].In particular PD-L1, IDO, FOXP3, TIM3, and LAG3 are expressed in T-cellinflamed, and β-catenin, PPAR-γ, and FGFR3 in non-Tcell-inflamed urothelial tumours [17].Although the pretreatment expression of PD-1/PD-L1 initially showed some predictive value, it has recently failed to perform as a good biomarker in the recent clinical trials due to their transient nature of expression [23][24][25].
As reported in other cancers sites, it is likely that the pre-existing tumour immune landscape in UBC could be an additive determinant of response to chemotherapy as well as immune-based therapies leading to more precise prognostication, patient stratification, and informed treatment decisions [26].To our knowledge there are no previous studies in MIBC that have evaluated the association between immune transcriptomic alterations, specifically those mediated by IFNs and cytotoxic pathway genes, and their potential associations with their distinct molecular sub-populations.In the current study, we performed a comprehensive in silico immune transcriptomic profiling of MIBC using the publicly available TCGA global transcriptomic datasets in order to determine whether the known molecular subtypes of MIBC are associated with specific immune gene signatures.The findings from our study may not only provide insights into the association between genomic subtypes and immune activation, but may also open novel opportunities for improving the management of MIBC.

RESULTS
We aimed to determine whether the previously defined four TCGA MIBC clusters exhibit differences in their immune gene expression patterns that could be of potential significance in informing treatment decisions for immunotherapies and other combinatorial treatment approaches.

Immune gene expression patterns across MIBC clusters
First, the 122 samples previously used to identify the four clusters by TCGA [6] were treated as a discovery group to determine immune gene expression profiles across clusters.A total of 377 genes derived from the NanoString TM panel discriminating among the clusters were identified using a feature selection technique (Supplementary Table 1).The performance of these genes to accurately distinguish the four TCGA clusters was evaluated by clustering of cohort 1 (Figure 1).This set of genes was then used to assign samples in the validation set to the four clusters.Similar to cohort 1, the genes were able to distinguish the four clusters in cohort 2 by supervised and unsupervised analysis (Figure 2A and  2B).Similar unsupervised analysis was done using the top 5% of genes derived from all immune panels (n=157) (Supplementary Table 2) on both cohorts (Figure 3A and 3B).A recent updated analysis of the current TCGA bladder tumour cohort shows that clusters I-IV remained stable [28], supporting our classification approach in cohorts 1 and 2.

Differential pre-existing expression patterns of interferon associated genes
The four cluster patterns were also noted for the top 20% ranking IFN-γ associated genes upon hierarchical clustering analysis in both training and validation cohorts (Figure 4A and 4B).A gradient of under-expression of IFN-γ associated genes in cluster I to overexpression in cluster IV is observed in both.A similar pattern was also noted in the top 20% ranking IFN-α (Figure 5A and 5B) and cytotoxic genes (Figure 6A and 6B).Most importantly, key IFN response genes and downstream T-cell recruiting target chemokine genes, CXCL9, CXCL10, and CXCL11, and their common receptor CXCR3, showed increased expression in clusters III and IV.Similarly, others in the list included the key players in IFN response such as IFITM2, CCL5, IRF4 and others.

Antigen processing pathways are overrepresented in T-cell inflamed MIBC clusters
We determined the Gene Ontology functional annotations of the differentially expressed genes that distinguished the four clusters, using both the 377 and 157 genes as input gene lists.Using the overrepresentation statistic in PANTHER, we calculated the probability of highly populated protein classes and gene ontology classes among the two gene lists (Table 2a and 2b).The most enriched GO biological processes in the 377-gene list were response to IFN-γ, antigen processing and presentation, cytokine mediated signaling, hemopoeisis, cell proliferation and cellular defense response (Supplementary Table 3).The top five overrepresented pathways included JAK/STAT signaling pathway, Toll-like receptor signaling pathway, interleukin signalling pathway, and T-cell activation (Figure 7).Interestingly, similar analysis using the top ranking 157 genes as input list, revealed only the B-cell activation, T-cell activation, and inflammation mediated by chemokine and cytokine signaling pathways as the only three overrepresented pathways.Response to IFN-γ, hemopoiesis, macrophage activation and cell proliferation were the most overrepresented GO biological processes in the top ranked 157 genes (Supplementary Table 4).

DISCUSSION
Evolving research from correlative studies as well as clinical trials, including those for UBC, have emphasized the value of the pre-existing tumour immune state as a predictor of response to treatment and survival [29,30].Furthermore, UBC is associated with a comparatively high mutational burden [31], which could potentially contribute to increased immunogenicity making them more susceptible to novel immunotherapy-based approaches.In order to gain a better understanding of the pre-existing tumour immune landscape in UBC, we conducted a comprehensive in silico immune transcriptomic profiling of the MIBC tumours from the TCGA database.
Four distinct molecular subtypes in MIBC were defined recently based on the TCGA MIBC genome wide profiling datasets [6].Indeed, variability in subtype nomenclature has been reported [10], which could be attributed to heterogeneity in tissue samples in addition to several other factors such as inclusion of NMIBC cases in classification schemes.However, since the TCGA bladder cohort is enriched for MIBC and gene expression based clusters have been well defined, we specifically used this cohort to address our question on immune gene expression patterns associated with genomic alterations.Based on the distinct immune signature between clusters in cohort 1 (n=122), we were able to assign cohort 2 (n=225) into the associated TCGA clusters using the top 20% of ranked immune genes from the training cohort.Further analysis on the IFN-γ, IFN-α, and cytotoxic genes were then compared in both cohorts.All of these analyses revealed an increased expression of immune-associated genes in Cluster IV and underactive immune environment in Cluster I. Given that specific genetic alterations associate with these molecular subtypes, it seems that anti-tumour immune responses could be partly driven by oncogenic drivers.Cluster I has been previously reported to show higher expression of FGFR3 via mutations, amplifications, and other mechanisms [3].Interestingly this cluster showed a distinct underactive tumour immune state with reduced expression of IFN genes and genes associated with T-helper type-1 response.It is indeed intriguing that tumours with FGFR3 mutations or overexpression as per previous classification [6], show an increased overall survival, which contradicts the underactive immune state observed here.In contrast, cases in cluster IV showed the most dominant immune response amongst all four clusters.Tumours in this cluster show decreased expression of PPAR-γ and GATA3, and significantly increased expression of IFN and antigen presentation pathway genes, in addition to MHC class II genes and those involved in T-cell cytolytic activity.Previous reports have shown that based on broader classifications, cases in cluster IV belong to the basal subtype, which shows poor overall survival [8], [9].One potential contributor to these associations is the increased expression of immune checkpoint molecules such as CD274 (PD-L1), IDO1 and the immunosuppressive IL6 in clusters III and IV that potentially lead to increased resistance to cytotoxic killing and poor response to treatment and ultimately poor survival.As previously shown, this cluster also shows higher levels of EMT related genes, indicating more aggressive tumour phenotype [33].Our recent report demonstrating that higher PD-L1 expression in cancer cells leads to increased drug resistance upon activation by IFN-γ or PD-1 [32] supports this notion.It could thus be speculated that the IFN-γ secreted by activated T-cells, reflected by the increased expression of GZMA in these clusters, could be inducing PD-L1 on the cancer cells with further interaction between these leading to T-cell dysfunction.However, the mechanistic basis of these significant associations needs to be explored further.In other cancers such as melanoma, colorectal, and ovarian, higher expression of IFN pathway genes and of those representing an active immune response is associated with a favourable treatment outcome and overall survival.Furthermore, it is also possible that factors other than antitumour immune responses contribute to increased survival rates in tumours with FGFR3 mutations in MIBC.
Increased expression of MHC class II genes CD74, HLA-DMB, and HLA-DQA1 indicate higher tumour antigen processing by the antigen presenting cells in clusters III and IV.This was also confirmed by gene ontology-based analysis, which reflected a dominance of response to IFN-γ, antigen processing and presentation, cytokine mediated signaling, and cell proliferation, NK cell and macrophage activation, and B cell mediated immunity.These enrichments not only confirm the increased active anti-tumour immune response in clusters IV and some of cluster III but also indicate the immunogenic nature of these clusters that could be potentially be driven by higher mutational burden and recognition of immune cells.Overall, our findings based on comprehensive immune transcriptomic analysis have significant implications in informing treatment decisions based on immune gene expression patterns.Specifically, since immune checkpoint blockade therapy has shown some promise in bladder cancer [15], near future biomarker driven clinical trials will benefit from these findings that emphasize appropriate patient stratification for treatment.Although recent reports have described the presence of T-cell inflamed and non-inflamed MIBC tumours [17,33], no previous reports have identified associations between immune response and IFN-associated genes with the four molecular MIBC subtypes.Our study is limited by the fact that the TCGA dataset is enriched for MIBC and thus further validations in other cohorts need to be performed; however, these associations are timely and complement the ongoing and future clinical trials based on immunebased therapies.Clinical translation of our findings will most appropriately be addressed by validation of the most significant differentially expressed genes at both transcriptional and proteomic levels in retrospective and prospective pre-treatment bladder tumour specimens.Future investigations by integration of genomic alterations determined by exome and transcriptome sequencing data are key to identifying the genomic determinants of variability in immune response.Finally our study provides an improved understanding of the bladder cancer molecular subtype associated immune gene expression patterns and will significantly impact the design of novel immune based therapies.Immune gene panel -IFN-γ associated genes Immune gene panel -IFN-α associated genes Immune gene panel -cytotoxic associated genes

Patient tumour samples
The publicly available global transcriptomic sequencing (RNA-Seq) data from 412 MIBC cases, with the corresponding clinical information was downloaded from TCGA data portal (https://gdc-portal.nci.nih.gov/),now part of the National Cancer Institute's Genetic Data Commons.The cohort was further divided into two cohorts for downstream analysis.For our training cohort (cohort 1) we used data from the previously defined 129 cases from TCGA that were divided into four clusters based on their integrated analysis of mRNA, miRNA, and protein data [6].Since our objective was to define immune gene expression patterns in treatment naïve tumours, we excluded patients with any previous therapy.Thus a cohort of 122 MIBC cases, divided into four molecular clusters, was used for in silico immune profiling.The remaining 283 cases constituted the validation cohort (cohort 2) of which 225 had no prior therapy.

Design of immune pathway gene panel for in silico immune profiling
To investigate the presence of subtype associated immune signatures, we assembled a defined set of 924 immune related genes.This curated list (Table 1) primarily consisted of genes involved in IFN-α (97 genes), IFN-γ (200 genes), and cytotoxic (115 genes) pathways as defined by Gene Set Enrichment Analysis (GSEA) in combination with other immune genes.The nCounter PanCancer Immune profiling panel (722 genes), (http:// www.nanostring.com/products/pancancer_immune/)was used to derive the immune response related genes.

Bioinformatics analysis of RNA-Seq data
We used the upper quartile-normalized RNA-seq data by expectation maximization (RSEM) available for all selected cases at the TCGA data portal.No additional normalization was performed and the expression data were log 2 transformed.All downstream data analysis was performed in MATLAB (Mathworks, Inc., Natick, Massachusetts, USA).Focusing on the first set of 122 samples, where clustering information is known, we performed separate analyses of the genes in each of the four immune panels (NanoString TM , IFN-α, IFN-γ, and T cell cytotoxicity associated genes).Using a feature selection algorithm, genes were ranked based on their ability to discriminate samples belonging to one cluster from the remaining.The feature selection algorithm uses an ensemble of five different machine-learning techniques (unpublished).The analysis resulted in 16 ranking tables, four tables for each immune panel, where each table ranked genes on their ability to discriminate samples in one cluster from the rest.
In the hierarchical clustering analysis, the top 20% of genes in each ranking group were assembled to represent each immune panel, resulting in 377 genes (Nanostring), 44 genes (IFN-α), 91 genes (IFN-γ), and 62 genes (T cell cytotoxicity).A final feature selection ranking was performed where the combined unique set of 924 genes was used.The top 5% of genes in each of the four ranking groups were then merged, resulting in 157 unique genes.

Gene ontology analysis using PANTHER
We used the Protein Analysis Through Evolutionary Relationships (PANTHER), version 11.0, classification system (http://www.pantherdb.org/)[27] to determine dominant and enriched pathways in the top ranking 377 genes (NanoString panel) that were ranked based on their ability to discriminate samples across the four clusters.We then applied the statistical binomial overrepresentation test, as previously described in PANTHER [27], to derive the most dominant enriched pathways and gene ontology (GO) biological processes in our lists compared to the reference human genome.We performed these tests using both the 377 top ranking NanoString TM genes and 157 top ranked genes from all immune panels combined.The p-values were corrected for multiple testing using Bonferroni correction.

Validation of immune gene signature
The remaining 298 cases, not included in cohort 1, were treated as a validation group.From this cohort, patients with previous BCG therapy were excluded, leaving 225 cases.In order to assign samples in this set to each of the four clusters, only the top ranked 377 genes from the NanoString panel were used.First, for each cluster, two cluster centroids were computed using the expression data from cohort 1 (n=122; total of 8 cluster centroids).The cluster centroids were computed by taking the mean expression of samples in a given cluster (main cluster) and the mean expression of samples that do not belong to that cluster (alternative cluster).Then, for each sample in cohort 2, the Euclidean distance was computed to each of the 8 cluster centroids.A sample was assigned to a cluster with the smallest distance to the main cluster, but only if the distance to the main cluster was smaller than the distance to the alternative cluster.Alternatively, the sample was assigned to the 'unclassified' cluster.Using the newly assigned clustering information and ranked list of genes, unsupervised hierarchical clustering was performed.

Figure 1 :
Figure 1: Distinct immune gene expression levels in cohort 1 (n=122) between the four TCGA bladder cancer subtypes based on the top 20% (377 NanosString panel genes) using the feature selection algorithm.Red indicates high expression, and blue indicates low expression.

Figure 2 :
Figure 2: Cohort 2 (n=225) assigned to clusters based on Euclidian distance to the cluster centroids generated from the cohort 1 (n=122).Supervised (A) and unsupervised (B) analysis based on the samples and 377 NanoString TM panel genes.

Figure 3 :
Figure 3: Unsupervised analysis of both the cohort 1 (A) and cohort 2 (B) using the top 5% (n=157) genes.Unsupervised grouping shows gradient of under expression in cluster I to overexpression in cluster IV.

Figure 4 :
Figure 4: Supervised heat map of top 20% of IFN-γ associated pathway genes in both cohort 1 (A) and cohort 2 (B).

Figure 5 :
Figure 5: Supervised heat map of top 20% of IFN-α associated pathway genes in both discovery (A) and validation (B) groups.

Figure 6 :
Figure 6: Supervised heat map of top 20% of cytotoxic associated pathway genes in both cohort 1 (A) and cohort 2 (B).

Figure 7 :
Figure 7: Bar graph depicting distribution of fold enrichment levels of biological pathways defined by PANTHER based analysis in the 377 genes that show differential expression patterns in the four TCGA MIBC clusters.The enriched categories were obtained upon analysis using the statistical overrepresentation test defined by PANTHER tool [27].