Mutanome and expression of immune response genes in microsatellite stable colon cancer

The aim of this study was to analyze the impact of the mutanome in the prognosis of microsatellite stable stage II CRC tumors. The exome of 42 stage II, microsatellite stable, colon tumors (21 of them relapse) and their paired mucosa were sequenced and analyzed. Although some pathways accumulated more mutations in patients exhibiting good or poor prognosis, no single somatic mutation was associated with prognosis. Exome sequencing data is also valuable to infer tumor neoantigens able to elicit a host immune response. Hence, putative neoantigens were identified by combining information about missense mutations in each tumor and HLAs genotypes of the patients. Under the hypothesis that neoantigens should be correctly presented in order to activate the immune response, expression levels of genes involved in the antigen presentation machinery were also assessed. In addition, CD8A level (as a marker of T-cell infiltration) was measured. We found that tumors with better prognosis showed a tendency to generate a higher number of immunogenic epitopes, and up-regulated genes involved in the antigen processing machinery. Moreover, tumors with higher T-cell infiltration also showed better prognosis. Stratifying by consensus molecular subtype, CMS4 tumors showed the highest association of expression levels of genes involved in the antigen presentation machinery with prognosis. Thus, we hypothesize that a subset of stage II microsatellite stable CRC tumors are able to generate an immune response in the host via MHC class I antigen presentation, directly related with a better prognosis.


INTRODUCTION
Colorectal cancer (CRC) is a complex disease in which various types of molecular alterations are implicated [1]. Gene expression is largely deregulated [2], following diverse patterns that allow molecular subtyping into groups with diverse prognosis [3]. Stage I and II CRC patients have a moderate risk of relapse after surgical resection, whereas stage III patients have a higher chance of recurrence. Taking advantage of genome-wide techniques, a large number of studies have performed gene expression profiling to identify molecular prognosis biomarkers useful to discriminate between good and poor prognosis CRC tumors [4,5], although no marker has been yet adopted in routine clinical practice [6]. In addition to gene expression, diverse aberrations at DNA level have been widely described to contribute to CRC progression such as copy number aberrations, point mutations or altered methylation [7,8]. Also, increasing data support the main role of tumor microenvironment in CRC progression and prognosis [9,10]. Tumor microenvironment is composed by a heterogeneous population of stromal cells such as fibroblasts, extracellular matrix components, and also immune cells [11]. Indeed, evasion of immune surveillance and/or suppression of immune system have been described as a hallmark of cancer [12].
The cellular machinery is able to recognize mutant proteins and cleave them via the proteasome. Then, the generated peptides bind to MHC molecules to be presented to T cells [13]. In cancer cells, somatic mutations generate cancer-specific antigens that could be targeted [14]. In fact, an association between HLA expression and prognosis has been described in various types of cancer, including CRC [15]. Immune system activation has been specifically recognized in CRC tumors with DNA mismatch repair deficiencies such as microsatellite instable (MSI) CRC, that accumulate an elevated number of point mutations [16]. The high mutational load in MSI tumors creates many tumor-specific neoantigens, typically 10 to 50 times more than microsatellite stable (MSS) tumors [17]. Indeed, it has been proposed that the higher level of neoantigens and lymphocytic reaction in MSI tumors may contribute to a better patient survival [18]. The MSI subset of colorectal tumors is therefore a good candidate for checkpoint immunotherapy [19], however not as much research has been devoted to study the role of immune system activation in response to mutations in the subset of MSS tumors. Even for MSS tumors, a high number of neoepitopes could be associated a better prognosis.
Exome sequencing has been revealed as a useful technique for mutation discovery in cancer [20,21]. This technique is also useful to predict, from DNA mutations, putative short peptides (neopeptides) capable to elicit immunoreactivity through their presentation at the cell surface as major histocompatibility complex (MHC) ligands. Recently, our group performed an exome sequencing analysis of a homogenous group of microsatellite stable (MSS) stage II colorectal tumors and described the landscape of somatic mutations [22]. Here we aim to compare the mutational profile of good and poor prognosis stage II MSS CRC tumors, and to explore whether the activation of the immune system in response to somatic mutations is related to prognosis.

No significant association was found between single nucleotide variants (SNVs) and prognosis
A total of 3,597 potentially functional single nucleotide variants (SNVs) were found across the 42 sequenced tumors. No differences on the total number of mutations between good and poor prognosis groups were detected ( Figure 1A). When SNVs were classified by functional effect, only "missense near splice site" mutations were marginally significant (P=0.04, Supplementary Figure 1). Good prognosis tumors accumulated 1961 SNVs (median 98, range 45 to 153) whereas relapsing tumors accumulated 1675 SNVs (median 86, range 2 to 146). The median difference was 12 (Mann-Whitney Test p-value=0.23). From these, only 23 were recurrent (found in at least 2 tumors): 16 were shared by good and poor prognosis groups, 4 were good-exclusive (2 located in WASL gene, 1 in ERP27 and 1 in C3orf27) and 3 were poor-exclusive (located in AGTRAP, SYNPO2, and APC genes). No single mutation was associated with prognosis, though the low frequency of recurrent mutations derived in a low power to detect specific differences.

Prognosis association of SNVs cumulated by genes, pathways, and their position into the human interactome
SNVs were mapped into genes harboring them. The average number of mutated genes per sample was 83 (range: 2-136). In line with SNV results, no differences between the total number of mutated genes between good and poor prognosis groups were detected ( Figure  1B). The good prognosis tumors had 1618 mutated genes (median 96, range 45 to 143) whereas the tumors with worse prognosis accumulated 1426 (median 85, range 2 to 142, median difference 11, Mann-Whitney test p-value=0.21). Both groups shared 270 genes (8%), when the expected was less than 1%, so they were probably related to common pathways in carcinogenesis. Regarding recurrently and exclusively mutated genes, 12 genes were found to be mutated in more than 3 poor-prognosis tumors including BRAF and POLE. On the other hand, good-prognosis tumors had 15 exclusively-mutated genes including NOTCH3 (Table 1). Not surprisingly, the wellknown APC, TP53, KRAS, along with FAT4, FBXW7, and RYR3 were equally mutated in both groups.
Next, a pathway enrichment analysis was performed, stratified by prognosis group. Both groups were enriched in mutated genes that mapped to pathways classically related to cancer, like p53 signaling or Wnt pathway. However, tumors in poor prognosis group had more mutations in pathways related to DNA repair and polymerase activity whereas good prognosis tumors had more mutations in genes related to cell cycle or apoptosis, among others (Table 2 and Supplementary Table 1).
Finally, genes harboring mutations were mapped into the human interactome and analyzed to test the hypothesis that relevant genes might have central positions in networks (i.e act as hubs). However, no significant differences were found in the centrality measures degree (P=0.65), betwenness (P=0.47), closeness (P=0.56), and eccentricity (P= 0.38), between the two groups of tumors.

Number of neoantigens and expression levels of genes involved in antigen presentation machinery is associated with prognosis, and this association changes within molecular subtypes
Somatic missense mutations can activate T cytotoxic responses. Thus, we used bioinformatics tools to predict putative neoantigens by combining information about HLA genotypes and missense mutations in each patient. Then, we assessed if a relationship existed between the number of hypothetical neoantigens generated by the tumor and the patients' outcome. No significant differences between good and poor prognosis groups of tumors were found ( Figure 2), even if the level of expression of genes harboring missense mutations were taken into account (Supplementary Figure 2). However, a trend was identified when extremes were compared: 4 out of 5 tumors with 0 predicted neoantigens had relapsed whereas 5 out of 6 tumors with more than 8 predicted neoantigens had not relapsed. We were intrigued by the relapse of the tumor with the highest number of predicted neoantigens (n=16). Taking advantage of gene expression data, we realized that this tumor had a marked under-expression of indispensable genes for the neoantigen presentation    Figure 4).
Recently, a group of experts has reported a robust classification of CRC for future clinical stratification that group tumors in four subtypes based on common molecular and clinical characteristics [3]. Under the hypothesis that our results could vary within subtypes, we assessed the prognosis value of genes related with antigen presentation stratifying by consensus molecular subtypes. We found that the expression of genes of immune response were not associated to prognosis for tumors classified as CMS2 "canonical" subtype. However, the association was very strong among the CMS4 "mesenchymal" and in CMS3 "metabolic" and subtypes. Tumors overexpressing genes involved in antigen presentation in these subtypes had better prognosis (Figure 4). Of note, because all our tumors were MSS, only 6 tumors were classified as CMS1 "MSI subtype" and were not analyzed, although 2 of them had relapsed.

Good-prognosis tumors showed higher levels of lymphocytic infiltration
Apart from neoantigen generation and presentation, a third player is necessary to activate an immune response against the tumor cell: T-cell receptors should bind to MHC-antigen complexes. Under this hypothesis, the level of expression of CD8A was used as a marker of T-cell infiltration and compared between the two groups of tumors [28]. CD8 is a specific marker for T-cells binding MHC Class I but not MHC Class II. As expected, good prognosis tumors showed higher levels of this marker than poor prognosis ones (p-value=0.026) suggesting that not only antigen presentation but also proper recognition by T-cells is necessary for the host immune system to attack the tumor ( Figure 5A). Indeed, a Kaplan-Meier curve plotting patients classified on the basis of CD8A expression (Supplementary Figure 5) clearly identified a group of 13 patients over-expressing the gene who did not experience relapse ( Figure 5B).
Based on this result, we stratified patients into "high" and "low" groups according to CD8A expression. Then, we looked for the prognosis value of HLAs and TAP genes in each group. As expected, survival curves were significantly different in those patients showing T-cell infiltration (high CD8A levels); whereas no differences were found in those patients without or with low T-cell infiltration ( Figure 5C). This result suggested that, in order to be recognized and eliminated by the immune system, T-cells needs to infiltrate the tumor (indicated by high CD8A levels).
Based on this, we hypothesized that colon tumors evade immune responses underexpressing genes involved in antigen presentation and/or preventing the infiltration of T-cells into the tumor bulk. Indeed, an analysis comparing the levels of expression of MHC class I and TAPs genes, as well as CD8A gene, showed that, compared with adjacent normal mucosa, tumors underexpressed HLA genes (P=4e-4 for HLA-A, P=3e-7 for HLA-B, and P=4e-5 for HLA-C). CD8A was also underexpressed (P=1e-9), but not TAP1 nor TAP2 genes ( Figure 6).

DISCUSSION
Our study has shown that there are no major differences between the exome mutational landscape of good and poor-prognosis group of tumors. However, our analyses revealed that tumors able to properly present neoantigens and showing T-cell infiltration have better prognosis. Indeed, increasing data support the idea that the endogenous T cell compartment is able to recognize peptide epitopes that are displayed by major histocompatibility complexes (MHCs) on the surface of the tumor cells [29]. We searched for patient-specific neoantigens that might be generated from both driver and passenger mutations acquired during tumorigenesis. Then, we tested the hypothesis that recognition of such neoantigens by the host immune cells could influence patients' survival. Certainly, our results pointed to a better outcome in those patients not only with a higher number of neoantigens (although not statistically significant, a trend was observed), but also over-expressing genes involved in antigen presentation machinery. In agreement with our  in tumors classified as "high CD8A" and "low CD8A", separately. results, an association between the number of predicted neoantigens in a tumor with increased patient survival has been recently described in a meta-analysis including six different types of tumors [30]. Thus, we hypothesize that poor-prognosis MSS stage II CRC tumors are unable to generate an immune response in the host via MHC class I antigen presentation.
It is well known that HLA class I functional abrogation represents a mechanism by which tumors circumvent immune surveillance [31]. In CRC, HLA class I loss has been described as a more frequent event in MSI tumors than in MSS tumors [32]. Regarding nonclassical HLA class I molecules, their over-expression is one of the immunosuppressive strategies used by tumors [33]. Zeestraten et al. postulated that patients whose CRC tumors showed loss of HLA-E and HLA-G had better overall survival [15]. However, we have not reproduced this result in our data. We have found that expression of CD8A, considered a marker of lymphocyte infiltration, is higher in patients with better prognosis, probably indicating an activation of immune response against the tumor. This result was concordant with other studies correlating CD8+ lymphocytic infiltration with better survival in colon cancer [34,35]. In addition, a stratification of our samples into "high CD8A" and "low CD8A" showed that levels of HLA and TAP1 genes are only relevant predicting survival in "high CD8A" tumors. This leads us to the conclusion that proper antigen presentation is only relevant for the inhibition of growth in those tumors showing T-cell infiltration. Indeed, it has been recently reported that loss of tapasin correlates with a reduction in CD8+ t-cell immunity and is associated with tumor progression in CRC [36]. It is important to note that patients in our series had not been treated with chemotherapy. In consequence, we consider that our results mirror the original response of the patients against the residual disease without treatment perturbing the host immune response.
Tumor-specific antigens that are generated by somatic mutation -neoantigens-can influence patient response to immunotherapy treatment and contribute to tumor reduction [37]. It is well known that microsatellite instable (MSI) CRC tumors are frequently characterized by inflammatory lymphocytic infiltration which is associated with a better outcome than microsatellite stable (MSS) CRC, probably reflecting a more effective immune response [38]. Although lymphocyte infiltration is characteristic of MSI tumors, it also has been shown to predict better prognosis in MSS tumors [39][40][41]. It is important to highlight that in this work all the analyzed CRC tumors are MSS, and a weak immune response should be expected, al least in comparison with MSI tumors. Nevertheless, a recent work assessing immunogenicity of somatic mutations in gastrointestinal cancers demonstrated that a clinically relevant antimutation T cell response could be also triggered against tumors with a low mutational load [42]. Thus, we postulate that MSS tumors with high HLA, mutational load and antigen processing pathways acquire a phenotype that helps cancer cells to evade the immune system of the host. Indeed, a comparison between tumor and adjacent normal mucosa revealed that tumors significantly underexpressed HLA class I genes and CD8A, suggesting a poorer ability for antigen presentation and recognition.
We were interested in analyzing our data according to the recent proposal of Consensus Molecular Subtypes that classifies CRC tumors into CMS1 "MSI immune", CMS2 "Canonical", CMS3 "Metabolic" and CMS4 "Mesenchymal" [3]. While tumors belonging to CMS1 (not included in our dataset) are more prone to induce a host immune response, our results showed that CMS3 and especially CMS4 tumors might also elicit an immune response. In agreement, recent studies conclude that CMS4 "mesenchymal" phenotype is characterized by a high stromal infiltration [43,44]. This subgroup usually has worse prognosis than others. So, this stratification inside subgroups might be useful to further selection of patients requiring more aggressive or specific therapies in CRC. It is also important to note that the CMS2 "canonical" subtype comprising almost 40% of CRC tumors seems not to be able to elicit an immune response.
Although no single SNV was found to be associated with prognosis, several genes have been found to be exclusively and recurrently mutated in the poor-prognosis group of tumors such as classically CRC related genes POLE and BRAF. Mutations in POLE have been related to shorter patient survival [45]. Also in agreement with our results, BRAF mutations have been associated with a decrease in survival rate in CRC patients [46][47][48] and specifically in MSS CRC tumors [49]. We have found CACNA1G missense mutations in three tumors, all of them exhibiting poor-prognosis. Interestingly, a recent paper by Berg et al. reported that losses of CACNA1G by copy number changes were associated with early recurrence [50]. Thus, we hypothesized that loss of this gene could be a poor prognosis biomarker. Regarding functions in which mutated genes are implicated, several pathways have been found exclusively mutated in good and poor group of tumors. This result suggests that diverse molecular alterations could converge in similar phenotypes necessary for cancer progression and invasion.
This study also has several limitations. Besides the limited sample size, we have not a direct evidence that the mutated proteins generate neoantigens, but have used instead bioinformatics prediction tools. Moreover, it has been reported that HLA inference program HLAMiner achieves a good performance with RNA-seq or whole genome data rather than with exome sequencing data [26]. Finally, it is well known that each gene could codify several protein isoforms. However, in this work, to predict neoantigens only those isoforms reported in UniProt repository as the most common ones have been taken into account. Also, tumor infiltrating lymphocytes have not been measured directly, but CD8A expression has been used as a surrogate variable.
In conclusion, this analysis of prognosis in relation to somatic mutations in CRC has shown that although no single mutation was significantly associated with prognosis, some genes were found to be exclusively mutated in poor-prognosis tumors. Moreover, neither the total number of mutations nor the number of predicted neoantigens were different between good and poorprognosis groups of tumors. However, those patients with higher levels of CD8A (as a marker of lymphocytic infiltration) and HLA-class I genes expression showed better prognosis. These results pave the way for future studies assessing the putative role of genes implicated in tumor-specific antigen presentation and recognition as prognostic biomarkers in pre-metastatic tumors, specifically in the subset of tumors classified as "metabolic" or "mesenchymal". In addition, they suggest that personalized immunotherapy could stand for a treatment opportunity in patients suffering MSS CRC tumors.

Patients and samples
This study included a subset of 42 paired adjacent normal and tumor tissues (84 samples) from a previously described set of 100 patients with colon cancer diagnosed at stage II and microsatellite stable tumors (colonomics project -CLX-: www.colonomics.org; NCBI BioProject PRJNA188510). The series of 42 patients was selected to include the 21 patients who experienced relapse and 21 without metastatic progression after a minimal followup of 3 years (Supplementary Table 2). None of them received chemotherapy. All patients were recruited at the Bellvitge University Hospital (Barcelona, Spain). Written informed consent was obtained from all patients and the Institution's Ethics Committee approved the protocol. DNA was extracted using a standard phenol-chloroform protocol. Moreover, gene expression profiles of these tumors have been performed [23] and is available in GEO repository (GSE44076).

Exome sequencing and somatic single nucleotide (SNV) variants selection
Exome sequencing pipeline was extensively described in a previous work [22]. Briefly, Genomic DNA from the set of 42 adjacent-tumor paired samples was sequenced in the National Center of Genomic Analysis, Barcelona, Spain (CNAG) using the Illumina HiSeq-2000 platform. Exome capture was performed with the commercial kit Sure Select XT Human All Exon 50MB (Agilent). After data alignment and processing, high quality single nucleotide variants (SNVs) were identified using GATK software and germline variants were filtered. Finally, somatic SNVs were annotated using the SeattleSeq Variant Annotation web tool. Only potentially functional single nucleotide variants (SNVs) were taken into account in this work. Those include variants annotated as: coding-synonymous-near-splice, missense, missensenear-splice, splice-3, splice-5, stop-gained, stop-gainednear-splice, stop-lost, utr-3 and utr-5.

Functional analysis
Gene sets containing function and pathway information from "KEGG", "Biocarta", "Reactome", and "GO" were downloaded from the Molecular Signatures Database [24]. For each gene set, an enrichment score was calculated for each tumor by dividing the number of mutations mapping into genes constituting the gene set by the number of genes in such gene set. The score was then compared for good and poor group of tumors. To assess if differences in scores were significant, a p-value was calculated by randomly permuting the tumor group label. The number of permutations was calculated in each case to ensure that the minimum p-value was at least as small as required by the Bonferroni correction at nominal 0.05 significance level.

Interactome analysis
Human pairs of protein-protein interactions were downloaded from Hippie database [25] (last updated: 09/05/14). Only highly reliable interactions were taken into account (score > 0.72) to construct a human proteinprotein interaction network containing 9,963 nodes and 49,847 edges. This network was used as a framework in which mutated genes were mapped. To each mutated gene in each tumor, centrality parameters degree, betweenness, closeness and eccentricity were calculated using igraph library in R.

HLA genotyping and neoantigens prediction
Exome sequencing data from normal samples was used to infer MHC class I alleles of HLA-A and HLA-B (genes that predominantly mediates anti-tumor response) using HLAminer software [26].
FASTA sequences extracted from UniProt were used to translate information about DNA missense mutations into an aminoacid change level (only isoforms 1 were taken into account). For each missense mutation, a sequence of 21-aminoacids centered on the mutation and the corresponding wild type peptide were analyzed for potential neoantigens. NetMHCpan 2.8 Server [27] was used to infer putative immunogenic peptides (8 to 11 aminoacids) combining information about HLAs genotypes and peptides harboring missense mutations. An immunogenic epitope was defined as a mutated peptide with high affinity for one HLA allele of the patient (IC50<50nM) and low affinity for the wild-type counterpart (IC50>500nM). Information about the level of expression of each gene generating an immunopeptide was also computed.

Survival analyses
Kaplan-Meier curves were plotted to represent the results, dividing the range of gene expression into quartiles and log-rank test were computed. Also, Cox models were fitted to consider multiple variables simultaneously and control for potential confounding.

Molecular subtyping
The CMSclassifier R package [3] was used to classify our samples into the four CRC consensus molecular subtypes CMS1, CMS2, CMS3 and CMS4, using a Random Forest approach. Also, a principal components analysis (PCA) was performed to explore dispersion in our data based on the number of predicted neoantigens and expression level of genes of interest.