Mutational analysis of genes coding for cell surface proteins in colorectal cancer cell lines reveal novel altered pathways, druggable mutations and mutated epitopes for targeted therapy.

We carried out a mutational analysis of 3,594 genes coding for cell surface proteins (Surfaceome) in 23 colorectal cancer cell lines, searching for new altered pathways, druggable mutations and mutated epitopes for targeted therapy in colorectal cancer. A total of 3,944 somatic non-synonymous substitutions and 595 InDels, occurring in 2,061 (57%) Surfaceome genes were catalogued. We identified 48 genes not previously described as mutated in colorectal tumors in the TCGA database, including genes that are mutated and expressed in >10% of the cell lines (SEMA4C, FGFRL1, PKD1, FAM38A, WDR81, TMEM136, SLC36A1, SLC26A6, IGFLR1). Analysis of these genes uncovered important roles for FGF and SEMA4 signaling in colorectal cancer with possible therapeutic implications. We also found that cell lines express on average 11 druggable mutations, including frequent mutations (>20%) in the receptor tyrosine kinases AXL and EPHA2, which have not been previously considered as potential targets for colorectal cancer. Finally, we identified 82 cell surface mutated epitopes, however expression of only 30% of these epitopes was detected in our cell lines. Notwithstanding, 92% of these epitopes were expressed in cell lines with the mutator phenotype, opening new venues for the use of "general" immune checkpoint drugs in this subset of patients.


INTRODUCTION
Colorectal cancer is the most common gastro intestinal cancer in the world, with approximately one million new cases being diagnosed and more than 500,000 deaths occurring yearly. Approximately, one in five patients is diagnosed with metastatic disease, and an additional 30%-40% develop metastasis during the www.impactjournals.com/oncotarget course of their disease. Unfortunately, only a minority of the patients with metastatic disease is amenable to curative resection and remains free of disease recurrence [1]. Even though survival for patients with unresectable metastatic colorectal cancer has improved over the past decade, due to the introduction of agents targeting the Epidermal Growth Factor Receptor (EGFR) and the Vascular Endothelial Growth Factor (VEGF), these treatments are often not curative, and intrinsic and acquired drug resistance is frequently observed in the clinical practice [2]. Therefore, the identification of altered pathways and new therapeutic targets is critical to improve the management of a significant proportion of colorectal cancer patients.
Genetic analysis of colorectal tumors over the past 30 years allowed the characterization of distinct molecular pathways altered during the development and progression of this disease [3]. Initial wholeexome screenings using colorectal cancer cell lines detected an average of 80 point mutations in coding regions of the genome and a small number of frequently mutated cancer genes [4]. More recently, in a major effort to dissect the genetic basis of colorectal cancer, the TCGA released the results of a comprehensive and integrated genome-scale analysis of 276 tumors. No significant genetic differences were observed between rectal and colon tumors, and twenty-four genes were identified as frequently mutated in colorectal cancer, including several novel cancer genes such as SOX9, ARID1A, ATM, TCF7L2 and FAM123B. Most importantly, new potentially druggable targets were identified, including amplifications in the ERBB2 and IGF2 genes [5]. Despite this massive sequencing effort, a recent mutation saturation analysis of 4,742 tumors, across 21 cancer types, revealed that the cancer gene catalogue is far from complete, and that many more mutated genes with putative druggable mutations remain to be discovered [6].
Cell surface proteins are involved in a variety of cellular functions, including nutrient and ion transport, adhesion and signaling. These proteins also play important roles in pathological conditions such as diabetes, neurological disorders and cancer. They represent approximately 18% of all protein-coding genes in the human genome [7] and, due to their accessibility on the cell surface, they constitute optimal targets for directed therapies [8]. We have recently generated a catalog of genes coding for transmembrane proteins located at the surface of human cells (Surfaceome), and by integrating publically available gene expression data from a variety of sources, we searched for altered pathways, new therapeutic targets and tumor antigens in gliomas, colorectal and breast tumors [9,10]. In the present work, we carried out a systematic mutational analysis of the Surfaceome in a panel of 23 representative colorectal cancer cell lines, searching for novel altered pathways, druggable mutations and mutated epitopes for targeted therapy in colorectal cancer. Collectively, our results point towards the potential use of FDA (U.S. Food and Drug Administration) approved RTK inhibitors and immune checkpoint target drugs in specific subsets of colorectal cancer patients.

Targeted sequencing the Surfaceome in colorectal cancer
We have recently used a combined bioinformatics approach to generate a catalog of genes coding for transmembrane proteins located on the surface of human cells [9]. Briefly, we searched the complete set of proteincoding genes for an annotated and/or predicted transmembrane domain and eliminated false positive candidates containing a signal peptide or known to be located on the membrane of other intracellular compartments. An updated list of genes coding for cell surface proteins was generated for this study (Supplementary information Table S1).
To define the mutational profile of the Surfaceome in colorectal cancer, we target sequenced the coding regions of the 3,594 cell surface protein genes in a panel of 23 tumor cell lines (Supplementary information Table S2) that altogether are representative of the main subtypes of primary colorectal tumors at the genomic level [11]. A total of 33,405 exons, covering ~6Mb of the human genome, were screened for the presence of somatic point mutations (nucleotide nonsynonymous substitutions and InDels). For each cell line we analyzed approximately 1.2 Gb of on target sequences, with an average coverage of 30X (Table 1).

Somatic mutations in the colorectal cancer Surfaceome
Somatic point mutations were detected using an in house computational pipeline based on SAMtools mpileup calling ( Figure 1). As matched normal tissue for these cell lines was not available, putative somatic mutations were identified by annotation against databases of known human germline variants (Table 2). A total of 3,944 putative somatic nonsynonymous substitutions and 595 InDels were catalogued affecting 2,061 (57%) Surfaceome genes (Supplementary information Table S3). We identified an average of 174 putative non-synonymous substitutions and 28 InDels per cell line (Table 2). Mutation rates for genes coding for cell surface proteins varied significantly across cell lines and were similar to those previously reported for the entire set of proteincoding genes in colorectal tumors (Table 2) [5]. As expected, higher mutation rates (mutator phenotype) were observed in cell lines with microsatellite instability (MSI) and mutations in the www.impactjournals.com/oncotarget DNA mismatch-repair genes or POLε (Supplementary  information Table S2).
A total of 184 (5%) putative non-synonymous substitutions were nonsense, and 529 (89%) of the InDels introduced a frameshift alteration in the mutated protein (Table 2 and Supplementary information Table S4). To further identify substitutions that may impact protein function, we used three different algorithms (PolyPhen, SIFT and Mutation Assessor) to estimate the impact of amino acid substitutions using information from DNA sequence, evolutionary conservation and structural data. A total of 1,434 (36%) putative non-synonymous substitutions and 474 (80%) InDels were classified as having an impact on protein function, and colorectal cancer cell lines harbor on average 85 putative point mutations (nonsynonymous substitutions and indels) with a predicted impact on protein function (Table 3 and  Supplementary information Table S5).

Novel mutated cell surface proteins and altered pathways in colorectal cancer
To further address the biological significance of the uncovered point mutations, we have incorporated gene expression data available for the cell lines (RNAseq and microarray) and restricted our downstream analysis to mutated and expressed genes. A list of genes coding for cell surface proteins that are mutated and expressed in >10% of the 23 cell lines analyzed is provided in Supplementary information Table S6. Analysis of  Sequences were aligned against the human genome reference sequence (GRCh37/hg19) using Bioscope and NovoAlign CS for SOLiD4 sequences and BWA for HiSeq 2000 sequences. Variant calling was performed using samtools mpileup and requiring at least 3 high quality reads (Q≥25; q≥20) on both strands supporting the variant. Known germline polymorphisms were removed and recurrent mutations were manually inspected to remove alignment artifacts. SIFT, PolyPhen-2 and Mutation Assessor were used to predict the functional impact of non-synonymous substitutions on protein function. RANKPEP and NetMHC were used for epitope prediction. Gene expression data was obtained from RNASeq (FPKM>3) or microarray (hybridization intensity≥5.5) experiments. expressed mutated surface genes revealed recurrent mutations in genes belonging to pathways known to be involved in colorectal cancer, including the WNT (LRP5 and FZD10), TGFβ (TGFBR3 and ACVR1B) and RTK-Ras (EGFR and ERBB3) signaling pathways [5]. Our analysis also identified 48 expressed genes that were not previously described as mutated in primary colorectal tumors in the TCGA database [5] (Supplementary  information Table S7). This list includes mutations in 9 genes (SEMA4C, FGFRL1, PKD1, FAM38A, WDR81, TMEM136, SLC36A1, SLC26A6, IGFLR1) that occur in >10% of the cell lines and were confirmed by Sanger sequencing.
Semaphorin 4C (SEMA4C) mutations were detected and validated by Sanger sequencing in 4 cell lines (HCT15, KM12, RW2982, T84). Two of these mutations occur in the SEMA domain, a highly conserved sequence of approximately 500 amino acids critical for inducing targets of Semaphorin signaling. A third mutation occurs in the plexinsemaphorinintegrin (PSI) domain, another highly conserved domain, enriched in cysteine residues ( Figure 2). Recurrent mutations in other genes belonging to the Semaphorin signaling pathway were also observed, including frequent mutations (>20%) in SEMA4G and SEMA4D, some of which also occurring in the SEMA and PSI domains ( Figure 2). Semaphorins are an evolutionarily conserved family of proteins that have been initially implicated in nervous system development and, more recently, in cancer progression and tumor angiogenesis [12,13]. SEMA4C expression is significantly downregulated during stem cell differentiation [14] and plays an important role in TGFβ-1 induced epithelial-mesenchymal transition [15]. To date, there is no published evidence of the direct involvement of SEMA4C in cancer, but somatic point mutations in SEMA4C were also reported by TCGA in 4% of the cutaneous melanomas. Conversely, an important role of the SEMA4DPlexinB1 interaction in regulating different aspects of tumor progression and angiogenesis is well established [16]. In all, alterations in SEMA4 family members were detected in 56% (13/23) of the cell lines, indicating an important role of SEMA4 signaling in colorectal cancer.  Figure 2). Recurrent (>10%) FGFRL1 somatic mutations were also reported in bladder tumors [17]. FGFRL1 acts as a negative regulator of Fibroblast Growth Factor Receptor 1 (FGFR1) signaling either by interfering with FGFR1 dimerization and phosphorylation or by sequestering FGFR1 ligands [18]. FGFR1 amplification and overexpression has been reported in colorectal cancer and associated with the presence of liver metastasis [19]. Indeed, in our study we also detected and validated by Sanger sequencing somatic mutations in FGFR1 in 4 cell lines (HCT116, HCT15, RKO, SW48), including a non-synonymous substitution in the tyrosine kinase domain (Figure 2). Mutations in FGFR2 (LIM2405) and FGFR3 (LOVO, SW48) were also observed at a lower frequency. In all, alterations in FGFR family members were detected in 35% (8/23) of the cell lines, suggesting an important role of FGF signaling in colorectal cancer.
Although the remaining 7 genes (PKD1, FAM38A, WDR81, TMEM136, SLC36A1, SLC26A6, IGFLR1) are mutated in >10% of the colorectal cancer cell lines, literature searches did not reveal evidence of the functional role or therapeutic potential of these genes in colorectal cancer. Nevertheless, recurrent mutations (>3%) in FAM38A, SLC36A1 and WDR81 have been reported for other primary tumors in the TCGA database, and further functional studies will be necessary to address their involvement in colorectal tumorigenesis.

Druggable mutations in cell surface proteins for targeted therapy in colorectal cancer
In order to identify putative druggable mutations in cell surface proteins, we searched for mutated genes present in the DrugGene Interaction Database (DGIdb), which integrates drugtarget information from 13 different sources, including the literature and previously established databases [20]. We generated a catalogue of point mutations in druggable genes, and we found that colorectal cell lines harbor on average 11 mutations in druggable expressed genes (Table 3 and Supplementary  information Table S8).
A significant fraction (34%) of these mutations occurred in membrane transporters. Membrane transporters, including solute carriers (SLCs) and ABC transporters, control the uptake and efflux of amino acids, sugars, lipids and vitamins, and their expression and activity are frequently altered in cancer as a consequence of the higher energy and nutritional requirements of the tumor cells [21]. Membrane transporters represent potential targets for cancer therapy and blocking their activity could be one way to interfere with tumor progression. In addition, membrane transporters can also serve as chemo-sensitizing targets, since they actively participate in drug delivery and resistance [21,22]. www.impactjournals.com/oncotarget Mutations in ABCA3, ABCA7, ABCC1, SLC23A2 and SLC9A1 were each observed in >20% of the cell lines.
We then focused on expressed genes with putative druggable mutations that were not previously considered as potential therapeutic targets for colorectal cancer, but for which specific inhibitors have previously been developed. We particularly focused on surface proteins with kinase activity, as they represent a significant fraction of the genes mutated in cancer and are highly amenable to targeting by rationally designed small molecule inhibitors. Two druggable RTKs (AXL and EPHA2) were found to be frequently (>20%) altered in our cell lines.
Five point mutations in the kinase domain and/or with predicted functional impact in the AXL Receptor Tyrosine Kinase (AXL) were detected and validated by Sanger Sequencing in 22% (5/23) of our cell lines (COLO205, KM12, HCT116, HCT15 and LOVO) (Figure 2). One of these mutations (g.chr19:41726597 C>T) occurring in the GAS6ligand binding domain was also observed in a uterine corpus endometrioid carcinoma and in a glioblastoma. AXL is a member of the TAM family of RTKs, which also includes Mer and Tyro-3 [23]. Mutations in Mer (SW48) and Tyro-3 (HCT15) were also observed. Point mutations in AXL have not been specifically described in the literature for colorectal cancer, and lower mutation frequencies (3.5%) were reported for primary colorectal cancers in the TCGA database [5]. Overexpression of AXL in colorectal tumors was reported in metastatic lesions [24] and AXL was recently characterized as a poor prognostic marker in early stage colorectal tumors, and as an important mediator of basal and 5-FU induced EMT and invasiveness [25].
Point mutations in the EPH Receptor A2 gene (EPHA2) were detected and validated by Sanger sequencing in 3 cell lines (HCT15, LIM1215, LIM2405). Three of these mutations are located in the tyrosine kinase domain and one in the Ephrinligand binding domain (Figure 2). Mutations with predicted functional impact in the Ephrinligand binding domain and in the tyrosine kinase domain of the EPHA1 gene were also observed in 3 cell lines (HCT15, LIM1215 and LOVO) (Figure 2). Point mutations in EPHA2 and EPHA1 have not been specifically described in the literature for primary colorectal tumors, and lower mutation frequencies for these genes (4.4% EPHA1 and 2.6% EPHA2) were reported for primary colorectal tumors in the TCGA database [5]. EPHA2 is overexpressed in tumor cells and in tumor blood vessels in different types of cancer [26]. In colorectal tumors, EPHA2 overexpression was detected in approximately half of the samples and higher expression was associated with advanced stage tumors, metastatic disease and higher microvessels counts [27,28]. Moreover, loss of EPHA2 reduced tumor formation in Apc Min/+ mice [29]. Conversely, elevated levels of EPHA1 were observed in early stage compared to late stage colorectal tumors. Reduced EPHA1 expression was associated with poorly differentiated and invasive tumors and poor overall survival, indicating that EPHA1 may play different roles during different stages of colorectal carcinoma progression [30,31].

Mutated epitopes exposed on the cell surface of colorectal cancer
Nonsynonymous and frameshift mutations in the Surfaceome of the 23 colorectal cancer cell lines were used to identify mutated epitopes with differential binding affinity to HLA when compared to epitopes generated by the corresponding non-mutated (reference) sequences. Our local pipeline for immunogenic epitope prediction was based on two algorithms RANKPEP and NetMHC as described in Materials and Methods. Mutated epitopes were required to have a binding affinity to the HLA*0201 molecule that was at least 20% higher than the reference epitope as predicted by both algorithms. A total of 82 putative mutated epitopes were identified (73 epitopes from nonsynonymous mutations and 9 epitopes from frameshift mutations). However, when we combined gene expression data with epitope prediction analysis, we found that only 30% (25/82) of the predicted epitopes are expressed, and that 92% (23/25) of these epitopes are expressed in a subset of the cell lines with the mutator phenotype. These results suggest that the use of potentially immunogenic mutations in cell surface proteins for personalized Tcell based immunotherapy in colorectal cancer is limited, as only 30% of the mutated epitopes are expressed and less than half (11/23) of the tumors cell lines express mutated epitopes.

Discussion and Therapeutic Implications
One of the major objectives of cancer genome sequencing projects is to identify therapeutically targetable mutations. This objective has been achieved with repeated success in cancer therapy, resulting in the introduction of new treatment protocols in the clinical practice. The use of Imatinib, for chronic myeloid leukemia and other solid tumors, of Trastuzumab and Lapatinib, for ERBB2 positive breast cancer, and of Vemurafenib, for BRAF mutant melanomas, are emblematic examples of how genomic alterations can be used to target cancer cells [32]. Over the past years, these sequencing projects have revealed many new cancer genes, most of which are mutated at intermediate frequencies (2-20%) or lower, uncovering an unprecedented level of genetic heterogeneity in human cancers and establishing the need for a continued effort to determine the functional significance of these mutations and to translate these findings to the bedside [33].
Cell surface proteins constitute optimal targets for directed therapies and represent twothirds of the protein based drug targets [34,35]. Surface proteins are also excellent targets for antibody-based therapies and vaccine development since they are exposed on the cell surface and, therefore, have the highest chances to be recognized as antigens [36]. In the present work, we carried out a systematic mutational analysis of human genes coding for cell surface proteins, aiming to uncover novel altered pathways, druggable mutations and mutated epitopes for targeted therapy in colorectal cancer. We target sequenced the coding regions of cell surface protein genes in a panel of 23 tumor cell lines that altogether are representative of the main subtypes of primary colorectal tumors at the genomic level [11]. We opted to use cell lines in this study, instead of primary tumors, to overcome limitations imposed by the high level of colorectal intratumoral genetic heterogeneity in the mutation detection efficiency and to have straightforward cell models to further address the therapeutic potential of the uncovered altered pathways and druggable mutations.
We found that a significant (57%) fraction of the Surfaceome is reshaped by somatic point mutations in colorectal cancer cell lines. Our analysis identified 48 genes coding for cell surface proteins that were not previously described as mutated in primary colorectal tumors in the TCGA database [5], including mutations in SEMA4C and FGFRL1 which have not been previously considered as potential therapeutic targets for colorectal cancer. Although we cannot exclude the possibility that some of these alterations correspond to mutations acquired during in vitro propagation of the cell lines, our results are in agreement with a recent mutation saturation analysis of 4,742 sequenced tumors, across 21 cancer types [6]. This study revealed that the discovery of cancer genes mutated at frequencies of 5-10% in colorectal tumors is increasing linearly in relation to the number of tumor genomes sequenced, and that the current collection of sequenced colorectal tumors lacks the desired power to detect genes mutated at frequencies of 5% above the background rate [6].
SEMA4C mutations were found in 17% of the cell lines and recurrent mutations in SEMA4G (17%) and SEMA4D (22%) were also observed. The effects of Semaphorins and their receptors in cancer are broad, context dependent and complex [37]. SEMA4C is expressed in neural stem cells and its expression is downregulated during stem cell differentiation [14]. SEMA4C expression is induced by TGFβ-1 in renal epithelial cells and plays and important role in TGFβ-1 induced epithelialmesenchymal transition [15]. In addition, an important role of SEMA4DPlexinB1 interaction in regulating different aspects leading to tumor progression, including invasive growth and angiogenesis, is well established [16]. The proangiogenic effect of SEMA4D was demonstrated both in vitro and in vivo and is comparable to that elicited by other wellknown angiogenic molecules, such as VEGFA, HGF and bFGF [38,39]. Our results suggest that SEMA4 signaling is activated by point mutations in a significant fraction of colorectal tumors, and although specific inhibitors targeting SEMA4 proteins are not currently available, several biological process driven by SEMA4 signaling, such as angiogenesis and invasiveness, could be targeted with FDA approved drugs, including anti-angiogenic agents and MET inhibitors.
Inactivating mutations in FGFRL1, the most recently discovered member of the FGFR family, were detected in 17% of our cell lines. FGFRL1 binds with high affinity to heparin and FGF ligands, but it does not possess an intracellular protein kinase domain and, therefore, cannot signal by trans-auto-phosphorylation [18]. FGFRL1 thus acts as a negative regulator of FGFR1 signaling and loss of function mutations described here may represent a novel mechanism of FGF signaling activation in colorectal cancer. Alterations in FGFR1, FGFR2 and FGFR3 were also observed at a lower frequency, and 35% of the cell lines harbored somatic mutations in members of the FGF signaling pathway. Different FGFR specific inhibitors are currently under development [40], and further evaluation of their activity in the subset of colorectal cancer with FGFR/FGFRL1 alterations should be pursued. Moreover, Regorafenib, a multikinase inhibitor that targets FGFR1 among other RTKs, was recently approved by the FDA for the treatment of advanced colorectal cancer [41], but predictive biomarkers for this indication are not yet currently available.
Higher mutation frequencies in the RTKs AXL (22%) and EPHA2 (17%) were detected in our panel compared to those reported in the TCGA database for primary colorectal tumors (3.51% AXL and 2.63% EPHA2) [5]. Both RTKs have not been considered as potential therapeutic targets for colorectal cancer, however the availability of specific inhibitors and pre-clinical data support their potential use for therapeutic intervention. The oncogenic properties of AXL were initially described in patients with chronic myelogenous and lymphoblastic leukemia (CML), but overexpression of AXL have also been detected in many solid tumors and associated with poor prognosis [23]. AXL has a well established oncogenic role in survival, proliferation and migration of cancer cells in vitro, as well as in tumor angiogenesis and metastasis in vivo [23]. Moreover, recent studies have uncovered a major role of AXL in primary and acquired resistance to several anticancer therapies. AXL overexpression has been linked to Imatinibresistance in gastrointestinal stromal tumors [42], Nilotinibresistance in CML [43] and Lapatinib-resistance in HER-2 positive breast tumor cells [44]. In lung cancer, AXL was identified as a potential target for overcoming EGFR inhibitor resistance and combination of an AXL specific inhibitor (SGI-7079) with Erlotinib reversed Erlotinib resistance in a xenograft model of mesenchymal nonsmall cell lung cancer [45].
In colorectal cancer, AXL expression is associated to increased invasiveness of tumor cell lines with overexpression of the chemokine receptors CXCR4 and CXCR7, and AXL knockdown in these cell lines significantly hampered tumor cell invasion [46]. Considering that many multikinase inhibitors under development have AXL as one of their targets, further exploration of the pharmacologic inhibition of this pathway in preclinical models, including tumor cells lines with resistance to antiEGFR drugs, should be pursued. In addition, monoclonal antibodies and smallmolecule tyrosine kinase inhibitors specifically targeting AXL are currently in development and their use in colorectal cancer patients should also be further explored [47]. Noteworthy, some of the cell lines analyzed herein presented concomitant mutations in AXL and FGFR or FGFRL1 (HCT116, HCT15, LOVO, KM12), which suggests that these mutations are not mutually exclusive. In this setting, it will be important to explore the interdependence of both pathways, specially considering that some multikinase inhibitors under development are capable of blocking AXL and FGFR concomitantly [48]. Indeed, combination of these multi-kinase inhibitors with bevacizumab led to near total inhibition of tumor growth in colon carcinoma xenograft models and caused tumor growth arrest in bevacizumab-resistant tumors [48].
Somatic alterations in EPH receptors were also frequently observed in our cell lines, including frequent mutations in EPHA1 and EPHA2. Point mutations in EPHA2 and EPHA1 have not so far been described in the literature for colorectal cancer. Nevertheless, mutations in the kinase domain of EPHA3 was reported in 5% of colorectal cancer cell lines [49] and EPHA3 was listed among the top 3 cancer genes in a largescale screening for somatic mutations in colorectal cancer [4]. EPH receptors play critical roles in embryonic development and their expression is frequently altered in a variety of cancers and tumor cell lines [50]. They comprise the largest family of RTKs and bind to ephrins (EFN) available on the surface of neighboring cells. Unlike others RTKs, EPH-EFN signaling is unique, since it triggers a bidirectional signal that affects both receptor and EFN expressing cells [50]. EPH receptors are thus important mediators of tumor cell interactions with the tumor stroma and tumor vasculature, and have been proposed as promising targets for cancer therapy, since targeting these receptors could simultaneously inhibit several aspects of tumor progression [26,50]. EPHA2 overexpression in colorectal cancer is associated with advanced stage tumors, metastatic disease and higher microvessel counts [27,28]. Moreover, loss of EPHA2 was shown to reduce Apc Min/+ tumorigenesis [29]. Confirmation of the activation of EPH signaling mediated by EPHA2 point mutations in colorectal cancer is of upmost importance considering the availability of FDA approved drugs targeting this receptor, such as Dasatinib [51]. In addition, EPHA2FC soluble receptors were shown to significantly reduce tumor volume and overall metastatic burden in pre-clinical models of breast [52] and pancreatic tumors [53], but have not been evaluated in colorectal cancer models. Finally, receptor endocytosis promoted by antiEPHA2 monoclonal antibodies has also been used to reduce EPHA2 activity and inhibit malignant cell behavior in vitro [54]. On the other hand, therapies targeting EPHA1 in colorectal cancer should be carefully evaluated since this gene seems to play different roles during disease progression [30,31].
Nonsynonymous and frameshift mutations in tumor cells can generate unique T-cell mutated epitopes and induce tumor antigen-specific immune response [55]. There is evidence supporting the efficacy of vaccination strategies using mutated epitopes [56] and the use of personalized peptide vaccines and adoptive T-cell transfer protocols based on patient-specific mutated epitopes holds great promise in cancer therapy [57]. Unfortunately, combining epitope prediction algorithms and gene expression data, we found that the use of potentially immunogenic mutations in cell surface proteins for personalized immunotherapy in colorectal cancer is limited, since the expression of approximately 70% of these epitopes was not detected in the tumor cells. However, additional studies including mutated epitopes present in intracellular proteins will be required to further address the applicability of personalized vaccines in colorectal patients.
Notwithstanding, we observed that mutated expressed epitopes are predominantly found in colorectal cell lines presenting a mutator phenotype and that this specific subset of cell lines express a total of 23 mutated epitopes. In this context, it was recently demonstrated that patients with tumors showing naturally occurring immunogenic mutations presented higher cytotoxic Tcell infiltration and improved overall survival and, based on these observations, the use general immune modulators that block immune regulatory checkpoints such as anti CTLA4 and antiPD1 was proposed as a treatment strategy for patients with immunogenic mutations [58]. Accordingly, tumors with a high level of mutations as revealed by the TCGA [59], such as melanoma and nonsmall cell lung cancer, are currently deriving striking benefits with immune checkpoint blockage drugs [60,61]. Although our results do not support the use of personalized Tcell based immunotherapy in colorectal cancer, they suggest that colorectal cancer patients harboring tumors with a mutator phenotype could be more responsive to immune checkpoint blockage. Indeed, increased counts of CD8+ T-cells were observed in colorectal cancer tumors with high mutational loads [58] and microsatellite instability [62]. Data on the use of immune checkpoint target drugs in colorectal cancer are still limited, but the results of the first long term follow-up study from the first clinical trial based on the PD1-targeting monoclonal antibody have recently been reported. This study included a 71yearold patient with colorectal cancer who attained a complete and durable (>4 years) response to antiPD1 treatment [63].
To the best of our knowledge, this is the first systematic and focused screen of point mutations in genes coding for cell surface proteins in colorectal cancer. By combining high-throughput sequencing, bioinformatics tools, data integration and literature searches, we have successfully discovered novel altered pathways and druggable mutations for targeted therapy in colorectal cancer. We have also uncovered the potential use of existing RTK inhibitors and immune checkpoint target drugs in specific subsets of colorectal cancer patients. Results presented here are encouraging, however our study also presents some limitations.
First, although we have described novel druggable mutations occurring in a representative panel of colorectal cancer cell lines, it will be important to confirm the prevalence of these alterations in clinical samples matched with normal tissue. At present, we cannot completely exclude the possibility that some of the alterations reported in this study correspond to mutations acquired during in vitro propagation of the cell lines or to very rare germline polymorphisms not represented in public databases, nor in individuals sequenced by the 1000 Genomes Project. However, we believe that these possibilities do not significantly affect our results, since we have previously shown that the rate of mutation accumulation during in vitro propagation is not significant [11] and stringent bioinformatics cut-offs were implemented to filter most, if not all, non-clonal mutations eventually introduced during in vitro growth. Second, the functional consequences of the uncovered genetic alterations were predicted primarily using computational tools, and confirmation with functional in vitro assays is further required. Similarly, additional experiments to evaluate the effects of pharmacologic inhibition of the altered pathways using preclinical models are compulsory to translate our findings to the bedside. Finally, although we suggest potential molecular therapeutic targets in colon cancer, it is important to recognize that a recent study matching targeted therapy with specific molecular abnormalities for patients with advanced colorectal cancer failed to confer significant clinical benefit [64]. We believe that a diversification of potential targets, including those proposed by our study, could bring new opportunities to change this paradigm.

Colorectal cancer cell lines
The panel of 23 colorectal cancer cell lines used in this study was obtained from different sources (Supplementary information Table S2). CACO2, COLO205, COLO320DM, HCT116, HCT15, HT29, LOVO, RKO, SKCO-1, SW1116, SW403, SW48, SW480, SW620, SW837, SW948 and T84 were obtained from the American Type Culture Collection (Manassas, VA). LIM1215 and LIM2405 were generated by the Ludwig Institute for Cancer Research. HCC2998 and KM12 were obtained from the National Cancer InstituteFrederick Cancer DCT Tumor Repository. RW2982 and RW7213 were provided by Dr. P Calabresi from Roger Williams General Hospital. Cells were cultured with Dulbecco's Modified Eagle Medium and 10% FBS at 37 o C and 5% CO 2 . Cell lines were authenticated and tested for Mycoplasma contamination as previously described [11].

Public Data and Databases
Exome-capture sequencing data on colorectal tumors were retrieved from TCGA and used to identify novel mutated genes and to determine mutation frequencies in colorectal cancer primary tumors. The DGIdb [20] was used to identify druggable mutated genes and the gene list provided by the Human Kinome project [65] (kinase.com/ human/kinome) was used to identify genes coding for cell surface proteins with kinase activity.

Somatic mutation detection, validation and functional analysis
For single nucleotide variations (SNVs) detection, SOLiD 4.0 and Illumina reads were aligned to the human reference genome sequence (GRCh37/hg19) using BioScope (Life Technologies) and BWA [66], respectively. For InDels detection, alignments were performed using NovoAlignCS (www.novocraft.com). A local pipeline for point mutations was developed using Samtools mpileup and bcftools [67]. Duplicated reads were removed with rmdup (Samtools) to avoid potential PCR duplicates generated during library construction. Variants were filtered against known germline variations annotated in dbSNP (version #135) and variations present in more than 3 cell lines were manually inspected to distinguish recurrent mutations (eg. EGFR mutations) from false positive mutations due to alignment artifacts. Somatic mutations were validated using PCR amplification and Sanger sequencing using standard protocols (Supplementary information Table S9, S10). SIFT [68], PolyPhen2 [69] and Mutation Assessor [70] were used to evaluate the impact of non-synonymous substitutions and InDels on protein function. Mutations were annotated as having an impact on protein function when predicted by at least two of these algorithms in the case of non synonymous substitutions and by SIFT in the case of InDels.

Epitope prediction
Peptide sequences corresponding to non-synonymous mutations and InDels, flanked by 10 aminoacids on either side, were used for epitope prediction by applying a similar approach to that described by Segal et al. 2008 [76]. The same process was performed for peptide sequences corresponding to the non-altered (reference) sequences. Concatamers of these peptides were analyzed by RANKPEP [77] and NetMHC [78] to identify 9 aa peptide sequences with binding affinity to the class I MHC molecule HLA-A*0201. RANKPEP predicts binding based on scoring matrices from known peptides that bind to MHC molecules. Peptides were considered immunogenic if the percentage optimum was ≥ 50%. RANKPEP also evaluates if the peptide tested results from a known cleavage process and therefore only predicted cleaved peptides were analyzed. NetMHC uses artificial neural networks to predict binding to the MHC molecule. The peptides were considered immunogenic if the IC 50 was ≤ 500nM. To check for predicted cleavage, sequences were then analyzed using the NetChop algorithm [79], and only peptides with predicted cleavage were selected. Results from both algorithms were processed using a local pipeline and epitopes resulting from sequence concatenation artifacts were excluded. Mutated epitopes were defined as those predicted by both algorithms and that were unique to the variant sequence or showing an increase in MHC biding affinity by >20% when compared to the reference peptide.