Unique signalling connectivity of FGFR3-TACC3 oncoprotein revealed by quantitative phosphoproteomics and differential network analysis

The FGFR3-TACC3 fusion is an oncogenic driver in diverse malignancies, including bladder cancer, characterized by upregulated tyrosine kinase activity. To gain insights into distinct properties of FGFR3-TACC3 down-stream signalling, we utilised telomerase-immortalised normal human urothelial cell lines expressing either the fusion or wild-type FGFR3 (isoform IIIb) for subsequent quantitative proteomics and network analysis. Cellular lysates were chemically labelled with isobaric tandem mass tag reagents and, after phosphopeptide enrichment, liquid chromatography-high mass accuracy tandem mass spectrometry (LC-MS/MS) was used for peptide identification and quantification. Comparison of data from the two cell lines under non-stimulated and FGF1 stimulated conditions and of data representing physiological stimulation of FGFR3 identified about 200 regulated phosphosites. The identified phosphoproteins and quantified phosphosites were further analysed in the context of functional biological networks by inferring kinase-substrate interactions, mapping these to a comprehensive human signalling interaction network, filtering based on tissue-expression profiles and applying disease module detection and pathway enrichment methods. Analysis of our phosphoproteomics data using these bioinformatics methods combined into a new protocol—Disease Relevant Analysis of Genes On Networks (DRAGON)—allowed us to tease apart pathways differentially involved in FGFR3-TACC3 signalling in comparison to wild-type FGFR3 and to investigate their local phospho-signalling context. We highlight 9 pathways significantly regulated only in the cell line expressing FGFR3-TACC3 fusion and 5 pathways regulated only by stimulation of the wild-type FGFR3. Pathways differentially linked to FGFR3-TACC3 fusion include those related to chaperone activation and stress response and to regulation of TP53 expression and degradation that could contribute to development and maintenance of the cancer phenotype.

The labelled peptides isolated from the four samples were mixed in equal proportions and the workflow undertaken to enrich the fraction of phosphorylated peptides from the unphosphorylated ones is based on the "workflow 2" previously described [1] with minor changes. Briefly, for the anti-phosphotytrosine peptide immunoprecipitation, the antibody PT66 (Sigma-Aldrich, Poole, UK) was replaced with pY1000 (Cell Signaling Technology, Hitchin, Hertfordshire, UK), while 4G10 (Millipore) and pY100 (Cell Signaling Technology, Hitchin, Hertfordshire, UK) were kept. Elution was performed with 0.5% formic acid so that the sample could be directly injected into the mass spectrometer without the need for an additional clean-up step.
The TiO 2 chromatography was performed on the unbound fraction using the TiO 2 phosphopeptide enrichment kit (Thermo Fisher Scientific, UK) followed by a clean-up step with graphite columns (Thermo Fisher Scientific, UK), according to manufacturer's instructions, and samples were prepared for injection as previously described [1]. The flow through (FT) collected after the TiO 2 chromatographic step was kept, with an aliquot corresponding to 1/50 of the available volume desalted with the Zeba™ Spin Desalting Column (Thermo Fisher Scientific, UK) injected into the mass spectrometer for normalisation purposes and overall proteomic evaluation of the samples.

Extended methods for LC MS-MS and proteomic/ phosphoproteomic data identification and quantification
LC-MS/MS analysis was performed using an LTQ-Velos Orbitrap mass spectrometer (Thermo Fisher Scientific, UK). Peptide samples were loaded using nanoACQUITY UPLC (Waters, Elstree, U.K.) with Symmetry C18 180 μm × 20 mm (Waters, Elstree, UK, part number 186006527) trapping column for desalting and then introduced into the mass spectrometer via Stonearch fused silica capillary column: 100 μm i.d., 360 μm o.d., 15 cm length, 5 μm C18 particles (Nikkyo Technos CO, Tokyo, Japan part number NTCC 360/100-5-153) and a nanoelectrospray ion source at a flow rate of 0.40 μL/min. The mobile phase comprised H 2 O with 0.1% formic acid (buffer A) and 100% acetonitrile with 0.1% formic acid (buffer B). The gradient ranged from 1% to 50% buffer B in 136 min and a step gradient to 85% B for 20 min with a flow of 0.40 μL/min, finally a return to the initial conditions of 1% B for 20 min [1,2].
The full scan precursor survey MS spectra (400-2000 m/z) were acquired in full profile with the Velos-Orbitrap analyser at a resolution of r = 60,000.
This was followed by data dependent MS/MS fragmentation in centroid mode of the most intense ion from the survey scan: electrospray voltage 1.5 kV, capillary temperature 200°C and isolation width 2.00. First, using collision induced dissociation (CID) in the linear ion trap: normalized collision energy 35%, activation Q 0.25 and activation time 10 ms. For the second MS/MS event, the most intense ion from the survey scan was fragmented using higher energy collision dissociation (HCD) in the HCD collision cell: normalized collision energy 50%, resolution 7500 and activation time 0.1 ms. The two MS/ MS scan events were repeated for the top 10 peaks in the MS survey scan, the targeted ions were then dynamically excluded for 30 s. Singly charged ions were excluded from the MS/MS analysis and Xcalibur software version 2.1.0 SP1 build 1160 (Thermo Fisher Scientific, U.K.) used for data acquisition. Raw data were analysed using Proteome Discoverer (PD v1.3) with Mascot search engine and Swiss-Prot human proteome database (v11/16). Up to 2 trypsin missed cleavages were allowed, carbamidomethylation was set as a fixed modification, while methionine oxidation, phosphorylation of serine, threonine and tyrosine were set as variable modifications. Furthermore, TMT 6-Plex labelling was set as variable modification to evaluate the labelling efficiency of each sample, but set as fixed during the analysis of phosphopeptide-enriched samples and of the unbound fraction. Mass tolerance was set to 7 ppm for the precursors and to 0.8 Da for the fragments. FDRs were chosen at the peptide level to generate two different datasets: High Confidence Quantitative Phosphoproteomic dataset (HC) with an FDR ≤ 0.01, and Quantitative Phosphoproteomic dataset (QP) with FDR ≤ 0.05.
In order to identify and quantify phosphopeptides, a workflow was created in PD to sequence a peptide based on CID-derived spectra and assign a quantification based on the corresponding HCD-derived spectra. To statistically evaluate peptide identifications and obtain probabilistic scores for phosphorylation site(s) localisation, both the Peptide validator and the phosphoRS [3] nodes were added to the workflow. Phosphosite localisation was manually inspected if the PhosphoRS probability was lower than 0.75. The quantitative information obtained from analysis of the FT was used to normalise the samples and calculate a correction factor to account for systematic bias incorporated in peptide measurements -for example, due to variations in sample loading.
A total intensity normalisation approach was applied, which is based on the assumption that the total intensity of each reporter ion for each sample remains the same in the four conditions [4]. The correction factors, calculated from the unbound fractions, were used to normalise the intensities of the reporter ions collected in the phosphopeptide-enriched fractions (IP and TiO 2 ).
Phosphosite localisation was finalised by choosing the assignment corresponding to the highest probability score calculated by PhosphoRS. Phosphopeptides having the same phosphorylation pattern across one or more replicates were grouped together and relative ratios (FUS Hep: WT Hep; FUS FGF1: WT FGF1; WT FGF1: WT Hep) were obtained by averaging the quantitative information obtained from each data point in each replicate. Finally, a mean ratio was obtained, averaging the ratios obtained in each individual replicate, where available. Standard deviations and coefficients of variation were calculated when more than one data point (i.e. measurement per replicate) was available.
The reference proteome was obtained by analysing the sample prior to enrichment (TMT mix) and TiO 2 FT. The FDR at the peptide level was set at 0.01 and was required to have at least 1 unique quantified peptide per protein to include the identified protein into the dataset. For quantification, the intensities were normalised over the overall protein median, as PD workflow suggests in its settings.

Extended methods for Immunoprecipitation and blotting
15 or 30 µg of total protein lysates were loaded in a 4-15% or 7.5% (to visualize FGFR3 only) SDS polyacrylamide gel (Bio-rad, Watford, UK) and run under denaturing conditions. At the end of the run, separated proteins were transferred onto a nitrocellulose membrane using a semi-dry transfer system (Bio-rad, Watford, UK) for 120 minutes at 20 V. Membranes were blocked for 1 hour, under orbital agitation, either with 3% (w/v) NFDM (Bio-Rad) or 3% (w/v) BSA (Sigma, Aldrich, Poole, UK), depending on primary antibody requirements. Primary antibody incubations were performed overnight under orbital agitation at 4°C by diluting the antibodies accordingly to manufacturer's instructions; secondary antibodies were incubated for 1 hour under orbital agitation at room temperature. Immunoreactive bands were detected using the SuperSignal West Pico Chemiluminescent Substrate (Thermo Fisher Scientific, UK) and developed either by exposing the membrane to an X-ray film (Amersham) or by digital imaging. Densitometric analysis used ImageStudioLite (v. 5.2.5) for experiments run at least in triplicates on three independent biological replicates, while statistical evaluation of the results and graphing used Prism 7.
The PathScan® Intracellular Signaling Array Kit (Cell Signaling, Hitchin, Hertfordshire, UK) was used to explore the phosphorylation status of proteins fundamental to signal transduction. The assay was performed according to the manufacturer's instructions, with total lysates prepared in the lysis buffer provided in the kit and diluted to 0.35 mg/mL prior to incubation on the slides. Slide images were captured using Odyssey Fc System (Li-Cor Biosciences) and analysed with ImageStudioLite v.5.2.5. The experiment was performed on total protein lysates from two independent growths, with densitometric analysis performed using ImageStudioLite. Signals were normalised towards the positive control, followed by the relative comparisons across all four conditions. Graphing and statistical analyses used Prism 7, unpaired Student's t-test and p-value cut-off = 0.05.
Immunoprecipitation for both FGFR3 and RT112FUS were performed as in [5]. Immunoprecipitation of Topoisomerase IIα used 100 ug of starting material. Total lysates and the anti-TOPO IIα antibody (Cell Signaling, Hitchin, Hertfordshire, UK) were mixed in 100:1 ratio (total lysate : antibody) and incubated over night at 4°C with gentle agitation. Protein A sepharose beads (Invitrogen, Thermo Fisher Scientific, UK) were added to the mixture and incubated for three hours under slow agitation. After a brief, gentle centrifugation (1 minute at 2000 g), the unbound fraction was removed, beads washed 5 times with lysis buffer (RIPA) and eluted by boiling the sample for 5 minutes in 3X Lamely sample buffer containing the reducing agent. Eluted samples were divided into two aliquots of equal volumes, with western blots to assess levels of expression of TOPO IIα and pS 1469 TOPO IIα (detected with the Cell Signaling, Hitchin, Hertfordshire, UK antibody).

Effective networks for C1 and C3
We modelled the signalling context of the kinase and substrate seed proteins relevant to each of the conditions C1 and C3 by creating a comprehensive human signalling network (HSN) from the pathway information collected in the Pathway Commons database [6]. All human proteins and their regulatory and physical interactions were included in the human signalling network with special attention to protein phosphorylation -we considered the Pathway Commons binary relations: "controls-expressionof", "interacts-with", "controls-phosphorylation-of", "controls-state-change-of" and "in-complex-with" as explained in www.pathwaycommons.org/pc2/formats.
We derived an effective network from the HSN for a set of 527 seed proteins (comprising all distinct proteins in the QP dataset and all those predicted to interact with any of the QP dataset phospho-sites according to the NetworKIN database). This effective network is a subnetwork of the HSN that contains the seed proteins and those directly affected by them, which are identified by applying a commute-time kernel (CT kernel) on the HSN.
The CT kernel matrix of the HSN integrates the network topology on the distance measures between nodes. Therefore, the CT-kernel provides a richer measure of the network distance indicating how effective the connection is between any two nodes in the HSN [7]. We mapped protein seeds onto the CT kernel and gradually incorporated proteins effectively connected to the seeds. To define the threshold of the CT kernel at which two proteins are considered effectively connected, we used the articulation nodes. The articulation nodes of a connected network are the minimum set of nodes that need to be removed in order to make it disconnected. In terms of 'information flow', any two nodes on a connected network will have a path connecting them, and any information flowing between them can be disrupted by removing a (relatively small) subset of nodes on the path; these articulation nodes are the essential elements in keeping the network connected and maintaining the integrity of information flow through it [8].
Starting from the 527 protein seeds, more network neighbours were gradually included by relaxing the threshold of the CT kernel-based distance. For each of these CT-kernel cut-offs, a number of network neighbours that are articulation nodes, plus other neighbours that are not essential to the network's integrity were included. A CT kernel cut-off that maximises the ratio of articulation nodes to total nodes was used to define the effective network, as shown in Illustration 1.

Illustration 1. Choice of cut-off for the CT kernel.
The ratio of articulation to total nodes for each CT cut-off indicates the optimal choice in order to maximise the number of important articulation nodes, while minimising non-essential nodes.
From the overall CT kernel HSN containing 19,446 proteins and 755,299 interactions, the final effective network obtained using the 527 seeds and CT cut-off 10% contained 4,991 proteins and 308,564 edges. Further filtering was applied to the effective network to remove proteins not found in either urothelial cells or urothelial tumour cells (using Human Protein Atlas data -method described below) resulting in a final connected filtered effective network comprising 4,716 proteins and 281,403 interactions.

Generating detailed local networks
We identified Reactome pathways that were significantly differentially modulated when comparing FGFR3-TACC3 fusion cell signalling (i.e. condition C1) with FGF-induced signalling in FGFR3b-expressing cells (i.e. the wild-type model, condition C3). Detailed local networks were created by overlaying multiple sources of network information, including the Reactome database and a published functional interactome (Wu et al 2010) alongside other regulatory and physical interactions annotated in the effective network. Each of the detailed network figures (Supplementary Figures 5 and 6) was generated by expansion around the associated differential pathway proteins as outlined below.
Create an induced subgraph on the filtered effective network using the set of "seed" proteins connected to the relevant differential Reactome pathways, plus the proteins not found in Reactome during differential network analyses (as these are still part of the effective network) (Illustration 2).
Extend the network using the ReactomeFIViz tool [9]. This Cytoscape plugin uses a set of curated functional protein-protein interactions and a supplied list of proteins (here, all proteins in [1]). Proteins that link seeds to the non-Reactome proteins were found using the plugin's 'get linkers' option, alongside functional annotation of the interactions (Illustration 3).
Add predicted substrate-kinase and substrate-SH2 interactions from both the QP data and NetworKIN. All QP phospopeptide proteins and the predicted kinase/SH2 interactors were added to network [2] where either the substrate or the kinase is a member of network [2].
Filter interactions. The resulting network [3] was simplified by removing redundant or non-informative interactions where there are multiple edges between two proteins. Further filtering removed proteins that were weakly connected -i.e. if they had degree one and were not connected to any QP, HC or proteome proteins.
Annotate network at protein or interaction-level to indicate important features, e.g. by representing QP or HC proteins with bold outline, or by highlighting phospho-site interactions that significantly differ in their quantitation ratios between this condition (e.g. C3) and its comparator (i.e. C1) (Supplementary Figures 5 and 6).
Detailed networks generated for two differential pathways are presented in Supplementary Figures 5  and 6.

HPA processing
We obtained data from Human Protein Atlas (HPA) for normal tissue (downloaded 10th June 2016) and tumour (downloaded 17th June 2016). Protein expression levels for normal tissue data were used to filter out proteins from C3-based networks, by first filtering the HPA normal dataset by tissue/cell-type (tissue="urinary bladder" and cell type="urothelial cells") and then identifying proteins with undetectable expression levels from either annotated protein expression (APE) or staining experiments where the reliability was classed as 'supportive'. Tumour expression was used to filter out proteins from C1-based networks by first filtering HPA tumour data (tumour = "urothelial cancer"). HPA reporting for tumours uses antibody based profiling for the 15,297 genes with anitbodies available, reporting expression levels (not detected, low, medium and high), the number of patients with this protein/expression level and the total number of patients for each tumour type. To filter C1 networks, proteins having undetectable expression levels across all patients were removed.
Proteins that were not expressed according to HPA, but were found in the relevant C1 or C3 proteome datasets were not filtered out; this reflects that HPA protein expression in urothelial/tumour cells may not exactly match that of our NHUC cells.

NetworKIN data import and processing
NetworKIN data downloaded and imported via the full tab-separated file (5th Feb 2016), comprised 459,749 rows specifying individual phosphosites on substrates, predicted kinases, SH2 domains, confidence scores and Ensembl identifiers [10].
Ensembl identifiers were mapped to UniProt accessions using UniProt ID mapping file from ftp site (below) on 18th June 2016. ftp://ftp.uniprot.org/ pub/databases/uniprot/current_release/knowledgebase/ idmapping/by_organism/ All subsequent SQL-based queries filtered any NetworKIN confidence scores ≤1 to keep only the ~50% best predicted substrate interactors resulting in 145,547 kinase and 135,396 SH2 interactions in the database. Assigned UniProt accessions for each phosphosite (in either QP or HC datasets) were matched by NetworKIN substrate identifier and the sequence similarity score between observed phosphosite and the site's sequence annotated in NetworKIN (using Oracle SQL function UTL_MATCH.EDIT_DISTANCE_SIMILARITY). To reduce the number of irrelevant and/or non-informative matches returned, we further filtered here using HPA.

DIAMOnD runs
We used the script DIAMOnD.py (version 05/12/2014) with parameters n = 200 and alpha = 1 for differential analysis of C1 and C3 pathway enrichments.
These parameter values are consistent with those suggested by DIAMOnD authors as sensible choices over a range of benchmark tests [11].

Database and scripting
We created a custom Oracle 12c database to import, clean and query datasets, with supporting scripts developed in R version 3.3; final network rendering and annotation used Cytoscape v3.4 [12].
Illustration 2: Mapping of differential pathway Reactome seed proteins and non-Reactome proteins to the filtered effective network. (A) Seed proteins (example for C3 pathways 3 & 4) with interactions inferred from the filtered effective network.
(B) Seed proteins and other proteins found from differential pathway analysis but not in Reactome are mapped onto the filtered effective network [Green/blue text: C3/shared seeds for pathway; bold outline: observed in QP and/or HC]. Illustration 3. Functional Interactome (FI) returned by using all differential pathway seeds, non-Reactome proteins and the linkers that connect them. Seed proteins are now functionally connected with the non-Reactome singletons (e.g. SPRY3, SGTA & BCAR3). Weakly connected proteins (i.e. of degree 1 such as BCAR3), although not informative here, are used to query the phosphoproteomics dataset for kinase-substrate interactions [Green/blue text: C3/shared seeds; light grey text: non-Reactome protein; (heavy) bold outline: observed in HC and/or QP; yellow fill: observed C3 proteome; edge colours for HSN as before.]