Reverse phase protein array (RPPA) combined with computational analysis to unravel relevant prognostic factors in non- small cell lung cancer (NSCLC): a pilot study

In this work high throughput technology and computational analysis were used to study two stage IV lung adenocarcinoma patients treated with standard chemotherapy with markedly different survival (128 months vs 6 months, respectively) and whose tumor samples exhibit a dissimilar protein activation pattern of the signal transduction. Tumor samples of the two patients were subjected to Reverse Phase Protein Microarray (RPPA) analysis to explore the expression/activation levels of 51 signaling proteins. We selected the most divergent proteins based on the ratio of their RPPA values in the two patients with short (s-OS) and long (l-OS) overall survival (OS) and tested them against a EGFR-IGF1R mathematical model. The model with RPPA data showed that the activation levels of 19 proteins were different in the two patients. The four proteins that most distinguished the two patients were BADS155/136 and c-KITY703/719 having a higher activation level in the patient with short survival and p70S6S371/T389 and b-RAFS445 that had a lower activation level in the s-OS patient. The final model describes the interactions between the MAPK and PI3K-mTOR pathways, including 21 nodes. According to our model mTOR and ERK activation levels were predicted to be lower in the s-OS patient than the l-OS patient, while the AMPK activation level was higher in the s-OS patient. Moreover, KRAS activation was predicted to be higher in the l-OS KRAS-mutated patient. In accordance with their different biological properties, the Moment Independent Robustness Indicator in s-OS and l-OS predicted the interaction of MAPK and mTOR and the crosstalk AKT with p90RSK as candidates to be prognostic factors and drug targets. Case Report


INTRODUCTION
The identification of druggable driver mutations and rearrangements (e.g., EGFR mutations, ALK, ROS1 and RET-fusions) routinely detected in approximately 15% of NSCLCs has dramatically changed the therapeutic approach for lung adenocarcinoma in the last ten years. However, most lung adenocarcinomas are still treated with standard chemotherapy often showing an opposite behavior in terms of treatment efficacy and survival. Therefore, a better understanding of the multiple defects in signaling pathways, that are dysregulated in these nononcogene addicted NSCLC and play a critical role in their neoplastic phenotype, has the potential to improve outcomes. An important goal of the reverse phase protein array (RPPA), a high throughput proteomic technique, is to provide a map of the signaling pathways involved in cancer survival and progression that are dysregulated in tumor cells. These aberrations could identify novel predictors of response or identify novel targets for therapy [1,2].Since protein profiling can quantify posttranslational modifications (e.g., such as the receptor tyrosine kinases phosphorylation status) that are intimately linked with activation of signaling proteins, signaling network analyses are an important addition to other molecular profiling techniques such as gene expression analysis. The RPPA has been successfully used in NSCLC as well as in other cancers to identify signaling pathway abnormalities, pharmacodynamic markers and proteins associated with therapeutic resistance [3][4][5][6]. Cancer systems biology studies complex interaction networks to identify biomarkers of response and to understand drug resistance along with the effect of combination therapies. It integrates patient or laboratory-based -omics data into robust predictive models [7]. The data obtained with high throughput platforms can be used to create bioinformatics models that can then be used to analyze biological data of a different nature [8]. In this study, we combined RPPA and computational analysis of the EGFR-IGF1R pathways to study two stage-IV lung adenocarcinoma patients treated with standard chemotherapy with marked differences in survival (128 months vs 6 months from diagnosis) and whose tumor samples exhibit a distinctive signal transduction pattern. We integrated RPPA data with a mathematical model previously generated to describe key signaling pathways in NSCLC using robustness analysis as a quantitative measurement indicating the cell ability to maintain its functions against internal and external perturbations. Here we present results that might be of great value though they are derived from the analysis of two cases with divergent behavior.

Case presentation
Case 1. A 67-year-old never smoker woman with a KRAS-mutated (Gln61His), EGFR wild-type (WT), PI3KCA WT stage IV [T3N2M1a TNM v.7.0] lung adenocarcinoma, diagnosed in March 2006 and an OS of 128 months (long survival: l-OS) on standard chemotherapy treatment. Her clinical course is described in Figure 1A.
Case 2. A never smoker 72-year-old woman diagnosed in June 2012 with stage IV [cT3N2M1a TNM v.7.0] lung adenocarcinoma (video-thoracoscopic biopsies) with the molecular phenotype being EGFR WT, KRAS WT, BRAF MUT (V600E), ALK negative, ROS1 negative, RET negative. Her OS was 6 months from diagnosis (short survival; s-OS) with massive systemic disease progression after 3 cycles of first-line chemotherapy with cisplatin plus pemetrexed, sudden worsening of ECOG Performance Status (PS) and death due to disease progression in November 2012 ( Figure 1B).

RPPA and computational analysis
RPPA result analysis showed that 20 endpoints (active sites and/or total proteins) were differently activated in the two patients. Among these, BAD S155/136 and c-KIT Y703/719 proteins had higher activation levels in the s-OS patient while p70S6 S371/ T389 and b-RAF S445 proteins had lower activation levels in the s-OS patient (Table 1 and Figure 2A heat map). The computational model included 21 relevant proteins ( Figure  2B). Thirteen proteins were in our previous mathematical model [9] and 8 proteins were added in the pathways using both literature information and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Database ( Table 2 and Section 1 in Supplementary Material) [10]. Following the flow diagram in Figure 3A, calibration algorithm, we used BAD, c-KIT, p70S6 and b-RAF to personalize the general model of the patient with s-OS and l-OS, respectively. The calibration procedure based on the Conditional Robustness Algorithm (CRA) [11] and Moment Independent Robustness Indicator (MIRI) ( Figure 3B) produced 10 parameters that had MIRI index higher than the cut-off threshold (Box plot in Figure 4, green threshold). In the validation process ( Figure 3A), we fixed the 10 parameters to fit the RPPA data using their specific induced probability density function (pdf) tails for s-OS and l-OS patients, respectively ( Figure  5 and Table S7 in Supplementary Material). The s-OS patient and the l-OS patient personalized models were simulated in silico and MIRIs for all 19 nodes versus all non-fixed parameters were generated. The heat map in  The records with * are the antibodies used in the model that are related to the EGFR-IGF1R model [9] according to KEGG Pathway Database. * are the antibodies included in the mathematical model ** ratio is in log10 scale www.impactjournals.com/oncotarget   Figure 4 shows the difference between the MIRI of the s-OS patient and l-OS patient. The higher MIRI indicates higher robustness of the network proteins. The overall robustness of the s-OS patient was also higher than the l-OS patient. The dendrogram in Figure 4 clustered the proteins that mainly contribute to the higher robustness of the s-OS patient. These proteins were represented with two main interactions between proteins included in the mTOR and MAPK cascades, namely AKT and p90RSK pathways ( Figure 5). We used the endpoints ERK, AMPK and mTOR, having higher closeness centrality (CC) and eccentricity (E) in the pathway to validate the personalized models (Green nodes in Figure 5 and Section 3 in Supplementary Material). We generated in silico measures of the proteins BAD, c-KIT, p70S6 and b-RAF and of the proteins ERK, AMPK and mTOR used for the calibration and validation, respectively. Figures 6 and 7 show the conditional pdfs for the in silico measures of the evaluation functions that are the area under the curve of the chosen proteins. These evaluation functions are a computational index of the protein activation levels. The model generated the expected distributions for the proteins BAD, c-KIT, p70S6 and b-RAF ( Figure 6). Furthermore, the model predicted their behavior according to RPPA data: mTOR and ERK protein levels were lower in the s-OS than in the l-OS patient, while AMPK protein level was higher in the s-OS than in the l-OS patient as seen in the original data (Figure 7 green boxes). Moreover, the model predicted higher expression levels of KRAS in the l-OS patient who was found to have a KRAS somatic mutation in her tumor sample (Figure 7 red box).

DISCUSSION
This work combined high throughput proteomic analysis, namely RPPA, and computational analysis to study the complex signaling network of two stage IV (cT3N2M1a) lung adenocarcinoma patients treated with standard chemotherapy with an opposite clinical behavior in term of OS. The molecular profiling of their tumors showed that the l-OS patient was KRAS-mutated (Q61H) and the s-OS patient was BRAF-mutated (V600E), but neither of them had access to targeted therapy. The RPPA was used to calibrate the model in both patients and to validate predictions of the model. The main predicted information was the higher overall robustness for the s-OS patient in all the proteins included in the model. This could explain the shorter survival of the s-OS patient whose tumor was more aggressive and the signaling network had a structurally higher probability to maintain the neoplastic phenotype and overcome the effects of standard chemotherapy.
The MIRI applied to each protein in the calibrated models were a measure of robustness of the model. The proteins in the pathway with higher MIRI were the ones that contributed to the overall robustness of the tumor. The s-OS had higher MIRIs than l-OS and this can be associated with a more adaptable resistance of the cancer cells to the therapeutic treatment. Ideally, in the s-OS patient only a multi-target treatment could have helped to obtain a better response. The higher robustness was within two main interactions: the MAPK cascade including KRAS, b-RAF, MEK, ERK and LKB1 that interacts with mTOR and AKT, BAD and Caspase proteins that crosstalk with p90RSK [12][13][14]. While the mTOR and MAPK cascade interactions are known to play a relevant role in lung cancer cell signal transductions [15], to our knowledge the interaction between p90RSK and BAD in the anti-apoptotic process has never been described and might represent a potential prognostic factor in lung cancer.
The computational model helps to interpret RPPA data even when single events are measured.
In the validation process, the s-OS and l-OS pdf of the active proteins were used. The relative values of RPPA data in s-OS and l-OS were comparable with the relative pdf of s-OS and l-OS generated in silico. If the RPPA value of a protein was lower in the s-OS rather than  in the l-OS, we expected the pdf-mean value of the s-OS evaluation function to be lower compared to that of l-OS.
The calibrated models for s-OS and l-OS had 10 parameters fixed in opposite conditions of the induced tails distributions to fit the 4 divergent measured proteins. The other 76 parameters were free to move in the validation process using 10000 samples for each of 100 realizations. Fifteen percent of the active proteins in the pathway influenced the tumor growth of the two patients in opposite ways. The simulations of the calibrated models were stable as can inferred from the small variations of the distributions among the 100 realizations. Our calibration  and validation algorithms were based on only two patient's RPPA data and this could be a critical point of the study. It could be solved by collecting data of similar patients and performing a statistical analysis to select the most divergent proteins. The model prediction of KRAS level being higher in the l-OS patient, together with the somatic mutation and with high ERK activation, could lead to the activation of the senescence program of the cancer cells as has been validated in human colon adenocarcinoma cancer models [16]. The mTOR and ERK protein expression that in RPPA data were lower in the s-OS patient than in the l-OS one was confirmed with simulations of the calibrated models.
The nonlinear interactions in the signaling network produced complex behaviors and the computational approach is crucial to discovering this complexity and transferring results to clinical application. The proposed method could be used to stratify patients with similar clinical pathological tumor classification according to the active protein profiles at diagnosis. Using the most divergent protein expression among the clusters obtained from RPPA data, we can calibrate the network and study the pathways associated with the clusters. The use of the CRA algorithm in the calibrated networks will predict the proteins playing the main roles in cancer robustness and these predictions could be used to increase patient survival with personalized drug treatments. Our approach could be applied to tumors that are genetically similar but have different protein profiles. This could be done by using RPPA technology that is a high sensitivity system and also by using the computational method that has an internal control of the distribution stability among independent realizations generated in silico. However, when the protein expressions of the samples are very similar, our methodology is not applicable.
In conclusion, coherently with their different biological properties, the Moment Independent Robustness Indicator in s-OS vs l-OS patients confirmed the MAPK cascade interaction with mTOR and predicted new crosstalk of AKT, BAD and Caspase proteins with p90RSK that could be a candidate for prognostic factors or drug targets.

Tumor and data collection
Two patients followed at the Medical Oncology Department of the S. Maria della Misericordia Hospital with a diagnosis of stage IV lung adenocarcinoma having a short (s-OS: 6 months) and long (l-OS: 128 months) overall survival, respectively, were identified from the institutional database. Pathologic data, tumor genotype, treatment type, and radiological parameters were gathered from retrospective chart extraction. The pilot study was approved by the local Ethics Committee and was conducted in accordance with ethical principles of the latest version of the Declaration of Helsinki. Written informed consent for RPPA analysis was obtained from the two patients included in the pilot study.

Mathematical model
We developed an ordinary differential equations (ODEs) model starting from the most relevant proteins involved in the signaling pathways studied. Michaelis-Menten kinetics, mass action law and conservation law were combined to write the equations (see Section 1 in Supplementary Material). We selected the most divergent proteins based on the ratio of RPPA values in the two patients (s-OS vs l-OS) and used them to build the ODEs model (Table 1, Figure 2A). We included the proteins inserted in our pre-existing EGFR-IGF1R mathematical model [9] and the pathways were extended using KEGG PATHWAY Database combined with a literature revision in lung cancer pathways ( Figure 2B). The isolated proteins were excluded because they did not contribute to the model dynamics (Table 2 and Section 1 and 2 in Supplementary Material) [10].

Model calibration based on conditional robustness
The CRA proposed in [11] was an iterative procedure to study the dynamical model robustness perturbing the model parameters. CRA is based on the probability distribution of the evaluation function and MIRI that measures the dissimilarity of pdf tails. The evaluation function tails induce for each parameter of the model the pdf tails and their dissimilarity is measured through the MIRI. As an example, Figure 3B shows the induced pdf tails and their MIRI for a node N (proteins) that interact with 3 other nodes through the 3 edges described in the ODEs with 4 parameters: p1, p2, p3 and p4.
The calibration and validation processes are presented in the flow chart in Figure 3B. We selected the 4 most divergent proteins of s-OS and l-OS RPPA data as evaluation functions (Figure 2A) and we performed in silico simulations from the hypercube of parameters space generating the pdf of the selected proteins. We evaluated the MIRI for all parameters using the intersection of the evaluation function tails. The calibration process was repeated for 10 realizations using 100000 samples for each of them (Box plot Figure 4).
After calibration, we fixed 10 parameters with opposite conditions for s-OS and l-OS simulations having higher MIRI (Green line in the box plot Figure  4). We performed a CRA algorithm using as evaluation functions all the proteins in the model and we clustered the difference of s-OS and l-OS MIRIs to explore the most relevant proteins in network robustness (heat map with row dendrograms in Figure 4 and Figure 5). We selected the network proteins with higher closeness centrality (CC) and eccentricity (E) to compare the simulations with RPPA data (Green nodes in Figure 5). We checked the calibration results comparing the conditional distributions in s-OS and l-OS ( Figure 6) and we validated the model predictions comparing the simulation results and RPPA data ( Figure 7 and Section 3 in Supplementary Material).
The mathematical model and the computational analysis were performed using Python programming language (Supplementary Data).

Author contributions
LV, CR, BF, CL: ideation and planning of the study; BL, KAH, PE, PM: Reverse Phase Protein Microarray analysis and interpretation; BG: histopathological evaluation; BS, SA, TFR: DNA extraction and Sanger sequencing; TL, AC, BF: computational model and analysis; LV, CR, BF: drafted the initial version of the manuscript, which was subsequently revised. All authors have read and approved the final manuscript.