EPH/ephrin profile and EPHB2 expression predicts patient survival in breast cancer

The EPH and ephrins function as both receptor and ligands and the output on their complex signaling is currently investigated in cancer. Previous work shows that some EPH family members have clinical value in breast cancer, suggesting that this family could be a source of novel clinical targets. Here we quantified the mRNA expression levels of EPH receptors and their ligands, ephrins, in 65 node positive breast cancer samples by RT-PCR with TaqMan® Micro Fluidics Cards Microarray. Upon hierarchical clustering of the mRNA expression levels, we identified a subgroup of patients with high expression, and poor clinical outcome. EPHA2, EPHA4, EFNB1, EFNB2, EPHB2 and EPHB6 were significantly correlated with the cluster groups and particularly EPHB2 was an independent prognostic factor in multivariate analysis and in four public databases. The EPHB2 protein expression was also analyzed by immunohistochemistry in paraffin embedded material (cohort 2). EPHB2 was detected in the membrane and cytoplasmic cell compartments and there was an inverse correlation between membranous and cytoplasmic EPHB2. Membranous EPHB2 predicted longer breast cancer survival in both univariate and multivariate analysis while cytoplasmic EPHB2 indicated shorter breast cancer survival in univariate analysis. Concluding: the EPH/EFN cluster analysis revealed that high EPH/EFN mRNA expression is an independent prognostic factor for poor survival. Especially EPHB2 predicted poor breast cancer survival in several materials and EPHB2 protein expression has also prognostic value depending on cell localization.


INTRODUCTION
Breast cancer prognosis and treatment mostly relies on a few markers such as the estrogen receptor (ER), progesterone receptor (PgR), the human epidermal growth factor receptor 2 (HER2/neu) and tumor stage. Positive ER and PgR expression helps identifying patients more likely to benefit from endocrine treatment while HER2 over expression and amplification predicts response to trastuzumab and lapatinib [1,2]. Despite advances in breast cancer prognosis and treatment [2,3] some patients have a short disease-free survival period demonstrating the need of better clinical markers.
Therefore we proposed to screen the EPH receptor family and its ligands. The EPH receptors belong to the largest family of tyrosine kinase receptors with implications in cancer [4][5][6]. The name EPH derived from an Erythropoietin-Producing Hepatocellular carcinoma cell line used to clone the receptor for the first time [7,8]. EPH receptors, together with membranebound ligands (ephrins) play a crucial role not only in mammary gland development but also in carcinogenesis [9]. EPH and ephrins influence cell adhesion, cell migration, intercellular junction formation, cell shape, cell motility, cell guidance and pattern formation [10,11]. www.impactjournals.com/oncotarget The EPH family is composed of subclasses A and B, based on sequence homology, and structural features. EPHA receptors are attached to the plasma membrane by a glycosylphosphatidylinositol tail and preferentially bind ephrin-A ligands. EPHB receptors have a single trans-membrane domain and a short cytoplasmic tail and usually recognize transmembrane ephrin-B ligands [12]. Upon cell-to-cell contact, EPH receptors and ephrins interact and transduce signals in a bidirectional manner. Bidirectional signals are defined as "forward signals" when deriving from EPH receptors present in epithelial cells and "reverse signals" when transmitted by the ephrin ligands expressed by, for example, endothelial cells. The EPH-ephrin interaction is usually restricted to members of the same class but hetero-dimerization between EPHA and EPHB members and ephrins takes place [13].
EPH receptors can also "cross talk" with other signaling molecules [14] and receptor tyrosine kinases (RTK) [15]. Therefore, it is believed that EPH-ephrin interactions are complex and promiscuous affecting both the normal and malignant epithelium [9].
Among the family members, EPHA2 and EPHB4 are the most studied in breast cancer and additionally EPHA4, EPHA7 and EPHB6 emerged as promising clinical candidates in an expression profile of the individual EPH and ephrin family members [16]. Here, we explored whether a cluster analysis of the EPH/ephrin gene expression levels would reveal patient subgroups with different clinical outcome. For this purpose the EPH/ EFN gene expression was quantified using TaqMan ® Array Micro Fluidics Cards containing 21 EPH/ephrin family members and then proceeded to group the patients based on their gene expression levels. This approach, which differs from the one used in a previous study [16], allowed us identifying a subgroup of patients with higher expression levels of the EPH/EFN genes and more frequent relapse of the disease compared with the rest of the patients. Also, in addition to the previous report, we found that EFNB1, EFNB2 and EPHB2 were interesting candidates due to the strong correlation between these genes and the cluster groups. EPHB2 was identified as an independent prognostic factor in multivariate analysis and therefore we also investigated the expression of EPHB2 at the protein level. EPHB2 was found in the cell membrane and the cytoplasm of the tumor cells. However, membranous EPHB2 and cytoplasmic EPHB2 were inversely correlated indicating different patient prognosis. Positive membranous EPHB2 was coupled to better prognosis while cytoplasmic EPHB2 was associated with shorter disease-free survival. This finding suggests that the EPHB2 cellular localization introduces another level of complexity.
In conclusion, we confirmed the clinical value of EPHA2, EPHB4 and EPHB6. We also suggest that EFNB1 and EFNB2 could be additional interesting candidates and revealed the clinical value of EPHB2 as a potential prognostic marker in breast cancer.

Expression of the EPH/EFN gene family (cohort 1)
Gene expression levels were quantified in the first patient cohort (Fig. 1). All analyzed genes expressed mRNA at detectable levels in the cell pool used as reference sample. More than 90% of the tumors expressed mRNA for EFNA1, EFNA2, EFNA3, EFNA4, EFNA5,  EFNB1, EFNB2, EFNB3, EPHA1, EPHA2, EPHA3,  EPHA4, EPHA7, EPHB1, EPHB2, EPHB3, EPHB4 and EPHB6. However, mRNA levels for EPHA5, EPHA6 and EPHA8 were detected in <40% of the tumors, and although mRNA for EFNA2 was present in most tumors it was poorly expressed with high variance. Relative mRNA expression levels of the analyzed genes, except for EPHA5, EPHA6, EPHA8 and EFNA2, are shown in Fig.  2A. EPHB1 showed the highest relative mRNA expression in the breast cancer samples and EPHA2 the lowest.

Cluster and statistical analyses (cohort 1)
Unsupervised hierarchical clustering was used to group the patients in cohort 1 according to their expression levels of the EPH/EFN gene family. In order to have a clinically homogenous cohort, only the 65 patients with lymph node infiltration were included. The hierarchical clustering divided the patients in two main clusters. The patients in the smaller cluster (n=22), generally expressed the EPHA2, EFNB1, EFNB2, EPHB2, EPHB1, EPHA4, EPHB6, EPHA1, EFNA4, EFNA1, EFNA3, EPHA7, EFNB3, EPHB4, EPHB3 and EFNA5 genes at higher levels in comparison with the patients in the larger cluster 1 (n=43) (Fig. 2B).
A categorical variable was assigned to each patient describing whether it belonged to the "high expression" cluster or cluster 2 (34% of the patients) or to the "low expression" cluster or cluster 1 (66% of the patients). Spearman Rank Correlation was then used to test the correlation between cluster groups and the expression levels of individual EPH genes. It was noted that the strongest correlation with the cluster groups was observed for EPHA2, EFNB1, EFNB2, EPHB2, EPHA4 and EPHB6 (P<0.000001) indicating that these genes are the most representative members of this cluster. However, no other known clinical variable was associated with the "cluster groups" categorical variable (Table 1). Among the EPH members, EPHB2 mRNA expression was positively associated with HER2 protein expression.

Survival analysis
Univariate Cox proportional Hazard Regression and the Gehan's Wilcoxon test (included in the Kaplan-Meier plots) were used to assess whether there were differences in recurrence-free survival time for the patients in the cluster 2 compared with the patients in the cluster 1. Four end-points were analyzed: distant recurrence-free survival, breast cancer-survival, local recurrence-free survival and total recurrence-free survival (time from surgery to development of local or distant recurrences) (Fig. 3A, 3C, 3E, 3G).
Furthermore, a multivariate Cox analysis (Table 2) showed the independent prognostic value of the cluster groups and EPHB2 with the covariates treatment, tumor size, HER2 protein expression and ERα. EPHB6, which also had a high impact on the cluster separation, had independent prognostic value in univariate analysis. However in multivariate analysis with the covariates EPHB2 and the cluster groups, only EPHB2 remained significant (Table 3 and Fig. 3B, 3D, 3F, 3H).
We next explored the potential clinical value of EPHB2 in four public datasets ( Fig. 4A-4D) finding that EPHB2 has prognostic value in other patient cohorts and even for patients without lymphnodal infiltration.
Therefore we continued exploring the role of EPHB2 at the protein level in a larger patient material (cohort 2, Fig. 1).

EPHB2 protein expression (cohort 2)
EPHB2 protein expression was determined by immunohistochemistry with a polyclonal rabbit anti-EPHB2 antibody raised against the recombinant EPHB2. A commercial cell lysate from HEK293 expressing the extracellular human EPHB2-Fc domain was used as positive control for the immunoblot. Also cell lysates from mouse brain and human colorectal cancer cells (HCT116, SW620) with reported EPHB2 expression, were used. Additionally we detected EPHB2 in MDA-MB-231, MDA-MB-468 and T47D breast cancer cells. The immunoblot (Fig. 5A) shows that the antibody recognizes a single protein band in all the samples including the HEK293 positive control. In the brain lysate the detected band matches the predicted molecular weight  and relative to the expression levels in the cell line pool. The red line shows y=1 which corresponds to the relative expression levels of the EPH/EFN genes in the cell line pool. The box plot shows expression levels for those EPH family members that were expressed at detectable levels in clinical samples. Hierarchical clustering showing that the patients were clustered into cluster 1 (n=43) with low to medium mRNA levels of the EPH family members and cluster 2 (n=22) with high mRNA expression B. Numbers below the heat map represent anonymous patient identification. The red inserts in the upper blue bar indicate total recurrences. The color key for the mRNA expression is blue for low to medium expression and red for high expression. Both graphs were built in R. slightly above 100 kDa. In the cancer cells, however, a 75kDa band was visualized. This band was also EPHB2 as proved by LC-MS/MS analysis (see Supplementary Methods, Supplementary Table S3 and Supplementary  Fig. S1). To further validate the antibody; in addition of using a blocking peptide (result not shown), the SW620 cells were selected to knock down the EPHB2 expression with siRNAs. Fig. 5B shows that the EPHB2 (75kDa) is detected by the rabbit polyclonal antibody in the control siRNA-treated cells but hardly in the EPHB2 siRNAtreated cells due to knockdown of the EPHB2 protein in these cells.
Also to prove that the anti-EPHB2 was specific and suitable for studies in paraffin embedded material, we used paraffin-embedded HCT116 cells pre-treated with control or EPHB2 siRNAs. A strong membranous staining was observed in the control cells (Fig. 5C) compared to a negative/weak EPHB2 signal in the knockdown cells (Fig. 5D).
Regarding the clinical material, some tumors were negative for EPHB2 (Fig. 6A) while other tumors showed cytoplasmic staining (Fig. 6B) or membrane staining (Fig.  6C). For the cytoplasmic staining, 60% of the tumors were classified as positive (scored as C>0), and 26% presented strong membranous staining (scored as M=2). Only 7% of the tumors presented nuclear staining (result not shown).

Survival analysis
Positive cytoplasmic EPHB2 expression was associated with poor patient survival while positive membranous EPHB2 was a good prognostic factor in the univariate analysis. High cytoplasmic EPHB2 expression predicted shorter distant metastasis-free survival, H.R. Multivariate Cox proportional hazard regression; adjusted for the well-known clinical variables ER, HER2, lymph nodes, tumor size and treatment; showed that membranous EPHB2 was an independent predictor of breast cancer-free survival in addition to lymph nodal status and tumor size. EPHB2 also predicted lower risk to develop metastasis with borderline significance (Table  5). Cytoplasmic EPHB2, albeit indicating higher risk for breast cancer death and metastasis, was not an independent prognostic factor in multivariate analysis.
EPHB2 did not have predictive value for patients randomized between radiation treatment (RT) and CMF. Although, patients with positive membranous staining in the tumor cells or negative cytoplasmic staining, did not received a clear benefit from RT compared to CMF in terms of local recurrences-free survival (Fig. 7).

DISCUSSION
In this study we distinguished a subgroup of patients more likely to relapse with local and distant metastasis and to have a shorter breast cancer free-survival time based on cluster analysis of the EPH/EFN mRNA expression levels. EPHA2, EPHA4, EFNB1, EFNB2, EPHB2, and EPHB6     with the software Aperio ImageScope v.12.2 and the panels were assembled in Adobe Photoshop CS5 extended v.12. www.impactjournals.com/oncotarget we identified two patient clusters: the patients within cluster 2 were characterized by high EPH expression and were more prone to relapse with local metastasis, distant metastasis and had a shorter breast cancer-free survival time. Especially, expression of EPHA2, EPHB2, EPHB6, EFNB1 and EFNB2 was strongly associated with the cluster variable and coupled to poor outcome. As mentioned before, EPHA2 has been coupled to poor patient survival [15,18,19] and to trastuzumab [15] and tamoxifen resistance [20]. Other receptors like EPHA7 failed to show any clinical value in this study in spite of its previous coupling to shorter recurrence and overallfree survival [16]. EPHA7 could be affected by promoter methylation, which explains its down regulation in human tumors [21]. Here, we also found low levels of EPHA7 expression, although detectable in more than 90% of the samples. In general we could detect most of the EPH family members except for EPHA5, EPHA6 and EPHA8 that were only present in less than 40% of the tumors. EPHA5 promoter methylation has also been reported. Low EPHA5 is associated with high tumor grade, lymph node metastasis and PgR negative status in breast cancer, indicating that down regulation of EPHA5 could be an important step in tumor progression [22]. Concerning the lower expression of EPHA6 and EPHA8, further studies are needed to unravel their significance. In our study, the lower mRNA levels of EPHA5, EPHA6 and EPHA8 were not further investigated, as the Ward's algorithm used in the cluster analysis did not take into account genes with more than 50% missing data. Multivariate analysis revealed that the cluster variable was an independent prognostic marker using the following covariates: treatment, tumor size, HER2 status and ERα. However, adding the EPHB2 to the multivariate analysis proved that the EPHB2 was the strongest prognostic factor for most of the survival endpoints. EPHB2 was one of the genes significantly coupled with the cluster variable and therefore chosen for confirmation in other patient cohorts.
The clinical role of EPHB2 in breast cancer is not well established. High EPHB2 protein expression has been associated with shorter overall survival [23]. The authors reported high EPHB2 cytoplasmic protein and mRNA expression in 51% of the tumors while we found high EPHB2 gene expression in 52% of the cases (cohort 1). Recently EPHB2 was found to be a target of TGFβ3mediated invasion and migration [24] which is in line with increased EPHB2 protein levels in invasive carcinomas. Also a recent model suggests that EPHB2 could mediate invasion in cells with defective apoptotic machinery via the pro-survival role of autophagy [25]. The EPHB2 invasive properties were kinase-dependent suggesting interactions with an ephrin ligand or another receptor. Indeed, promiscuous interactions between EPH receptors and their ligands with opposite outcomes regarding tumor progression have been reported [11,26]. For instance a recent study proposed that EPHB6 could decide the fate of the tumor by interacting with other receptors such as EPHB2 and EPHA2 [27]. EPHB6 is a kinase-dead receptor, which may sequester kinase functioning EPH's turning off the oncogenic signaling. Although we found that EPHB2 was the most promising candidate in our breast cancer cohorts, EPHB2 together with EPHB6, EPHA2, EPHA4 and the ligands EFNB1 and EFNB2 were important to define prognostic relevant clusters. This information allows speculating that EPHB2 should be studied in combination with these other factors. Especially the EPHB6, which seems to be coupled to invasion upon re-expression in breast cancer cell lines [28,29] and to adverse prognosis in breast cancer [16] although the EPHB6 gene seems to be methylated in cancer. To our knowledge, the data regarding the clinical value of EFNB1 and EFNB2 is scarce although high ephrin-B1 protein expression seems be involved in the development of brain metastasis from the primary breast tumor [30] and shorter patient overall survival [31]. Furthermore, the prognostic value of EPHB2 mRNA levels could be statistically demonstrated in the Van de Vijver [32], Uppsala [33] and Karolinska [34] datasets while a trend was seen in the Esserman Perou cohort [35]. We also assessed EPHB2 protein expression in a larger and randomized patient material (cohort 2). EPHB2 was mainly located in the cytoplasm (60% of the tumors) and the cell membrane (26%) although 7% of the tumors presented nuclear staining. Cytoplasmic expression was inversely associated with membranous expression and positively correlated with HER2 protein expression in agreement with the results from cohort 1. Cytoplasmic EPHB2 was also positively correlated with high Nottingham Grade. Cytoplasmic EPHB2 predicted shorter breast cancer survival and tended to indicate shorter metastasis-free survival in univariate analysis. However, membranous EPHB2 was not associated with any known clinical variable and resulted a good prognostic indicator for breast cancer survival in both univariate and multivariate analysis and metastasis-free survival in univariate analysis. These findings suggest that it might be important to make a distinction between cytoplasmic and membranous EPHB2 previous to taking clinical decisions.
Regarding the nuclear localization, EPHB2 has not been reported before as a nuclear protein. However, according to the online tool NLStradamus with default prediction cutoff of 0.5, the EPHB2 has a NLS between aa 1017-1033 corresponding to the aa sequence GKKKGMGKKKTDPGRGR. Otherwise, EPHB4 has been detected in the nucleus of prostate cancer cells [36] and other authors assure that presence of receptor tyrosine kinases in this cell compartment is possible [37] through several mechanisms including receptor internalization upon ligand binding and enzymatic cleavage. According to the free-prediction algorithm PsortII [38], EPHB2 could be present in the Golgi apparatus, endoplasmic reticulum and cell membrane in line with our results. Specificity of the EPHB2 antibody might be an issue. However, the antibody used in this study was extensively validated using several techniques. Still it could be important to consider that EPHB2 have several transcript variants and could undergo posttranslational modifications affecting both protein function and cellular localization. These factors should not be underestimated.
Finally we also took advantage of the randomized study and tested the EPHB2 predictive value finding an inverse trend between membranous EPHB2 expression and response to radiotherapy.
In summary we found that the EPH receptors and the ephrin ligands are potential clinical candidates. Especially EPHA2, EPHA4, EFNB1, EFNB2, EPHB2, and EPHB6 and their co-expression in breast cancer. EPHB2, although poorly investigated, has shown to be a promising prognostic marker in breast cancer but more studies on its protein expression and localization are still encouraged.

Clinical materials
All tumor samples were collected during the Stockholm clinical trial (1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990) [39]. The trial included premenopausal and postmenopausal women with a unilateral, operable breast cancer. The surgery procedure was modified radical mastectomy. Further inclusion criteria were either histologically verified lymph node metastasis or a tumor diameter, exceeding 30 mm, measured on the surgical specimen. Patients received either adjuvant chemotherapy or radiotherapy, and both groups were randomized to tamoxifen or no endocrine treatment. Tamoxifen was administered postoperatively at a dose of 40 mg daily for 2 or 5 years. Patients in the chemotherapy group received 12 courses of cyclophosphamide, methotrexate, 5-fluorouracil (CMF) according to the original Milan protocol (100 mg/m 2 cyclophosphamide orally at days 1-14, 40 mg/m 2 methotrexate and 600 mg/m 2 5-fluorouracil intravenously on days 1 and 8). However, in the first 18 months of the trial, 10-15 mg chlorambucil was administered orally on days 1-8 instead of cyclophosphamide and to avoid dose reductions up to 18 months treatment time was allowed for the 12 courses. Patients randomized to radiation treatment (RT), received a dose of 46 Gy with 2 Gy per fraction 5 days a week. Total treatment time was about 4.5 weeks and the target volume included the chest wall, the axilla, the supraclavicular fossa and the internal mammary nodes. In this study we included two patient cohorts from the Stockholm trial (Fig. 1): Cohort 1: originally comprised 679 postmenopausal patients. From those tumor tissue was available from 282 and RNA from 90 patients. In this study we included 70 patients with good RNA quality and from these, 65 patients with lymph nodal infiltration were analyzed. Some clinical variables used here were described in previous studies: ER [39], ERBB2 gene amplification [40], S-phase fraction, HER2 protein levels [41], pAKT [41]. Cohort 2 was initially composed of 547 premenopausal patients and from these, 216 patients, with available tumors, were included. These tumors were paraffin embedded and available on TMA allowing detection of protein expression by immunohistochemistry. The characteristics of the patients included in cohorts 1 and 2 did not significantly differ from all patients included in the Stockholm trial (Supplementary Table S1). The retrospective studies on tumor tissues have been approved by the Research Ethics Committee at the Karolinska Institute (dnr 97-451), with amendments.

Cell lines
Breast cancer cell lines: MDA-MB-231 (HTB-26), MDA-MB-468 (HTB-132) and T47-D (HTB-133) and colorectal cancer cells: HCT116 (CCL-247) and SW620 (CCL-227) were purchased from the American Type Culture Collection (ATCC) and tested for mycoplasma using the PCR Mycoplasma Test Kit I/C from PromoKine (PromoCell GmbH, Germany). Breast cancer cells were cultured in Dulbecco's Modified Eagle's Medium supplemented with 4% fetal bovine serum and penicillin and streptomycin. SW620 cells were cultured in Eagle's Minimum Essential Medium supplemented with 2 mM L-glutamine, 10% Fetal Bovine Serum (FBS) and HCT116 cells in Mc Coy's medium supplemented with 10% FBS. The mouse brain cell lysate was a kind gift from Ravi Kumar Dutta.

ANTIBODY VALIDATION siRNA and immunoblot
The HCT116 and SW620 cells were transfected with a pool of EPHB2 siRNAs or a negative control siRNA in the Amaxa Nucleofector 2B and the Nucleofection Mix Solution V (Lonza) following manufacturer's instructions. Transfected cells were harvested after 7 days. siEPHB2 Silencer Select s4740 + s4741 were pooled at 300nM (ThermoFisher Scientific). The AllStar Negative Control (SI03650318, Qiagen) was also used at 300nM. Upon transfection, cell lysates were prepared in RIPA buffer containing proteases inhibitors (Complete Mini, Roche) and the protein concentration was measured with the Bicinchoninic Acid Assay (Pierce Biotechnology). Total proteins (30 μg/well) were loaded in the gel. For the immunoblot, primary antibodies, rabbit anti-EPHB2, 1:500 (Cat # AP7623d, Nordic BioSite) or anti-beta actin, 1:1000 (Cell Signaling) were diluted in blocking buffer (TBS-0.1% Tween20/5% milk) and incubated at 4°C overnight. The secondary antibodies (DAKO) conjugated with horseradish peroxidase (HRP) were incubated for 1 h at room temperature. Proteins were visualized with HyGLO chemiluminescent HRP-antibody detection reagent and developed with BioMax light film (Carestream Health).

Paraffin embedding
HCT116 cells at 80-90% confluence were harvested and the pellet fixed with 4% formaldehyde at room temperature for 25 min. The cells were stained with hematoxylin and centrifuged at 1200 rpm for 2 min followed by progressively dehydration in ethanol at 70% (overnight), 95% (1 h) and 99.5% (1 h). Finally, xylene was added to the pellet for 30 min and after centrifugation the cell pellet was paraffin embedded at 56°C overnight. The embedded cells were cut in 4 μm slices, using a microtome and the slides stained following the immunohistochemistry protocol described below.

Gene expression profile
Quantitative real-time RTPCR was performed using aTaqman ® Array Micro Fluidics Cards (Applied Biosystems, Life technologies, UK) that included 21 EPH family members: EPHA1-A8, EPHB1-B4 and EPHB6, EFNA1-A5, EFNB1-B3; and two endogenous controls: glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and hypoxanthine phosphoribosyltransferase 1 (HPRT1) (details of the array are compiled in Supplementary Table  S2). Each card included 8 samples: 7 breast tumors and an internal standard consisting of a pool of 7 cell lines. Samples were run in duplicates.
A cDNA equivalent of 200 ng RNA was adjusted to 52 μL with RNase free water and mixed with 51μL of TaqMan ® Universal Master Mix II with uracil-DNA glycosylase (UNG) (Applied Biosystems, Life technologies, UK). The mixture was loaded into one slot of a TaqMan ® Array Micro Fluidics Card. The PCR reaction was run in a 7900HT Fast time PCR system (Applied Biosystems, Life technologies, UK).
Relative mRNA expression levels of target genes within a sample was calculated with the ΔΔC T method [42] using RQ manager version 1.2 (Applied Biosystems, Life technologies, UK). The cell line pool was used as reference sample, and the HPRT1 gene was chosen as endogenous control due to its low expression variation as confirmed with the geNorm algorithm embedded in the StatMiner version 4.2 software (Integromics, Spain). Non-amplified wells and duplicates with SD>0.5 were omitted in the RQ manager.

Immunohistochemistry (cohort 2)
Tissue microarray (TMA) slides including 216 breast cancer patient samples from the Stockholm trial were incubated for 2 hours at 60°C prior to deparaffinization and antigen retrieval in a PT-Link system (DAKO, Denmark). Antigen retrieval was performed at pH 6.0 for 20 min at 97°C. A washing buffer, consisting in TBS-0.1% BSA, was used previous to inactivation of endogenous peroxidase in 3% H 2 O 2 for 10 min. Unspecific binding was blocked with serum-free protein block (Background Sniper, Biocare Medical) for 10 min in a moisture chamber. The rabbit anti-EPHB2 antibody (1:300) was incubated overnight at 4°C. The HRP conjugated-secondary antibody (Envision+System-HRP Labelled-Polymer anti Rabbit, DAKO, Ref#4002) was incubated for 30 min and the chromogenic agent and substrate was a DAB/H 2 O 2 solution. Cell nuclei were counterstained with Mayer's Hematoxylin prior to stepwise dehydration with ethanol, 40%, 70%, 95%, 99.5% and tissue clear. The TMA slides were mounted with Pertex and images were acquired with an Aperio Scanscope AT Turbo (Leica Biosystems) with 20x/0.75 NA Plan Apo and with 20X magnification. The software Aperio ImageScope v.12 was used for image analysis.

IHC scoring
Staining was evaluated on three separate core biopsies by two individual observers blinded to the clinical data. The sections were re-evaluated upon disagreement. EPHB2 was mainly visualized in the cell membrane and the cytoplasm. Few tumors also presented nuclear staining.
The cytoplasmic (C) and membrane (M) staining were based on intensity (negative =0, weak=1 and strong=2). The cut off for positive cytoplasmic staining was C>0 and for positive membranous staining, M=2.

Statistical analyses
The statistical analyses of relative mRNA expression levels in cohort 1 were performed in R version 3.0.2 (R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project. org/). Only patients with lymph node infiltration were included in the statistical analyses, and previous to the analysis the data was cleaned by only including the genes with detected expression in more than 60% of the tumor samples (42/70).
Hiearchical clustering of relative mRNA expression levels was performed on scaled data with mean value = 0 and standard deviation = 1 using the Complete linkage method and Eucledian distance. Cox Proportional Hazard regression was used in univariate and multivariate analyses to test if relative mRNA expression levels correlated with the endpoints breast cancer-survival (period from surgery until death due to breast cancer is reported), local recurrence-free survival (time from surgery until local recurrence is detected) and metastasis-free survival (time from surgery until distant metastasis is detected). Patient survival was represented with the Kaplan-Meier plots.
The statistical analysis of EPHB2 protein expression in cohort 2 was performed with Statistica 64 version 12.0 software (StatSoft. Inc, USA). Relationship with known clinical variables in breast cancer was tested with the Spearman Rank correlation test. Cox regression was used in univariate and multivariate analyses to test if there was an independent association between EPHB2 protein expression and the presence of distant metastases, local metastasis or death due to breast cancer. The survival analysis to estimate probabilities for metastasis-free survival (time from surgery until distant metastasis is detected), local recurrence-free survival (time from surgery until local recurrence is detected) and breast cancer-free survival (period from surgery until death due to breast cancer is reported) were calculated by comparing survival in multiple samples and represented with the Kaplan-Meier plots. When needed significance was set to p-value P<0.01 to compensate for multiple comparisons.

Public gene expression datasets
The EPHB2 results were validated in the following gene expression datasets: van de Vijver (n=295) [32], Uppsala (GSE3494, n=236) [33], Karolinska Institute (KI) (GSE1456, n=159) [34] and Esserman, Perou (GSE22226, n=147) [35]. For the statistical analysis gene expression data were divided into quartiles (q) where q1-3 was defined as low expression and q4 was high expression (Van de Vijver and Uppsala) or q1 was low vs. q 2-4 high (Esserman-Perou and KI). When several probes were used to detect EPHB2 mRNA expression (KI and Uppsala) and the probes were positively correlated, the average of the gene expression data was used for the analysis.