Difference between observed and expected number of involved lymph nodes reflects the metastatic potential of breast cancer independent to intrinsic subtype.

Purpose Poor prognosis associated with metastasis in breast cancer patients highlights the critical need to develop an effective evaluation model for metastatic potential (MP). We hypothesized that MP could be also indicated by primary tumor size and involved lymph nodes (LNs). Methods The expected number of involved LNs is defined as tumor size (cm) divided by 1.5. The effect of the surrogate for MP (defined as difference between the number of observed and expected involved LNs) on breast cancer-specific survival (BCSS) was investigated in the first cohort from SEER (n = 108,814). Validation was performed in another SEER cohort (n = 50,414) and a third cohort (n = 3,755). Results MP is an independent predictor for BCSS in the overall population [hazard ratio (HR) for high MP: 2.92; 95% confidence interval (CI): 2.80–3.03] and in subgroups. The effect of surrogate for MP on survival was independent to intrinsic subtype, with adjusted HRs of 3.46 (95%CI, 2.02–5.93), 2.30 (95%CI, 1.64–3.24), 4.05 (95%CI, 2.85–5.76), and 1.45 (95%CI, 1.04–2.03) in luminal-A, luminal-B, triple-negative, and HER2-positive subtypes, respectively. Conclusion Difference between the observed and expected number of involved LNs serves as an indicator for MP, which is independent to intrinsic subtype and could predict survival. Our findings need further validation.


INTRODUCTION
Breast cancer mortality is typically linked with distant metastasis, which is the most lethal type of recurrence and practically undetectable at the time of diagnosis. [1,2] The poor prognosis associated with metastasis highlights the critical need to better understand the biology of breast cancer metastasis and to develop an accurate evaluation model for metastatic potential (MP) after surgery for patients with operable disease.
With the development of intrinsic molecular subtype, clinicians use the surrogate of intrinsic subtype to determine MP. Intrinsic subtype correlates well with differences in prognosis, tumor aggressiveness, and response to available therapies. [3][4][5] Compared with luminal-like (luminal-A and luminal-B) tumors, HER2-positive and triple negative breast cancer (TNBC) are more likely to exhibit higher MP and are associated with markedly worse survival. Unfortunately, it is difficult for clinicians to discriminate different levels of MP within one subtype. Of note, emerging evidence demonstrates that very small tumors with extensive lymph node (LN) involvement can exhibit highly aggressive behavior compared with larger ones; [6] similarly, in the absence of LN involvement, large tumor size can be a surrogate for biologically indolent disease. [7] Consequently, the integrated understandings of LN involvement with tumor size might provide some useful information on tumor biology independent to intrinsic subtype.
We hypothesized that, within a certain subtype, the MP of breast cancer could be determined by the difference between the number of observed and expected involved LNs. First, we measured the quantitative ratio between tumor size and the number of involved LNs in the overall population. Using this ratio, we determined the expected number of involved LNs based on a given tumor size. The difference between the observed and expected numbers of involved LNs might serve as a surrogate for MP, which is proved closely associated with distance disease-free survival (DDFS) and breast cancer-specific survival (BCSS). To perform high-powered statistical analysis, we used the National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) cancer database for testing and validation. We further validated our findings in another independent cohort from Fudan University Shanghai Cancer Center (FDUSCC).

The first cohort from SEER for model establishment and the survival test
In the first cohort, we selected female patients with invasive breast cancer from the SEER database (released in Nov 2012) from Jan-1 1997 to Dec-31 2006. Patients diagnosed after 2006 were excluded to ensure an adequate follow-up time.
Initially, we identified 111,321 patients according to the following inclusion criteria: female, age of diagnosis between 18 and 74 years, surgical treatment with either mastectomy or breast-conserving surgery, AJCC TNM stages I-III, pathologically confirmed invasive ductal carcinoma, at least four axillary LNs dissected, unilateral cancer, known time of diagnosis, breast cancer as the first and only cancer diagnosis, known number of involved LNs, and known tumor size. The following information was also obtained if available: estrogen receptor (ER) and progesterone receptor (PgR) status, histological grade, race, and use of radiotherapy. A few cases with borderline values of ER/PgR were treated as positive because, according to current standards, ER and PgR status is considered positive if there are at least 1% positive nuclei. [8] Although SEER provides HER2 status from 2010 and the subtype of each case could be determined after that time, the follow-up time for survival is inadequate yet. SEER did not provide information on chemotherapy and endocrine therapy. There were very few cases with extreme values of tumor size (1.3% of cases were larger than 8.0 cm) and numbers of positive LNs (1.0% of cases had more than 18). To minimize the influence of extreme values, we excluded these cases. In total, 108,814 cases composed the first cohort.
The primary study outcome was BCSS. The cause of death was categorized as breast cancer-specific or nonbreast cancer-related. BCSS was calculated from the date of diagnosis to the date of breast cancer death. Patients who died from other causes were censored at the date of death.

The second cohort from SEER for survival validation
Using the criteria above, we selected 50,414 patients with invasive ductal breast cancer from SEER between Jan-1 1990 and Dec-31 1997 to validate the preliminary findings in the first cohort. Patients diagnosed before 1990 were excluded because of the lack of hormone receptor data. Because early cases might exhibit different distributions in tumor stage compared with those at present, [9] we did not use this set for linear regression to calculate the ratio of tumor size to the number of involved LNs.

The third cohort from FDUSCC for survival validation
To further validate the findings from the SEER dataset, to determine a direct relationship between the MP category and distant metastasis, and especially to test the performance of MP category in a certain subtype, we used the data from 3,755 consecutive patients diagnosed with operable unilateral breast cancer between Jan-1 1998 and Dec-31 2006 at FDUSCC. This is a well-characterized series of patients, whose clinicopathologic and follow-up information were maintained on a prospective basis. [10] Patients' treatments were based on St. Gallen consensus. [11,12] The cut-off for ER or PgR positivity was ≥ 10% of tumor cells with nuclear staining. Pathologic HER2 status was defined according to ASCO/CAP guidelines. [13] The primary treatment for all of these patients was surgery. Intrinsic subtypes (luminal-A, luminal-B, TNBC, and HER2-positive) were determined according to the clinicopathologic criteria recommended by the St Gallen panelists. [14] In brief, luminal-A is ER/PgR positive, HER2 negative, and Ki-67 low (< 14%); luminal-B is ER/PgR positive, and HER2 positive or Ki-67 high; TNBC is ER, PgR, and HER2 negative; HER2-positive is ER/PgR negative and HER2 positive. Because information on Ki-67 was not available in earlier cases, we used grade to capture cell proliferation, as suggested by von Minckwitz et al. [15] The outcomes of interest were DDFS, which was calculated from the date of diagnosis to the date of first distant metastasis. To determine distant relapse events, isolated local recurrence was further followed until a metastasis event. The research protocol of this part of our study was reviewed and approved by the Ethical Committee and Institutional Review Board of FDUSCC. All patients provided written informed consent.
The basic characteristics of the patients in the three cohorts are presented in Table 1.

Statistics
The median follow-up times were 92, 174, and 79 months for the first, second, and third cohort, respectively; we therefore reported the 8-year, 15-year, and 6-year rates of survival, respectively.
In the first set, we performed an exploratory analysis of the relationship between tumor size and the number of involved LNs (both as continuous variables). Because we could not rule out a nonlinear relationship, we regressed the number of involved LNs on tumor size using nonparametric regression based on either locally weighted scatterplot smoothing (LOWESS; the command "lowess" in Stata) [16] or Kernel-weighted local polynomial smoothing method (the command "lpoly" in Stata). [17] If a linear relationship was revealed, linear regression was employed to determine the quantitative relationship between tumor size and the observed number of involved LNs. Linear regression was performed with or without adjustments for other clinicopathologic factors. By this procedure, the expected (predicted) number of positive LNs of each case could be calculated. The continuous variable of the difference between the observed and expected number of involved LNs was used as the surrogate for MP. The nonlinear effect of continuous values of the surrogate of MP on BCSS was assessed using a B-spline transformation with evenly spaced knots. High MP was defined as a difference between the observed number and expected number (the former minus the latter) greater than or equal to 1; otherwise, the MP was considered to be low/normal. Survival curves were constructed using the Kaplan-Meier method, and the univariate survival difference was determined by log-rank test. Survival time was estimated using life-table method. Adjusted hazard ratios (HR) with 95% confidence interval (CI) were calculated using the Cox proportional hazards models. All of the statistical analyses were performed using Stata v.10.0 (Stata Corporation, College Station, TX). Two-sided P < 0.05 was considered statistically significant.

A linear relationship between tumor size and the number of involved axillary LNs in the first cohort from SEER
Clinicians use the TNM staging system, which provides a description of the extent and spread of a tumor, to determine the severity of disease. Specifically, the TNM  Figure 1A) or the local polynomial smoothing method ( Figure 1B)  each case. For instance, a patient with a 1.5-cm breast tumor theoretically had 1 (1.5/1.5) positive node; when the tumor further progressed to 3.0-cm with no treatment, theoretically 2 (3.0/1.5) regional LNs should be involved.

The difference between the observed number and the expected number of involved LNs serves as a significant surrogate for MP
Because the ratio of tumor size to the number of involved LNs was approximate 1.5 in the overall population, we calculated the expected number according to tumor size for each case. The relationship between the continuous value for numerical difference (observed value minus expected value) and 8-year BCSS is illustrated in Figure 2A, which reveals a pattern of comparable 8-year BCSS when the value for the numerical difference is less than 1, with a peak at the value of -1; when this value is greater than 1, the increasing value is related to remarkably decreasing BCSS. Accordingly, we determined that, if a patient's numerical difference (observed number minus expected value) was greater than or equal to 1, she was assigned to the high MP group; otherwise, the patient was assigned to the low/normal MP group.
We then studied the prognostic value of categorical MP. In univariate analysis, a survival difference was noted between the high MP and low/normal MP groups, with 8-year BCSS rates of 74.1% (95% CI, 73.4-74.7%) and 91.1% (95% CI, 90.9-91.3%), respectively ( Figure 2B). The unadjusted HR for high MP was 3.20 (95% CI, 3.09-3.32) relative to low/normal MP (Table 2). In multivariate analysis using the Cox model, after adjustment for other prognostic indicators, the HR of high MP was 2.92 (95% CI, 2.80-3.03; Table 2). Moreover, the prognostic significance of this surrogate of MP persisted in each subgroup stratified by other prognostic factors ( Figure 2C), even in each subgroup stratified by tumor size (Table 2).     Shanghai Cancer Center; HER2, human epidermal growth factor receptor-2; HR, hazard ratio; MP, metastatic potential; N.A., not applicable; SEER, Surveillance, Epidemiology and End-Results registry; TNBC, triple negative breast cancer * log-rank test # in SEER sets, HR was adjusted for age at diagnosis, race, grade, ER, and radiotherapy. Cox regression model (method: backward, likelihood ratio) is employed to calculate HR. www.impactjournals.com/oncotarget

Validation of MP in the third cohort from FDUSCC
In this cohort, the endpoint of survival analysis was distant metastasis. Consistent with observations in SEER, higher MP was associated with an increased risk of distant metastasis (Kaplan-Meier curves shown in Figure 3B; adjusted HR for DDFS was 2.60; 95% CI, 2.11-3.18). Because information concerning the intrinsic subtype was available in our dataset, we analyzed each subtype separately and found that MP was a prognostic factor independent of the subtype (Table 2). In luminal-A, luminal-B, TNBC, and HER2-positive subtypes, the high MP group had an adjusted HR of 3.46 (95% CI, 2.02-5.93), 2.30 (95% CI, 1.64-3.24), 4.05 (95% CI, 2.85-5.76), and 1.45 (95% CI, 1.04-2.03), respectively, when compared with low MP. The value of adjusted HR for HER2-positive group was relative lower than those in other subtypes, which might be because about 50% cases of HER2-positive cases received adjuvant trastuzumab.

DISCUSSION
In the present study, we sought to determine whether there was a clinicopathologic surrogate for the MP of breast cancer. We hypothesized that the difference between the expected number of involved LNs according to tumor size and the actual number of involved LNs might serve as an indicator for MP and thus could predict survival. Using a large-population cohort from SEER, we identified a linear relationship between tumor size and the number of involved LNs, and we subsequently established a prediction model for the expected number of involved LNs at a given tumor size. By this procedure, we classified patients into a high MP or low/normal MP group according to the difference between the observed and expected number. After adjustment for other prognostic factors, patients with high MP had a higher likelihood of death from breast cancer compared with those with low/normal MP. The prognostic effect was subsequently successfully validated in other two cohorts and proved to be independent to intrinsic subtypes. To the best of our knowledge, the influence of differences between the observed and expected numbers of involved LNs on MP has not yet been proposed. The conventional view of cancer spread is that cancer gains metastatic ability through the accumulation of mutations as the tumor grows to a large size. [3] In overall population, it is an indisputable fact that size of the primary tumor is positively related to LN involvement, suggesting that MP evolves as the tumor grows. [19] However, there is increasing awareness of tumor biology in predicting patient outcome. A growing body of literature demonstrates that distant metastasis could be, to some extent, determined by the intrinsic biology of breast cancer rather than the local disease severity. [20,21] Clinically, the abnormal relationship between tumor size and involved LNs suggests varied tumor biology. [6,7] Currently, there are limited numbers of clinicopathologic markers to assess the MP of breast cancer. Conventional TNM staging does not work well to determine MP, and the intrinsic subtype cannot further discern MP subgroups within one subtype.
We developed a quantitative marker, the numerical difference between the observed and expected numbers of involved LNs, to reflect different levels of MP. The patients with values less than -3 (mainly T 3 tumor with negative nodes) had slightly lower BCSS compared with those with value of -3 to 1. In contrast, once the value exceeded 1, the survival curve began to decrease in a monotonic pattern. For a feasible evaluation, we arbitrarily divided patients into two classes, high or low/normal MP with a cutoff value at a numerical difference of 1. It should be noted that our model might have limited predictive capability in larger tumors, as tumors with numerical differences of -5 to -3 exhibit comparably poor survival relative to tumors exhibiting values of 1 to 3. In contrast, our model is excellent for the personalized evaluation of MP in T 1-2 tumors with extensive LN involvement. For instance, a patient with a 1-cm tumor and 10 positive LNs and another patient with a 2.5 cm tumor and 10 positive LNs share the same pathological TNM stage. However, the value for the numerical difference (observed number minus expected number) differs. Our model predicts better survival for the patient with the 2.5 cm tumor and 10 positive LNs. In SEER set between 1998 and 2006, we identified 12 cases with 1-cm tumors and 10 involved LNs and 70 cases with 2.5-cm tumors and 10 involved LNs. The actual survival outcomes demonstrate that 7 of 12 (58%) and 21 of 70 (30%) died from breast cancer in the former and latter groups, respectively, in accordance with the predicted MP levels.
Taken together, the surrogate marker of MP derived from tumor size and number of involved LNs provides us a simple but effective tool to determine the potential for distant metastasis. Notably, the linear relationship between tumor size and the number of involved LNs is independent of race, implying that our findings, originally from a western population, could be extrapolated to Asian and other populations. Indeed, successful validation in a Chinese population supports this assumption and warrants the worldwide use of this model.
Our study had several limitations. First, the SEER database lacks several important variables such as HER2 status, adjuvant chemotherapy, and recurrence type. We could not adjust for more confounding factors, nor could we directly investigate the effect of surrogates of MP on DDFS. Second, our study was limited to invasive ductal histology; thus, our findings cannot be extrapolated to other histology types. Moreover, our model should be used with caution for cases with large tumor sizes. The current model based on tumor size and involved number of nodes seems to have limited capability to predict the survival of very larger tumor with negative nodes disease. Although the potential for distant metastasis may not be large enough for large tumors with negative nodes, [7] the local tumor burden is heavy, and the likelihood of recurrence would be much higher, which could negatively affect survival. [22] Despite these limitations, our data represent the most robust evaluation of the effect of tumor size and the number of involved LNs on MP in breast cancer.
In conclusion, our study reveals that differences between the expected number of involved LNs according to tumor size and the observed number of involved LNs might serve as an indicator for MP, and this surrogate for MP could predict survival. We introduced a simple but effective tool to determine breast cancer MP by comprehensively understanding the relationship between tumor size and the number of involved LNs. A deeper understanding of the biology of breast cancer using common clinicopathologic factor-based tools would certainly help clinicians to predict distant metastasis and provide personalized systemic therapies.