A nomogram to predict the probability of axillary lymph node metastasis in female patients with breast cancer in China: A nationwide, multicenter, 10-year epidemiological study

Axillary lymph node dissection (ALND) or sentinel lymph node biopsy (SLNB) alone may lead to postoperative complications. Among patients with positive ALN in the preoperative examination, approximately 40% patients do not have SLN metastasis. Herein, we aimed to develop a model to predict the probability of ALN metastasis as a preoperative tool to support clinical decision-making. We retrospectively analyzed the clinicopathological features of 4211 female patients with breast cancer who were diagnosed in seven breast cancer centers representing entire China, over 10 years (1999-2008). The patients were randomly categorized into a training cohort or validation cohort (3:1 ratio). Multivariate logistic regression analysis was performed for 1869 patients with complete information on the study variables. Age at diagnosis, tumor size, tumor quadrant, clinical nodal status, local invasion status, pathological type, and molecular subtypes were the independent predictors of ALN metastasis. The nomogram was then developed using the seven variables. Further, it was subsequently validated in 642 patients with complete data on variables in the validation cohort. Coefficient of determination (R2) and the area under the receiver-operating characteristic (ROC) curve (AUC) were calculated to be 0.979 and 0.7007, showing good calibration and discrimination of the model, respectively. The false-negative rates of the nomogram were 0 and 6.9% for the predicted risk cut-off values of 14.03% and 20%, respectively. Therefore, when the predicted risk is less than 20%, SLNB may be avoided. After further validation in various patient populations, this model may support increasingly limited axillary surgery in breast cancer.


INTRODUCTION
Breast cancer is the most common malignancy in women, accounting for 25% of all female cancer cases and 15% of all cancer-related deaths [1]. Recently, breast cancer incidence has plateaued [2]. The metastasis status of axillary lymph nodes (ALN) is an important factor affecting the prognosis of patients with breast cancer, a major component of breast cancer staging, and an important basis for designing treatment programs [3][4][5]. The sentinel lymph-node biopsy (SLNB) has been rapidly replacing ALN dissection (ALND) to become the standard surgical procedures for early breast cancer patients with clinical negative axillary lymph nodes [6][7][8] .
Although SLNB is landmark progress in the field of surgery and can avoid unnecessary ALND for patients, a discussion of the disadvantages related to SLNB should not be omitted. Because of the need to assess the pathological state of SLN during surgery, this procedure is time consuming and expensive. Besides, although it causes less damage than ALND [9,10], SLNB involves a certain degree of side injury, including upper limb edema, shoulder and back pain, arm numbness [11], weakened shoulder, and reduced arm strength [12]. Therefore, with either ALND or SLNB alone, postoperative associated complications may occur. In addition, studies have reported that among the patients with absence of ALN on preoperative clinical examination, more than 60% patients do not have SLN metastasis. Further, even in the patients with ALN on preoperative clinical examination, approximately 40% patients do not have SLN metastasis [13]. Therefore, it is important to screen patients with ALN and identify patients without SLN metastasis before surgery in order to avoid unnecessary SLNB. To this effect, researchers are attempting to determine methods to avoid unnecessary SLNB or ALND.
Medical Centers outside China have published models to predict the ALN status in patients. For example, the Memorial Sloan Kettering Cancer Center (MSKCC) developed a nomogram that was used for preoperative prediction of SLN status and prediction of non-SLN status when SLN was present. This model was verified in many medical centers and widely accepted, as it helped clinicians decide the surgical procedure required for regional lymph nodes before the surgery. However, there were several inconsistencies in the verification results between populations, which could have occurred due to differences in race, social and cultural background, level of economic development, level of medical care, and many other factors [14][15][16][17][18][19].
Prediction models are often built using clinical and pathological data of a specific population. Therefore, when used to predict disease in another group of people, their predictive value is limited. To our knowledge, the report on the establishment of prediction model of ALN metastasis in China was few at present, and these models included data from single-center studies that were not fully representative of the entire population of China.
Therefore, in this study, we aimed to (1) represent the entire population of China by retrospectively analyzing relevant medical records of female patients with breast cancer who were diagnosed over a period of 10 years, (2) determine the risk factors of ALN metastasis in breast cancer, and (3) build a prediction model of ALN metastasis in breast cancer in order to help clinicians in the decisionmaking process.

Clinicopathologic features and grouping of patients
Of the 4,211 patients, 3158 were included in the training cohort and 1053 were included in the validation cohort in a 3:1 ratio. The clinical and pathological data of the patients between the two groups did not differ significantly (p > 0.05), which was consistent with the randomization. Among patients who underwent SLNB and ALND, 48.74% (1426/2926) had ALN metastasis in the training cohort and 49.59% (483/973) had ALN metastasis in the validation cohort (Table 1).

Univariate logistic regression analysis of ALN metastasis in the training cohort
Univariate logistic regression analysis was used to explore ALN metastasis-related variables ( Table 2) and showed that age, tumor size, primary tumor quadrant, clinical nodal status, local invasion status, pathological type, ER status, and molecular subtypes were related to breast cancer ALN metastasis (p < 0.05).

Processing of missing data and multivariate logistic regression analysis of ALN metastasis in the modeling group
Because of the longer duration of data collection, a large amount of data and collecting information, partial data were missing. We found no significant difference in the clinical and pathological features of patients with missing data between the two groups (p > 0.05; Table 3    subtypes were included in the training cohort and validation cohort, respectively ( Figure 2). Multivariate analysis confirmed that age, tumor size, primary tumor quadrant, clinical nodal status, invasion of the chest wall and skin, pathological type, and molecular subtype were independent predictors of ALN metastasis ( Table 4).

Establishment of a prediction model for ALN metastasis
According to the results of multivariate analysis, the following seven variables were included in the prediction model of ALN metastasis: age, tumor size, primary tumor quadrant, clinical nodal status, local invasion status, pathological type, and molecular subtypes. The weights of each variable in the model corresponded to different points ( Figure 3). Points for the following factors were added to the total points, which corresponded to the linear predictors and risk predictors of ALN metastasis (  where "p" represents the risk of ALN metastasis, "a" represents age at diagnosis, "b" represents tumor size (b2. T2; b3.T3), "c" represents tumor site (c1.UIQ; c2.UOQ; c3.LIQ; c4.LOQ; c5.others), "d" represents local invasion; "e" represents clinical lymph node status, "f" represents pathological type (f1.IDC; f2.ILC; f3.others), and "g" represents the molecular subtype (g1.LM; g2. HER2 +). This model was retrospectively utilized for patients in the training cohort (n = 1869), with an AUC value of 0.7157 ( Figure 4), suggesting that it had a good predictive ability.

Prospective applications of the prediction model of ALN metastasis
This prediction model of ALN metastasis was prospectively used for patients in the validation cohort. It depicts the ROC curve, and the AUC value calculated was 0.7007 ( Figure 5), indicating a good predictive ability. As seen in Figure 6, the curvilinear trend of predicted values and the real value was the same; there was no significant deviation, indicating that the predicted risk of ALN metastasis was consistent with the actual metastasis risk. The coefficient of determination represented the accuracy of model, and the R² value of the model was 0.979, suggesting good calibration. On further evaluation of the clinical value of the model using cutoffs, we found that when the cutoff values of 14.03% and 20% were considered, the false-negative rates of model were 0 and 6.9%, respectively (Table 5).

DISCUSSION
Breast cancer ALN status is a key factor in deciding the therapeutic options for patients and affects the prognosis of patients [3][4][5]. With more in-depth research on breast cancer, researchers consider ALND important for lymph node staging but has a small significance for treatment [28]. Therefore, if the ALN status of patients with breast cancer can be assessed in a noninvasive and accurate manner, clinicians can avoid  [14][15][16][17][18][19]. Therefore, to improve the accuracy of prediction, some clinicians used preoperative breast ultrasound, mammography photography, and breast magnetic resonance imaging (MRI) to predict the risk of ALN metastasis. However, the false-negative rate of prediction of the ALN status by ultrasound was 16.7-22.9%, After combining with mammography photography, breast MRI, and positron emission tomography/computed tomography (PET/CT), the false-negative rate reduced to 14-16.9% [29,30]. However, thus far, there is no international consensus on the preoperative routine use of MRI [31]. In China, some clinicians used ultrasound in combination with clinical data of patients to build a prediction model of ALN metastasis and obtained an AUC value of 0.864, indicating a good predictive value [32]. However, the number of patients included in that model (n = 322 for the modeling group and n = 234 for the validation group) was relatively small, and the patients were from a single medical center; therefore, they only represented a small    A total of 642 patients had complete data in the validation group: 319 patients had actual positive axillary lymph node (49.69%) and 323 patients had actual negative axillary lymph node (50.31%) *patients: the patients whose predicted risk is lower than the cutoff value Abbreviations: FNR: false negative rate; ALN: axillary lymph nodes    proportion of all patients in China. Thus, the development and application of this model in China were limited. In our current model, the seven breast cancer treatment centers included appropriately reflect the incidence, diagnosis, and treatment of breast cancer in women in China and represent the entire population of China [20]. In many risk predictions of cancer, the nomogram is considered an effective tool for quantitative assessment of risk factors to maximize the accuracy of prediction. It can reflect the contribution of predictive variables to the outcome visually and directly [33,34]. In this study, we successfully established a nomogram model for prediction of breast cancer ALN metastasis that is suitable for Chinese people. Our results showed that the histopathological type played a crucial role in ALN metastasis, followed by tumor location, clinical lymph node status, age at diagnosis, invasion of the chest wall and skin, tumor size, and molecular subtype. Applying this model for patients in the training cohort and the validation cohort in this study, the performance of the nomogram in these two groups was similar (AUC = 0.7157 versus 0.7007), and the nomogram showed good predictive value in both. These results confirmed that our nomogram was useful in different population.
Several studies demonstrated that age of patients with breast cancer at diagnosis, BMI, tumor size, primary tumor quadrant, presence of multiple tumors, clinical lymph node status, local invasion status, pathological type, ER/PR, HER2 status, molecular subtypes, and other factors were related to ALN metastasis status [35][36][37]. Our results showed that the significance of these variables was similar to that of the results previously reported in the literature, including age at diagnosis, tumor size, clinical lymph node status, local invasion status, and pathological type.
In our study, tumor in the central region of the breast was more prone to ALN metastasis than that in other quadrants, which is consistent with the findings of other studies, showing abundant lymphatic drainage in the central region of the breast [38]. Some studies reported that the tumor in the UIQ of the breast was the most difficult tumor location of axillary metastasis [38][39][40]. However, we found that compared to the other locations, the risk of ALN metastasis in the LIQ was the lowest. We speculate that this difference between our study and previous studies is related to the differences in tumor heterogeneity, ethnic differences, lifestyle factors, and so on. The exact reason of this phenomenon remains unclear. Further, some studies showed that molecular subtypes had no predictive value for ALN status [41] , while others showed that the triplenegative breast cancer patients had the lowest incidence of ALN metastasis and the HER-2 subtype had the highest incidence of ALN metastasis [42,43]. Contradictory to these findings, we observed that luminal-like breast cancer patients were associated with higher probability of ALN metastasis. One possible explanation for this finding could be that Luminal like tumors had more lymphatic metastasis than triple-negative phenotype [32,38,44]. Nevertheless, these inconsistencies between our study and previous studies need to be investigated further.
To further assess the clinical application of the prediction model of ALN metastasis, we selected certain cutoff values for predicted risk for use in patients in the validation cohort. For patients with a metastasis risk below the cutoff value, we believed there was a low risk of ALN metastasis. Therefore, as per our model, they could be considered to be free from SLNB and ALND. According to the report of American Society of Clinical Oncology (ASCO), a false-negative rate of 0-29% was reported for SLNB, and the average false-negative rate was 8.4% [45]. In our model, the false-negative rate was only 6.9% when the cutoff value of 20%, which is less than the average false-negative rate of SLNB. Therefore, our nomogram should be acceptable in medical practice. SLNB may not be necessary when the predicted risk is less than 20%, especially for senile patients with other internal diseases and a lower surgical tolerance, who would have a low probability of ALN metastasis but might be more likely to suffer from postoperative complications [46,47]. Thus, application of this model can reduce the surgical risk and postoperative complications in breast cancer patients with a low risk of ALN metastasis.
This study has several strengths that have been highlighted below: (1) To the best of our knowledge, this is the first nomogram prediction model of breast cancer ALN status that considered multi-center data and represents the entire Chinese population with breast cancer. (2) Our prediction model included data on seven variables, which can be obtained by conventional preoperative examination. This information will greatly improve the clinical application of the prediction model without additional examinations and costs. It has important implications for patients in developing countries and economically lessdeveloped regions. (3) The AUC value obtained from the prospective data is 0.7007, suggesting a good predictive ability. Therefore, this model can help clinicians weigh the risks and benefits of SLNB before surgery in order to avoid unnecessary SLNB and ALND for patients.
Despite our important findings and strengths, our study had a few limitations that need to be addressed. First, the established model was based on clinical data of patients with breast cancer in multi-centers over a duration of 10 years. Owing to a long duration of data collection and differences in regional culture and educational and medical levels, there may be unavoidable biases introduced. For example, some data were missing in the study. Although 4211 patients were included in the analysis, only 2511 patients with complete data for variables were entered in the final model (1869 patients in the training cohort and 642 patients in the validation cohort). Second, in general, the predictive value of the model is considered good when the AUC value is 0.7-0.8 and very good when the AUC value is 0.8-0.9. When we prospectively used the model for patients in the validation cohort (n = 655), the obtained AUC value was 0.7007. The AUC value of our study is not perfect due to the large amount of data from multiple centers, it still needs more clinical central and a larger sample size to further evaluate and improve the predictive ability of model. In the future, we will validate the predictive ability of the model in larger clinical studies.
In conclusion, age of patients, tumor size, primary tumor quadrant, clinical nodal status, local invasion status, pathological type, and molecular subtypes were independent predictors of ALN metastasis. The nomogram model established in this study could provide an accurate and objective tool to predict the risk of breast cancer ALN metastasis by quantitative indicators. The developed model is easy to use and has a good predictive ability in the Chinese population. For low-risk patients with ALN metastasis, it can avoid the trauma and postoperative complications associated with axillary surgery, thereby improving the quality of life in patients.

Study design
Data were obtained from the Nationwide Multicenter 10-year (1999-2008) Retrospective Clinical Epidemiological Study of Breast Cancer in China, led by Cancer Hospital/Institute, Chinese Academy of Medical Sciences (CICAMS) and jointly included seven Grade Three A hospitals nationwide.

Selection of regions and hospitals
To ensure that the samples were representative of the total population with breast cancer in China, we selected seven geographic regions across China, including North, North-East, Central, South, East, North-West, and South-West regions (Figure 1). These regions encompassed most of the country and represented different breast cancer burdens. A representative Grade Three A hospital was selected from each region based on the criteria used in our previous study [20]. Briefly, the criteria are listed below: (1) the city where the hospital is located must be an important city in the region; (2) participant hospitals must be leading public cancer hospitals and regional referral centers providing pathology diagnosis, surgery, radiotherapy, medical oncology, and routine follow-up care for patients with breast cancer; and (3) the source of the patient must be able to cover the corresponding research area in order to represent the region.

Data collection and quality control
Employees who uniformly received professional and systemic training in Beijing were responsible for recording patient's information. The patient information was collected by standard case report forms (CRF) designed by the CICAMS and included data on general information, risk factors, diagnostic imaging tests, therapy models, and pathologic characteristics. The reliability and validity of the CRF were assessed by a preceding pilot study. The data were transmitted to Cancer Hospital/Institute, Chinese Academy of Medical Sciences and verified by EpiData (http://www.epidata.dk/). Specific details of this process are described in our previous studies [20].
In 1999-2008, each hospital randomly selected a month every year, and data of at least 50 female breast cancer patients were collected in this month (January and February were excluded from the random selection to eliminate any confounding effects of China's largest annual holiday). If the number of patients included was less than 50 in the selected month, the patients from the immediately preceding month and the immediately following month were included until the total number of patients in that year reached 50. If, in the selected month, the number of patients exceeded 50, they were all included in the study. As such, a total of 4211 patients with breast cancer were included in the study.

Patients and variables
The 4211 patients included in this study were randomly categorized into a training cohort or validation cohort in a 3:1 ratio. ; the patients whose regional lymph nodes could not be assessed [Nx] were excluded); invasion of the chest wall and skin (categorized as invasion and non-invasion); multifocal tumors were categorized as multifocal and unifocal; pathological types were categorized as ductal carcinoma in situ with micro-invasion (DCIS-Mi), invasive ductal carcinoma (IDC), invasive lobular carcinoma (ILC), and other types of invasive carcinoma (tubular carcinoma, mucinous carcinoma, medullary carcinoma); expression of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor-2 (HER-2) (categorized as positive and negative). The outcome variable-postoperative ALN status-was categorized as positive (presence of one or more ALN metastasis) and negative (no metastasis). Molecular subtypes were divided into three categories: luminal-like subtype (ER+ and/ or PR+, any HER2 status), HER-2+ subtype (ER-, PR-, HER2+), and triple-negative (ER-, PR-, HER2-) [21,22]. All patients included in the study were female patients with breast cancer who were diagnosed by histopathology and underwent successful SLNB and ALND.

Pathologic processing
All nodes were examined postoperatively with serial section H&E staining. IHC staining was performed to determine whether micrometastasis (0.2-2 mm cancer foci) existed or not when no cancer cells were identified on H&E staining. ER, PR were considered positive if immunostaining was positive in more than 1% of tumour cells. HER-2 positivity was defined as a score of 3+ on IHC or amplification on FISH [23][24][25]. The histological subtype categorization was based on the 1981 and 2003 histological classification criteria of the World Health Organization [26]. Specific details are described in our previous studies [20,27].

Statistical analysis
The mean, SD, median were calculated to describe continuous variables, and a constituent ratio was used to describe categorical variables. T-test was used for the comparison of continuous variables, and Chi-square test or Fisher's exact test was used for comparison of categorical variables. Using the clinical and pathological data of the training cohort, univariate logistic regression analysis was performed to explore ALN metastasis-related variables. Subsequently, multivariate logistic regression analysis was used to determine the variables that were independent influence factors of ALN metastasis and establish the nomogram of the prediction model for breast cancer ALN metastasis. Receive-operating characteristic (ROC) curves, areas under the ROC curves (AUC), sensitivity, specificity were used to evaluate the predictive ability of model. The ROC curve was prepared by retrospectively using the data of the training cohort and calculating the AUC value. This prediction model was prospectively used for patients in the validation cohort by the depicting the ROC curve and re-evaluating the accuracy of the prediction through the AUC value. To test the accuracy and stability of the model, the decile of predicted values of metastasis risk was segmented based on the data of the validation cohort, and the average metastasis risk was calculated in each segment. The predictive value was taken as the abscissa, and the average actual metastasis risk was taken as the ordinate to draw the calibration curve. To further evaluate the clinical value of the model, we considered certain cutoff values for prediction risk in patients in the validation cohort and calculated the corresponding accuracy and false-negative rate of the cutoff values in order to assess the screening indicators of low-risk patients with ALN metastasis.
All statistical analyses were performed using SPSS (version 19.0, Chicago, IL, USA), Stata (version 11.0, College Station, TX), and R software (version 3.1.0, Institute for Statistics and Mathematics, Vienna, Austria). A two-tailed p value < 0.05 was considered statistically significant. In this study, all data are reported in aggregates.

Ethics statement
The research was approved by Institutional Review Board of the Cancer Foundation of China. Because of the retrospective nature of the study, we were unable to contact all patients or their families, In addition, considering that it will not pose a risk to patients included in the study, informed consent was not obtained. All patient identifiers were removed, as per the approved procedures. Deidentified data were maintained in a secure database, to which only research team members had access.