Quantitative ultrasound assessment of breast tumor response to chemotherapy using a multi-parameter approach

Purpose This study demonstrated the ability of quantitative ultrasound (QUS) parameters in providing an early prediction of tumor response to neoadjuvant chemotherapy (NAC) in patients with locally advanced breast cancer (LABC). Methods Using a 6-MHz array transducer, ultrasound radiofrequency (RF) data were collected from 58 LABC patients prior to NAC treatment and at weeks 1, 4, and 8 of their treatment, and prior to surgery. QUS parameters including midband fit (MBF), spectral slope (SS), spectral intercept (SI), spacing among scatterers (SAS), attenuation coefficient estimate (ACE), average scatterer diameter (ASD), and average acoustic concentration (AAC) were determined from the tumor region of interest. Ultrasound data were compared with the ultimate clinical and pathological response of the patient's tumor to treatment and patient recurrence-free survival. Results Multi-parameter discriminant analysis using the κ-nearest-neighbor classifier demonstrated that the best response classification could be achieved using the combination of MBF, SS, and SAS, with an accuracy of 60 ± 10% at week 1, 77 ± 8% at week 4 and 75 ± 6% at week 8. Furthermore, when the QUS measurements at each time (week) were combined with pre-treatment (week 0) QUS values, the classification accuracies improved (70 ± 9% at week 1, 80 ± 5% at week 4, and 81 ± 6% at week 8). Finally, the multi-parameter QUS model demonstrated a significant difference in survival rates of responding and non-responding patients at weeks 1 and 4 (p=0.035, and 0.027, respectively). Conclusion This study demonstrated for the first time, using new parameters tested on relatively large patient cohort and leave-one-out classifier evaluation, that a hybrid QUS biomarker including MBF, SS, and SAS could, with relatively high sensitivity and specificity, detect the response of LABC tumors to NAC as early as after 4 weeks of therapy. The findings of this study also suggested that incorporating pre-treatment QUS parameters of a tumor improved the classification results. This work demonstrated the potential of QUS and machine learning methods for the early assessment of breast tumor response to NAC and providing personalized medicine with regards to the treatment planning of refractory patients.


INTRODUCTION
Conventional methods of clinical tumor response assessment involve tracking changes in tumor size, using the guidelines provided by Response Evaluation Criteria in Solid Tumors (RECIST) [1]. Such measurements are ascertained using anatomical-based imaging modalities such as X-ray imaging, magnetic resonance imaging (MRI), or conventional diagnostic ultrasound. However, tumor size typically provides late indications of response as measurable changes in tumor size do not occur until several weeks to months after the initiation of the NAC treatment, despite positive response [2]. Currently, no routine clinical imaging is carried out to assess tumor size or response during breast NAC administration in a neoadjuvant setting. Thus, the introduction of a noninvasive functional imaging system that can be used to monitor the early response of a tumor to anticancer therapy can potentially help facilitate personalized treatment for cancer patients, thereby optimizing their therapeutic outcome and recurrence-free survival.
Several imaging methods have been developed in research to assess early therapeutic responses of breast tumors, including diffuse optical spectroscopy (DOS) [3], fluoro-deoxyglucose positron emission tomography (FDG-PET) [4], and diffusion-weighted magnetic resonance imaging (DW-MRI) [5]. Despite a favorable sensitivity in detecting breast tumor response at 4 weeks, DOS has limited tissue penetration depth, thereby limiting its application to superficial tumors. DW-MRI requires substantial capital investment and PET requires the injection of radioactive tracer isotopes, limiting repeated usability and imparting potential longterm health complications. On the other hand, ultrasound is relatively inexpensive and safe and its imaging methods with respect to QUS rely on the inherent changes in tissue microstructure to generate tissue contrast, requiring no external contrast agents.
Quantitative ultrasound (QUS) is a tissue characterization technique which examines the frequency content of the radiofrequency (RF) backscatter ultrasound signals from tissues. According to the theory of ultrasound scattering, the power spectrum of the tissue backscatter signal is affected by parameters such as the size and number density of the ultrasound scatterers. In 1987, Lizzi et al. [6] demonstrated that parameters related to the linear regression of the tissue power spectrum are directly linked to the tissue microstructure. These parameters include spectral slope (SS), spectral intercept (SI), and midband fit (MBF). The parameter SS is inversely related to the scatterer size [6], SI is related to scatterer size, scatterer concentration, and the acoustic impedance difference between the scatterer and the background [6], and MBF is related to ultrasound integrated backscatter [7], a measure of the energy efficiency of the acoustic backscatter from a tissue sample [8]. By taking into account differences in tissue microstructure, the aforementioned parameters have enabled the characterization of abnormalities of different tissues such as those in breast, prostate, liver, eye, myocardium, and lymph nodes [9][10][11][12][13][14][15]. Alternatively, some studies have found higher-order model derived backscatter coefficient (BSC) parameters such as average scatterer diameter (ASD) and average acoustic concentration (AAC) to be useful in studying tissues, including differentiating mouse models of breast cancer from benign breast masses, grading clinical breast cancer, and detecting cancerous human lymph nodes [10,15,16]. Recent preclinical studies have demonstrated at high-(>20 MHz) and conventional-(<10MHz) frequency ranges that QUS can be used to detect and quantify tumor cell death in vivo, in response to various treatments including photodynamic therapy, radiation therapy, chemotherapy, and antivascular therapy [17][18][19][20]. Furthermore, pilot clinical studies by Sadeghi-Naini et al. [21,22] has demonstrated the effectiveness of QUS and texture analysis methods in the assessment of patients' breast tumor responses to NAC as early as 1 week into their several-month treatment. In those studies, Sadeghi-Naini et al. posited that at a clinically relevant frequency range (<10 MHz), spectral parameters such as MBF, SI, and SS are sensitive to changes in tumor microstructure which occur as a result of therapeutic effects, and therefore can correlate to early signatures of tumor response. Furthermore, statistical texture analysis of the QUS images using the gray-level co-occurrence matrices (GLCM) considering the heterogeneity of the tumor response was suggested to improve the discrimination of responsive patients from non-responsive ones [22]. However, those studies were limited to only statistical significance tests and the use of a simple classifier (Fisher linear discriminant (FLD)) applied on a small patient database (N=25) and performance measures were obtained without crossvalidation (training and testing sets were identical), resulting in over-optimistic values. More recently, Tadayyon et al. [23] compared tumor response prediction sensitivity and the specificity of the MBF when the power spectrum was corrected for attenuation and vice versa. They demonstrated that estimating the acoustic attenuation of the patient's tumor and correcting the power spectra accordingly, not previously done [21,22], increased the sensitivity of MBF to response detection by 12% and specificity by 17%.
Here, we propose a new and improved approach for the QUS prediction of breast tumor response to neoadjuvant chemotherapy. Specifically, we have made the following five improvements compared to similar previous works [22,24]: 1. We have used the largest population size (N = 58) studied to date on QUS characterization of LABC tumor response to NAC, which is double the size of the most recent study [24]. This represents over 200 volumetric ultrasound scans of patient tumours.
2. We have included new QUS features not investigated previously for this application, including attenuation coefficient estimate (ACE), attenuation-corrected QUS features, as well as spacing among scatterers (SAS). 3. We have performed leave-one-patient-out crossvalidation in order to evaluate the performance of the classifier when subjected to unseen data. 4. We have used a new classifier, the KNN classifier, to perform discriminant analysis. Since the search radius can be tuned, the KNN classifier can learn the local structure of a feature space more effectively than a linear classifier. This is important especially in a tumor response classification task such as this, since tumor response is not discrete but rather heterogeneous, with many variations in degrees of response. 5. The study here found, for the first time, that including pre-treatment QUS features in the QUS model improves the discrimination of response.

RESULTS
Patient characteristics, including age, initial tumor size, tumor subtypes, and bulk tumor shrinkage for responders and non-responders are summarized in Table  1. All patients were females aged between 29 and 67 years with a mean age of 49 years. Tumor size ranged from 2 to 13 cm, with a mean size of 6.3 cm. The tumors were mainly of the invasive ductal carcinoma type not otherwise specified (90% of cases). The remaining 10% of cases were comprised of invasive lobular carcinoma (5%) and other types of breast cancer (5%). The ultimate clinical response rate to NAC in the sample population was 72% and responders demonstrated a mean tumor shrinkage of 68 ± 47% whereas the non-responders demonstrated mean bulk shrinkage of -16 ± 57%. Bulk tumor shrinkage was defined as the relative reduction in the sum of tumor diameters from pre-treatment to pre-operation. Size measurements were ascertained using breast DCE-MRI obtained at these two times. Detailed individual patient characteristics and responses are provided in Tables A.1 and A.2.
Representative images of a responding breast tumor and a non-responding breast tumor before treatment initiation and 4 weeks after treatment initiation (1-2 cycles of NAC) are presented in Figures 1 and 2. For each tumor, B-mode images, MBF images overlaid on the B-modes, power spectra before and 4 weeks after the start of treatment, and magnified hematoxylin and eosin (H&E) stained histology sections of whole-mount breast specimens obtained post-surgery (mastectomy/ lumpectomy) are shown. These data were selected for illustration as MBF was a parameter, which demonstrated statistically significant changes at early weeks (1 and 4). Whereas B-mode images showed no appreciable changes in the tumor 4 weeks into treatment, a marked increase in MBF could be observed in the responding tumor region as a result of 4 weeks of NAC (1-2 cycles). The nonresponding tumor, on the other hand, demonstrated no change or decrease in MBF. The before/after superimposed power spectra demonstrated the same concept graphically, where MBF is marked by a circle in the middle of the regression line ( Figure 1 and 2 (C left)). The histology image of the responding tumor indicates a stroma-filled tissue (pink staining) with small isolated patches of glands (purple staining), demonstrating therapeutic effects. On the other hand, the histology of the non-responding tumor shows a gland-dominated tumor with low stromal collagen density, indicating little to no therapeutic effect. Figure 3 compares QUS parameters with the RECIST metric for tracking changes in the tumor during NAC. Average QUS data obtained from responding and non-responding groups are plotted versus treatment time in Figure 3A-3G. Patients were grouped based on their ultimate clinical/pathological responses. The vertical axes represent the absolute difference in QUS parameters relative to week 0 (pre-treatment), which is denoted by a ∆ prefix. For instance, ∆MBF at week 4 is computed as MBF(week 4) -MBF(week 0). Parameters related to the intensity of the frequency-dependent backscatter (i.e. ∆MBF, ∆SI, ∆AAC) demonstrated, on average, an increase with treatment time for responders. Based on unpaired ttest comparison of responder and non-responder groups, right-tailed with 95% confidence, this increase was statistically significant at weeks 1, 4, and 8 for ∆MBF (p = 0.042, <0.005, <0.005, respectively), and at weeks 1, 4, and 8 for ∆SI (p = 0.034, 0.010, <0.005, respectively). Patients in the responding group demonstrated a greater increase in ∆ACE compared to non-responders, which were statistically significant at weeks 1 and 4 (p <0.005 and 0.042, respectively). On the other hand, ∆SS, ∆ASD, and ∆SAS values did not show any significant changes between responders and non-responders at any time during the treatment. As expected, the mean tumor size reduction shown in Figure 3H was not significantly different in responders versus non-responders at any time (p = 0.89, 0.53, and 0.42 at week 1, 4, and 8, respectively) except at the end of the several-month treatment (p = 0.0011 at pre-op). Whereas a 30% mean size reduction occurred in responders at week 4 ( Figure 3H), non-responders also had a mean reduction of almost 30% at week 4, and the difference between the groups was not statistically significant. After week 4, whereas responding tumors continued shrinking, non-responding tumors grew to 20% larger than their original size between week 8 and pre-op, which had approximately an 8 to 10 week gap.
In order to compare the effectiveness of different QUS parameters in differentiating responding tumors from non-responding ones, the KNN algorithm was run for each QUS parameter separately and classification accuracy was computed. Table 2 summarizes the performance of ± indicates standard deviation individual QUS parameters in predicting response in terms of classification accuracy and statistical significance (p-value) for weeks 1, 4, and 8. The classification results are based on a 2-neighbor search area and using the Euclidean distance metric, which provided the optimal classification. The results demonstrated that the MBF parameter was most effective in response detection at all weeks (accuracy = 61 ± 8%, 65 ± 5%, and 85 ± 5%, for weeks 1, 4, and 8, respectively), followed by SI (accuracy = 55 ± 8%, 65 ± 11%, 74 ± 6%, respectively). Overall, performances improved at week 8 compared to those of weeks 4 and 1. Table 3 presents the RECIST-based versus multiparameter-QUS-based patient response classification results. Sensitivity was defined as the ratio of the number of true responders to total number of responders (expressed as a percentage). Specificity was defined as the ratio of the number of true non-responders to the total number of non-responders in percentage. Accuracy was determined as the percentage of total number of correctly classified patients to the total number of patients. The first row presents the RECIST-based response classification, which was performed by classifying each patient based on 30% reduction at each follow-up visit and comparing the prediction with their "true" response, assumed to be the ultimate clinical/pathological response. The second row presents the classification performance obtained using only changes in QUS features relative to pretreatment, whereas the third row presents the classification performance obtained using pre-treatment QUS features, and the combination of pre-treatment and changes in QUS features during treatment. The fourth row presents the p-values of the significance of the difference between the accuracies of the second and third rows. An asterisk indicates a significant difference. Leave-one-patient-out cross-validation was performed on the KNN classifier to obtain the overall sensitivity, specificity, and accuracy values. All possible combinations of the 7 QUS parameters (∆MBF, ∆SS, ∆SI, ∆SAS, ∆ACE, ∆ASD, ∆AAC) were investigated for feature selection. As expected, the RECIST method showed the poorest discrimination between responders and non-responders at all times during the treatment (accuracies of 30%, 52%, and 68% at weeks 1, 4, and 8, respectively, as presented in Table 3, row 1). On the other hand, the QUS-based model including the optimal parameter combination of [∆MBF ∆SS ∆SAS] displayed promising accuracies (accuracy = 60 ± 10%, 77 ± 8%, 75 ± 6%, at weeks 1, 4, and 8, respectively, as presented in Table 3, row 2). Combinations of 4 or more parameters have not been reported since no improvement was observed beyond 3 parameters. A separate feature selection was performed for the case when the pretreatment values were included. Furthermore, the QUS biomarker consisting of [MBF wk0 ∆MBF SS wk0 ∆SS SAS wk0 ∆SAS] differentiated responders from non-responders with improved accuracies of 70 ± 9% 80 ± 5%, and 81 ± 6% at weeks 1, 4, and 8, respectively, as presented in Table 3, row 3. The inclusion of pre-treatment information demonstrated a 10% improvement in the accuracy at week 1 which was also statistically significant (p-value = 0.03). Even at baseline (pre-treatment), the response of the patients could be predicted with 65 ± 9% accuracy using the set [MBF wk0 SS wk0 SAS wk0 ].
In order to compare the predictions of QUS and histopathology on the recurrence free survival (RFS) www.impactjournals.com/oncotarget of the patients, Kaplan-Meier survival analysis was performed and the results are presented in Figure 4. The median follow-up time was 25 months. The RFS curves were divided into responder and non-responder groups. A log-rank test was performed to compare the RFS rates between the responders and non-responders [25]. The RFS curves obtained from QUS biomarkers demonstrated statistically significant differences between the response groups at weeks 1 and 4 (log-rank p-value = 0.035, 0.027, respectively) as did the RFS curves obtained from histopathology information (log-rank p-value = 0.0002). However, RFS curves obtained from QUS biomarkers at week 8 did not show significant difference between the response groups (log-rank p-value = 0.26).

DISCUSSION
This study demonstrated, for the first time, using a relatively large patient database and using a leave-onepatient out classifier evaluation that multi-parametric QUS applied at a clinically relevant frequency range (<10 MHz) can be used to non-invasively predict breast tumor response to NAC as early as after 1-2 cycles (1-4 weeks) with reasonable accuracy (80%), whereas RECIST-based tumor size change is only 52% accurate in predicting response at week 4 with a 30% threshold. Additionally, RFS analyses performed demonstrated that when the ultrasound biomarkers [MBF, SS, SAS], which include pre-treatment values along with the change at a specified time during treatment, were used to predict the RFS, responder and non-responder RFS rates were statistically significantly different when classifying patients based on data at weeks 1 and 4. Although the results of this study were not used to modify the treatments of the patients, the findings suggest that ultrasound biomarkers can predict the RFS rates of responding and nonresponding patients within weeks almost as accurately as patient ultimate clinical response based on clinical and histopathology information obtained many months later. The reason for the poor separation of the RFS Figure 1: Representative data for a responding patient. B-mode images A. MBF images B. and power spectra C left. before and 4 weeks after the start of chemotherapy treatment. Hematoxylin and eosin histology histology image post-surgery C right. Data in the left column represent pre-treatment data, obtained prior to treatment initiation, and data in the right column represent week 4 data. US scale bar represents 1 cm, histology scale bar represents 100 μm. www.impactjournals.com/oncotarget rates of the groups predicted from week 8 QUS data was likely due to the QUS biomarker being sensitive only to early microstructural changes in the tumor during treatment. Post-surgical histology images demonstrated a considerable extent of fibrosis potentially mixed with cell death in the responding tumor bed. Thus, it is posited that at 8 weeks, the beginning of fibrotic changes contributed to altering the QUS measurement of cell death.
Previously developed theories about ultrasound detection of cell death support the findings in this study. Just as the parameters related to the backscatter intensity and acoustic concentration (i.e. MBF and SI) increased in tumors undergoing cell death in vivo in previous studies [19], MBF, SI, and AAC values increased in clinically responding tumors in this study. Classification results determined that MBF was the most effective parameter in the discrimination of responders from non-responders at weeks 4 and 8 ( Table 2). This suggests that responserelated changes in a tumor are linked to the energy efficiency of acoustic backscatter from the tumor tissue. Since changes in ACE obtained at weeks 1 and 4 were statistically significant in responders compared to those of non-responders (p = 0.004 and 0.039, respectively), it is likely that the attenuation correction of the tumor spectra helped in accentuating the MBF parameter to response detection. Furthermore, the MBF changes in the responding-tumor patient population became highly statistically different from those of non-responding tumors at week 8 (p < 0.005). The increase in the ACE observed in responding tumors over treatment time was concordant with the increase in attenuation coefficient with cell death extent observed in previous high-frequency QUS cell treatment characterization studies [26].
Classification results using the multiparametric QUS model demonstrated that increasing the number of QUS parameters submitted to the classification system improved the discrimination power, but not beyond three parameters. Whereas previous studies found classical parameters (MBF and SI) to be sensitive to detecting breast tumor response at 4 weeks, we found that MBF, SI, ACE, and AAC all have comparable accuracies in predicting tumor response (65%, 65%, 64%, and 64% respectively at week 4). Combining the changes and pre-treatment values of MBF, SS, and SAS provided the best prediction of response (70 ± 9% at week 1, 80 ± 5% at week 4, and 81 ± 6 and week 8). This may be due to In terms of statistical analysis, the ∆MBF, ∆SI, and ∆ACE parameters in our study demonstrated a significant change in responders at week 1 (p < 0.05), just as tracer uptake change did after one cycle in the PET study (p < 0.05) [4], just as total diffusion change did after one cycle of NAC in the DW-MRI study (p < 0.05) [5], and just as changes in deoxygenated hemoglobin, oxygenated hemoglobin, total hemoglobin concentration, water percentage, and tissue optical index did at week 1 in the DOS study (p < 0.05) [3]. Recently, a genetic method of monitoring metastatic breast cancer has been proposed, demonstrating circulating tumor DNA as an effective biomarker for this purpose [27]. However, this method is invasive in its nature and time consuming, as it involves many steps including blood sample centrifugation, DNA extraction, polymerase-chain-reaction to detect genomic mutations, and assay of circulating tumor cells. On the other hand, the ultrasound-based method here permits breast ultrasound imaging and response assessment to be performed using one system and in one session and does not rely on tumor-specific genetic markers. It is sensitive to the biophysical changes which accompany cell deaththe induction of which is the goal of cancer chemotherapy. Evidence demonstrates that patients who respond well to chemotherapy may benefit from longer regimens of efficacious chemotherapy and suggest that ineffective treatments should be changed [28]. Currently the standard of care for patients receiving NAC only includes pre-treatment and post-treatment imaging, using typically DCE-MRI, but does not routinely include intra-treatment imaging for tumor size assessment. Furthermore, ultrasound imaging is not reliable for tumor size measurement due to attenuation artefacts which cast shadows on the distal end of deep-set tumors. However, this had minimal effect on QUS assessment in this study, since the ROIs were selected in the center of the tumor (~ 90% coverage), avoiding regions of artefacts. Although intra-treatment tumor size was recorded in this study as measured by the physician during follow-up physical examinations, this method has limited reproducibility since measurements were made by different physicians. Thus, measurements reported here should be assumed approximate. Parameters are sorted from highest to lowest accuracy at week 4. The bold entry indicates the best performance. Reported values are mean and standard deviation of the accuracies in percentages. Results were obtained by running the classification 10 times using 10 random samples from responders group   As demonstrated by the results, whereas ΔMBF was the most effective single QUS parameter for classifying patient response, combining ΔMBF with ΔSS and ΔSAS improved the classification response accuracy at week 4 from 65 ± 5% to 77 ± 8%. A point of note is that ∆SS (week 0 and 4) and ∆SAS (week 0 and 4) were less accurate than other parameters investigated such as ∆SI, ∆ACE, ∆AAC, and ∆ASD, according to the results in Table 2. However, since ∆SS and ∆SAS are features that are independent from ∆MBF, the discriminating power increased when these parameters were combined. Particularly, MBF describes the acoustic concentration, SS describes the size of the scatterers, and SAS describes the distance between regularly-spaced scatterers.
The QUS results obtained indicated poor separation between responders and non-responders at the preoperative scan time. This is expected and likely due to the large time gap between the end of neoadjuvant treatment and surgery (usually several weeks), where minimal or no cell death had occurred at the time of data acquisition. Additionally, tumor ROI selection in pre-operative images was difficult in complete pathologic responders who had no residual tumor, and were therefore excluded from the analysis. As expected the early investigated times at weeks 1 and 4 indicated the best separation between responders and non-responders. These were selected to span cycles of NAC and it remains unknown if other times sooner or later would be useful for analyses. Despite this, the sensitivity and specificity and consequent accuracy were significant for predicting ultimate patient clinical response.

CONCLUSIONS
In summary, this study demonstrated for the first time, using a relatively large patient cohort and leave-oneout classifier evaluation, that the hybrid QUS biomarker [ΔMBF, ΔSS, ΔSAS] can, with good sensitivity and specificity, detect the response of LABC tumors to NAC as early as after 1 cycle (1 week) of administration. Extending efficacious treatments and switching ineffective ones early based on indications of QUS biomarkers may likely result in improved RFS. The findings of this study also provided insight into pre-treatment ultrasonic scattering properties of a tumor potentially contributing to a prediction about its therapeutic resistance before the initiation of therapy.

Ultrasound data acquisition and processing
This prospective study was reviewed and approved by the institution's research ethics board. After obtaining informed consent, ultrasound RF data were collected from the affected breast of patients (N = 58) with locally advanced breast cancer (LABC) prior to NAC treatment initiation and at four times during the course of the treatment -weeks 1, 4, 8, and prior to surgery (mastectomy/lumpectomy). Patients recently diagnosed with locally advanced invasive breast cancer within one week, including invasive ductal carcinoma, invasive lobular carcinoma, and other forms of invasive cancer, including all grades, were referred from the diagnostic clinic to our study. This included patients with tumors larger than 5 cm and/or tumors with locoregional lymph node, skin, and chest wall involvement as per guidelines reported in [29]. All clinical and ultrasound data obtained for this study were dated back to patients treated between January 2009 and August 2013. Treatment regimens varied from 5-fluorouracil, epirubicin and cyclophosphamide followed by docetaxol (FEC-D), to Adriamycin followed by paclitaxel (AC-T), or taxol followed by herceptin varying from weekly to tri-weekly cycles. Individual patient treatment regimens are provided in Table A.1.
Breast ultrasound data were collected by an experienced sonographer using a clinical scanner (Sonix RP, Ultrasonix, Vancouver, Canada) employing a 6 MHz center frequency linear array transducer (L14-5-60), sampling at a rate of 40 MHz, with the focus set at the midline of the tumor and maximum imaging depth set to 4-6 cm, depending on tumor size and location. Standard B-mode imaging was used for anatomical navigation, and acquisition volume was determined based on the tumor location reported in biopsy findings. Approximately 3-5 image planes were acquired from the tumor, depending on the tumor size. Regions of interest (ROI) were contoured around the tumor within each image plane and segmented into smaller blocks, called RF blocks. The ROI was then divided into 2 by 2 mm blocks, with adjacent overlap of 80% in both axial and lateral directions. A 2 by 2 mm RF block corresponds to 10 spatial pulse lengths axially and 5.5 beamwidths laterally, which meets the minimum ROI size requirements for obtaining reliable scatterer property estimates [30,31]. A normalized power spectrum was then computed for each RF block using a phantom reference, and was corrected for the total attenuation, A(f), from the top of the image down to the center of the RF block, as illustrated in Figure 5. The total attenuation consisted of two components: attenuation of the intervening tissue, α 0 , assumed to be 1 dB/cm-MHz based on reported ultrasound tomography measurements of the breast [32], and the local attenuation estimate of the ROI, α 1 , which is also referred to here as ACE and is estimated using the spectral difference method [33]. After obtaining attenuation corrected normalized power spectra from all RF blocks within the ROI, a parametric image was computed for each QUS parameter. The QUS parameters investigated were MBF, SS, SI, SAS, ACE, ASD, and AAC. More details about QUS analysis are provided in the appendix. Since spectral normalization was performed using a homogeneous tissue-mimicking phantom prior to parameter estimation, effects of tumor depth and size were minimized. In addition to acquiring ultrasound data, tumor sizes reported by oncologists by physical examination in the follow-up visits were also examined. Size reports were corroborated by ultrasound imaging results but clinical physical examination documentation was used for tumor size measures.

Classification and statistical analyses
All QUS results were compared with the clinical standard response of each patient, determined based on the RECIST guideline. This was determined at the end of the patient's several-month treatment by measuring the reduction in gross tumor size based on dynamic contrast enhanced magnetic resonance images (DCE-MRI) cross-verified with whole-mount breast histopathology obtained post-operatively. Since the focus of this study was a binary classification of response, the standard four categories of response defined in the RECIST guideline were merged into two categories by grouping complete and partial responses into "response" and grouping stable and progressive disease responses into "non-response". A recent study demonstrated that residual tumor cellularity is an important prognostic factor in breast cancer neoadjuvant treatment, which should be taken into account in conjunction with the RECIST metric of bulk tumor shrinkage (BTS) [34]. Accordingly in this study, a patient was deemed to be a clinical responder if the sum of the lengths of the tumor foci was reduced by more than 30% or if in the non-mass enhancing area, the pathologically determined residual tumor cellularity was low. Conversely, a patient was considered a clinical non-responder if the sum of the lengths of their tumor foci was reduced by less than 30% or the residual tumor cellularity remained high. In cases (infrequent) where the RECIST-based response conflicted with the pathological response, the pathological response was used to determine the true response.
The mean changes in each QUS parameter were compared between the clinical responder and nonresponder groups at each time. Initially, a Shapiro-Wilk normality test was used to test each parametric data set for normality. Since all data sets passed the normality test, a student's unpaired t-test (right-tailed, α = 0.05) was used to test for statistical significance of the difference between group means. In order to determine the clinical feasibility of using QUS as a cancer therapy monitoring system, multi-feature response classification was performed using a KNN classifier based on Euclidean distances (see Appendix). Rather than the absolute values, the changes in the QUS parameters relative to their pre-treatment value (week 0) were used as classification features, which are denoted here by the prefix ∆ (i.e. ∆MBF). This baseline normalization was necessary to account for differences in the breast tissue echogenecity levels among the patients, owing to differences in breast densities. The imbalance in the data set was compensated for by randomly sampling (with replacement) from the responder group so as to have equal number of responders and non-responders (N = 16). Classification was performed 10 times (10 different responder group samples) with leave-one-patient-out evaluation. Due to the small number of features, an exhaustive search feature selection method was used for obtaining the optimal set of features for classification. The exhaustive search involved searching through all possible combinations of 2, 3, 4, 5, 6, and 7 parameters (120 combinations) in order to determine the minimal feature set resulting in the best classification accuracy, thereby removing any irrelevant or redundant parameters. Classification accuracy (number of correctly classified patient over the total number of patients) was used as the objective function to maximize. The metrics used for measuring classification performance were sensitivity, specificity, and accuracy.

Quantitative ultrasound data analysis
All spectral analyses were carried out using the data from the -6 dB system transducer bandwidth, which was 3-8 MHz. The first step in the QUS analysis was computation of the attenuation coefficient estimate (ACE) of the tumor, which was used for attenuation correction of the tumor power spectrum. The ACE was computed using the reference phantom method (RPM) by estimating the rate of change in the spectral magnitude with depth and frequency relative to a reference medium with a known attenuation coefficient [33]. The reference material was an in-house constructed tissue-mimicking phantom with a measured attenuation coefficient of 0.15 dB/cm-MHz and a sound speed of 1515 m/s. The phantom was constructed based on [35], containing randomly dispersed glass microspheres with diameter of 18 (SD =3) μm and concentration of 2.2 g/L, in a 2% agar medium. Any phantom can be used for the RPM and for the QUS analysis performed here as long as it meets the following requirements. The phantom must produce a homogeneous speckle image and its BSC, speed of sound, and attenuation coefficient must be known (accurately measured). In order for system-dependent factors to be accurately corrected for using the RPM, the speed of sound of the reference should be matched to that of the sample [36]. Based on reported values [37], the speed of sound averaged over the fatty tissue, parenchyma, benign, and malignant lesion of the breast is 1490 m/s. Thus, the speed of sound of the phantom used here is within 1.5% difference of this value and is considered reasonable for use of the RPM.
In order to estimate ACE of a tumor ROI using the RPM, plots of phantom-normalized power spectrum amplitude versus depth were obtained by averaging the power spectra across laterally adjacent blocks and then plotting the average amplitude at each frequency against the depth of the blocks in the ROI. The mean ACE of the tumor was estimated by averaging the slopes of the linear fits to the amplitude versus depth data at all frequency points in the bandwidth. The newly found ACE was used to correct the tumor power spectrum for attenuation using the point attenuation compensation method [38]. The phantom power spectrum was corrected for using point attenuation compensation method and using the known attenuation coefficient (0.15 dB/cm-MHz). Afterwards, spectral parameters, including MBF, SI, and SS were determined from linear regression of the attenuation-corrected power spectrum within the usable (-6 dB) bandwidth. SI and SS are the intercept and slope parameters of the line of best fit, and MBF is the magnitude of the spectral fit at the center of the frequency bandwidth.

Classification system
After computing all 7 QUS features for all patients, classification was performed using the KNN classifier and using all possible combinations of QUS features. The KNN classifier determines the class of a point in the feature space based on the class which forms the majority of the points neighboring the point of interest and based on the distance between those points and the point of interest [44].