Risk stratification of thyroid nodules with Bethesda category III results on fine-needle aspiration cytology: The additional value of acoustic radiation force impulse elastography

To assess the value of conventional ultrasound, conventional strain elastography (CSE) and acoustic radiation force impulse (ARFI) elastography in differentiating likelihood of malignancy for Bethesda category III thyroid nodules. 103 thyroid nodules with Bethesda category III results on fine-needle aspiration cytology (FNAC) in 103 patients were included and all were pathologically confirmed after surgery. Conventional ultrasound, CSE and ARFI elastography including ARFI imaging and point shear wave speed (SWS) measurement were performed. Univariate and multivariate analyses were performed to identify the independent factors associated with malignancy. Area under the receiver operating characteristic curve (Az) was calculated to assess the diagnostic performance. Pathologically, 65 nodules were benign and 38 were malignant. Significant differences were found between benign and malignant nodules in ARFI. The cut-off points were ARFI imaging grade ≥ 4, SWS > 2.94 m/s and SWS ratio > 1.09, respectively. ARFI imaging (Az: 0.861) had the highest diagnostic performance to differentiate malignant from benign nodules, following by conventional ultrasound (Az: 0.606 - 0.744), CSE (Az: 0.660) and point SWS measurement (Az: 0.725 - 0.735). Multivariate logistic regression analysis showed that ARFI imaging grade ≥ 4 was the most significant independent predictor. The combination of ARFI imaging with point SWS measurement significantly improved the specificity (100% vs. 80.0%) and positive predictive value (100 % vs. 72.9%) in comparison with ARFI imaging alone. ARFI elastography is a useful tool in differentiating malignant from benign thyroid nodules with Bethesda category III results on FNAC.


INTRODUCTION
Patients with suspicious thyroid nodules on ultrasound (US) are usually advised to undergo fineneedle aspiration cytology (FNAC). US-guided FNAC is a cost-effective and widely-used method to differentiate thyroid nodules with a diagnostic accuracy of 62% to 85%, which reduces the risk of unnecessary surgery for benign nodules [1][2][3][4][5]. The Bethesda System for Reporting Thyroid Cytopathology (BSRTC) has standardized the FNAC results and facilitated effective communication among clinicians, radiologists and pathologists [6]. However, cytological examination cannot replace the final pathological diagnosis on account of sampling errors [7]. The Bethesda category III classification, that is, atypia Research Paper of undetermined significance (AUS) / follicular lesion of undetermined significant (FLUS), has remained ambiguous concerning the risk of malignancy and guidelines for management [3,8]. Bethesda category III nodules (i.e. AUS/FLUS nodules) usually account for less than 7% of FNAC results and the malignancy rate is 5%-15% [9]. However, there are varying reports citing the incidence rate and the risk of malignancy, ranging from 3% to 20% [10,11] and 5% to 48% [9,[12][13][14].
The uncertain malignancy risk of AUS/FLUS nodules always leads to uncertainty in subsequent treatment planning. To solve this issue, a lot of studies have investigated risk factors associated with malignancy in AUS/FLUS nodules, including clinical factors (sex, age, history of radiation and family history) [13], US findings [4,15], elastography findings [7,16], cytology subclassifications [3,17], molecular mutational analyses [3], repeated FNAC [8,18], core-needle biopsy (CNB) [17] and intraoperative frozen sections (FS) [2]. The recommended management for nodules with Bethesda category III result is repeat FNAC 3 month after first FNAC [6]. Surveillance or diagnostic surgery may also be advised, depending on clinical risk factors, US pattern, and patient preference [9]. In a recent prospective study, 48.6% of initial Bethesda category III nodules persisted as category III on repeat FNAC [8,11], arguing against the role of repeat FNAC. Meanwhile, patients are often reluctant to undergo repeat FNAC.
Recently, US elastography has gained increasing attention for diagnosis of thyroid nodules. Conventional strain elastography (CSE) is helpful in differentiating malignant from benign thyroid nodules by enabling measurement of tissue deformation in response to compression and displaying tissue stiffness [19], whereas it is limited by factors such as lack of quantitative information and low reproducibility [20]. Acoustic radiation force impulse (ARFI) elastography has been introduced in recent years, in which the tissue is mechanically excited under short-duration acoustic pulses from the transducer which propagate in a perpendicular direction. Qualitative assessment of stiffness is achieved by estimating tissue displacement (i.e. ARFI imaging) under the acoustic pulses and quantitative assessment is achieved by measuring transverse shear wave propagation speed (i.e. point shear wave speed [SWS] measurement) [21]. ARFI elastography has showed improved diagnostic accuracy in comparison with conventional US and CSE [22,23], which is also more reproducible and operator-independent [24].
Until present, no study has been performed to evaluate the usefulness of ARFI elastography for diagnosis of AUS/FLUS thyroid nodules. It was hypothesized that ARFI elastography might be a useful tool for malignancy stratification of AUS/FLUS thyroid nodules. To confirm this hypothesis, the diagnostic performance of ARFI elastography in diagnosis of AUS/FLUS nodules was prospectively evaluated and the possible predictors for malignancy were analyzed.

Basic characteristics, US, CSE, and ARFI elastography
In univariate analysis, nodule size was significantly associated with malignancy in which malignant nodules were smaller than benign ones (9.2 ± 3.9 mm vs. 11.8 ± 6.1 mm; P = 0.024). Conversely, patient sex, age, nodule position and thyroid background did not achieve significant differences (all P > 0.05). As to conventional US, halo sign, echogenicity, nodule component, shape, height and width, margin, and calcification, were associated with malignancy (all P < 0.05) (Figure 2 and Figure 3). In the sub-analysis for nodules 5-10 mm, echogenicity, margin, height and width, and calcification had statistical significances (all P < 0.05). For nodules > 10 mm, nodule size, echogenicity, calcification and height and width had statistical differences (all P < 0.05) ( Table  1). The differences were significant between malignant and benign nodules for CSE score, ARFI imaging grade, SWS and SWS ratio (all P < 0.05) (Table 2, Figure 2 and

DISCUSSION
The malignancy risk of AUS/FLUS nodules is varying and uncertain, therefore the recommended treatment of AUS/FLUS nodules is usually diagnostic thyroid lobectomy or repeat FNAC [26,27]. However, most of these nodules are benign at final pathological examinations after surgery or remain as AUS/FLUS nodules on repeat FNAC [28]. Risk stratification of these nodules may reduce unnecessary invasive procedures or avoid possible complications.
In the present study, AUS/FLUS nodules were found in 11.8% nodules after FNAC and its malignancy rate was 36.9%. Clinical features such as larger nodule size, male and age > 40 years were reported to increase the probability of malignancy for AUS/FLUS nodules [2,13,29]. However, Gweon et al. [4] discovered that sex, nodule size and age were not associated with malignancy. In the current study, malignant nodules were statistically smaller, which was related to the high proportion of microcarcinomas. Many other reports had emphasized the importance of US features in evaluating the AUS/FLUS nodules [2,12,13,18,30]. Mendez et al. [30] suggested that hypoechogenicity, irregular margin, microcalcification, and taller-than-wide shape were significantly associated with malignancy, whereas Samir et al. [31] reported that no B-mode US or Doppler characteristics displayed significant differences between benign and malignant indeterminate nodules. Our results found that only marked hypoechogenicity on conventional US was independent factor for malignancy. The differences might be due to different sample size of the AUS/FLUS nodules. In addition, it was reported that specimens diagnosed as AUS/FLUS were associated with the highest discordance rates among different centers [9]. Also, relatively high intra-observer variability in this difficult diagnostic category was documented [9].
Recent studies focused on the ability of US elastography for differentiation between benign and SWS of "2.85" m/s are assigned. F, G. FNA cytology (haematoxylin-eosin stain, original magnification, ×200 (F), ×400 (G)) shows part follicle cells with nuclear atypia and the nodule is diagnosed as AUS/FLUS. H. Histologic specimen (hematoxylin-eosin stain; original magnification, ×50) shows that this thyroid nodule is finally confirmed to be a papillary mirocarcinoma. malignant AUS/FLUS nodules, whereas the results were inconsistent [16,20,32]. In the current study, the best cut-off values for CSE score, ARFI imaging grade, SWS and SWS ratio were score 3, grade 4, 2.94 m/s and 1.09, respectively. The sensitivity, specificity and Az for CSE score ≥ 3 were 84.2%, 47.7% and 0.660 (95% CIs: 0.560 -0.750), respectively and CSE was failed to be identified as a predictor. For point SWS measurement including SWS and SWS ratio, the sensitivities, specificities and Azs were 52.6% -72.3%, 70.8% -92.3% and 0.725 -0.735.  In another study, Samir et al [31] indicated that SWS imaging may be a useful tool in preoperative malignancy risk assessment of follicular-patterned thyroid nodules, with a cut-off median value of 22.30 kPa for Young modulus, which had a higher sensitivity (82%), specificity (88%) and Az (0.81) than our results [31]. The underlying reason is that our findings are specific to particular patient cohorts with AUS/FLUS results, while the previous study focused on the follicular-patterned thyroid nodules. In addition, in the present study ARFI elastography was performed in longitudinal plane to avoid possible influencing factors such as pulsation of carotid artery and trachea, while SWS imaging in Samir's study were obtained in the transverse plane as optional plane [31]. Woo et al. [33] thought the SWS measured with ARFI and SWS imaging may have systemic differences, which may complicate the direct comparison between them. ARFI imaging belongs to strain imaging in nature, which reduces the interference of man-made factors and improves the reproducibility of the operation. Previous   studies confirmed the usefulness of ARFI imaging in diagnosing thyroid nodules and focal liver lesions [34][35][36]. In the current study, ARFI imaging was the most significant independent predictor for malignancy by multivariate analysis. ARFI imaging grade (Az: 0.861) had higher diagnosis performance compared other single risk features including point SWS measurement (Azs: 0.606 -0.744). The result might be ascribed to the fact that ARFI imaging reflects the stiffness of the entire target nodule, whereas point SWS measurement reveals tissue elasticity in selected region in the lesion. In addition, it was difficult to avoid the microcalcification and cystic areas when the microcalcifications were diffuse or the cystic areas were small and indistinguishable, which might lead to an inaccurate SWS measurement. Therefore, ARFI imaging may play more promising role in clinical practice. However, ARFI imaging also could be confounded by several factors, including inflammation and calcification, both of which are possible to increase estimated tissue stiffness [37,38]. Sporea et al [39] reported combining two elastographic methods could obtain a high specificity (93.3%) and PPV (96.8%) for predicting significant liver fibrosis. Therefore we also tried to use the combined elastography feature to evaluate the Bethesda category III nodules. As combination of ARFI imaging with point SWS measurement (SWS or SWS ratio), both the specificity and PPV significantly increased. The increase of specificity and PPV is meaningful, which indicates combining ARFI technique is helpful to make definite diagnosis for both benign and malignant nodules. However, the value of elastography in evaluating thyroid nodules remains to be determined. Russ [40] speculated about the classification of thyroid carcinomas into two categories. Irregular infiltrative carcinomas harbor a fibrous component with the low elasticity and elastography can be applied to detect this. Non-infiltrative carcinomas with a regular shape and borders have high elasticity, so elastography contribute little to their evaluation.
There were several limitations in the study. Only patients who underwent surgery were enrolled, which might lead to selection bias. However, at the current stage only pathological examination can be used as the reference standard. Repeat FNAC or follow-up is inadequate to exclude or confirm malignancy for Bethesda category III nodules. Furthermore, 36.9% of the nodules in this series were malignant at surgery. It was beyond the recommended malignancy rate. It is possible that the highrisk Bethesda III nodules may be more likely to be triaged to surgery and the reproducibility of interpreting AUS/ FLUS is also limited [9]. In literatures, the malignancy risk of Bethesda III nodules was varying from 5% to 48% [9,[12][13][14], which indicates that the Bethesda category should be independently defined at each center to guide clinicians for risk estimation. On the other hand, the current study involved some nodules that were less than 10 mm in diameter. According to the 2015 American Thyroid Association (ATA) guideline, those nodules < 10 mm require evaluation because of associated lymphadenopathy, suspicious US findings, location close to recurrent laryngeal nerve or trachea, or other high-risk clinical factors such as a family history of thyroid cancer or a childhood history of head and neck irradiation. A small percentage of PTCs < 10 mm present with clinically significant regional or distant metastases and signs of progression during followup [9]. For point SWS measurement in evaluating nodules smaller than 10mm, it is possible that the peripheral thyroid parenchyma is also included in the ROI box. In addition, the case number was relatively small, thus future studies with large case series are needed. Finally, it was a singlecenter experience and future multi-center studies are mandatory.
In summary, the present study demonstrates that ARFI elastography is a promising tool for preoperative malignancy risk stratification of patients with AUS/FLUS nodules. Specifically, combination of ARFI imaging with point SWS measurement could significantly improve specificity and PPV. ARFI elastography may provide an www.impactjournals.com/oncotarget easy and highly efficient way to help clinician to make correct decision for Bethesda category III nodules with regard to subsequent treatment planning.

MATERIALS AND METHODS
This study was approved by the Ethics Committee of the Shanghai Tenth People's Hospital and informed consent was obtained from all the patients. The study was performed in accordance with relevant guidelines and regulations.

Study population
From June 2013 to August 2015, we prospectively examined 4650 consecutive patients with 5260 thyroid nodules with conventional US, CSE, and ARFI elastography in the tertiary hospital. Of them, a total of 3300 consecutive patients with 3940 thyroid nodules were subject to USguided FNAC to rule out malignancy.

Conventional US, CSE and ARFI elastography examinations
All the examinations were performed with the same S2000 US scanner (Siemens Medical Solutions, Mountain View, Calif, USA) with the 7-17 MHz and/or 4-9 MHz linear transducer for conventional US and the 4-9 MHz linear transducer for elastography and were performed by 1 of 2 board-certified radiologists with more than 9 years' experience in thyroid US and 5 years of experience in thyroid elastography.
The patients were scanned in supine position with dorsal flexion of the head. Conventional transverse and longitudinal US images were obtained for each target nodule firstly. To obtain optimal images, the target nodule was placed at the center of the screen and the machine settings were constantly adjusted. The US features were evaluated and recorded. The precise location of nodule within the thyroid lobe was also recorded, such as upper, middle, or lower portion of the lobe. The distance from the thyroid capsule, carotid artery or trachea was also recorded, which facilitated correlation between US and pathology results for heterogeneous glands containing multiple coalescent nodules. Afterwards, CSE and ARFI elastography were performed in the longitudinal direction of the thyroid nodule to avoid possible influencing factors such as pulsation of carotid artery. The sampling box for CSE and ARFI imaging were selected to contain the target nodule and sufficient surrounding thyroid tissue. CSE was performed with a light pressure, with the quality indicator value above 60 to ensure sufficient quality image. ARFI elastography was thereafter initiated, with the patient holding the breath for a few seconds. ARFI involves ARFI imaging (i.e. virtual touch tissue imaging [VTI]) and point SWS measurement (i.e. virtual touch tissue quantification [VTQ]) for targeting an anatomic region to interrogate the elastic properties. Tissue within the sampling box or shear wave (SW) ROI is mechanically excited by using short-duration acoustic pulses to generate localized displacements, which results in the propagation of transverse shear waves. ARFI imaging shows the elasticity of tissue (i.e. the longitudinal displacement) with grayscale image in which the brightness means decreased tissue stiffness whereas the darkness indicates increased tissue stiffness. Point SWS measurement could assess the tissue elasticity quantitatively. The SWS is obtained by scaling the time to peak displacement at each lateral location and is shown on the screen automatically. The SW ROI size is fixed at 6 mm × 5 mm. The basic principles for SW ROI selection are as follows: (1) The SW ROI is placed on the solid portion of the nodule; (2) The calcified or cystic portions are avoided. (3) The SW ROI is placed on the peripheral portion of the nodule to avoid possible necrotic tissue. (4) Adjacent thyroid tissue is not included. The SWS measurement was repeated for 7 times without movement of the transducer. Afterward, the SW ROI was moved to the relatively homogeneous surrounding thyroid tissue (usually more than 5 mm from the nodule) at the same depth and the surrounding shear wave speeds (SSWSs) were also repeatedly measured 7 times at the same site.

Image interpretation
All images of US, CSE and ARFI imaging were scored and recorded independently by two experienced investigators with more than 10 years of experience in thyroid US and 5 years of experience in thyroid elastography, who were blind to the patients' identities and pathological diagnoses. All investigators were trained before analyses with the same standard. A training process was carried out in 30 extra patients before the study until high observer consistency (Kappa values > 0.6) was obtained. When discordance appeared for the evaluation between the two investigators, another senior investigator with more than 20 years of experience in thyroid US and 7 years of experience in thyroid elastography reviewed the images and made the final decision. www.impactjournals.com/oncotarget On gray-scale US, the target nodule was evaluated for size (largest diameter, subgrouped as 5-10 mm or > 10 mm), position (left, isthmus or right lobe), thyroid background (homogenous or coarse echogenicity background), halo sign (present or absent), capsule contact (no contact, less than 25%, 26% -50% or more than 50% of the perimeter in contact with the capsule), echogenicity (hyper-, iso-, hypo-, marked hypo-, or mixed echogenicity, compared with surrounding thyroid tissue or nearby strap muscle), internal nodule component (four categories: completely solid, cystic portion ≤ 25%, cystic portion 26% -50%, or cystic portion > 50%), shape (regular or irregular), margin (well or poorly defined), calcification (microcalcification ≤ 1.5 mm in diameter, macrocalcification > 1.5 mm in diameter with acoustic shadow, or no calcification), height and width (taller than wide or wider than tall). Color Doppler US patterns were defined as absence of visible blood flow (type I), peripheral blood flow and absent or slight internal blood flow (type II), and rich internal blood flow and absent or slight peripheral blood flow (type III) [24].
The CSE score was displayed that was based on a color scale ranging from red color (soft component) to blue color (hard component) over the B-mode image. The CSE was classified into four patterns according to Asteria et al. [19]: score 1, prevalence of red and green color in nodule; score 2, predominant green with few blue areas/spots in nodule ; score 3, predominant blue with few green areas/ spots in nodule ; score 4, the nodule is displayed entirely in blue.
The images of ARFI imaging were thereafter divided into grade I to grade VI as following [23]: Grade I, predominantly white for the whole nodule; Grade II, predominantly white with few black portions; Grade III, equal black and white portions; Grade IV, predominantly black with a few white spots; Grade V, almost completely black, and Grade VI, entirely black. Higher grades mean stiffer tissue.
For point SWS measurement, the investigators just read the SWS measurement results in the static images retrieved from hard disk. The highest and the lowest values of SWS and SSWS were eliminated and the mean of the rest 5 measurements was calculated. SWS ratio was figured out by mean intra-nodular and extra-nodular SWS values. After excluding other factors such as movement or breath of patient, inappropriate ROI placement or improper precompression, the measurement results of "X.XX m/s" (either extremely soft or extremely hard) were displaced by 0 m/s (the cystic portion) or 8.4 m/s (the solid portion) as suggested by previous studies [22,25].

US-guided FNAC procedure and cytopathological classification
US-guided FNAC was performed using a 22-gauge PTC needle (Hakko, Nagano, Japan) under local anesthesia.
Three to five smears were obtained for each target nodule, which were collected in 95% alcohol and were submitted for haematoxylin-eosin stain. All cases were reported using a six-tiered diagnostic system with BSRTC as follows: (1) nondiagnostic or unsatisfactory (Bethesda category I), (2) benign (Bethesda category II), (3) AUS/ FLUS (Bethesda category III), (4) follicular neoplasm or suspicious for follicular neoplasm (Bethesda category IV), (5) suspicious for malignancy (Bethesda category V), and (6) malignant (Bethesda category VI) [6]. All FNAC were reported by one of three cytopathologists with more than 3 years of experience in thyroid cytopathology. When FNAC report was Bethesda category III, another 2 senior cytopathologists, with more than 20 years of experience in thyroid pathology and 6 years of experience in thyroid cytopathology, reviewed the slides and made the final decision with consensus. Each patient was precisely positioned for target nodule in US before surgery. AUS/ FLUS nodules were identified by correlating cytology reports with US and pathology reports.

Statistical analysis
All the statistical analyses were performed using the SPSS software (version 19.0, Chicago, IL, USA) and MedCalc software (version 15.2, Mariakerke, Belgium). Continuous variables were compared by independent twosample t test, while Chi-square test or Fisher's exact test was used to analyze the categorical variables. A step-wise multivariate logistic regression analysis was performed to find the independent predictors for malignancy. Receiver operating characteristic (ROC) curve analysis was performed to assess the diagnostic performance. The comparisons of the area under the ROC curves (Azs) were performed by Z test. The optimal cut-off value for each variable, as well as the corresponding sensitivity and specificity, was obtained from ROC analysis when Youden index was maximum. The PPV, negative predictive value (NPV) and accuracy were calculated by the diagnostic test 2×2 contingency tables. Sub-analysis was performed according to nodule size (5-10 mm and > 10 mm). About the combining methods, "Or" is defined as either of two methods diagnosed the nodule as malignant, which is consider as malignant; "And" is defined as both methods diagnosed the nodule as malignant at the same time, which is considered as malignant. A two-tailed P value < 0.05 was considered to be statistically significant. Confidence intervals (CIs) were recorded as two-sided exact binomial 95% CIs.