A new computational model for human thyroid cancer enhances the preoperative diagnostic efficacy.

Considering the high rate of missed diagnosis and delayed treatments for thyroid cancer, an effective systematic model for the differential diagnosis is highly needed. Thus we analyzed the data on the clinicopathological characteristics, routine laboratory tests and imaging examinations in a cohort of 13,980 patients with thyroid cancer to establish a new diagnostic model for differentiating thyroid cancer in clinical practice. Here, we randomly selected two-thirds of the population to develop the thyroid malignancy risk scoring system (TMRS) for preoperative differentiation between thyroid cancer and benignant thyroid diseases, and then validated its differential diagnostic power in the rest one-third population. The 18 predictors finally enrolled in the TMRS included male gender, clinical manifestations (fever, neck sore, neck lump, palpitations or sweating), laboratory findings (TSH>1.56mIU/L, FT3>5.85pmol/L, TPOAb>14.97IU/ml, TgAb>48.00IU/ml, Tg>34.59μg/L, Ct>64.00ng/L, and CEA>0.41μg/L), and ultrasound features (tumor number≤ 23mm, site, size, echo texture, margins, and shape of neck lymphnodes). The TMRS is validated to be well-calibrated (P = 0.437) and excellently discriminated (AUC = 0.93, 95% CI [0.92, 0.94]), with an accuracy of 83.2%, a sensitivity of 89.3%, a specificity of 81.5%, positive and negative predictive values of 56.8% and 96.6%, positive and negative likelihood ratios of 4.83 and 0.13 in the development cohort, respectively. The TMRS highlights that this differential diagnostic system could help provide accurate preoperative risk stratification for thyroid cancer, and avoid unnecessary over- and under-treatment for such patients.


INTRODUCTION
Thyroid neoplasm is the one of the commonest endocrine tumors worldwide with an overall malignant risk of 5~10%, and is mostly present in thyroid nodules with different pathological forms [1]. Malignant types include papillary thyroid carcinoma (PTC, 88.0%), follicular thyroid carcinoma (FTC, 5.5%), Hűrthle cell (2.3%), medullary thyroid carcinoma (MTC, 1.8%), and anaplastic thyroid carcinoma (ATC, 0.9%) [2][3]. Researchers have observed a rapid global rise in thyroid cancer incidence over the past few decades [4][5][6]. In developed countries, the newly diagnosed patients with thyroid cancer gradually increased from 4.9 per 100,000 in 1975 to 12.0 per 100,000 in 2011 (9.1 per 100,000 females and 2.9 per 100,000 males, respectively) [7]. It is also observed that the overwhelming rise in the incidence of thyroid cancer parallels the increasing detection rate of malignant thyroid nodules [8][9]. However, the mortality of thyroid cancer remains the same [10][11][12]. Therefore, some researchers propose that excessive attention to thyroid cancer may give rise to the misdiagnosis and overtreatment of thyroid cancer, which discourages the effort on the early detection [13][14].
There are various pathological types of thyroid cancers with large differences in prognosis. As National Comprehensive Cancer Network (NCCN) revealed in 2014, ATC is almost uniformly lethal, but most deaths from thyroid carcinoma occur in patients with differentiated carcinoma (e.g., PTC, FTC, or Hűrthle), which accounts for over 90% of all cases with thyroid malignancy [5]. Thus, when properly treated, most patients, especially those cases with differentiated types, can be cured or at least their life expectancy could be extended with a 5-year survival rate of 97.8% [15]. Obviously, early detection and accurate differential diagnosis are critical.
Individualized or appropriate treatment depends on the nature of the lesion. The current focus of diagnosis is to distinguish malignancies from benign growths. Fineneedle aspiration biopsy (FNAB) is the best first-line procedure for differential diagnosis of a thyroid nodule, and pathological examination is considered as the gold standard. However, up to one-third of those FNAB results are inconclusive [16][17][18]. Sonography is another option for screening unknown thyroid nodule and lymph node structure, but this procedure has a relatively low capacity for differential diagnosis [19]. Conventional diagnostic methods including sonography and FNAB cannot provide definitive diagnoses in many cases [20][21][22][23]. Therefore, there is an urgent need for the selection of highly accurate tests and differential diagnostic approaches to identify thyroid malignancies.
In the present study, we used a different computational approach to distinguish the thyroid cancer. We collected and analyzed the clinical information of nearly 14,000 patients and established a database including demographic characteristics, preoperative clinical manifestations, serological results, ultrasound results, and pathologic examination. The preoperative predictors for nodular nature were also investigated. In addition, we established and validated a risk prediction model named thyroid malignancy risk scoring system (TMRS) for the differential preoperative diagnosis for thyroid cancers (Figure 1). Our results also showed that the TMRS was a highly reliable and discriminative panel to screen predictors for thyroid cancer, and could also provide a new means of differentiating this common type of endocrine cancer.

Characterization of the patients with thyroid tumor
A total of 13,980 thyroid tumor patients with complete medical record on the preoperative examination  and thyroid surgery were included.
The mean age in the study (n = 13,980) was 48.28 years, 77.30% were women, and 2,966 (21.22%) patients were diagnosed with thyroid carcinoma after surgery. All the participants were randomly divided into development (n = 9,195) and validation (n = 4,785) cohorts. In the development and validation cohort, the mean age was 48.75 and 48.32 years, female ratio was 77.35% and 77.26%, and 1,967 (21.39%) and 999 (20.88%) patients were diagnosed as thyroid cancer, respectively. No significant differences were found on characteristics between the two cohorts. Table 1 illustrates the detailed baseline characteristics of the patients.

A panel of 28 candidate predictors for a differential diagnostic model of thyroid malignancy
Twenty-eight of the 46 candidate predictors met the selection criteria (P < 0.10) for both prevalence and incidence of thyroid malignancies. The 28 selected candidate predictors were all significantly associated with diagnosis of thyroid malignancy in multiple logistic regression analysis (P < 0.01, Table 2), and were included in the second selection step. The receiver operating characteristic (ROC) curve showed a discrimination of the area under the ROC curve (AUC) was 0.997 ( Figure  2A), demonstrating that the 28 predictors had excellent diagnostic performance. But the Hosmer-Lemeshow χ 2 test showed a calibration of 20.639 (df = 8, P = 0.008), indicating a significant difference between the actual and Oncotarget 28467 www.impactjournals.com/oncotarget predicted malignancy diagnoses.

Selection of the prediction TMRS model with 18 differential diagnostic predictors
In order to improve the discrimination of the previous logistic regression model and make it convenient to use, we reduced the number of predictors in our model as much as possible, without compromising the diagnostic accuracy. Nine of the 28 candidate predictors were excluded for any of the following reasons: (1) P value > 0.05, (2) wide variation of 95% CI value of OR, or (3) difficult to use or unclear definition. We also converted all continuous variables into binary variables using the cutoff of their median values (Table 3 a ). Some candidate predictors that did not meet the criteria but were associated with thyroid malignancy were integrated into a single predictor (Table 3 b , e.g., left lobe, right lobe and isthmus were merged into one lobe). These procedures were repeated in the second selection and logistic regression analysis to recreate and adjust the new prediction model.
The new model used 18 selected candidate predictors, which were all significantly associated with differential diagnosis of thyroid malignancy in multiple logistic regression analysis. The prediction model included male gender, fever, neck sore, neck lump, palpitations or sweating, laboratory findings (TSH, FT 3 , TPOAb, TgAb, Tg, Ct, CEA), and sonographic appearances (tumor number, site, size, margin, nodular echo texture, and shape of cervical lymph nodes). This model was strongly predictive of an individual's thyroid malignancy risk, with   Figure  2B) and good calibration of 7.961 (df = 8, P = 0.437). Table 3 showed the β coefficients, odds ratio (OR), and 95% CI for the final model. Gender was used as the standard reference for assigning points for the TMRS, with the β coefficient for male gender (0.197 point) equaling one point. The points for all predictors were relative to this β coefficient (Table 3). Finally, we established this multivariable model of the TMRS and gave the involved predictors certain values (Table 3 c ), which was a scoring system with a scale from 0 to 99.

Internal and external validation of the TMRS
As shown in Figure 2C, the substitution of the β coefficients with points in the TMRS only slightly decreased the AUC to 0.928. Figure 3A shows the risk of malignancy, which is reported for each summed score. The malignancy risk increased linearly with the scores.
The detailed data were shown in Supplementary Table 1.
In the external validation cohort, all predictors in the TMRS had high differential diagnostic power in the validation cohort (P < 0.001, Table 4). In the validation cohort, the Hosmer-Lemeshow χ 2 test showed a stable calibration of 5.047 (df = 8, P = 0.753), and the AUC of 0.931 ( Figure 2D). The TMRS had similar discriminations in the two cohorts (P = 0.622, Figure 2C vs. Figure 2D). Figure 3B showed the risk of malignancy, which was reported for each summed score in the validation cohort (The detailed data were shown in Supplementary  In the two cohorts, summed scores that were less than 65 (i.e., 0~64) occurred more frequently ( Figure 3A and 3B). Consequently, risk estimates were more stable for these highest scores. Members with scores equal to and over 65 were considered a high risk population, with a higher malignancy detection rate (59.4% and 60.2%), while the others (0~64) were considered a low-risk population with a lower malignancy detection rate (4.0% and 3.7%). With the cutoff value of 65 points, the results of the χ 2 test showed that, compared with patients with the low summed scores (0~64), those with high summed scores (65~99) were 15~17 times more likely to be diagnosed with thyroid malignancy (P < 0.001, Table 5). The accuracy evaluations of the TMRS in development and validation cohorts were listed in Table 5. The sensitivity (SEN), specificity (SPE), accuracy, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), and negative likelihood ratio (NLR) of the TMRS in the development cohort were 87.0%, 83.5%, 84.5%, 59.4%, 96.0%, 5.27, and 0.16; And in the validation cohort were 87.5%, 84.8%, 85.3%, 60.2%, 96.3%, 5.76, and 0.15.

DISCUSSION
Among human malignancies, thyroid cancer is rare, accounting for approximately 1% of all cancers. However, it is the commonest endocrine malignancy, comprising over 90% of all endocrine cancers [24]. Early accurate detection of thyroid cancer and appropriate treatment for this disease are very important in clinical practice.
Interestingly, the Republic of Korea experienced a fifteen-fold increase in the rate of diagnosed thyroid cancer from 1993 to 2011, although the thyroid cancer mortality rate remained stable [14]. Some believe that excessive attention to thyroid cancer gives rise to overtreatment, while the other researchers suggest that the problem is not actually thyroid malignancy, but the over-diagnosis is attributable to over-screening for this type of cancer [13][14]. Welch in particular suggests that attention from the popular mass media encourages over-diagnosis and inappropriately aggressive treatment [25]. The current issue is how to best weigh the benefits of diagnosis and treatment against their harms [26].
To identify the optimal strategy for well standardized differential diagnosis for thyroid carcinoma, we evaluated the clinical significance of the specific characteristics of thyroid nodules. The TMRS presented in this study is a comprehensive analysis of an individual's absolute risk of thyroid malignancy based on a panel of predictors including thyroid-related examinations and other clinical information. The diagnostic accuracy of the TMRS was similar in the development and validation cohorts ( Figure 2C vs. Figure 2D, Table 5). The TMRS stratifies individuals from scores of 0 to 99. Using the cut-off score of 65 (scores ≥65), the malignancy risk population is shown with a nearly 17 times difference in malignancy risk between the lower (0~64) and the higher summed scores (65~99). It performs well for all age categories and genders. For the higher risk population, they should receive further targeted FNAB with histological diagnosis or thyroid surgery. Individuals with scores < 65 appear to have lower risk of malignancy.
All markers included in the TMRS are easy to access and reading-friendly. We selected the most direct, simplest and objective measures, including demographic characteristics, clinical symptoms and serological examinations. Meanwhile, we also introduced thyroid sonography because it is widely used, but excluded some potentially confounding subjective evaluations like internal architecture, echo pattern, calcification pattern, A/T, posterior echo, neck lymph node structure, and intranodular and peripheral blood-flow signals. In contrast, some commonly used predictors were not included in the final scoring system because of their low differential diagnostic value. In this study, the identification of new predictors specific to thyroid malignancies focuses on the importance of creating a current risk score for patients with suspected thyroid nodules. More importantly, no additional technical-intensive expensive tests or invasive examinations were required, because this TMRS was Laboratory findings and tumor size were derived into binary variables by their median values; b In the characteristic of 'Neck lump', aggressive enlargement was merged into Yes, 'Tumor site' was divided into one or both lobes according to the locations of thyroid cancer in the patient, 'Echo texture' was combined into two values, No or low and Equal or high; c risk scores of each predictor were calculated by the β coefficient that they matched (i.e., predictor 'male gender' equals 1 point). www.impactjournals.com/oncotarget mainly based on physical examinations, ultrasound imaging, FNA or other histological biopsy. A successful physical examination reveals the clinical manifestation of thyroid growths, and is a promising initiating step as effective screening method for thyroid cancer in primary care settings [1], [27]. Sonography is regarded as another optimal thyroid screening method, with lower costs and easier operation than other imaging exams [9], [19]. FNAB is the most reliable and important means for thyroid diseases worldwide, providing specimen for pathological diagnosis as the gold standard, which had been widely acknowledged by the public [28].  Our TMRS model is established by analyzing and summarizing the detailed data on preoperative clinical information including socio-demographics, clinical manifestations, serum findings and ultrasound features, and postoperative pathological diagnosis in a large diverse cohort of patients with thyroid cancers. We had the opportunity to validate the detection efficacy of this model using a large external cohort of patients, and the results exhibited a similar diagnostic accuracy of the TMRS to that in the development cohort.
Besides, the application of the TMRS may help the clinicians administrate the targeted invasive examinations or further operations for the patients with appropriate indications, which could significantly decrease the chance of over-treatment together with additional mental and economic costs. TMRS score before FNAB can prevent 66.3% of unnecessary procedure-related trauma in contrast with that by using FNAB alone. Thirty-one percentages of the population possessed high risk for malignancy. And in the other two-thirds population with lower malignant risk, about 96.5% were shown to have benign nodules after surgical treatments. The TMRS permits us to prevent as much as 84.1% of patients with benign growths from receiving excessive diagnostic procedures or treatments. On the opposite, 3.9% of patients in the low-risk cohort (TMRS < 65) had false-negative malignant nodules. A cost-benefit analysis for these patients will be conducted soon to evaluate their quality of life and economic burden.
Compared with previous studies including the thyroid nodule ultrasound forecasting model set up by Domínguez [29][30][31], the TMRS is the only risk score system that is specifically designed for the differential diagnosis of thyroid cancer based on comprehensive and common indicators with relatively high sensitivity and specificity. The TMRS may be useful in the selection of malignant high risk patients for early intervention especially individualized therapy in the future. Clinicians could apply this new model and scoring system to quantify the risk for malignancy, which might guide their decisions in clinical strategy and follow-up screening for such patients with thyroid cancer.

Patients and study design
We identified patients with suspected thyroid tumor who were diagnosed and had surgery in Changzheng Hospital from June 1997 to December 2013, and collected their clinical information. Overall, we enrolled 13,980 cases meeting inclusion criteria. Patients who had received radiation therapy were excluded from the analysis, since radiation therapy usually interferes with the laboratory results and physical examinations. A total of 10,934 subjects were excluded from the study for any of the following reasons: (1) incomplete or missing medical records (n = 7,012); (2) treated thyroid tumor including local injection (n = 1,103), radiation therapy (n = 955), or

Grouping and definition
Among all patients in the database, two-thirds (n = 9,195) were randomly selected for the development of the prediction model. The rest (n = 4,785) were used as the validation cohort. All cases were pathologically confirmed by thyroidectomy. For statistical purposes, we categorized patients with 'Benignancy' into thyroid Adenoma (TA), simple nodular goiter (SNG), chronic lymphocytic thyroiditis (CLT), painless thyroiditis (PPT), toxic nodular goiter (TNG), and thyroid cyst (TC). Patients with 'Malignancy' were categorized into PTC, FTC, MTC, ATC, uncertain malignant potential (UMP), thyroid lymphoma and other metastatic tumor.
We retrospectively reviewed the sonographic features of all cases. Real-time sonography of thyroids was performed with Acuson Sequoia and 128XP sonographic scanners (Siemens Medical Solutions, Mountain View, CA), equipped with commercially available 7-MHz to 14-MHz linear probes. Color Doppler imaging and power Doppler imaging were performed with the linear array transducers. Each case was evaluated for 16 characteristics of sonography: tumor number, tumor site, tumor size, aspect ratio, calcification pattern, internal architecture, echo texture, echo pattern, margin, halo, posterior echo, shape and structure of neck lymph nodes (LN), intranodular and peripheral blood-flow signals, and vascularity. In each case, 'tumor number' was categorized as unifocal or multifocal, and 'tumor site' was categorized as left lobe, right lobe, isthmus, or both lobes. 'Tumor size' was recorded by taking the maximum value of the three diameters of anteroposterior, transverse, and vertical sections. 'Aspect ratio' (the anteroposterior and transverse diameter ratio, A/T) was noted as ≤1 or > 1. 'Calcification pattern' was documented in accordance with persistence and size, for instance, null, < 1mm (microcalcification), 1-2mm, or > 2mm. 'Internal architecture' was defined as solid (cystic components > 75% of the lesion), predominantly cystic ( > 75%), or solid with cystic elements. The 'echo texture' of each lesion was classified as anechoic, hypoechoic, hyperechoic, or isoechoic in comparison with the background thyroid tissue. 'Echo pattern' was divided into homogeneous or heterogeneous. 'Margins' of lesions were categorized as well-defined when lesions had clear demarcation with normal thyroid surrounding over 50% of a nodule, or illdefined when > 50% of the nodular border was demarcated unclearly. The presence of a hypoechoic 'halo' around each lesion was also recorded as presence (complete) or absence (incomplete). 'Posterior echo' was grouped into normal, attenuation, or enhancement. Furthermore, the overall 'shape of neck lymph node' was classified as either smooth and round or irregular or enlarged, and 'structure of neck lymph node' was classified as clear or unclear. The predominant pattern of blood flow was classified as 'intranodular blood flow' (intrinsic to the lesion) and 'peripheral blood flow'. Blood flow seen on color Doppler within a lesion was defined as 'intranodular', while flow surrounding the immediate margins of the lesion was considered 'peripheral'. These categories were further classified as hypovascular or hypervascular with respect to lateral thyroid tissue. Additionally, 'vascularity' was also defined as diffuse, striped, or linear.

Statistical analysis
We selected predictors for the prediction scoring system in three sequential steps ( Figure 1). The 46 candidate predictors available in the registry and records were identified from the results of previous epidemiological and etiological studies. These predictors were evaluated against two main criteria: (1) the predictor must be significantly associated with thyroid malignancy risk in univariate analyses (continuous variables with Student's t test and Mann-Whitney U test, categorical variables with χ 2 test, P < 0.10); and (2) the remaining candidate predictors were evaluated in multivariable logistic regression models with the OR and its 95% CI (P < 0.05) [24]. The continuous variables in the initial model were converted into categorical variables and then repeated.
Discrimination and calibration were used to assess the predictive accuracy of the models. Discrimination refers to the model's ability to distinguish between