HEAD AND NECK
Accuracy of thyroid imaging reporting and data system category
4 or 5 for diagnosing malignancy: a systematic review
and meta-analysis
Dong Hwan Kim
1
& Sae Rom Chung
2
& Sang Hyun Choi
2
& Kyung Won Kim
2
Received: 25 February 2020 / Revised: 1 April 2020 / Accepted: 8 April 2020
#
European Society of Radiology 2020
Abstract
Objectives To determine the accuracies of the American College of Radiology (ACR)thyroid imaging reporting and data
systems (TIRADS), Korean (K)-TIRADS, and European (EU)-TIRADS for diagnosing malignancy in thyroid nodules.
Methods Original studies reporting the diagnostic accuracy of TIRADS for determining malignancy on ultrasound were iden-
tified in MEDLINE and EMBASE up to June 23, 2019. The meta-analytic summary sensitivity and specificity were obtained for
TIRADS category 5 (TR-5) and category 4 or 5 (TR-4/5), using a bivariate random effects model. To explore study heterogeneity,
meta-regression analyses were performed.
Results Of the 34 eligible articles (37,585 nodules), 25 used ACR-TIRADS, 12 used K-TIRADS, and seven used EU-TIRADS. For
TR-5, the meta-analytic sensitivity was highest for EU-TIRADS (78% [95% confidence interval, 6488%]), followed by ACR-
TIRADS (70% [6179%]) and K-TIRADS (64% [5870%]), although the differences were not significant. K-TIRADS showed the
highest meta-analytic specificity (93% [9195 %]), which was similar to ACR-TIRADS (89% [8592%]) and EU-TIRADS (89% [77
95%]). For TR-4/5, all three TIRADS systems had sensitivities higher than 90%. K-TIRADS had the highest specificity (61% [50
72%]), followed by ACR-TIRADS (49% [4356%]) and EU-TIRADS (48% [3562%]), although the differences were not significant.
Considerable threshold effects were noted with ACR- and K-TIRADS (p 0.01), with subject enrollment, country of origin, experi-
ence level of reviewer, number of patients, and clarity of blinding in review being the main causes of heterogeneity (p 0.05).
Conclusions There was no significant difference among these three international TIRADS, but the trend toward higher sensitivity
with EU-TIRADS and higher specificity with K-TIRADS.
Key Points
For TIRADS category 5, the meta-analytic sensitivity was highest for the EU-TIRADS, followed by the ACR-TIRADS and the K-
TIRADS, although the differences were not significant.
For TIRADS category 5, K-TIRADS showed the highest meta-analytic specificity, which was similar to ACR-TIRADS and EU-
TIRADS.
Considerable threshold effects were noted with ACR- and K-TIRADS, with subject enrollment, country of origin, experience
level of reviewer, number of patients, and clarity of blinding in review being the main causes of heterogeneity.
Keywords Thyroid neoplasms
.
Diagnostic imaging
.
Ultrasonography
.
Systematic review
.
Meta-analysis
Dong Hwan Kim and Sae Rom Chung contributed equally to this study as
co-first authors.
Electronic supplementary material The online version of this article
(https://doi.org/10.1007/s00330-020-06875-w) contains supplementary
material, which is available to authorized users.
* Sang Hyun Choi
edwardchoi83@gmail.com
1
Department of Radiology, Seoul St. Marys Hospital, College of
Medicine, The Catholic University of Korea, 222 Banpo-daero,
Seocho-gu, Seoul 06591, Republic of Korea
2
Department of Radiology and Research Institute of Radiology, Asan
Medical Center, University of Ulsan College of Medicine, 88
Olympic-Ro 43-Gil, Songpa-Gu, Seoul 05505, Republic of Korea
European Radiology
https://doi.org/10.1007/s00330-020-06875-w
Abbreviations
ACR American College of Radiology
CI Confidence interval
ETA European Thyroid Association
EU European
FNA Fine-needle aspiration
HSROC Hierarchical summary receiver operating
characteristic
KKorean
PRISMA Preferred Reporting Items for Systematic
Reviews and Meta-Analyses
QUADAS Quality Assessment of Di agnostic Accuracy
Studies
TIRADS Thyroid imaging reporting and data system
US Ultrasound
Introduction
Ultrasound (US) is the primary diagnostic tool for evaluating
thyroid nodules, which may be found in up to 68% of the
healthy population [1, 2]. Thyroid nodules can show specific
features on US that are consistently predictive of malignancy,
and these features are used as criteria to determine the need to
perform fine-needle aspiration (FNA) [2, 3]. However, no US
features can be used alone for the reliable discrimination of
malignancy from benign nodules [4, 5]. The main disadvan-
tages of US examinations are the relatively low specificity and
considerable inter-observer variability [6, 7]. To overcome
these limitations, much work is underway to develop an US-
based standardized risk stratification system.
A thyroid imaging reporting and data system (TIRADS)
was first introduced by Horvath et al [8] in 2009, to provide
a quantitative scoring system for the risk stratification of thy-
roid nodules. In 2016, the Korean (K)-TIRADS was proposed
by the Korean Society of Thyroid Radiology for the US diag-
nosis and management of thyroid nodu les on th e basis of
echogenicity and solidity [9]. In 2017, the American College
of Radiology (ACR) developed the ACR-TIRADS, which
sums the points of every US feature to produce a total score
that determines a nodules TIRADS level, i.e., TR-1 (benign)
to TR-5 (high suspicion of malignancy) [10]. Also in 2017,
the European Thyroid Asso ciation (ETA) developed the
European (EU)-TIRADS that consists of five categories based
on different patterns and US features [11].
A previous meta-analysis reported the diagnostic perfor-
mance of TIRADS, showing pooled sensitivity and specificity
of 79% and 71%, respectively [12]. However, since the first
meta-analysis in 2013, there have been lots o f studi es of
TIRADS and also the development of a slightly different ap-
proach to TIRADS represented by ACR-, K-, and EU-
TIRADS [1316]. In addition, the reported results are variable
(i.e., sensitivity of 1694% and specificity of 66100% for the
TR-5 category) [1720]. Therefore, a comprehensive compar-
ative review of all three systems is needed to see if there are
differences between the systems and to determine updated
sensitivity and specificity.
In this study, we aimed to perform meta-analysis to deter-
mine and compare the diagnostic performance of these three
international TIRADS for diagnosing thyroid malignancy on
US.
Materials and methods
This study followed the Preferred Reporting Items for
Systematic Reviews and Meta-Analyses (PRISMA) guide-
lines for conduct and reporting [21].
Literature search strategy
A search of PubMed MEDLINE and EMBASE databases was
conducted to find original research articles investigating the
diagnostic accuracy of TIRADS for the dichotomous diagno-
sis of thyroid malignancy on US. The search query was de-
veloped to furnish a sensitive literature search so as not to miss
relevant articles. The search terms used were Thyroid
Imag*[TW] AND Report*[TW] AND Data*[TW] AND
System*[TW]) OR (TI-RADS[TW] OR TIRADS[TW]
OR K-TIRADS[TW] OR KTIRADS[TW] OR ACR
TI-RADS[TW] OR EU TI-RADS[TW]. Among the vari-
ous TIRADS systems available, ACR-TIRADS, K-TIRADS,
and EU-TIRADS were chosen for this study, as they are wide-
ly used classification schemes and represent the USA, Asia,
and Europe, respectively. The literature search was updated
until June 23, 2019. The search was restricted to human sub-
jects and English-language studies. To expand the search, the
bibliographies of articles surviving the selection process were
screened for other potentially suitable articles.
Eligibility criteria
After removing duplicate articles, articles were reviewed with
respect to eligibility: (i) population, patients with thyroid nod-
ules; (ii) index test, gray-scale US; (iii) reference standard,
cytopathological diagnosis or US follow-up; (iv) outcomes,
sensitivity and specificity of TIRADS in the dichotomous di-
agnosis of malignancy; (v) study design, including observa-
tional studies (prospective or retrospective) and clinical trials.
The exclusion criteria were (i) animal studies, case reports,
review articles, scientific abstracts, and meta-analyses; (ii) ar-
ticles that were not within the field of interest of this study; (iii)
articles without sufficient details to construct a diagnostic two-
by-two table of the imaging results and reference standard
findings, or articles with inappropriate statistical methods for
diagnostic test accuracy; (iv) articles using TIRADS other
Eur Radiol
than ACR-, K-, or EU-TIRADS; (v) articles with overlapping
patient cohorts and data; and (vi) articles that analyzed less
than 100 thyroid nodules. Articles were first screened by their
titles and abstracts, and full-text reviews were then performed
after selecting those abstracts that were potentially eligible.
Both steps were performed by two independent reviewers
(D.H.K. and S.H.C. with 6 and 7 years of experience in the
thyroid imaging, respectively), who eliminated only those ar-
ticles that were clearly ineligible. Articles with any degree of
ambiguity or that generated differences in opinion between the
two independent reviewers were re-evaluated at a consensus
meeting, to which a third reviewer (S.R.C. with 9 years of
experience in the thyroid imaging) was invited.
Data extraction
The data were extracted onto a predefined data form, includ-
ing study characteristics, study population characteristics, le-
sion characteristics, TIRADS system, image review method,
method for obtaining the reference standards, and study out-
comes. The detail of data form was shown in the
Supplementary Method. The reviewers evaluated the use of
statistical methods for diagnostic test accuracy, and the exact
numbers for true-positive, true-negative, false-positive, and
false-negative results were extracted. When not reported ex-
plicitly, data were extracted manually from the text, tables, and
figures. Two reviewers independently performed data extrac-
tion, and all discrepancies were resolved at a consensus meet-
ing in the presence of a third reviewer.
Assessment of study quality
The Quality Assessment of Diagnostic Accuracy Studies
(QUADAS-2) criteria [22] were used to assess the quality of
the se lected articles. The QUADAS-2 tool assesses study
quality in four different domains, including patient selection,
index test, reference standard, and flow of patients through the
study, as well as timing of the index test and reference stan-
dard. The assessments were performed independently by two
reviewers, and all discrepancies were resolved by arriving at a
consensus between the two reviewers and a third reviewer.
Data synthesis and statistical analysis
Meta-analytic summary estimates of the diagnostic accuracy
of TIRADS Regarding the diagnostic accuracy of each
TIRADS, i.e., ACR-, K-, or EU-TIRADS, the sensitivity
and specificity of the two criteria, TR-5 and TR-4/5, were
calculated. When the diagnostic outcomes of more than one
TIRADS system were reported within a single study, the
meta-analytic summary sensitivity and specificity of each
TIRADS system were calculated, separating the data accord-
ing to the different TIRADS systems. Hierarchical modeling
methods were used to calculate the meta-analytic summary
values. The summary sensitivity a nd specificity and t heir
95% confidence intervals (CIs) were obtained using a bivari-
ate random effects model. Summary receiver operating char-
acteristic curves were obtained using a hierarchical summary
receiver operating characteristic (HSROC) model. In addition,
subgroup analysis was performed on prospective studies and
studies which defined only cytopathology result as a reference
standard.
The presence of heterogeneity among studies with respect
to sensitivity and specificity was assessed using Higgins I
2
statistics, with an I
2
> 50% being considered to indicate sub-
stantial heterogeneity. When heterogeneity was noted, the
presence of a threshold effect was analyzed by visual assess-
ment of the coupled forest plots of sensitivity and specificity,
as well as by calculating the Spearman correlation coefficient
between sensitivity and false-positive rate (1-specificity). A
correlation coefficient > 0.6 was considered to indicate a con-
siderable threshold effect.
Meta-regression analysis Meta-regression analysis was per-
formed to further explore the causes of study heterogeneity
by including covariates in a bivariate model. The following
covariates were considered: (i) subject enrollment (consecu-
tive versus selective); (ii) region (Asian versus western coun-
tries); (iii) image reviewer (single reviewer or not available
versus multiple reviewers); (iv) clarity of blinding review
(yes versus unclear); (v) experience level of the reviewers
(senior vs. junior or others); (vi) number of patients (< 200
versus 200); and (vii) number of thyroid nodules (< 1000
versus 1000).
Analysis of public ation bias Publication bias was assessed
visually using Deeks funnel plot, and its statistical signifi-
cance was tested using Deeks asymmetry test.
Stata version 15.0 (StataCorp LLC) was used for statistical
analysis, with p < 0.05 being considered statistically
significant.
Results
Literature search
A total of 225 articles were screened after removal of dupli-
cates (Fig. 1). Of these, 112 articles were excluded on the basis
of their titles and abstracts, and 79 articles were further ex-
cluded after a full-text review. Of the remaining 34 eligible
articles, which included a total of 37,585 thy roid nodules
(Table 1), 33 [14, 16 20, 2349]and32[1420, 2426,
2849] reported the accuracy of TR-5 and TR-4/5, respective-
ly (31 articles reported both) [14, 1620, 2426, 2849].
Eur Radiol
Characteristics of the included articles
The characteristics of the finally included articles are sum-
marized in Table 1. Of the 34 included articles, four were
prospective studies [15 17, 34]. Five articles selectively
enrolled study subjects w ith specific diagnoses or US fea-
tures detected during clinical practice, i.e., they only in-
cluded solid nodules on US, or Bethesda category III or
IV nodules on FNA [16, 29 , 33, 40, 44]. Eleven articles
originated from Western countries [14, 15, 18, 30, 3234,
36, 44, 46, 47]. Multiple reviewers performed the image
review in 17 articles [14, 15, 19, 20, 24, 26, 29, 30 , 3336,
38, 39, 42, 43, 45], and the reviewers were blinded to the
final diagnosis in 25 articles [14, 17, 19, 2528, 3032,
3549]. Nine articles used only pathological diagnosis
through surgery as the reference standard [20, 27
, 29, 34,
36, 39, 4446]. Details of t he reference standard of the
included articles are summarized i n the Supplemen tary
Table 1.
Of the studies reporting the accuracy of TR-5, 23 used
ACR-TIRADS [14, 16, 1820, 24, 2730, 32, 33, 35, 36,
38, 39, 41, 4345, 4749], 11 used K-TIRADS [17, 2326,
28, 31, 37, 40, 48, 49], and six used EU-TIRADS [34, 42, 45,
46, 48, 49]. For TR-4/5, 24 studies used ACR-TIRADS
[1416, 1820, 24, 2830,
32, 33, 35, 36, 38, 39, 4145,
4749], 10 used K-TIRADS [15, 17, 25, 26, 28, 31, 37, 40,
48, 49], and six used EU-TIRADS [15, 34, 45, 46, 48, 49].
Study quality according to QUADAS-2
The qualities of the included articles are summarized in
Supplementary Fig. 1. All the studies reported results on a
per-lesion basis. Of the four domains, the patient selection
domain had notable quality concerns, with 15% (5/34) of
studies having a high risk of bias, and 9% (3/34) having high
concerns regarding applicability because they did not enroll
patients consecutively, or the mean size of the thyroid nodules
included in the analysis was relatively large. The interval be-
tween the index test and reference standard was unclear or not
noted in 88% (30/34) of studies, resulting in a risk of bias in
the flow and timing domain. In a ddition, the presence of
blinding during the image review was unclear or not noted
in 26% (9/34) of studies, resulting in a risk of bias in the index
test domain.
Accuracy of TIRADS category 5 for diagnosing
malignancy
The results of the meta-analysis on the studies evaluating TR-5
for the diagnosis of malignancy are summarized in T able 2 and
Fig. 2. For TR-5, 37,083 nodules in 33 studies were analyzed.
Among the three TIRADS systems, EU-TIRADS showed the
highest sensitivity (78% [95% CI, 6488%]), and K-TIRADS
had the highest sp ecificity (9 3% [95% CI, 9195%]). However ,
there was no significant difference between the three systems.
Fig. 1 PRISMA flow diagram of
the article selection process.
TIRADS thyroid imaging
reporting and data system, ACR
American College of Radiology,
K Korean, EU European. *31
articles were included in both
analyses
Eur Radiol
Table 1 Characteristics of the included studies
Author (year of
publication)
Study
design
Subject
enrollment
*
No. of
patients
No. of
nodules
Age, years
Country T umor size,
mm
TIRADS Image reviewers No. of
readers
(years of
experience)
Blinding
Reference standard
Ha SM (1) (2017)
[23]
Retrospective Consecutive 71 1 829 48.7 (698) Korea 23.0 (341.6) K NA NA Unclear Surgery, FNAB, CNB,
or US follow-up
Ha SM (2) (2017)
[24]
Retrospective Consecutive 954 1 112 50.8 (1386) Korea 14.1 (570) ACR, K Multiple reviewers
with consensus
2 (8, 10) Unclear Surgery, FNAB, CNB,
or US follow-up
Hong MJ (2017) [25] Retrospective Consec utive 1457 1651 51.0, 12.1
(mean, SD)
Korea 19. 1 , 10.7
(mean, SD)
K Single reviewer 3 (12, 16, 19) Yes Surgery, FNAB, CNB,
or US follow-up
Middleton WD
(2017) [14]
Retrospective Consecutive 3315 3422 54.4 (1897) USA NA ACR Multiple reviewers
with consensus
2 (NA) Yes Surgery or FNAB
Bae JM (2018) [17] Prospective Consecutive 190 201 49.7, 11.3
(mean, SD)
Korea 22.0 (1060) K Single reviewer 1 (> 8) Yes Surgery, FNAB, or
CNB
Chung SR (2018)
[26]
Retrospective Consecutive 877 907 NA Korea NA K Multiple reviewers
with consensus
2 (NA) Yes Surgery, FNAB, or
CNB
Gao L (2018) [27] Retrospective Consecutive 262 342 NA China 12.1 ACR Single reviewer 1 (20) Yes Surgery
Ha EJ (2018) [28] Retrospective Consecutive 750 902 49.2 (981) Korea 15.0 (5100) ACR, K Single reviewer 1 (NA) Yes Surgery, FNAB, or
CNB
Hang J (2018) [29] Retrospective Selective 262 298 45.6 (2176) China 12.8 (5.027.8) ACR Multiple reviewers
with consensus
2(>3) Unclear Surgery
Hoang JK (2018)
[30]
Retrospective Consecutive 92 100 52.0 (1992) USA 27.0 (7.059.0) ACR Multiple reviewers
with consensus
3(2634) Yes Surgery or FNAB
Hong MJ (2018) [31] Retrospective Consec utive 1802 2000 51.2, 12.2
(mean, SD)
Korea 20.0, 11.4
(mean, SD)
K Single reviewer 3 (12, 16, 19) Yes Surgery, FNAB, CNB,
or US follow-up
Koseoglu Atilla FD
(2018) [18]
Retrospective Consecutive 2614 2614 51.0 (NA) T urkey 20.3 (NA) ACR NA NA Unclear Surgery or FNAB
Lauria Pantano A
(2018) [32]
Retrospective Consecutiv e 946 1077 56.0, 13. 3
(mean, SD)
Italy 14 (456) ACR NA NA Yes FNAB
Rosario PW
(2018) [33]
Retrospective Selective 1106 1490 48 (983) Brazil NA ACR Multiple reviewers
with consensus
2 (NA) Unclear Surgery or FNAB
Skowronska A
(2018) [34]
Prospective Consecutive 52 140 55.0, 14.0
(mean, SD)
Poland 16.1 (NA) EU Multiple reviewers
with consensus
2 (2, 15) Unclear Surgery
Zheng Y (2018) [35] Retrospective Consecutive 1013 1033 45.3 (1581) China 13.1 (575) ACR Multiple reviewers
with consensus
2(>5) Yes SurgeryorFNAB
Ahmadi S (2019)
[36]
Retrospective Consecutive 213 323 55 (4265),
median
(IQR)
USA 19 (1234),
median (IQR)
ACR Multiple reviewers
with consensus
3 (NA) Yes Surgery
Ahn HS (2019) [37] Retrospective Consecutive 384 432 50.6, 12.5,
(mean, SD)
Korea 17.9 (1087) K Single reviewer 3 (12, 16, 19) Yes Surgery, FNAB, or
CNB
Chen L (2019) [38] Retrospective Consecutive 195 203 NA China 16.9 (1045) ACR Multiple reviewers
with consensus
3 (15) Yes Surgery, FNAB, or
US follow-up
Gao L (2019) [39] Retrospective Consecutive 1758 2544 44.8 (NA) China 15.1 (NA) ACR Multiple reviewers
with consensus
2 (8, 9) Yes Surgery
Grani G (2019) [15] Prospective Consecutive 477 502 55.9, 13.9,
(mean, SD)
Italy NA ACR, K,
EU
Multiple reviewers
with consensus
2 (NA) Unclear Surgery, FNAB, or
US follow-up
Hong HS (2019) [40] Retrospective Selective 683 683 49.7 (20
77) Korea 13.2, 10.2
(mean, SD)
K Single reviewer 1 (4 residents,
5, 8, 25)
Yes Surgery, FNAB, CNB,
or US follow-up
Eur Radiol
Table 1 (continued)
Author (year of
publication)
Study
design
Subject
enrollment
*
No. of
patients
No. of
nodules
Age, years
Country T umor size,
mm
TIRADS Image reviewers No. of
readers
(years of
experience)
Blinding
Reference standard
Jin ZQ (2019) [16] Prospective Selective 316 332 42.2 (1672) China Benign: 20.4
(NA),
Malignant:
19.6 (NA)
ACR NA NA Unclear Surgery or FNAB
Li X (2019) [41] Retrospective Consecutive 128 130 47.8 (1768) China 14.6 (549) ACR Single review er 1 (> 5) Yes Sur g ery or F NAB
Phutthar ak W
(2019) [42]
Retrospective Consecutive 94 108 51.6 (1075) Thailand 21.2 (4.680) ACR, EU Multiple
independent
reviewers
2 (2, > 10) Yes FNAB or US
follow-up
Ruan JL (2019) [43] Retrospective Consecutive 918 1001 45.7 (1478) China 18.1 (579) ACR Multiple reviewers
with consensus
4 (NA) Yes Surgery or FNAB
Sahli ZT (2019) [44] Retrospective Selective 131 133 52.2 (1780) USA 24 (580) ACR Single reviewers 1 (NA) Yes Surgery
Shen Y (2019) [45] Retrospective Consecutive 1568 1612 51 (1880) China 16.8 (6120) ACR, EU Multiple reviewers
with consensus
2 (11, 15) Yes Surgery
T rimboli P (2019)
[46]
Retrospective Consecutive 495 1058 53 (NA) Switzerland,
France,
UK
18.0 (NA) EU Single reviewer 1 (> 15) Yes Surgery
Wildman-Tobriner
B (2019) [47]
Retrospective Consecutive 1264 1425 53.0 (1893) USA 26.3 (NA) ACR Single reviewer 1 (20) Yes Surgery or FNAB
W u XL (2019) [19 ] Retrospective Consecutive 894 1000 NA China 12.7 (NA) ACR Multiple reviewers
with consensus
3 (> 15) Yes Surgery, FNAB, CNB,
or US follow-up
Xu T (2019) [48] Retrospective Consecutiv e 2031 2465 47.7, 13.4
(mean, SD)
China 16.6, 1 1.8
(mean, SD)
ACR, K,
EU
Single reviewer 1 (NA) Yes Surgery, FNAB, or
US follow-up
Yoon SJ (2019) [49] Retrospective Consecutive 1836 2274 55.1 (992) Korea 19.4, 9.9
(mean, SD)
ACR, K,
EU
Single reviewer 1 (20) Yes Surgery or FNAB
Zhu J (2019) [20] Retrospective Consecutive 3242 3242 45.6 (NA) China NA ACR Multiple reviewers
with consensus
3 (NA) Unclear Surgery
Articles a re listed alphabetically according to the order of the names of the first authors
TIRADS thyroid imaging reporting and data system, K Korean, ACR American College of Radiology, EU European, NA no t available, FNAB fine-needle aspiration biopsy, CNB core needle biopsy , US
ultrasonography, SD standard deviation, IQR interquartil e range
*
The two methods were consecutive enrollment of eligible patients with thyroid nodules on ultrasonography , and selective enrollment according to sp ecific diagnoses or imaging features
Data are mean or media n value, and data in parentheses are range
Determined according to wheth er the reviewers were blinded to the final diagnosis of the analyzed lesions or not
Eur Radiol
All three TIRADS demonstrated substantial study heterogeneity
in both sensitivity (I
2
=939 8%) and specificity (I
2
=9598%)
with the HSROC curves showing a large difference between
95% confidence and prediction regions (Supplementary Fig.
2). For ACR- and K-TIRADS, a threshold effect was noted with
correlation coefficients of 0.736 and 0.755 between sensitivity
and false-positive rate (p 0.01; Fig. 2).
In the subgroup analysis of the prospective studies, EU-
TIRADS had the highest sensitivity (75% [95% CI, 36
96%]) and K-TIRADS had the highest specificity (100 %
[95% CI, 96100%]). In the subgroup analysis of studies
which defined only cytopathologic result as a reference stan-
dard, EU-TIRADS had the highest sensitivity (81% [95% CI,
6690%]) and K-TIRADS had the highest specificity (94%
[95% CI, 9196%]) (Table 2).
Accuracy of TIRADS category 4 or 5 for diagnosing
malignancy
The results of the meta-analysis on the studies evaluating TR-
4/5 for the diagnosis of malignancy are summarized in Table 2
and Fig. 3. For TR-4/5, 36,414 nodules in 32 studies were
analyzed. All three of the TIRADS systems had sensitivities
higher than 90% for diagno sing thyroid malignancy.
Regarding specificity, K-TIRADS had the highest specificity
(61% [95% CI, 5072%]). There was no significant difference
in both sensitivity and specificity between the three systems,
All three TIRADS demonstrated substantial study heteroge-
neity in both sensitivity (I
2
=9397%) and specificity (I
2
=
9899%), with the HSROC curves showing a large difference
between 95% confidence and prediction regions
(Supplementary Fig. 3). A considerable threshold effect was
noted in K-TIRADS, with a correlation coefficient of 0.754
between sensitivity and false-positive rate (p = 0.01; Fig. 3).
In the subgroup analysis of the prospective studies, EU-
TIRADS had the highest sensitivity (90% [95% CI, 76
100%]) and ACR-TIRADS had the highest specificity (57%
[95% CI, 3480%]). In the subgroup analysis of studies which
defined only cytopathologic result as a reference standard,
both ACR- and EU-TIRADS had a sensitivity of 95% and
K-TIRADS had the highest specificity (66% [95% CI, 60
72%]) (Table 2).
Table 2 Accuracy of TIRADS classifications for the diagnosis of malignancy
Meta-analytic summary estimates
No. of studies Sensitivity (95% CI) Higgins I
2
(%) Specificity (95% CI) Higgins I
2
(%)
TIRADS category 5
ACR-TIRADS 23 70% (6179) 98 89% (8592) 98
Prospective design
*
152%(4262) 93% (8896)
Cytopathology only
19 70% (5879) 89% (8693)
K-TIRADS 11 64% (5870) 93 93% (9195) 95
Prospective design
*
159%(4869) 100% (96100)
Cytopathology only
563%(5372) 94% (9196)
EU-TIRADS 6 78% (6488) 97 89% (7795) 98
Prospective design
*
175%(3696) 99% (94100)
Cytopathology only
481%(6690) 92% (7898)
TIRADS category 4 or 5
ACR-TIRADS 24 95% (9297) 97 49% (4356) 98
Prospective design
*
286%(66100) 57% (3480)
Cytopathology only
18 95% (9197) 51% (4458)
K-TIRADS 10 92% (8796) 95 61% (5072) 98
Prospective design
*
289%(77100) 47% (2370)
Cytopathology only
591%(8594) 66% (6072)
EU-TIRADS 6 96% (9298) 93 48% (3562) 99
Prospective design
*
290%(76100) 52% (2975)
Cytopathology only
495%(9098) 53% (3670)
TIRADS thyroid imaging reporting and data system, CI confidence interval, ACR American College of Radiology, K Korean, EU European, NA not
applicable
*
Subgroup analysis for prospective studies
Subgroup analysis for studies that used only cytopathologic result as a reference standard
Eur Radiol
No significant publication biases were noted across the
studies for both TR-5 and TR-4/5 (p 0.06; Supplementary
Fig. 4 and Supplementary Fig. 5).
Meta-regression analysis
The results of the meta-regression analysis for TR-5 and TR-
4/5 are summarized in Tables 3 and 4, respectively. For both
the TR-5 and TR-4/5 criteria, study heterogeneity was signif-
icantly associated with the country of origin (ACR- and EU-
TIRADS in TR-5 and ACR- and K-TIRADS in TR-4/5) and
the experience levels of the reviewers (ACR- and K-TIRADS;
p 0.04). In K-TIRADS only, the number of patients (TR-5),
the method of subject enrollment (both TR-5 and TR-4/5), and
the clarity of blinding review (TR-4/5) were significantly as-
sociated with study heterogeneity (p 0.05).
Discussion
In this study, we evaluated the diagnostic performance of three
international TIRADS for diagnosing thyroid cancer on US.
The meta-analytic summary sensitivity and specificity for TR-
5were6478% and 8993% respectively, while the
corresponding values for TR-4/5 were 9296% and 48
61%. There was no significant difference among three inter-
national TIRADS, but the trend toward higher sensitivity with
EU-TIRADS and higher specificity with K-TIRADS.
The use of high-resolution US for thyroid disease has
markedly increased the detection of thyroid nodules [13]. To
improve the communication between referring physicians and
cytopathologists, as well as to increase the efficacy of FNA
and avoid unnecessary procedures, several TIRADS have
been applied to the US features of thyroid nodules [2, 3,
911, 50, 51]. ACR-TIRADS is a scoring system which inte-
grated all US characteristics and scored from 0 to 3 on the
basis of their malig nant potential [10]. By contrast, K-
TIRADS and EU-TIRADS are pattern-based systems, i.e.,
K-TIRADS uses solidity, echogenicity, and suspicious fea-
tures (nonparallel orientation, spiculated/microlobulated mar-
gin, and microcalcifications) to stratify nodules [9], while EU-
TIRADS uses four US characteristics (non-oval shape,
irregular margins, microcalcifications, and marked
hypoechogenicity) [11, 49]. Pattern-based systems have the
advantage that they are intuitive and feasible for clinical ap-
plication, whereas a scoring system might enable to evaluate
each nodule objectively. In other words, the scoring system
and the pattern-based system have their own advantages and
SENSITIVITY (95% CI)
0.16 [0.10 - 0.23]
0.30 [0.15 - 0.49]
0.33 [0.23 - 0.44]
0.52 [0.47 - 0.57]
0.52 [0.42 - 0.62]
0.54 [0.42 - 0.65]
0.57 [0.52 - 0.63]
0.60 [0.51 - 0.68]
0.60 [0.57 - 0.63]
0.61 [0.53 - 0.67]
0.65 [0.49 - 0.79]
0.73 [0.45 - 0.92]
0.75 [0.67 - 0.81]
0.76 [0.70 - 0.81]
0.78 [0.74 - 0.82]
0.82 [0.80 - 0.83]
0.82 [0.78 - 0.85]
0.84 [0.81 - 0.87]
0.85 [0.81 - 0.89]
0.88 [0.86 - 0.90]
0.90 [0.85 - 0.93]
0.92 [0.83 - 0.97]
0.93 [0.92 - 0.95]
0.36 [0.13 - 0.65]
0.51 [0.47 - 0.56]
0.56 [0.51 - 0.62]
0.57 [0.51 - 0.63]
0.59 [0.48 - 0.69]
0.60 [0.55 - 0.66]
0.66 [0.62 - 0.70]
0.69 [0.64 - 0.74]
0.71 [0.69 - 0.74]
0.79 [0.74 - 0.84]
0.80 [0.75 - 0.84]
0.46 [0.26 - 0.67]
0.73 [0.68 - 0.78]
0.75 [0.69 - 0.80]
0.75 [0.35 - 0.97]
0.83 [0.81 - 0.85]
0.93 [0.91 - 0.95]0.93 [0.91 - 0.95]
StudyId
ACR [18]
ACR [44]
ACR [36]
ACR [14]
ACR [16]
ACR [32]
ACR [49]
ACR [47]
ACR [48]
ACR [33]
ACR [38]
ACR [30]
ACR [29]
ACR [28]
ACR [43]
ACR [39]
ACR [24]
ACR [19]
ACR [35]
ACR [45]
ACR [27]
ACR [41]
ACR [20]
K [37]
K [31]
K [25]
K [49]
K [17]
K [23]
K [26]
K [24]
K [48]
K [28]
K [40]
EU [42]
EU [49]
EU[46]
EU [34]
EU [48]
EU [45]
0.1 1.0
SENSITIVITY
SPECIFICITY (95% CI)
0.98 [0.98 - 0.99]
0.91 [0.84 - 0.96]
0.99 [0.96 - 1.00]
0.89 [0.87 - 0.90]
0.93 [0.89 - 0.96]
0.92 [0.90 - 0.93]
0.92 [0.91 - 0.93]
0.86 [0.84 - 0.88]
0.91 [0.89 - 0.92]
0.93 [0.91 - 0.94]
0.93 [0.88 - 0.97]
0.86 [0.77 - 0.92]
0.86 [0.78 - 0.92]
0.87 [0.84 - 0.89]
0.95 [0.93 - 0.96]
0.79 [0.76 - 0.82]
0.78 [0.74 - 0.81]
0.66 [0.61 - 0.70]
0.82 [0.79 - 0.85]
0.87 [0.85 - 0.90]
0.77 [0.67 - 0.84]
0.67 [0.53 - 0.79]
0.78 [0.76 - 0.80]
0.95 [0.93 - 0.97]
0.96 [0.95 - 0.97]
0.96 [0.95 - 0.97]
0.94 [0.93 - 0.95]
1.00 [0.97 - 1.00]
0.89 [0.86 - 0.92]
0.94 [0.92 - 0.96]
0.91 [0.88 - 0.93]
0.87 [0.86 - 0.89]
0.88 [0.85 - 0.90]
0.91 [0.88 - 0.94]
0.80 [0.70 - 0.88]
0.78 [0.76 - 0.80]
0.97 [0.95 - 0.98]
0.98 [0.95 - 1.00]
0.79 [0.77 - 0.81]
0.81 [0.78 - 0.84]0.81 [0.78 - 0.84]
StudyId
ACR [18]
ACR [44]
ACR [36]
ACR [14]
ACR [16]
ACR [32]
ACR [49]
ACR [47]
ACR [48]
ACR [33]
ACR [38]
ACR [30]
ACR [29]
ACR [28]
ACR [43]
ACR [39]
ACR [24]
ACR [19]
ACR [35]
ACR [45]
ACR [27]
ACR [41]
ACR [20]
K [37]
K [31]
K [25]
K [49]
K [17]
K [23]
K [26]
K [24]
K [48]
K [28]
K [40]
EU [42]
EU [49]
EU[46]
EU [34]
EU [48]
EU [45]
0.5 1.0
SPECIFICITY
Fig. 2 Coupled forest plots of the sensitivity and specificity of ACR-
TIRADS, K-TIRADS, and EU-TIRADS for category 5. Each study
was labeled with the TIRADS used and its reference number. TIRADS
thyroid imaging reporting and data system, ACR American College of
Radiology, K Korean, EU European
Eur Radiol
disadvantages. This might be related to our results that there
was no significant difference in the diagnostic performance
among three international TIRADS.
In our study, no significant difference was noted among
three international TIRADS, but EU-TIRADS showed a ten-
dency toward higher sensitivity for TR-5 than ACR- and K-
TIRADS. In EU-TIRADS, the presence of a highly suspicious
feature, irrespective of the composition and echogenicity of a
thyroid nodule, is classified as TR-5 [11]. As ACR-TIRADS
uses different weightings for the composition and
echogenicity of thyroid nodules [10], and K-TIRADS catego-
rizes thyroid nodules with suspicious features and mixed solid
and cystic composi tion or solid composition with iso- or
hyperechogenicity as TR-4 [9], a substantial proportion of
nodules categorized as TR-4 using K-TIRADS and ACR-
TIRADS are categorized as TR-5 using EU-TIRADS. For this
reason, EU-TIRADS showed the highest sensitivity for TR-5,
whereas the specificity was the lowest of the three systems
[49]. However, since there were only seven studies using EU-
TIRADS, more prospective data is needed to validate it. In
addition, the sonographic features of macrocalcification, pe-
ripheral rim calcification, and extrathyroidal extension are
counted as features that increase the malignancy risk in
ACR-TIRADS, whereas these are not malignant features in
EU-TIRADS and K-TIRADS [911, 49]. These features may
lead to ACR-TIRADS having a higher sensitivity for TR-5
than K-TIRADS.
Substantial heterogeneity across the studies was found. For
ACR- and K-TIRADS, a significant positive correlation be-
tween sensitivity and false-positive rate (correlation coeffi-
cients = 0.7360.755) was noted, indicating a threshold effect,
which might have occurred from the use of different thresh-
olds to determine a positive test result. In addition, meta-
regression analysis revealed that subject enrollment, country
of origin, experience level of reviewer, clarity of blinding
review, and number of patients were the main causes of het-
erogeneity. Selection bias may lead to a higher sensitivity in
studies with a retrospective design or those with selectively
enrolled subjects in comparison with studies with a prospec-
tive design or those using consecutively enrolled subjects. To
avoid selection bias, which can be induced in the retrospective
or case-control design, further prospective cohort studies are
needed to confirm whether the TIRADS systems provide the
kind of sensitivity and specificity that allow clinicians balance
cancer detection and resource utilization in the setting that the
diagnosis of thyroid nodules is increasing rapidly.
The basic role of sonographic riskstratification is as a
rule-out test to minimize the number of cancers that are
SENSITIVITY (95% CI)
0.72 [0.63 - 0.80]
0.75 [0.53 - 0.90]
0.78 [0.68 - 0.86]
0.83 [0.67 - 0.94]
0.84 [0.78 - 0.89]
0.84 [0.80 - 0.88]
0.88 [0.80 - 0.94]
0.88 [0.82 - 0.93]
0.90 [0.73 - 0.98]
0.91 [0.82 - 0.96]
0.92 [0.88 - 0.95]
0.95 [0.84 - 0.99]
0.96 [0.93 - 0.98]
0.96 [0.94 - 0.98]
0.96 [0.94 - 0.98]
0.97 [0.95 - 0.98]
0.98 [0.97 - 0.99]
0.98 [0.97 - 0.99]
0.99 [0.93 - 1.00]
0.99 [0.96 - 1.00]
0.99 [0.98 - 0.99]
0.99 [0.97 - 1.00]
1.00 [0.99 - 1.00]
1.00 [0.78 - 1.00]
0.79 [0.49 - 0.95]
0.81 [0.77 - 0.84]
0.81 [0.71 - 0.88]
0.84 [0.79 - 0.88]
0.90 [0.86 - 0.93]
0.92 [0.78 - 0.98]
0.94 [0.92 - 0.96]
0.95 [0.92 - 0.98]
0.96 [0.95 - 0.97]
0.99 [0.97 - 1.00]
0.86 [0.71 - 0.95]
0.93 [0.89 - 0.95]
0.93 [0.89 - 0.96]
0.98 [0.97 - 0.99]
0.99 [0.97 - 0.99]
1.00 [0.63 - 1.00]1.00 [0.63 - 1.00]
StudyId
A
CR [18]
A
CR [42]
A
CR [36]
A
CR [15]
A
CR [33]
A
CR [14]
A
CR [16]
A
CR [47]
A
CR [44]
A
CR [32]
A
CR [49]
A
CR [38]
A
CR [28]
A
CR [24]
A
CR [43]
A
CR [48]
A
CR [39]
A
CR [45]
A
CR [41]
A
CR [29]
A
CR [20]
A
CR [35]
A
CR [19]
A
CR [30]
K [37]
K [31]
K [17]
K [25]
K [49]
K [15]
K [26]
K [28]
K [48]
K [40]
EU [15]
EU [49]
EU [46]
EU [48]
EU [45]
EU [34]
0.5 1.0
SENSITIVITY
SPECIFICITY (95% CI)
0.65 [0.63 - 0.67]
0.58 [0.47 - 0.69]
0.73 [0.67 - 0.79]
0.56 [0.52 - 0.61]
0.51 [0.48 - 0.54]
0.52 [0.50 - 0.53]
0.58 [0.52 - 0.65]
0.47 [0.44 - 0.50]
0.30 [0.21 - 0.40]
0.45 [0.42 - 0.48]
0.61 [0.59 - 0.64]
0.61 [0.53 - 0.68]
0.53 [0.49 - 0.57]
0.44 [0.40 - 0.48]
0.77 [0.74 - 0.81]
0.53 [0.50 - 0.56]
0.50 [0.47 - 0.53]
0.33 [0.30 - 0.36]
0.25 [0.14 - 0.38]
0.31 [0.23 - 0.40]
0.52 [0.49 - 0.55]
0.43 [0.40 - 0.47]
0.10 [0.08 - 0.14]
0.61 [0.50 - 0.72]
0.70 [0.66 - 0.75]
0.70 [0.68 - 0.72]
0.81 [0.72 - 0.88]
0.73 [0.70 - 0.75]
0.64 [0.62 - 0.66]
0.18 [0.14 - 0.22]
0.64 [0.59 - 0.68]
0.59 [0.55 - 0.63]
0.54 [0.51 - 0.57]
0.59 [0.54 - 0.64]
0.32 [0.28 - 0.36]
0.40 [0.37 - 0.42]
0.68 [0.65 - 0.71]
0.45 [0.42 - 0.47]
0.33 [0.30 - 0.36]
0.73 [0.64 - 0.80]0.73 [0.64 - 0.80]
StudyId
ACR [18]
ACR [42]
ACR [36]
ACR [15]
ACR [33]
ACR [14]
ACR [16]
ACR [47]
ACR [44]
ACR [32]
ACR [49]
ACR [38]
ACR [28]
ACR [24]
ACR [43]
ACR [48]
ACR [39]
ACR [45]
ACR [41]
ACR [29]
ACR [20]
ACR [35]
ACR [19]
ACR [30]
K [37]
K [31]
K [17]
K [25]
K [49]
K [15]
K [26]
K [28]
K [48]
K [40]
EU [15]
EU [49]
EU [46]
EU [48]
EU [45]
EU [34]
0.1 0.9
SPECIFICITY
Fig. 3 Coupled forest plots of the sensitivity and specificity of ACR-
TIRADS, K-TIRADS, and EU-TIRADS for category 4 or 5. Each study
was labeled with the TIRADS used and its reference number. TIRADS
thyroid imaging reporting and data system, ACR American College of
Radiology, K Korean, EU European
Eur Radiol
Table 3 Results of the meta-regression analysis of the accuracy of TIRADS category 5 for diagnosing malignancy
ACR (n = 23) K (n =11) EU (n =6)
Covariates Subgr oup Sensitivity
(95% CI)
Specificity
(95% CI)
p value Sensitivity
(95% CI)
Specificity
(95% CI)
p value Sensitivity
(95% CI)
Specificity
(95% CI)
p value
Subject enrollment Consecutive 73% (64, 8 2) 88% (84, 92) 0.41 63% (57, 68) 94% (92, 96) 0.05 NA*
Selective 56% (31, 8 1) 91% (85, 98) 80% (68, 92) 91% (83, 100)
Country Asian 81% (76, 8 7) 85% (80, 89) < .001 NA
*
79% (67, 92) 79% (77, 8 1) < .001
Western 47% (35, 59) 93% (90, 96) 76% (53, 99) 97% (96, 9 8)
Image reviewer Single reviewers or NA 62% (47, 76) 90% (85, 95) 0.17 63% (57, 70) 94% (91, 96) 0.84 79% (61, 96) 90% (79, 100) 0.95
Multiple reviewers 76% (66, 86) 88% (83, 93) 68% (55, 81) 93% (88, 98) 77% (62, 93) 88% (75, 100)
Clarity of blinding
in review
Yes 72% (61, 8 2) 88% (84, 92) 0.79 64% (57, 71) 94% (92, 96) 0.13 78% (66, 91) 85% (77, 94) 0.09
Unclear 67% (48, 8 5) 90% (85, 96) 65% (51, 79) 90% (84, 96) 77% (36, 100) 99% (96, 100)
Experience level
of reviewers
Senior
80% (71, 8 9) 83% (77, 89) 0.01 57% (51, 64) 95% (94, 97) 0.01 83% (71, 95) 88% (76, 100) 0.57
Junior
or others 59% (47, 7 2) 92% (90, 95) 72% (66, 77) 90% (87, 93) 70% (50, 91) 90% (78, 1 00)
Number of patients < 200 69% (46, 9 3) 87% (77, 97) 0.80 59% (48, 69) 100% (96, 100) < .001 57% (29, 85) 93% (84, 100) 0.16
200 70% (60, 80) 89% (85, 93) 65% (58, 71) 93% (91, 95) 83% (75, 91) 86% (75, 98)
Number of nodules < 1000 69% (54, 8 4) 89% (84, 95) 0.97 67% (60, 75) 93% (90, 96) 0.50 57% (29, 85) 93% (84, 100) 0.16
1000 71% (60, 82) 88% (84, 93) 61% (53, 70) 94% (91, 97) 83% (75, 91) 86% (75, 98)
The results were obtained using meta-regression anal yses with the bivariat e model
TIRADS thyroid imaging reporting and data system, ACR American College of Radiology, K Korean, EU European, CI confidence interval, NA not available
*
If either subgroup was 0
The level of experience of all included reviewers was more than 5 years of experience in the thyroid imaging field
In the case that at least one reviewer had less than 5 years o f experience in the thyroid imaging field
Eur Radiol