
than ACR-, K-, or EU-TIRADS; (v) articles with overlapping
patient cohorts and data; and (vi) articles that analyzed less
than 100 thyroid nodules. Articles were first screened by their
titles and abstracts, and full-text reviews were then performed
after selecting those abstracts that were potentially eligible.
Both steps were performed by two independent reviewers
(D.H.K. and S.H.C. with 6 and 7 years of experience in the
thyroid imaging, respectively), who eliminated only those ar-
ticles that were clearly ineligible. Articles with any degree of
ambiguity or that generated differences in opinion between the
two independent reviewers were re-evaluated at a consensus
meeting, to which a third reviewer (S.R.C. with 9 years of
experience in the thyroid imaging) was invited.
Data extraction
The data were extracted onto a predefined data form, includ-
ing study characteristics, study population characteristics, le-
sion characteristics, TIRADS system, image review method,
method for obtaining the reference standards, and study out-
comes. The detail of data form was shown in the
Supplementary Method. The reviewers evaluated the use of
statistical methods for diagnostic test accuracy, and the exact
numbers for true-positive, true-negative, false-positive, and
false-negative results were extracted. When not reported ex-
plicitly, data were extracted manually from the text, tables, and
figures. Two reviewers independently performed data extrac-
tion, and all discrepancies were resolved at a consensus meet-
ing in the presence of a third reviewer.
Assessment of study quality
The Quality Assessment of Diagnostic Accuracy Studies
(QUADAS-2) criteria [22] were used to assess the quality of
the se lected articles. The QUADAS-2 tool assesses study
quality in four different domains, including patient selection,
index test, reference standard, and flow of patients through the
study, as well as timing of the index test and reference stan-
dard. The assessments were performed independently by two
reviewers, and all discrepancies were resolved by arriving at a
consensus between the two reviewers and a third reviewer.
Data synthesis and statistical analysis
Meta-analytic summary estimates of the diagnostic accuracy
of TIRADS Regarding the diagnostic accuracy of each
TIRADS, i.e., ACR-, K-, or EU-TIRADS, the sensitivity
and specificity of the two criteria, TR-5 and TR-4/5, were
calculated. When the diagnostic outcomes of more than one
TIRADS system were reported within a single study, the
meta-analytic summary sensitivity and specificity of each
TIRADS system were calculated, separating the data accord-
ing to the different TIRADS systems. Hierarchical modeling
methods were used to calculate the meta-analytic summary
values. The summary sensitivity a nd specificity and t heir
95% confidence intervals (CIs) were obtained using a bivari-
ate random effects model. Summary receiver operating char-
acteristic curves were obtained using a hierarchical summary
receiver operating characteristic (HSROC) model. In addition,
subgroup analysis was performed on prospective studies and
studies which defined only cytopathology result as a reference
standard.
The presence of heterogeneity among studies with respect
to sensitivity and specificity was assessed using Higgins I
2
statistics, with an I
2
> 50% being considered to indicate sub-
stantial heterogeneity. When heterogeneity was noted, the
presence of a threshold effect was analyzed by visual assess-
ment of the coupled forest plots of sensitivity and specificity,
as well as by calculating the Spearman correlation coefficient
between sensitivity and false-positive rate (1-specificity). A
correlation coefficient > 0.6 was considered to indicate a con-
siderable threshold effect.
Meta-regression analysis Meta-regression analysis was per-
formed to further explore the causes of study heterogeneity
by including covariates in a bivariate model. The following
covariates were considered: (i) subject enrollment (consecu-
tive versus selective); (ii) region (Asian versus western coun-
tries); (iii) image reviewer (single reviewer or not available
versus multiple reviewers); (iv) clarity of blinding review
(yes versus unclear); (v) experience level of the reviewers
(senior vs. junior or others); (vi) number of patients (< 200
versus ≥ 200); and (vii) number of thyroid nodules (< 1000
versus ≥ 1000).
Analysis of public ation bias Publication bias was assessed
visually using Deeks’ funnel plot, and its statistical signifi-
cance was tested using Deeks’ asymmetry test.
Stata version 15.0 (StataCorp LLC) was used for statistical
analysis, with p < 0.05 being considered statistically
significant.
Results
Literature search
A total of 225 articles were screened after removal of dupli-
cates (Fig. 1). Of these, 112 articles were excluded on the basis
of their titles and abstracts, and 79 articles were further ex-
cluded after a full-text review. Of the remaining 34 eligible
articles, which included a total of 37,585 thy roid nodules
(Table 1), 33 [14, 16 –20, 23–49]and32[14–20, 24–26,
28–49] reported the accuracy of TR-5 and TR-4/5, respective-
ly (31 articles reported both) [14, 16–20, 24–26, 28–49].
Eur Radiol