Background: In systematic reviews of diagnostic accuracy studies, summary estimates of sensitivity and specificity and summary ROC curve are the preferred test-performance measures. In some of the recent systematic reviews, area under summary ROC (AUSROC) curve is also reported as an overall performance measure.
Methods: We investigated the performance of AUSROC estimates based on simulated test results in primary studies and 2-by-2 tables with different thresholds. Area Under the ROC Curve (AUC) was estimated in different ways: summary AUC from the HSROC/bivariate model; summary AUC from a meta-analysis of reported AUCs; and, an overall AUC from IPD meta-analysis. Four different scenarios were considered, with true AUC fixed at 0.64, 0.76, 0.81 and 0.91, respectively. True AUC was calculated based on parametric method with the known distribution (mean and SD) of test results. Performance of the estimates was assessed by bias and root-mean-square error.
Results: In all the 4 scenarios, the bivariate model using the pre-defined threshold always underestimated the AUC, while using the optimal threshold overestimated the AUC. Both approaches resulted in high RMSE. Meta-analysis of AUC, either from empirical estimate or distribution of the test results, performed fairly well. AUC calculated from pooling IPD data was not superior to meta-analysis of AUC, but was more accurate than estimating an AUC from the bivariate model. When the number of primary studies included in the meta-analysis increased from 5 to 20, all approaches returned a lower RMSE.
Conclusions: This simulation study provides empirical evidence for the observation that the AUHSROC cannot precisely estimate the performance of a test in a meta-analysis. Therefore, the AUHSROC should not be reported as an overall accuracy measure. By directly meta-analysing the AUC and its SE reported in primary studies, we can get a better summary estimate of AUC. Therefore, in those cases where the AUC may be a relevant measure of test accuracy, using the hierarchical models may not be the most accurate way to estimate the AUC.