Background: Previous studies on the diagnostic test accuracy of depression-screening tools have treated both clinician-administered semi-structured diagnostic interviews and lay-administered fully structured interviews as gold standards for assessing major depressive disorder (MDD). Fully structured interviews do not involve clinical judgement and are considered potentially more reliable but less valid than semi-structured interviews, overdiagnosing MDD among patients with low-level symptoms. No studies have assessed the impact of using fully structured interviews as the reference standard in meta-analyses of diagnostic test accuracy.
Objectives: To compare the sensitivity and specificity of the Patient Health Questionnaire-9 (PHQ-9) depression screening tool using semi-structured vs. fully structured diagnostic interviews as the reference standard.
Methods: We conducted an individual patient data meta-analysis of the diagnostic accuracy of the PHQ-9. Electronic databases were searched from January 2000 to December 2014 for datasets that compared PHQ-9 scores to MDD diagnosis based on a validated interview. For PHQ-9 cutoffs 5-15, we estimated pooled sensitivity and specificity among studies using semi-structured and fully structured diagnostic interviews separately.
Results: Data were obtained from 43 of 53 eligible studies (81%), for a total of 14 405 patients (1,763 MDD cases). Estimates of specificity were similar using semi-structured or fully structured interviews as the reference standard (within 2%); however, estimates of sensitivity were underestimated by 5-22% (median = 18%) using fully structured compared to semi-structured diagnostic interviews (Table 1).
Conclusions: Estimates of PHQ-9 sensitivity are consistently underestimated when using fully structured diagnostic interviews as the reference standard. Due to their lack of validity, fully structured interviews lead to an artificially inflated number of MDD cases, which the PHQ-9 is unable to capture. When deciding which interviews to conduct in research settings, the poor validity of fully structured interviews should be considered.