Background: Confidence intervals (CIs) computed in frequentist statistics for the quantitative results of systematic reviews of the effects of interventions are recommended in the evidence-based practice literature (EBPL) in general and in the widely used approach known as the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. Post hoc power analyses (PHPAs), defined here as the statistical power analyses performed for the interpretation of the results, are recommended in the EBPL, including the optimal information size proposed in the GRADE approach.
Objectives: To provide a principled, constructive and pragmatic critique of the use of CIs and PHPAs in systematic reviews, as recommended in the EBPL in general and in the GRADE approach to imprecision and to offer suggestions for improvement.
Methods: Assessment of the use of CIs and PHPAs in frequentist statistics informed by a critical review of the mathematical and applied statistics literature regarding CIs, Neyman-Pearson hypotheses testing and power analysis, augmented by insights from the statistical literature on fiducial inference, likelihood inference and confidence distributions. Statistical simulations and examples are discussed.
Results: Precision of the CIs is an initial precision, not a final precision; evaluation criteria for CIs other than the coverage and the length of the interval should be considered; the support provided by observed data for different proposed values for the unknown parameter are appropriately explored with likelihood inference, including likelihood functions and likelihood intervals, and with confidence distributions; PHPAs are misleading.
Conclusions: The use of CIs and PHPAs in systematic reviews as advocated in the EBPL and in the GRADE approach to imprecision is in contradiction with the mathematical and applied statistics literature, and should be revised. Suggestions for correcting the identified issues are provided.