The Radpeer system is central to the quality assurance process in many radiology practices. Previous studies have shown poor agreement between physicians in the evaluation of their peers. The purpose of this study was to assess the reliability of the Radpeer scoring system.
A sample of 25 discrepant cases was extracted from our quality assurance database. Images were made anonymous; associated reports and identities of interpreting radiologists were removed. Indications for the studies and descriptions of the discrepancies were provided. Twenty-one subspecialist attending radiologists rated the cases using the Radpeer scoring system. Multirater kappa statistics were used to assess interrater agreement, both with the standard scoring system and with dichotomized scores to reflect the practice of further review for cases rated 3 and 4. Subgroup analyses were conducted to assess subspecialist evaluation of cases.
Interrater agreement was slight to fair compared with that expected by chance. For the group of 21 raters, the kappa values were 0.11 (95% CI, 0.06-0.16) with the standard scoring system and 0.20 (95% CI, 0.13-0.27) with dichotomized scores. There was disagreement about whether a discrepancy had occurred in 20 cases. Subgroup analyses did not reveal significant differences in the degree of interrater agreement.
The identification of discrepant interpretations is valuable for the education of individual radiologists and for larger-scale quality assurance and quality improvement efforts. Our results show that a ratings-based peer review system is unreliable and subjective for the evaluation of discrepant interpretations. Resources should be devoted to developing more robust and objective assessment procedures, particularly those with clear quality improvement goals.