What is it about?
When two or more subjective observers, for example, human radiologists independently assess disease severity in an x-ray, their individual assessments are often compared using the kappa inter-observer statistic. Kappa values range in theory from -1 to +1, in practice from 0 to 1. A high inter-observer agreement is desirable otherwise problems exists with the method or the observers. It is well known that kappas of studies with different populations cannot be compared for mathematical reasons. We extend this caution to clinical trials where an intervention changes the distribution of outcome measurements: At the start of the trial most patients are in the severe categories, at the conclusion most are in the mild categories. Kappa will have to change even if nothing about the observers changed.
Featured Image
Why is it important?
Inter-observer statistics are increasingly used when comparing the performance of AI (artificial intelligence) with human readers. The more prevalent the use of kappa, the higher the risk of not understanding its limitations Machine learning scientists need to pay attention.
Perspectives
Read the Original
This page is a summary of: Sequentially Determined Measures of Interobserver Agreement (Kappa) in Clinical Trials May Vary Independent of Changes in Observer Performance, Therapeutic Innovation & Regulatory Science, January 2020, Springer Science + Business Media,
DOI: 10.1007/s43441-019-00102-5.
You can read the full text:
Resources
Contributors
The following have contributed to this page