Arch Public Health. 2025 May 7;83(1):124. doi: 10.1186/s13690-025-01606-3.
ABSTRACT
BACKGROUND: Self-reporting is a common approach in observational epidemiological studies. However, information can be biased by several causes and can, therefore, affect the outcomes of the investigations. This analysis aimed to evaluate the agreement between self-reported data from a population-based cohort study with data from two large German health insurance companies.
METHODS: Participants with available self-reported diagnoses of a history of stroke, atrial fibrillation (AF), heart failure (HF), and myocardial infarction (MI) from the baseline and the follow-up (after six years) surveys of the prospective population-based LIFE-Adult study were included in this study. Two health insurance companies provided ICD-10-GM codes. The agreement between the self-reports and health insurance data (HID) was examined by calculating sensitivity, specificity, Cohen`s Kappa, positive and negative predictive values. We used multivariable logistic regression models to examine whether odds ratios (OR) for the association between risk factors and the certain disease changed, depending on whether self-reports or HID was used as the dependent variable.
RESULTS: One thousand seven hundred eighty four individuals with complete data were included in this interim analysis. Mean age was 58 (SD±12) years and 984 (55%) were female. 52 (2.9%) subjects reported a history of stroke, 99 (5.6%) AF, 63 (3.5%) HF, and 46 (2.6%) MI. Compared with the HID, a high specificity was found for all four diagnoses (stroke: 99% [95% CI 99.3-99.9]; AF: 99% [95% CI 98.1-99.2], HF: 98% [95% CI 97.6-98.9], and MI: 99% [95% CI 98.9-99.7]). Sensitivity ranged from 58% (95% CI 47.4-69.5) for stroke over 61% (95% CI 48.8-74.0) for MI, to 65% (95% CI 56.6-73.9) for AF. Sensitivity in HF was the lowest (20% [95% CI 14.4-26.5]).
CONCLUSION: The use of German health insurance data is a feasible method for verifying population-based self-reported diagnoses. The sensitivity varied among the self-reported diseases compared with the health insurance data, whereas the specificity was continuously high. The verification of self-reported diagnoses using health insurance data as an additional data source may be considered in future population-based assessments to reduce misclassification error of self-reported data.
PMID:40336119 | DOI:10.1186/s13690-025-01606-3