Importance of Diagnostic Accuracy in Big Data: False-Positive Diagnoses of Type 2 Diabetes in Health Insurance Claims Data of 70 Million Germans.
Epidemiology
aggregated data
chronic diseases - epidemiology
illness-death model
incidence
mortality
non-communicable chronic disease (NCD)
prevalence
Journal
Frontiers in epidemiology
ISSN: 2674-1199
Titre abrégé: Front Epidemiol
Pays: Switzerland
ID NLM: 9918419158106676
Informations de publication
Date de publication:
2022
2022
Historique:
received:
01
03
2022
accepted:
30
03
2022
medline:
23
5
2022
pubmed:
23
5
2022
entrez:
8
3
2024
Statut:
epublish
Résumé
Large data sets comprising diagnoses of chronic conditions are becoming increasingly available for research purposes. In Germany, it is planned that aggregated claims data - including medical diagnoses from the statutory health insurance - with roughly 70 million insurants will be published regularly. The validity of the diagnoses in such big datasets can hardly be assessed. In case the dataset comprises prevalence, incidence, and mortality, it is possible to estimate the proportion of false-positive diagnoses using mathematical relations from the illness-death model. We apply the method to age-specific aggregated claims data from 70 million Germans about type 2 diabetes in Germany stratified by sex and report the findings in terms of the age-specific ratio of false-positive diagnoses of type 2 diabetes (FPR) in the dataset. The FPR for men and women changes with age. In men, the FPR increases linearly from 1 to 3 per 1,000 in the age group of 30-50 years. For age between 50 and 80 years, FPR remains below 4 per 1,000. After 80 years of age, we have an increase to approximately 5 per 1,000. In women, we find a steep increase from age 30 to 60 years, the peak FPR is reached at approximately 12 per 1,000 between 60 and 70 years of age. After age 70 years, the FPR of women drops tremendously. In all age groups, the FPR is higher in women than in men. In terms of absolute numbers, we find that there are 217,000 people with a false-positive diagnosis in the dataset (95% confidence interval, CI: 204-229), the vast majority being women (172,000, 95% CI: 162-180). Our work indicates that possible false-positive (and negative) diagnoses should appropriately be dealt with in claims data, for example, by the inclusion of age- and sex-specific error terms in statistical models, to avoid potentially biased or wrong conclusions.
Identifiants
pubmed: 38455330
doi: 10.3389/fepid.2022.887335
pmc: PMC10911003
doi:
Types de publication
Journal Article
Langues
eng
Pagination
887335Informations de copyright
Copyright © 2022 Brinks, Tönnies and Hoyer.
Déclaration de conflit d'intérêts
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.