Communicating exploratory unsupervised machine learning analysis in age clustering for paediatric disease.


Journal

BMJ health & care informatics
ISSN: 2632-1009
Titre abrégé: BMJ Health Care Inform
Pays: England
ID NLM: 101745500

Informations de publication

Date de publication:
29 Jul 2024
Historique:
received: 10 11 2023
accepted: 01 07 2024
medline: 30 7 2024
pubmed: 30 7 2024
entrez: 29 7 2024
Statut: epublish

Résumé

Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders. Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed. Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated. Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.

Sections du résumé

BACKGROUND BACKGROUND
Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders.
METHODS METHODS
Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed.
FINDINGS RESULTS
Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated.
CONCLUSION CONCLUSIONS
Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.

Identifiants

pubmed: 39074912
pii: bmjhci-2023-100963
doi: 10.1136/bmjhci-2023-100963
pii:
doi:

Types de publication

Journal Article Observational Study

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© Author(s) (or their employer(s)) 2024. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Déclaration de conflit d'intérêts

Competing interests: None declared.

Auteurs

Joshua William Spear (JW)

DRIVE, Great Ormond Street Hospital for Children, London, UK.
NIHR GOSH BRC, London, UK.

Eleni Pissaridou (E)

DRIVE, Great Ormond Street Hospital for Children, London, UK.
NIHR GOSH BRC, London, UK.

Stuart Bowyer (S)

DRIVE, Great Ormond Street Hospital for Children, London, UK.
NIHR GOSH BRC, London, UK.

William A Bryant (WA)

DRIVE, Great Ormond Street Hospital for Children, London, UK.
NIHR GOSH BRC, London, UK.

Daniel Key (D)

DRIVE, Great Ormond Street Hospital for Children, London, UK.
NIHR GOSH BRC, London, UK.

John Booth (J)

DRIVE, Great Ormond Street Hospital for Children, London, UK.
NIHR GOSH BRC, London, UK.

Anastasia Spiridou (A)

DRIVE, Great Ormond Street Hospital for Children, London, UK.
NIHR GOSH BRC, London, UK.

Spiros Denaxas (S)

Institute of Health Informatics, University College London, London, UK.
BHF Data Science Centre, London, UK.

Rebecca Pope (R)

Institute of Child Health, University College London, London, UK.

Andrew M Taylor (AM)

DRIVE, Great Ormond Street Hospital for Children, London, UK.
Institute of Cardiovascular Science, University College London, London, UK.

Harry Hemingway (H)

DRIVE, Great Ormond Street Hospital for Children, London, UK.
Institute of Health Informatics, University College London, London, UK.

Neil J Sebire (NJ)

DRIVE, Great Ormond Street Hospital for Children, London, UK neil.sebire@gosh.nhs.uk.
Institute of Child Health, University College London, London, UK.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH