Automated prioritization of sick newborns for whole genome sequencing using clinical natural language processing and machine learning.


Journal

Genome medicine
ISSN: 1756-994X
Titre abrégé: Genome Med
Pays: England
ID NLM: 101475844

Informations de publication

Date de publication:
16 03 2023
Historique:
received: 25 08 2022
accepted: 27 02 2023
entrez: 17 3 2023
pubmed: 18 3 2023
medline: 21 3 2023
Statut: epublish

Résumé

Rapidly and efficiently identifying critically ill infants for whole genome sequencing (WGS) is a costly and challenging task currently performed by scarce, highly trained experts and is a major bottleneck for application of WGS in the NICU. There is a dire need for automated means to prioritize patients for WGS. Institutional databases of electronic health records (EHRs) are logical starting points for identifying patients with undiagnosed Mendelian diseases. We have developed automated means to prioritize patients for rapid and whole genome sequencing (rWGS and WGS) directly from clinical notes. Our approach combines a clinical natural language processing (CNLP) workflow with a machine learning-based prioritization tool named Mendelian Phenotype Search Engine (MPSE). MPSE accurately and robustly identified NICU patients selected for WGS by clinical experts from Rady Children's Hospital in San Diego (AUC 0.86) and the University of Utah (AUC 0.85). In addition to effectively identifying patients for WGS, MPSE scores also strongly prioritize diagnostic cases over non-diagnostic cases, with projected diagnostic yields exceeding 50% throughout the first and second quartiles of score-ranked patients. Our results indicate that an automated pipeline for selecting acutely ill infants in neonatal intensive care units (NICU) for WGS can meet or exceed diagnostic yields obtained through current selection procedures, which require time-consuming manual review of clinical notes and histories by specialized personnel.

Sections du résumé

BACKGROUND
Rapidly and efficiently identifying critically ill infants for whole genome sequencing (WGS) is a costly and challenging task currently performed by scarce, highly trained experts and is a major bottleneck for application of WGS in the NICU. There is a dire need for automated means to prioritize patients for WGS.
METHODS
Institutional databases of electronic health records (EHRs) are logical starting points for identifying patients with undiagnosed Mendelian diseases. We have developed automated means to prioritize patients for rapid and whole genome sequencing (rWGS and WGS) directly from clinical notes. Our approach combines a clinical natural language processing (CNLP) workflow with a machine learning-based prioritization tool named Mendelian Phenotype Search Engine (MPSE).
RESULTS
MPSE accurately and robustly identified NICU patients selected for WGS by clinical experts from Rady Children's Hospital in San Diego (AUC 0.86) and the University of Utah (AUC 0.85). In addition to effectively identifying patients for WGS, MPSE scores also strongly prioritize diagnostic cases over non-diagnostic cases, with projected diagnostic yields exceeding 50% throughout the first and second quartiles of score-ranked patients.
CONCLUSIONS
Our results indicate that an automated pipeline for selecting acutely ill infants in neonatal intensive care units (NICU) for WGS can meet or exceed diagnostic yields obtained through current selection procedures, which require time-consuming manual review of clinical notes and histories by specialized personnel.

Identifiants

pubmed: 36927505
doi: 10.1186/s13073-023-01166-7
pii: 10.1186/s13073-023-01166-7
pmc: PMC10018992
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

18

Subventions

Organisme : NCATS NIH HHS
ID : UL1 TR002550
Pays : United States
Organisme : NIH HHS
ID : S10 OD021644
Pays : United States

Informations de copyright

© 2023. The Author(s).

Références

NPJ Genom Med. 2020 Aug 11;5:33
pubmed: 32821428
Genome Med. 2021 Oct 14;13(1):153
pubmed: 34645491
Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98
pubmed: 25428349
Sci Transl Med. 2020 May 20;12(544):
pubmed: 32434849
NPJ Genom Med. 2018 Apr 4;3:10
pubmed: 29644095
Pediatr Crit Care Med. 2019 Nov;20(11):1007-1020
pubmed: 31246743
Am J Hum Genet. 2019 Oct 3;105(4):719-733
pubmed: 31564432
Genes (Basel). 2020 Apr 23;11(4):
pubmed: 32340307
Am J Hum Genet. 2014 Apr 3;94(4):599-610
pubmed: 24702956
Am J Hum Genet. 2021 Jul 1;108(7):1231-1238
pubmed: 34089648
N Engl J Med. 2017 Nov 16;377(20):1909-1911
pubmed: 29141159
Am J Hum Genet. 2019 Sep 5;105(3):448-455
pubmed: 31491408
Genet Med. 2017 Feb;19(2):209-214
pubmed: 27441994
Mol Genet Genomic Med. 2022 Apr;10(4):e1888
pubmed: 35119225
Am J Hum Genet. 2020 Nov 5;107(5):942-952
pubmed: 33157007
Sci Transl Med. 2019 Apr 24;11(489):
pubmed: 31019026
Genome Med. 2015 Jul 30;7(1):81
pubmed: 26229552
NPJ Genom Med. 2021 Apr 22;6(1):29
pubmed: 33888711
N Engl J Med. 2019 Jun 20;380(25):2478-2480
pubmed: 31216405
Intensive Care Med. 2019 May;45(5):627-636
pubmed: 30847515
NPJ Genom Med. 2018 Feb 9;3:6
pubmed: 29449963
Genet Med. 2019 Jul;21(7):1585-1593
pubmed: 30514889
Am J Hum Genet. 2015 Jul 2;97(1):111-24
pubmed: 26119816

Auteurs

Bennet Peterson (B)

Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.

Edgar Javier Hernandez (EJ)

Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA.

Charlotte Hobbs (C)

Rady Children's Institute for Genomic Medicine, San Diego, CA, USA.

Sabrina Malone Jenkins (S)

Division of Neonatology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA.

Barry Moore (B)

Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA.

Edwin Rosales (E)

Rady Children's Institute for Genomic Medicine, San Diego, CA, USA.

Samuel Zoucha (S)

Division of Neonatology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA.

Erica Sanford (E)

Rady Children's Institute for Genomic Medicine, San Diego, CA, USA.
Department of Pediatrics, Cedars-Sinai Medical Center, Los Angeles, CA, USA.

Matthew N Bainbridge (MN)

Rady Children's Institute for Genomic Medicine, San Diego, CA, USA.

Erwin Frise (E)

Fabric Genomics Inc., Oakland, CA, USA.

Albert Oriol (A)

Rady Children's Hospital, San Diego, CA, USA.

Luca Brunelli (L)

Division of Neonatology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA.

Stephen F Kingsmore (SF)

Rady Children's Institute for Genomic Medicine, San Diego, CA, USA.

Mark Yandell (M)

Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA. myandell@genetics.utah.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH