Automated prioritization of sick newborns for whole genome sequencing using clinical natural language processing and machine learning.
Journal
Genome medicine
ISSN: 1756-994X
Titre abrégé: Genome Med
Pays: England
ID NLM: 101475844
Informations de publication
Date de publication:
16 03 2023
16 03 2023
Historique:
received:
25
08
2022
accepted:
27
02
2023
entrez:
17
3
2023
pubmed:
18
3
2023
medline:
21
3
2023
Statut:
epublish
Résumé
Rapidly and efficiently identifying critically ill infants for whole genome sequencing (WGS) is a costly and challenging task currently performed by scarce, highly trained experts and is a major bottleneck for application of WGS in the NICU. There is a dire need for automated means to prioritize patients for WGS. Institutional databases of electronic health records (EHRs) are logical starting points for identifying patients with undiagnosed Mendelian diseases. We have developed automated means to prioritize patients for rapid and whole genome sequencing (rWGS and WGS) directly from clinical notes. Our approach combines a clinical natural language processing (CNLP) workflow with a machine learning-based prioritization tool named Mendelian Phenotype Search Engine (MPSE). MPSE accurately and robustly identified NICU patients selected for WGS by clinical experts from Rady Children's Hospital in San Diego (AUC 0.86) and the University of Utah (AUC 0.85). In addition to effectively identifying patients for WGS, MPSE scores also strongly prioritize diagnostic cases over non-diagnostic cases, with projected diagnostic yields exceeding 50% throughout the first and second quartiles of score-ranked patients. Our results indicate that an automated pipeline for selecting acutely ill infants in neonatal intensive care units (NICU) for WGS can meet or exceed diagnostic yields obtained through current selection procedures, which require time-consuming manual review of clinical notes and histories by specialized personnel.
Sections du résumé
BACKGROUND
Rapidly and efficiently identifying critically ill infants for whole genome sequencing (WGS) is a costly and challenging task currently performed by scarce, highly trained experts and is a major bottleneck for application of WGS in the NICU. There is a dire need for automated means to prioritize patients for WGS.
METHODS
Institutional databases of electronic health records (EHRs) are logical starting points for identifying patients with undiagnosed Mendelian diseases. We have developed automated means to prioritize patients for rapid and whole genome sequencing (rWGS and WGS) directly from clinical notes. Our approach combines a clinical natural language processing (CNLP) workflow with a machine learning-based prioritization tool named Mendelian Phenotype Search Engine (MPSE).
RESULTS
MPSE accurately and robustly identified NICU patients selected for WGS by clinical experts from Rady Children's Hospital in San Diego (AUC 0.86) and the University of Utah (AUC 0.85). In addition to effectively identifying patients for WGS, MPSE scores also strongly prioritize diagnostic cases over non-diagnostic cases, with projected diagnostic yields exceeding 50% throughout the first and second quartiles of score-ranked patients.
CONCLUSIONS
Our results indicate that an automated pipeline for selecting acutely ill infants in neonatal intensive care units (NICU) for WGS can meet or exceed diagnostic yields obtained through current selection procedures, which require time-consuming manual review of clinical notes and histories by specialized personnel.
Identifiants
pubmed: 36927505
doi: 10.1186/s13073-023-01166-7
pii: 10.1186/s13073-023-01166-7
pmc: PMC10018992
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
18Subventions
Organisme : NCATS NIH HHS
ID : UL1 TR002550
Pays : United States
Organisme : NIH HHS
ID : S10 OD021644
Pays : United States
Informations de copyright
© 2023. The Author(s).
Références
NPJ Genom Med. 2020 Aug 11;5:33
pubmed: 32821428
Genome Med. 2021 Oct 14;13(1):153
pubmed: 34645491
Nucleic Acids Res. 2015 Jan;43(Database issue):D789-98
pubmed: 25428349
Sci Transl Med. 2020 May 20;12(544):
pubmed: 32434849
NPJ Genom Med. 2018 Apr 4;3:10
pubmed: 29644095
Pediatr Crit Care Med. 2019 Nov;20(11):1007-1020
pubmed: 31246743
Am J Hum Genet. 2019 Oct 3;105(4):719-733
pubmed: 31564432
Genes (Basel). 2020 Apr 23;11(4):
pubmed: 32340307
Am J Hum Genet. 2014 Apr 3;94(4):599-610
pubmed: 24702956
Am J Hum Genet. 2021 Jul 1;108(7):1231-1238
pubmed: 34089648
N Engl J Med. 2017 Nov 16;377(20):1909-1911
pubmed: 29141159
Am J Hum Genet. 2019 Sep 5;105(3):448-455
pubmed: 31491408
Genet Med. 2017 Feb;19(2):209-214
pubmed: 27441994
Mol Genet Genomic Med. 2022 Apr;10(4):e1888
pubmed: 35119225
Am J Hum Genet. 2020 Nov 5;107(5):942-952
pubmed: 33157007
Sci Transl Med. 2019 Apr 24;11(489):
pubmed: 31019026
Genome Med. 2015 Jul 30;7(1):81
pubmed: 26229552
NPJ Genom Med. 2021 Apr 22;6(1):29
pubmed: 33888711
N Engl J Med. 2019 Jun 20;380(25):2478-2480
pubmed: 31216405
Intensive Care Med. 2019 May;45(5):627-636
pubmed: 30847515
NPJ Genom Med. 2018 Feb 9;3:6
pubmed: 29449963
Genet Med. 2019 Jul;21(7):1585-1593
pubmed: 30514889
Am J Hum Genet. 2015 Jul 2;97(1):111-24
pubmed: 26119816