Development of a phenotype ontology for autism spectrum disorder by natural language processing on electronic health records.

Autism Autism spectrum disorder Electronic health record Natural language processing Phenotype ontology Terminology set

Journal

Journal of neurodevelopmental disorders
ISSN: 1866-1955
Titre abrégé: J Neurodev Disord
Pays: England
ID NLM: 101483832

Informations de publication

Date de publication:
23 05 2022
Historique:
received: 22 10 2021
accepted: 26 04 2022
entrez: 23 5 2022
pubmed: 24 5 2022
medline: 26 5 2022
Statut: epublish

Résumé

Autism spectrum disorder (ASD) is a complex neurodevelopmental condition characterized by restricted, repetitive behavior, and impaired social communication and interactions. However, significant challenges remain in diagnosing and subtyping ASD due in part to the lack of a validated, standardized vocabulary to characterize clinical phenotypic presentation of ASD. Although the human phenotype ontology (HPO) plays an important role in delineating nuanced phenotypes for rare genetic diseases, it is inadequate to capture characteristic of behavioral and psychiatric phenotypes for individuals with ASD. There is a clear need, therefore, for a well-established phenotype terminology set that can assist in characterization of ASD phenotypes from patients' clinical narratives. To address this challenge, we used natural language processing (NLP) techniques to identify and curate ASD phenotypic terms from high-quality unstructured clinical notes in the electronic health record (EHR) on 8499 individuals with ASD, 8177 individuals with non-ASD psychiatric disorders, and 8482 individuals without a documented psychiatric disorder. We further performed dimensional reduction clustering analysis to subgroup individuals with ASD, using nonnegative matrix factorization method. Through a note-processing pipeline that includes several steps of state-of-the-art NLP approaches, we identified 3336 ASD terms linking to 1943 unique medical concepts, which represents among the largest ASD terminology set to date. The extracted ASD terms were further organized in a formal ontology structure similar to the HPO. Clustering analysis showed that these terms could be used in a diagnostic pipeline to differentiate individuals with ASD from individuals with other psychiatric disorders. Our ASD phenotype ontology can assist clinicians and researchers in characterizing individuals with ASD, facilitating automated diagnosis, and subtyping individuals with ASD to facilitate personalized therapeutic decision-making.

Sections du résumé

BACKGROUND
Autism spectrum disorder (ASD) is a complex neurodevelopmental condition characterized by restricted, repetitive behavior, and impaired social communication and interactions. However, significant challenges remain in diagnosing and subtyping ASD due in part to the lack of a validated, standardized vocabulary to characterize clinical phenotypic presentation of ASD. Although the human phenotype ontology (HPO) plays an important role in delineating nuanced phenotypes for rare genetic diseases, it is inadequate to capture characteristic of behavioral and psychiatric phenotypes for individuals with ASD. There is a clear need, therefore, for a well-established phenotype terminology set that can assist in characterization of ASD phenotypes from patients' clinical narratives.
METHODS
To address this challenge, we used natural language processing (NLP) techniques to identify and curate ASD phenotypic terms from high-quality unstructured clinical notes in the electronic health record (EHR) on 8499 individuals with ASD, 8177 individuals with non-ASD psychiatric disorders, and 8482 individuals without a documented psychiatric disorder. We further performed dimensional reduction clustering analysis to subgroup individuals with ASD, using nonnegative matrix factorization method.
RESULTS
Through a note-processing pipeline that includes several steps of state-of-the-art NLP approaches, we identified 3336 ASD terms linking to 1943 unique medical concepts, which represents among the largest ASD terminology set to date. The extracted ASD terms were further organized in a formal ontology structure similar to the HPO. Clustering analysis showed that these terms could be used in a diagnostic pipeline to differentiate individuals with ASD from individuals with other psychiatric disorders.
CONCLUSION
Our ASD phenotype ontology can assist clinicians and researchers in characterizing individuals with ASD, facilitating automated diagnosis, and subtyping individuals with ASD to facilitate personalized therapeutic decision-making.

Identifiants

pubmed: 35606697
doi: 10.1186/s11689-022-09442-0
pii: 10.1186/s11689-022-09442-0
pmc: PMC9128253
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

32

Subventions

Organisme : NICHD NIH HHS
ID : P50 HD105354
Pays : United States

Informations de copyright

© 2022. The Author(s).

Références

J Am Med Inform Assoc. 2018 Mar 1;25(3):331-336
pubmed: 29186491
Am J Hum Genet. 2008 Nov;83(5):610-5
pubmed: 18950739
J Am Acad Child Adolesc Psychiatry. 2012 Apr;51(4):368-83
pubmed: 22449643
J Biomed Inform. 2015 Aug;56:333-47
pubmed: 26151311
J Am Acad Child Adolesc Psychiatry. 1998 Mar;37(3):271-7
pubmed: 9519631
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
pubmed: 14681409
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13
pubmed: 20819853
Res Autism Spectr Disord. 2012 Jan;6(1):249-262
pubmed: 22125579
Bioinformatics. 2020 Feb 15;36(4):1234-1240
pubmed: 31501885
Autism. 2014 Jul;18(5):583-97
pubmed: 23787411
PLoS One. 2016 Jul 29;11(7):e0159621
pubmed: 27472449
Neuroinformatics. 2014 Apr;12(2):291-305
pubmed: 24163114
AI Matters. 2015 Jun;1(4):4-12
pubmed: 27239556
Environ Health Insights. 2008 Aug 20;2:55-9
pubmed: 21572830
Mol Autism. 2013 May 15;4(1):13
pubmed: 23675688
Curr Biol. 2005 Oct 11;15(19):R786-90
pubmed: 16213805
Arch Pediatr Adolesc Med. 2005 Jan;159(1):37-44
pubmed: 15630056
BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):322
pubmed: 33380331

Auteurs

Mengge Zhao (M)

Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.

James Havrilla (J)

Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.

Jacqueline Peng (J)

Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA.

Madison Drye (M)

Center for Autism Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.

Maddie Fecher (M)

Center for Autism Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.

Whitney Guthrie (W)

Center for Autism Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
Departments of Pediatrics and Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.

Birkan Tunc (B)

Center for Autism Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
Departments of Pediatrics and Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.

Robert Schultz (R)

Center for Autism Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
Departments of Pediatrics and Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.

Kai Wang (K)

Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.

Yunyun Zhou (Y)

Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. zhouy6@chop.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH