Development of a phenotype ontology for autism spectrum disorder by natural language processing on electronic health records.
Autism
Autism spectrum disorder
Electronic health record
Natural language processing
Phenotype ontology
Terminology set
Journal
Journal of neurodevelopmental disorders
ISSN: 1866-1955
Titre abrégé: J Neurodev Disord
Pays: England
ID NLM: 101483832
Informations de publication
Date de publication:
23 05 2022
23 05 2022
Historique:
received:
22
10
2021
accepted:
26
04
2022
entrez:
23
5
2022
pubmed:
24
5
2022
medline:
26
5
2022
Statut:
epublish
Résumé
Autism spectrum disorder (ASD) is a complex neurodevelopmental condition characterized by restricted, repetitive behavior, and impaired social communication and interactions. However, significant challenges remain in diagnosing and subtyping ASD due in part to the lack of a validated, standardized vocabulary to characterize clinical phenotypic presentation of ASD. Although the human phenotype ontology (HPO) plays an important role in delineating nuanced phenotypes for rare genetic diseases, it is inadequate to capture characteristic of behavioral and psychiatric phenotypes for individuals with ASD. There is a clear need, therefore, for a well-established phenotype terminology set that can assist in characterization of ASD phenotypes from patients' clinical narratives. To address this challenge, we used natural language processing (NLP) techniques to identify and curate ASD phenotypic terms from high-quality unstructured clinical notes in the electronic health record (EHR) on 8499 individuals with ASD, 8177 individuals with non-ASD psychiatric disorders, and 8482 individuals without a documented psychiatric disorder. We further performed dimensional reduction clustering analysis to subgroup individuals with ASD, using nonnegative matrix factorization method. Through a note-processing pipeline that includes several steps of state-of-the-art NLP approaches, we identified 3336 ASD terms linking to 1943 unique medical concepts, which represents among the largest ASD terminology set to date. The extracted ASD terms were further organized in a formal ontology structure similar to the HPO. Clustering analysis showed that these terms could be used in a diagnostic pipeline to differentiate individuals with ASD from individuals with other psychiatric disorders. Our ASD phenotype ontology can assist clinicians and researchers in characterizing individuals with ASD, facilitating automated diagnosis, and subtyping individuals with ASD to facilitate personalized therapeutic decision-making.
Sections du résumé
BACKGROUND
Autism spectrum disorder (ASD) is a complex neurodevelopmental condition characterized by restricted, repetitive behavior, and impaired social communication and interactions. However, significant challenges remain in diagnosing and subtyping ASD due in part to the lack of a validated, standardized vocabulary to characterize clinical phenotypic presentation of ASD. Although the human phenotype ontology (HPO) plays an important role in delineating nuanced phenotypes for rare genetic diseases, it is inadequate to capture characteristic of behavioral and psychiatric phenotypes for individuals with ASD. There is a clear need, therefore, for a well-established phenotype terminology set that can assist in characterization of ASD phenotypes from patients' clinical narratives.
METHODS
To address this challenge, we used natural language processing (NLP) techniques to identify and curate ASD phenotypic terms from high-quality unstructured clinical notes in the electronic health record (EHR) on 8499 individuals with ASD, 8177 individuals with non-ASD psychiatric disorders, and 8482 individuals without a documented psychiatric disorder. We further performed dimensional reduction clustering analysis to subgroup individuals with ASD, using nonnegative matrix factorization method.
RESULTS
Through a note-processing pipeline that includes several steps of state-of-the-art NLP approaches, we identified 3336 ASD terms linking to 1943 unique medical concepts, which represents among the largest ASD terminology set to date. The extracted ASD terms were further organized in a formal ontology structure similar to the HPO. Clustering analysis showed that these terms could be used in a diagnostic pipeline to differentiate individuals with ASD from individuals with other psychiatric disorders.
CONCLUSION
Our ASD phenotype ontology can assist clinicians and researchers in characterizing individuals with ASD, facilitating automated diagnosis, and subtyping individuals with ASD to facilitate personalized therapeutic decision-making.
Identifiants
pubmed: 35606697
doi: 10.1186/s11689-022-09442-0
pii: 10.1186/s11689-022-09442-0
pmc: PMC9128253
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
32Subventions
Organisme : NICHD NIH HHS
ID : P50 HD105354
Pays : United States
Informations de copyright
© 2022. The Author(s).
Références
J Am Med Inform Assoc. 2018 Mar 1;25(3):331-336
pubmed: 29186491
Am J Hum Genet. 2008 Nov;83(5):610-5
pubmed: 18950739
J Am Acad Child Adolesc Psychiatry. 2012 Apr;51(4):368-83
pubmed: 22449643
J Biomed Inform. 2015 Aug;56:333-47
pubmed: 26151311
J Am Acad Child Adolesc Psychiatry. 1998 Mar;37(3):271-7
pubmed: 9519631
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
pubmed: 14681409
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13
pubmed: 20819853
Res Autism Spectr Disord. 2012 Jan;6(1):249-262
pubmed: 22125579
Bioinformatics. 2020 Feb 15;36(4):1234-1240
pubmed: 31501885
Autism. 2014 Jul;18(5):583-97
pubmed: 23787411
PLoS One. 2016 Jul 29;11(7):e0159621
pubmed: 27472449
Neuroinformatics. 2014 Apr;12(2):291-305
pubmed: 24163114
AI Matters. 2015 Jun;1(4):4-12
pubmed: 27239556
Environ Health Insights. 2008 Aug 20;2:55-9
pubmed: 21572830
Mol Autism. 2013 May 15;4(1):13
pubmed: 23675688
Curr Biol. 2005 Oct 11;15(19):R786-90
pubmed: 16213805
Arch Pediatr Adolesc Med. 2005 Jan;159(1):37-44
pubmed: 15630056
BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):322
pubmed: 33380331