Deep Learning Transformer Models for Building a Comprehensive and Real-time Trauma Observatory: Development and Validation Study.

deep learning emergencies natural language processing public health transformers trauma

Journal

JMIR AI
ISSN: 2817-1705
Titre abrégé: JMIR AI
Pays: Canada
ID NLM: 9918645789006676

Informations de publication

Date de publication:
12 Jan 2023
Historique:
received: 07 07 2022
accepted: 29 10 2022
revised: 14 10 2022
medline: 12 1 2023
pubmed: 12 1 2023
entrez: 14 6 2024
Statut: epublish

Résumé

Public health surveillance relies on the collection of data, often in near-real time. Recent advances in natural language processing make it possible to envisage an automated system for extracting information from electronic health records. To study the feasibility of setting up a national trauma observatory in France, we compared the performance of several automatic language processing methods in a multiclass classification task of unstructured clinical notes. A total of 69,110 free-text clinical notes related to visits to the emergency departments of the University Hospital of Bordeaux, France, between 2012 and 2019 were manually annotated. Among these clinical notes, 32.5% (22,481/69,110) were traumas. We trained 4 transformer models (deep learning models that encompass attention mechanism) and compared them with the term frequency-inverse document frequency associated with the support vector machine method. The transformer models consistently performed better than the term frequency-inverse document frequency and a support vector machine. Among the transformers, the GPTanam model pretrained with a French corpus with an additional autosupervised learning step on 306,368 unlabeled clinical notes showed the best performance with a micro F The transformers proved efficient at the multiclass classification of narrative and medical data. Further steps for improvement should focus on the expansion of abbreviations and multioutput multiclass classification.

Sections du résumé

BACKGROUND BACKGROUND
Public health surveillance relies on the collection of data, often in near-real time. Recent advances in natural language processing make it possible to envisage an automated system for extracting information from electronic health records.
OBJECTIVE OBJECTIVE
To study the feasibility of setting up a national trauma observatory in France, we compared the performance of several automatic language processing methods in a multiclass classification task of unstructured clinical notes.
METHODS METHODS
A total of 69,110 free-text clinical notes related to visits to the emergency departments of the University Hospital of Bordeaux, France, between 2012 and 2019 were manually annotated. Among these clinical notes, 32.5% (22,481/69,110) were traumas. We trained 4 transformer models (deep learning models that encompass attention mechanism) and compared them with the term frequency-inverse document frequency associated with the support vector machine method.
RESULTS RESULTS
The transformer models consistently performed better than the term frequency-inverse document frequency and a support vector machine. Among the transformers, the GPTanam model pretrained with a French corpus with an additional autosupervised learning step on 306,368 unlabeled clinical notes showed the best performance with a micro F
CONCLUSIONS CONCLUSIONS
The transformers proved efficient at the multiclass classification of narrative and medical data. Further steps for improvement should focus on the expansion of abbreviations and multioutput multiclass classification.

Identifiants

pubmed: 38875539
pii: v2i1e40843
doi: 10.2196/40843
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e40843

Informations de copyright

©Gabrielle Chenais, Cédric Gil-Jardiné, Hélène Touchais, Marta Avalos Fernandez, Benjamin Contrand, Eric Tellier, Xavier Combes, Loick Bourdois, Philippe Revel, Emmanuel Lagarde. Originally published in JMIR AI (https://ai.jmir.org), 12.01.2023.

Auteurs

Gabrielle Chenais (G)

Unit 1219, Bordeaux Public Health Center, Institut National de la Santé et de la Recherche Médicale, Bordeaux, France.

Cédric Gil-Jardiné (C)

Unit 1219, Bordeaux Public Health Center, Institut National de la Santé et de la Recherche Médicale, Bordeaux, France.
Emergency Department, Bordeaux University Hospital, Bordeaux, France.

Hélène Touchais (H)

Unit 1219, Bordeaux Public Health Center, Institut National de la Santé et de la Recherche Médicale, Bordeaux, France.

Marta Avalos Fernandez (M)

Unit 1219, Bordeaux Public Health Center, Institut National de la Santé et de la Recherche Médicale, Bordeaux, France.
Statistics in Systems Biology and Translational Medicine Team, University of Bordeaux, Institut National de Recherche en Sciences et Technologies du Numérique, Talence, France.

Benjamin Contrand (B)

Unit 1219, Bordeaux Public Health Center, Institut National de la Santé et de la Recherche Médicale, Bordeaux, France.

Eric Tellier (E)

Unit 1219, Bordeaux Public Health Center, Institut National de la Santé et de la Recherche Médicale, Bordeaux, France.
Emergency Department, Bordeaux University Hospital, Bordeaux, France.

Xavier Combes (X)

Emergency Department, Bordeaux University Hospital, Bordeaux, France.

Loick Bourdois (L)

Unit 1219, Bordeaux Public Health Center, Institut National de la Santé et de la Recherche Médicale, Bordeaux, France.

Philippe Revel (P)

Emergency Department, Bordeaux University Hospital, Bordeaux, France.

Emmanuel Lagarde (E)

Unit 1219, Bordeaux Public Health Center, Institut National de la Santé et de la Recherche Médicale, Bordeaux, France.

Classifications MeSH