Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents.


Journal

Chemical research in toxicology
ISSN: 1520-5010
Titre abrégé: Chem Res Toxicol
Pays: United States
ID NLM: 8807448

Informations de publication

Date de publication:
21 08 2023
Historique:
medline: 22 8 2023
pubmed: 24 7 2023
entrez: 24 7 2023
Statut: ppublish

Résumé

The US Food and Drug Administration (FDA) regulatory process often involves several reviewers who focus on sets of information related to their respective areas of review. Accordingly, manufacturers that provide submission packages to regulatory agencies are instructed to organize the contents using a structure that enables the information to be easily allocated, retrieved, and reviewed. However, this practice is not always followed correctly; as such, some documents are not well structured, with similar information spreading across different sections, hindering the efficient access and review of all of the relevant data as a whole. To improve this common situation, we evaluated an artificial intelligence (AI)-based natural language processing (NLP) methodology, called Bidirectional Encoder Representations from Transformers (BERT), to automatically classify free-text information into standardized sections, supporting a holistic review of drug safety and efficacy. Specifically, FDA labeling documents were used in this study as a proof of concept, where the labeling section structure defined by the Physician Label Rule (PLR) was used to classify labels in the development of the model. The model was subsequently evaluated on texts from both well-structured labeling documents (i.e., PLR-based labeling) and less- or differently structured documents (i.e., non-PLR and Summary of Product Characteristic [SmPC] labeling.) In the training process, the model yielded 96% and 88% accuracy for binary and multiclass tasks, respectively. The testing accuracies observed for the PLR, non-PLR, and SmPC testing data sets for the binary model were 95%, 88%, and 88%, and for the multiclass model were 82%, 73%, and 68%, respectively. Our study demonstrated that automatically classifying free texts into standardized sections with AI language models could be an advanced regulatory science approach for supporting the review process by effectively processing unformatted documents.

Identifiants

pubmed: 37487037
doi: 10.1021/acs.chemrestox.3c00028
pmc: PMC10445280
doi:

Types de publication

Journal Article Research Support, U.S. Gov't, Non-P.H.S. Research Support, U.S. Gov't, P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

1290-1299

Références

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10:S11
pubmed: 22166012
BMC Bioinformatics. 2019 Mar 14;20(Suppl 2):97
pubmed: 30871458

Auteurs

Magnus Gray (M)

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, United States Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079 United States.

Joshua Xu (J)

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, United States Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079 United States.

Weida Tong (W)

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, United States Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079 United States.

Leihong Wu (L)

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, United States Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079 United States.

Articles similaires

Humans United States Aged Cross-Sectional Studies Medicare Part C
Humans Emergency Service, Hospital Child Child, Preschool Infant
Humans Mobile Applications Hepatitis C Male Female

How Certification Exams Reflect Current Practice.

Tara L Myers, Sean DeGarmo, Marianne Horahan
1.00
Humans Certification Clinical Competence Education, Nursing, Continuing Adult

Classifications MeSH