Extractive summarization of clinical trial descriptions.

Clinical trials NLP Text mining Text summarization

Journal

International journal of medical informatics
ISSN: 1872-8243
Titre abrégé: Int J Med Inform
Pays: Ireland
ID NLM: 9711057

Informations de publication

Date de publication:
09 2019
Historique:
received: 19 08 2018
revised: 06 04 2019
accepted: 21 05 2019
pubmed: 25 8 2019
medline: 28 11 2019
entrez: 25 8 2019
Statut: ppublish

Résumé

Text summarization of clinical trial descriptions has the potential to reduce the time required to familiarize oneself with the subject of studies by condensing long-form detailed descriptions to concise, meaning-preserving synopses. This work describes the process and quality of automatically generated summaries of clinical trial descriptions using extractive text summarization methods. We generated a novel dataset from the detailed descriptions and brief summaries of trials registered on clinicaltrials.gov. We executed several text summarization algorithms on the detailed descriptions in this corpus and calculated the standard ROUGE metrics using the brief summaries included in the record as a reference. To investigate the correlation of these metrics with human sentiments, four reviewers assessed the content-completeness of the generated summaries and the helpfulness of both the generated and reference summaries via a Likert scale questionnaire. The filtering stages of the dataset generation process reduce the 277,228 trials registered on clinicaltrials.gov to 101,016 records usable for the summarization task. On average, the summaries in this corpus are 25% the length of the detailed descriptions. Of the evaluated text summarization methods, the TextRank algorithm exhibits the overall best performance with a ROUGE-1 F1 score of 0.3531, ROUGE-2 F1 score of 0.1723, and ROUGE-L F1 score of 0.3003. These scores correlate with the assessment of the helpfulness and content similarity by the human reviewers. Inter-rater agreement for the helpfulness and content similarity was slight and fair respectively (Fleiss' kappa of 0.12 and 0.22). Extractive summarization is a viable tool for generating meaning-preserving synopses of detailed clinical trial descriptions. Further, the human evaluation has shown that the ROUGE-L F1 score is useful for rating the general quality of generated summaries of clinical trial descriptions in an automated way.

Identifiants

pubmed: 31445245
pii: S1386-5056(18)30933-X
doi: 10.1016/j.ijmedinf.2019.05.019
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

114-121

Informations de copyright

Copyright © 2019 Elsevier B.V. All rights reserved.

Auteurs

Christian Gulden (C)

Medical Informatics, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen, Germany. Electronic address: Christian.Gulden@fau.de.

Melanie Kirchner (M)

Medical Center for Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany.

Christina Schüttler (C)

Medical Informatics, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen, Germany.

Marc Hinderer (M)

Medical Informatics, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen, Germany.

Marvin Kampf (M)

Medical Center for Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany.

Hans-Ulrich Prokosch (HU)

Medical Informatics, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen, Germany.

Dennis Toddenroth (D)

Medical Informatics, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen, Germany.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Algorithms Software Artificial Intelligence Computer Simulation

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking

Classifications MeSH