Extractive summarization of clinical trial descriptions.

Algorithms Clinical Trials as Topic Natural Language Processing

Clinical trials NLP Text mining Text summarization

Journal

International journal of medical informatics

ISSN: 1872-8243

Titre abrégé: Int J Med Inform

Pays: Ireland

ID NLM: 9711057

Informations de publication

Date de publication:
09 2019

Historique:

received: 19 08 2018

revised: 06 04 2019

accepted: 21 05 2019

pubmed: 25 8 2019

medline: 28 11 2019

entrez: 25 8 2019

Statut: ppublish

Résumé

Text summarization of clinical trial descriptions has the potential to reduce the time required to familiarize oneself with the subject of studies by condensing long-form detailed descriptions to concise, meaning-preserving synopses. This work describes the process and quality of automatically generated summaries of clinical trial descriptions using extractive text summarization methods. We generated a novel dataset from the detailed descriptions and brief summaries of trials registered on clinicaltrials.gov. We executed several text summarization algorithms on the detailed descriptions in this corpus and calculated the standard ROUGE metrics using the brief summaries included in the record as a reference. To investigate the correlation of these metrics with human sentiments, four reviewers assessed the content-completeness of the generated summaries and the helpfulness of both the generated and reference summaries via a Likert scale questionnaire. The filtering stages of the dataset generation process reduce the 277,228 trials registered on clinicaltrials.gov to 101,016 records usable for the summarization task. On average, the summaries in this corpus are 25% the length of the detailed descriptions. Of the evaluated text summarization methods, the TextRank algorithm exhibits the overall best performance with a ROUGE-1 F1 score of 0.3531, ROUGE-2 F1 score of 0.1723, and ROUGE-L F1 score of 0.3003. These scores correlate with the assessment of the helpfulness and content similarity by the human reviewers. Inter-rater agreement for the helpfulness and content similarity was slight and fair respectively (Fleiss' kappa of 0.12 and 0.22). Extractive summarization is a viable tool for generating meaning-preserving synopses of detailed clinical trial descriptions. Further, the human evaluation has shown that the ROUGE-L F1 score is useful for rating the general quality of generated summaries of clinical trial descriptions in an automated way.

Identifiants

DOI: 10.1016/j.ijmedinf.2019.05.019 PMID: 31445245

pubmed: 31445245

pii: S1386-5056(18)30933-X

doi: 10.1016/j.ijmedinf.2019.05.019

pii:

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

114-121

Extractive summarization of clinical trial descriptions.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Auteurs

Christian Gulden (C)

Melanie Kirchner (M)

Christina Schüttler (C)

Marc Hinderer (M)

Marvin Kampf (M)

Hans-Ulrich Prokosch (HU)

Dennis Toddenroth (D)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Multilabel SegSRGAN-A framework for parcellation and morphometry of preterm brain in MRI.

An arithmetic operation P system based on symmetric ternary system.

Unsupervised learning for real-time and continuous gait phase detection.

Classifications MeSH