Summarization of biomedical articles using domain-specific word embeddings and graph ranking.

Deep learning Graph ranking Medical text mining Natural language processing Text summarization Word embedding

Journal

Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413

Informations de publication

Date de publication:
07 2020
Historique:
received: 13 01 2020
revised: 06 05 2020
accepted: 09 05 2020
pubmed: 23 5 2020
medline: 29 7 2021
entrez: 23 5 2020
Statut: ppublish

Résumé

Text summarization tools can help biomedical researchers and clinicians reduce the time and effort needed for acquiring important information from numerous documents. It has been shown that the input text can be modeled as a graph, and important sentences can be selected by identifying central nodes within the graph. However, the effective representation of documents, quantifying the relatedness of sentences, and selecting the most informative sentences are main challenges that need to be addressed in graph-based summarization. In this paper, we address these challenges in the context of biomedical text summarization. We evaluate the efficacy of a graph-based summarizer using different types of context-free and contextualized embeddings. The word representations are produced by pre-training neural language models on large corpora of biomedical texts. The summarizer models the input text as a graph in which the strength of relations between sentences is measured using the domain specific vector representations. We also assess the usefulness of different graph ranking techniques in the sentence selection step of our summarization method. Using the common Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, we evaluate the performance of our summarizer against various comparison methods. The results show that when the summarizer utilizes proper combinations of context-free and contextualized embeddings, along with an effective ranking method, it can outperform the other methods. We demonstrate that the best settings of our graph-based summarizer can efficiently improve the informative content of summaries and decrease the redundancy.

Identifiants

pubmed: 32439479
pii: S1532-0464(20)30080-0
doi: 10.1016/j.jbi.2020.103452
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

103452

Informations de copyright

Copyright © 2020 Elsevier Inc. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Milad Moradi (M)

Institute for Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria. Electronic address: milad.moradivastegani@meduniwien.ac.at.

Maedeh Dashti (M)

Department of Computer Science, Islamic Azad University, Isfahan, Iran.

Matthias Samwald (M)

Institute for Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria. Electronic address: matthias.samwald@meduniwien.ac.at.

Articles similaires

Classifications MeSH