Deep neural model with self-training for scientific keyphrase extraction.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2020
2020
Historique:
received:
27
12
2019
accepted:
16
04
2020
entrez:
16
5
2020
pubmed:
16
5
2020
medline:
29
7
2020
Statut:
epublish
Résumé
Scientific information extraction is a crucial step for understanding scientific publications. In this paper, we focus on scientific keyphrase extraction, which aims to identify keyphrases from scientific articles and classify them into predefined categories. We present a neural network based approach for this task, which employs the bidirectional long short-memory (LSTM) to represent the sentences in the article. On top of the bidirectional LSTM layer in our neural model, conditional random field (CRF) is used to predict the label sequence for the whole sentence. Considering the expensive annotated data for supervised learning methods, we introduce self-training method into our neural model to leverage the unlabeled articles. Experimental results on the ScienceIE corpus and ACL keyphrase corpus show that our neural model achieves promising performance without any hand-designed features and external knowledge resources. Furthermore, it efficiently incorporates the unlabeled data and achieve competitive performance compared with previous state-of-the-art systems.
Identifiants
pubmed: 32413094
doi: 10.1371/journal.pone.0232547
pii: PONE-D-19-35793
pmc: PMC7228065
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0232547Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
BMC Bioinformatics. 2017 Oct 30;18(1):462
pubmed: 29084508
PLoS One. 2019 May 2;14(5):e0216046
pubmed: 31048840
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276