Deep neural model with self-training for scientific keyphrase extraction.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2020
Historique:
received: 27 12 2019
accepted: 16 04 2020
entrez: 16 5 2020
pubmed: 16 5 2020
medline: 29 7 2020
Statut: epublish

Résumé

Scientific information extraction is a crucial step for understanding scientific publications. In this paper, we focus on scientific keyphrase extraction, which aims to identify keyphrases from scientific articles and classify them into predefined categories. We present a neural network based approach for this task, which employs the bidirectional long short-memory (LSTM) to represent the sentences in the article. On top of the bidirectional LSTM layer in our neural model, conditional random field (CRF) is used to predict the label sequence for the whole sentence. Considering the expensive annotated data for supervised learning methods, we introduce self-training method into our neural model to leverage the unlabeled articles. Experimental results on the ScienceIE corpus and ACL keyphrase corpus show that our neural model achieves promising performance without any hand-designed features and external knowledge resources. Furthermore, it efficiently incorporates the unlabeled data and achieve competitive performance compared with previous state-of-the-art systems.

Identifiants

pubmed: 32413094
doi: 10.1371/journal.pone.0232547
pii: PONE-D-19-35793
pmc: PMC7228065
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0232547

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

BMC Bioinformatics. 2017 Oct 30;18(1):462
pubmed: 29084508
PLoS One. 2019 May 2;14(5):e0216046
pubmed: 31048840
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276

Auteurs

Xun Zhu (X)

Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, Hubei, China.
School of Mathematics and Computer Science, Jianghan University, Wuhan, Hubei, China.

Chen Lyu (C)

Laboratory of Language and Artificial Intelligence, Guangdong University of Foreign Studies, Guangzhou, Guangdong, China.
Collaborative Innovation Center for Language Research and Services, Guangdong University of Foreign Studies, Guangzhou, Guangdong, China.

Donghong Ji (D)

Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, Hubei, China.

Han Liao (H)

School of Mathematics and Computer Science, Jianghan University, Wuhan, Hubei, China.

Fei Li (F)

Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, Hubei, China.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Humans Meta-Analysis as Topic Sample Size Models, Statistical Computer Simulation

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking
Humans Shoulder Fractures Tomography, X-Ray Computed Neural Networks, Computer Female

Classifications MeSH