Dependency parsing of biomedical text with BERT.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
29 Dec 2020
Historique:
received: 18 11 2020
accepted: 24 11 2020
entrez: 29 12 2020
pubmed: 30 12 2020
medline: 16 1 2021
Statut: epublish

Résumé

 : Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine.  : We explore the application of state-of-the-art neural dependency parsing methods to biomedical text using the recently introduced CRAFT-SA shared task dataset. The CRAFT-SA task broadly follows the UD representation and recent UD task conventions, allowing us to fine-tune the UD-compatible Turku Neural Parser and UDify neural parsers to the task. We further evaluate the effect of transfer learning using a broad selection of BERT models, including several models pre-trained specifically for biomedical text processing.  : We find that recently introduced neural parsing technology is capable of generating highly accurate analyses of biomedical text, substantially improving on the best performance reported in the original CRAFT-SA shared task. We also find that initialization using a deep transfer learning model pre-trained on in-domain texts is key to maximizing the performance of the parsing methods.

Sections du résumé

BACKGROUND BACKGROUND
 : Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine.
METHODS METHODS
 : We explore the application of state-of-the-art neural dependency parsing methods to biomedical text using the recently introduced CRAFT-SA shared task dataset. The CRAFT-SA task broadly follows the UD representation and recent UD task conventions, allowing us to fine-tune the UD-compatible Turku Neural Parser and UDify neural parsers to the task. We further evaluate the effect of transfer learning using a broad selection of BERT models, including several models pre-trained specifically for biomedical text processing.
RESULTS RESULTS
 : We find that recently introduced neural parsing technology is capable of generating highly accurate analyses of biomedical text, substantially improving on the best performance reported in the original CRAFT-SA shared task. We also find that initialization using a deep transfer learning model pre-trained on in-domain texts is key to maximizing the performance of the parsing methods.

Identifiants

pubmed: 33372589
doi: 10.1186/s12859-020-03905-8
pii: 10.1186/s12859-020-03905-8
pmc: PMC7771067
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

580

Références

Bioinformatics. 2010 Jun 15;26(12):i382-90
pubmed: 20529932
BMC Bioinformatics. 2012 Jul 09;13:161
pubmed: 22776079
BMC Bioinformatics. 2012 Aug 17;13:207
pubmed: 22901054
Bioinformatics. 2020 Feb 15;36(4):1234-1240
pubmed: 31501885

Auteurs

Jenna Kanerva (J)

TurkuNLP Group, University of Turku, Turku, Finland. jmnybl@utu.fi.

Filip Ginter (F)

TurkuNLP Group, University of Turku, Turku, Finland.

Sampo Pyysalo (S)

TurkuNLP Group, University of Turku, Turku, Finland.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH