ExTRI: Extraction of transcription regulation interactions from literature.

Gene regulation Systems biology Text-mining Transcription factors

Journal

Biochimica et biophysica acta. Gene regulatory mechanisms
ISSN: 1876-4320
Titre abrégé: Biochim Biophys Acta Gene Regul Mech
Pays: Netherlands
ID NLM: 101731723

Informations de publication

Date de publication:
01 2022
Historique:
received: 07 05 2021
revised: 22 11 2021
accepted: 29 11 2021
pubmed: 8 12 2021
medline: 22 3 2022
entrez: 7 12 2021
Statut: ppublish

Résumé

The regulation of gene transcription by transcription factors is a fundamental biological process, yet the relations between transcription factors (TF) and their target genes (TG) are still only sparsely covered in databases. Text-mining tools can offer broad and complementary solutions to help locate and extract mentions of these biological relationships in articles. We have generated ExTRI, a knowledge graph of TF-TG relationships, by applying a high recall text-mining pipeline to MedLine abstracts identifying over 100,000 candidate sentences with TF-TG relations. Validation procedures indicated that about half of the candidate sentences contain true TF-TG relationships. Post-processing identified 53,000 high confidence sentences containing TF-TG relationships, with a cross-validation F1-score close to 75%. The resulting collection of TF-TG relationships covers 80% of the relations annotated in existing databases. It adds 11,000 other potential interactions, including relationships for ~100 TFs currently not in public TF-TG relation databases. The high confidence abstract sentences contribute 25,000 literature references not available from other resources and offer a wealth of direct pointers to functional aspects of the TF-TG interactions. Our compiled resource encompassing ExTRI together with publicly available resources delivers literature-derived TF-TG interactions for more than 900 of the 1500-1600 proteins considered to function as specific DNA binding TFs. The obtained result can be used by curators, for network analysis and modelling, for causal reasoning or knowledge graph mining approaches, or serve to benchmark text mining strategies.

Identifiants

pubmed: 34875418
pii: S1874-9399(21)00096-1
doi: 10.1016/j.bbagrm.2021.194778
pii:
doi:

Substances chimiques

Transcription Factors 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

194778

Informations de copyright

Copyright © 2021 The Authors. Published by Elsevier B.V. All rights reserved.

Auteurs

Miguel Vazquez (M)

Barcelona Supercomputing Center, Barcelona, Spain. Electronic address: miguel.vazquez.g@bsc.es.

Martin Krallinger (M)

Barcelona Supercomputing Center, Barcelona, Spain.

Florian Leitner (F)

Data Catalytics, Madrid, Spain.

Martin Kuiper (M)

Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.

Alfonso Valencia (A)

Barcelona Supercomputing Center, Barcelona, Spain; ICREA, Barcelona, Spain.

Astrid Laegreid (A)

Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway.

Articles similaires

Humans Endoribonucleases RNA, Messenger RNA Caps Gene Expression Regulation
Animals Lung India Sheep Transcriptome
Humans Circadian Rhythm Adult Aged Aging
Triticum Transcription Factors Gene Expression Regulation, Plant Plant Proteins Salt Stress

Classifications MeSH