PALMER: improving pathway annotation based on the biomedical literature mining with a constrained latent block model.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
02 Oct 2020
Historique:
received: 16 04 2019
accepted: 16 09 2020
entrez: 3 10 2020
pubmed: 4 10 2020
medline: 31 10 2020
Statut: epublish

Résumé

In systems biology, it is of great interest to identify previously unreported associations between genes. Recently, biomedical literature has been considered as a valuable resource for this purpose. While classical clustering algorithms have popularly been used to investigate associations among genes, they are not tuned for the literature mining data and are also based on strong assumptions, which are often violated in this type of data. For example, these approaches often assume homogeneity and independence among observations. However, these assumptions are often violated due to both redundancies in functional descriptions and biological functions shared among genes. Latent block models can be alternatives in this case but they also often show suboptimal performances, especially when signals are weak. In addition, they do not allow to utilize valuable prior biological knowledge, such as those available in existing databases. In order to address these limitations, here we propose PALMER, a constrained latent block model that allows to identify indirect relationships among genes based on the biomedical literature mining data. By automatically associating relevant Gene Ontology terms, PALMER facilitates biological interpretation of novel findings without laborious downstream analyses. PALMER also allows researchers to utilize prior biological knowledge about known gene-pathway relationships to guide identification of gene-gene associations. We evaluated PALMER with simulation studies and applications to studies of pathway-modulating genes relevant to cancer signaling pathways, while utilizing biological pathway annotations available in the KEGG database as prior knowledge. We showed that PALMER outperforms traditional latent block models and it provides reliable identification of novel gene-gene associations by utilizing prior biological knowledge, especially when signals are weak in the biomedical literature mining dataset. We believe that PALMER and its relevant user-friendly software will be powerful tools that can be used to improve existing pathway annotations and identify novel pathway-modulating genes.

Sections du résumé

BACKGROUND BACKGROUND
In systems biology, it is of great interest to identify previously unreported associations between genes. Recently, biomedical literature has been considered as a valuable resource for this purpose. While classical clustering algorithms have popularly been used to investigate associations among genes, they are not tuned for the literature mining data and are also based on strong assumptions, which are often violated in this type of data. For example, these approaches often assume homogeneity and independence among observations. However, these assumptions are often violated due to both redundancies in functional descriptions and biological functions shared among genes. Latent block models can be alternatives in this case but they also often show suboptimal performances, especially when signals are weak. In addition, they do not allow to utilize valuable prior biological knowledge, such as those available in existing databases.
RESULTS RESULTS
In order to address these limitations, here we propose PALMER, a constrained latent block model that allows to identify indirect relationships among genes based on the biomedical literature mining data. By automatically associating relevant Gene Ontology terms, PALMER facilitates biological interpretation of novel findings without laborious downstream analyses. PALMER also allows researchers to utilize prior biological knowledge about known gene-pathway relationships to guide identification of gene-gene associations. We evaluated PALMER with simulation studies and applications to studies of pathway-modulating genes relevant to cancer signaling pathways, while utilizing biological pathway annotations available in the KEGG database as prior knowledge.
CONCLUSIONS CONCLUSIONS
We showed that PALMER outperforms traditional latent block models and it provides reliable identification of novel gene-gene associations by utilizing prior biological knowledge, especially when signals are weak in the biomedical literature mining dataset. We believe that PALMER and its relevant user-friendly software will be powerful tools that can be used to improve existing pathway annotations and identify novel pathway-modulating genes.

Identifiants

pubmed: 33008309
doi: 10.1186/s12859-020-03756-3
pii: 10.1186/s12859-020-03756-3
pmc: PMC7532116
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

432

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM122078
Pays : United States
Organisme : NIAMS NIH HHS
ID : P30-AR072582
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01-GM122078
Pays : United States
Organisme : NCI NIH HHS
ID : R21 CA209848
Pays : United States
Organisme : NCI NIH HHS
ID : R21-CA209848
Pays : United States
Organisme : NIDA NIH HHS
ID : U01-DA045300
Pays : United States

Références

Int J Mol Sci. 2018 Mar 16;19(3):
pubmed: 29547541
Bioinformatics. 2008 Sep 1;24(17):1957-8
pubmed: 18628289
Proc Int Conf Intell Syst Mol Biol. 2000;8:93-103
pubmed: 10977070
Stat Methods Med Res. 2019 Jul;28(7):2137-2149
pubmed: 29336210
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W406-10
pubmed: 18442992
Algorithms Mol Biol. 2016 May 20;11:14
pubmed: 27213009
J Cell Sci. 2004 Mar 15;117(Pt 8):1281-3
pubmed: 15020666
PLoS One. 2015 Oct 16;10(10):e0139889
pubmed: 26473729
Bioinformatics. 2018 Jun 15;34(12):2139-2141
pubmed: 29432514
Cell Biol Int. 2015 May;39(5):577-83
pubmed: 25572129
Stat Med. 2017 Sep 30;36(22):3461-3474
pubmed: 28675924
Development. 2008 Feb;135(3):411-24
pubmed: 18192283
Nat Commun. 2016 Feb 01;7:10331
pubmed: 26831545
IEEE Trans Pattern Anal Mach Intell. 2005 Apr;27(4):643-7
pubmed: 15794169
BMC Bioinformatics. 2017 Feb 2;18(1):82
pubmed: 28153040
Toxicol Pathol. 2007 Jun;35(4):495-516
pubmed: 17562483
J Biol Chem. 2009 Feb 27;284(9):5763-73
pubmed: 19109256
Algorithms Mol Biol. 2016 Sep 14;11:23
pubmed: 27651825
Cell. 2009 Apr 17;137(2):216-33
pubmed: 19379690
J Biomed Inform. 2009 Oct;42(5):824-30
pubmed: 19318137
Antioxid Redox Signal. 2012 Jun 1;16(11):1323-67
pubmed: 22146081
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50
pubmed: 16199517
Mol Syst Biol. 2007;3:140
pubmed: 17940530
Nature. 1999 Dec 2;402(6761 Suppl):C47-52
pubmed: 10591225
Nat Biotechnol. 2009 Feb;27(2):199-204
pubmed: 19182785
Blood. 2002 Feb 1;99(3):746-53
pubmed: 11806973
J Cell Sci. 2009 Oct 15;122(Pt 20):3589-94
pubmed: 19812304
PLoS One. 2019 Jul 1;14(7):e0219195
pubmed: 31260503
Leukemia. 2016 Sep;30(9):1950-3
pubmed: 27118405
Annu Rev Cell Dev Biol. 2010;26:721-44
pubmed: 20604711
Int J Mol Sci. 2015 Sep 23;16(9):22989-3011
pubmed: 26404262
Science. 2015 Feb 20;347(6224):1257601
pubmed: 25700523
Leukemia. 2004 Feb;18(2):189-218
pubmed: 14737178
Blood. 1998 May 1;91(9):3182-92
pubmed: 9558373
Genomics. 2017 Oct;109(5-6):438-445
pubmed: 28694080
Nat Genet. 2001 May;28(1):21-8
pubmed: 11326270

Auteurs

Jin Hyun Nam (JH)

Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA.
School of Pharmacy, Sungkyunkwan University, Suwon, Republic of Korea.

Daniel Couch (D)

Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA.

Willian A da Silveira (WA)

School of Biological Sciences, Queen's University Belfast, Belfast, UK.

Zhenning Yu (Z)

Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA.

Dongjun Chung (D)

Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA. chung.911@osu.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH