Network inference with ensembles of bi-clustering trees.

Algorithms Cluster Analysis Databases, Factual Gene Regulatory Networks Machine Learning Protein Interaction Maps Proteins / metabolism

Biomedical networks Interaction prediction Multi-label classification Network inference Tree-ensembles

Journal

BMC bioinformatics

ISSN: 1471-2105

Titre abrégé: BMC Bioinformatics

Pays: England

ID NLM: 100965194

Informations de publication

Date de publication:
28 Oct 2019

Historique:

received: 28 02 2019

accepted: 20 09 2019

entrez: 30 10 2019

pubmed: 30 10 2019

medline: 18 12 2019

Statut: epublish

Résumé

Network inference is crucial for biomedicine and systems biology. Biological entities and their associations are often modeled as interaction networks. Examples include drug protein interaction or gene regulatory networks. Studying and elucidating such networks can lead to the comprehension of complex biological processes. However, usually we have only partial knowledge of those networks and the experimental identification of all the existing associations between biological entities is very time consuming and particularly expensive. Many computational approaches have been proposed over the years for network inference, nonetheless, efficiency and accuracy are still persisting open problems. Here, we propose bi-clustering tree ensembles as a new machine learning method for network inference, extending the traditional tree-ensemble models to the global network setting. The proposed approach addresses the network inference problem as a multi-label classification task. More specifically, the nodes of a network (e.g., drugs or proteins in a drug-protein interaction network) are modelled as samples described by features (e.g., chemical structure similarities or protein sequence similarities). The labels in our setting represent the presence or absence of links connecting the nodes of the interaction network (e.g., drug-protein interactions in a drug-protein interaction network). We extended traditional tree-ensemble methods, such as extremely randomized trees (ERT) and random forests (RF) to ensembles of bi-clustering trees, integrating background information from both node sets of a heterogeneous network into the same learning framework. We performed an empirical evaluation, comparing the proposed approach to currently used tree-ensemble based approaches as well as other approaches from the literature. We demonstrated the effectiveness of our approach in different interaction prediction (network inference) settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein and gene regulatory networks. We also applied our proposed method to two versions of a chemical-protein association network extracted from the STITCH database, demonstrating the potential of our model in predicting non-reported interactions. Bi-clustering trees outperform existing tree-based strategies as well as machine learning methods based on other algorithms. Since our approach is based on tree-ensembles it inherits the advantages of tree-ensemble learning, such as handling of missing values, scalability and interpretability.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

We extended traditional tree-ensemble methods, such as extremely randomized trees (ERT) and random forests (RF) to ensembles of bi-clustering trees, integrating background information from both node sets of a heterogeneous network into the same learning framework. We performed an empirical evaluation, comparing the proposed approach to currently used tree-ensemble based approaches as well as other approaches from the literature. We demonstrated the effectiveness of our approach in different interaction prediction (network inference) settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein and gene regulatory networks. We also applied our proposed method to two versions of a chemical-protein association network extracted from the STITCH database, demonstrating the potential of our model in predicting non-reported interactions.

CONCLUSIONS CONCLUSIONS

Bi-clustering trees outperform existing tree-based strategies as well as machine learning methods based on other algorithms. Since our approach is based on tree-ensembles it inherits the advantages of tree-ensemble learning, such as handling of missing values, scalability and interpretability.

Identifiants

DOI: 10.1186/s12859-019-3104-y PMID: 31660848 PMC: PMC6819564

pubmed: 31660848

doi: 10.1186/s12859-019-3104-y

pii: 10.1186/s12859-019-3104-y

pmc: PMC6819564

doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

525

Références

Nat Rev Drug Discov. 2004 Aug;3(8):673-83

pubmed: 15286734

PLoS One. 2015 Dec 07;10(12):e0144031

pubmed: 26641091

BMC Bioinformatics. 2007;8 Suppl 10:S8

pubmed: 18269702

Bioinformatics. 2016 Jun 15;32(12):i18-i27

pubmed: 27307615

BMC Bioinformatics. 2006 Mar 07;7:113

pubmed: 16522208

Front Genet. 2013 Dec 03;4:262

pubmed: 24348517

PLoS Comput Biol. 2007 Jun;3(6):e116

pubmed: 17604446

BMC Bioinformatics. 2016 Feb 09;17:76

pubmed: 26862054

Drug Discov Today. 2012 Jan;17(1-2):10-22

pubmed: 21777691

Methods. 2015 Jul 15;83:98-104

pubmed: 25957673

Nat Genet. 2000 May;25(1):25-9

pubmed: 10802651

BMC Bioinformatics. 2017 Jan 17;18(1):39

pubmed: 28095781

Brief Bioinform. 2014 Mar;15(2):195-211

pubmed: 23698722

IEEE J Biomed Health Inform. 2017 Mar;21(2):561-572

pubmed: 26731781

Bioinformatics. 2011 Nov 1;27(21):3036-43

pubmed: 21893517

Mol Biosyst. 2009 Dec;5(12):1593-605

pubmed: 20023720

Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169

pubmed: 27899622

IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):24-45

pubmed: 17048406

Bioinformatics. 2008 Jul 1;24(13):i232-40

pubmed: 18586719

Bioinformatics. 2016 Apr 1;32(7):1057-64

pubmed: 26614126

Brief Bioinform. 2019 Jul 19;20(4):1337-1357

pubmed: 29377981

Nucleic Acids Res. 2012 Jan;40(Database issue):D1100-7

pubmed: 21948594

J Cheminform. 2015 Aug 19;7:40

pubmed: 26300984

BMC Bioinformatics. 2010 Jan 02;11:2

pubmed: 20044933

PLoS Biol. 2007 Jan;5(1):e8

pubmed: 17214507

Bioinformatics. 2013 Jan 15;29(2):238-45

pubmed: 23162055

J Cheminform. 2011 Oct 07;3:33

pubmed: 21982300

Nature. 2012 Jun 10;486(7403):361-7

pubmed: 22722194

BMC Bioinformatics. 2015 Nov 04;16:365

pubmed: 26537615

Nat Methods. 2012 Jul 15;9(8):796-804

pubmed: 22796662

Bioinformatics. 2007 Jul 1;23(13):i57-65

pubmed: 17646345

PLoS Comput Biol. 2016 Feb 12;12(2):e1004760

pubmed: 26872142

Bioinformatics. 2017 Nov 15;33(22):3610-3618

pubmed: 29036404

Bioinformatics. 2018 Apr 1;34(7):1164-1173

pubmed: 29186331

PLoS One. 2015 Mar 04;10(3):e0118432

pubmed: 25738806

Nucleic Acids Res. 2000 Jan 1;28(1):27-30

pubmed: 10592173

Nucleic Acids Res. 2017 Jan 4;45(D1):D362-D368

pubmed: 27924014

BMC Bioinformatics. 2018 May 21;19(1):176

pubmed: 29783926

Mol Biosyst. 2015 Aug;11(8):2116-25

pubmed: 26008881

Nucleic Acids Res. 2008 Jan;36(Database issue):D684-8

pubmed: 18084021

Nucleic Acids Res. 2018 Jan 4;46(D1):D1074-D1082

pubmed: 29126136

BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):509

pubmed: 28155697

BMC Bioinformatics. 2016 Jan 22;17:46

pubmed: 26801218

Network inference with ensembles of bi-clustering trees.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Références

Auteurs

Konstantinos Pliakos (K)

Celine Vens (C)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Exploring structural diversity across the protein universe with The Encyclopedia of Domains.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Understanding the role of machine learning in predicting progression of osteoarthritis.

Classifications MeSH