Network inference with ensembles of bi-clustering trees.
Biomedical networks
Interaction prediction
Multi-label classification
Network inference
Tree-ensembles
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
28 Oct 2019
28 Oct 2019
Historique:
received:
28
02
2019
accepted:
20
09
2019
entrez:
30
10
2019
pubmed:
30
10
2019
medline:
18
12
2019
Statut:
epublish
Résumé
Network inference is crucial for biomedicine and systems biology. Biological entities and their associations are often modeled as interaction networks. Examples include drug protein interaction or gene regulatory networks. Studying and elucidating such networks can lead to the comprehension of complex biological processes. However, usually we have only partial knowledge of those networks and the experimental identification of all the existing associations between biological entities is very time consuming and particularly expensive. Many computational approaches have been proposed over the years for network inference, nonetheless, efficiency and accuracy are still persisting open problems. Here, we propose bi-clustering tree ensembles as a new machine learning method for network inference, extending the traditional tree-ensemble models to the global network setting. The proposed approach addresses the network inference problem as a multi-label classification task. More specifically, the nodes of a network (e.g., drugs or proteins in a drug-protein interaction network) are modelled as samples described by features (e.g., chemical structure similarities or protein sequence similarities). The labels in our setting represent the presence or absence of links connecting the nodes of the interaction network (e.g., drug-protein interactions in a drug-protein interaction network). We extended traditional tree-ensemble methods, such as extremely randomized trees (ERT) and random forests (RF) to ensembles of bi-clustering trees, integrating background information from both node sets of a heterogeneous network into the same learning framework. We performed an empirical evaluation, comparing the proposed approach to currently used tree-ensemble based approaches as well as other approaches from the literature. We demonstrated the effectiveness of our approach in different interaction prediction (network inference) settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein and gene regulatory networks. We also applied our proposed method to two versions of a chemical-protein association network extracted from the STITCH database, demonstrating the potential of our model in predicting non-reported interactions. Bi-clustering trees outperform existing tree-based strategies as well as machine learning methods based on other algorithms. Since our approach is based on tree-ensembles it inherits the advantages of tree-ensemble learning, such as handling of missing values, scalability and interpretability.
Sections du résumé
BACKGROUND
BACKGROUND
Network inference is crucial for biomedicine and systems biology. Biological entities and their associations are often modeled as interaction networks. Examples include drug protein interaction or gene regulatory networks. Studying and elucidating such networks can lead to the comprehension of complex biological processes. However, usually we have only partial knowledge of those networks and the experimental identification of all the existing associations between biological entities is very time consuming and particularly expensive. Many computational approaches have been proposed over the years for network inference, nonetheless, efficiency and accuracy are still persisting open problems. Here, we propose bi-clustering tree ensembles as a new machine learning method for network inference, extending the traditional tree-ensemble models to the global network setting. The proposed approach addresses the network inference problem as a multi-label classification task. More specifically, the nodes of a network (e.g., drugs or proteins in a drug-protein interaction network) are modelled as samples described by features (e.g., chemical structure similarities or protein sequence similarities). The labels in our setting represent the presence or absence of links connecting the nodes of the interaction network (e.g., drug-protein interactions in a drug-protein interaction network).
RESULTS
RESULTS
We extended traditional tree-ensemble methods, such as extremely randomized trees (ERT) and random forests (RF) to ensembles of bi-clustering trees, integrating background information from both node sets of a heterogeneous network into the same learning framework. We performed an empirical evaluation, comparing the proposed approach to currently used tree-ensemble based approaches as well as other approaches from the literature. We demonstrated the effectiveness of our approach in different interaction prediction (network inference) settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein and gene regulatory networks. We also applied our proposed method to two versions of a chemical-protein association network extracted from the STITCH database, demonstrating the potential of our model in predicting non-reported interactions.
CONCLUSIONS
CONCLUSIONS
Bi-clustering trees outperform existing tree-based strategies as well as machine learning methods based on other algorithms. Since our approach is based on tree-ensembles it inherits the advantages of tree-ensemble learning, such as handling of missing values, scalability and interpretability.
Identifiants
pubmed: 31660848
doi: 10.1186/s12859-019-3104-y
pii: 10.1186/s12859-019-3104-y
pmc: PMC6819564
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
525Références
Nat Rev Drug Discov. 2004 Aug;3(8):673-83
pubmed: 15286734
PLoS One. 2015 Dec 07;10(12):e0144031
pubmed: 26641091
BMC Bioinformatics. 2007;8 Suppl 10:S8
pubmed: 18269702
Bioinformatics. 2016 Jun 15;32(12):i18-i27
pubmed: 27307615
BMC Bioinformatics. 2006 Mar 07;7:113
pubmed: 16522208
Front Genet. 2013 Dec 03;4:262
pubmed: 24348517
PLoS Comput Biol. 2007 Jun;3(6):e116
pubmed: 17604446
BMC Bioinformatics. 2016 Feb 09;17:76
pubmed: 26862054
Drug Discov Today. 2012 Jan;17(1-2):10-22
pubmed: 21777691
Methods. 2015 Jul 15;83:98-104
pubmed: 25957673
Nat Genet. 2000 May;25(1):25-9
pubmed: 10802651
BMC Bioinformatics. 2017 Jan 17;18(1):39
pubmed: 28095781
Brief Bioinform. 2014 Mar;15(2):195-211
pubmed: 23698722
IEEE J Biomed Health Inform. 2017 Mar;21(2):561-572
pubmed: 26731781
Bioinformatics. 2011 Nov 1;27(21):3036-43
pubmed: 21893517
Mol Biosyst. 2009 Dec;5(12):1593-605
pubmed: 20023720
Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169
pubmed: 27899622
IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):24-45
pubmed: 17048406
Bioinformatics. 2008 Jul 1;24(13):i232-40
pubmed: 18586719
Bioinformatics. 2016 Apr 1;32(7):1057-64
pubmed: 26614126
Brief Bioinform. 2019 Jul 19;20(4):1337-1357
pubmed: 29377981
Nucleic Acids Res. 2012 Jan;40(Database issue):D1100-7
pubmed: 21948594
J Cheminform. 2015 Aug 19;7:40
pubmed: 26300984
BMC Bioinformatics. 2010 Jan 02;11:2
pubmed: 20044933
PLoS Biol. 2007 Jan;5(1):e8
pubmed: 17214507
Bioinformatics. 2013 Jan 15;29(2):238-45
pubmed: 23162055
J Cheminform. 2011 Oct 07;3:33
pubmed: 21982300
Nature. 2012 Jun 10;486(7403):361-7
pubmed: 22722194
BMC Bioinformatics. 2015 Nov 04;16:365
pubmed: 26537615
Nat Methods. 2012 Jul 15;9(8):796-804
pubmed: 22796662
Bioinformatics. 2007 Jul 1;23(13):i57-65
pubmed: 17646345
PLoS Comput Biol. 2016 Feb 12;12(2):e1004760
pubmed: 26872142
Bioinformatics. 2017 Nov 15;33(22):3610-3618
pubmed: 29036404
Bioinformatics. 2018 Apr 1;34(7):1164-1173
pubmed: 29186331
PLoS One. 2015 Mar 04;10(3):e0118432
pubmed: 25738806
Nucleic Acids Res. 2000 Jan 1;28(1):27-30
pubmed: 10592173
Nucleic Acids Res. 2017 Jan 4;45(D1):D362-D368
pubmed: 27924014
BMC Bioinformatics. 2018 May 21;19(1):176
pubmed: 29783926
Mol Biosyst. 2015 Aug;11(8):2116-25
pubmed: 26008881
Nucleic Acids Res. 2008 Jan;36(Database issue):D684-8
pubmed: 18084021
Nucleic Acids Res. 2018 Jan 4;46(D1):D1074-D1082
pubmed: 29126136
BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):509
pubmed: 28155697
BMC Bioinformatics. 2016 Jan 22;17:46
pubmed: 26801218