CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints.

Dynamic programming Generational orderings Optimal Bayesian network

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
14 Feb 2023
Historique:
received: 18 07 2022
accepted: 24 01 2023
entrez: 15 2 2023
pubmed: 16 2 2023
medline: 17 2 2023
Statut: epublish

Résumé

Finding a globally optimal Bayesian Network using exhaustive search is a problem with super-exponential complexity, which severely restricts the number of variables that can feasibly be included. We implement a dynamic programming based algorithm with built-in dimensionality reduction and parent set identification. This reduces the search space substantially and can be applied to large-dimensional data. We use what we call 'generational orderings' based search for optimal networks, which is a novel way to efficiently search the space of possible networks given the possible parent sets. The algorithm supports both continuous and categorical data, as well as continuous, binary and survival outcomes. We demonstrate the efficacy of our algorithm on both synthetic and real data. In simulations, our algorithm performs better than three state-of-art algorithms that are currently used extensively. We then apply it to an Ovarian Cancer gene expression dataset with 513 genes and a survival outcome. Our algorithm is able to find an optimal network describing the disease pathway consisting of 6 genes leading to the outcome node in just 3.4 min on a personal computer with a 2.3 GHz Intel Core i9 processor with 16 GB RAM. Our generational orderings based search for optimal networks is both an efficient and highly scalable approach for finding optimal Bayesian Networks and can be applied to 1000 s of variables. Using specifiable parameters-correlation, FDR cutoffs, and in-degree-one can increase or decrease the number of nodes and density of the networks. Availability of two scoring option-BIC and Bge-and implementation for survival outcomes and mixed data types makes our algorithm very suitable for many types of high dimensional data in a variety of fields.

Sections du résumé

BACKGROUND BACKGROUND
Finding a globally optimal Bayesian Network using exhaustive search is a problem with super-exponential complexity, which severely restricts the number of variables that can feasibly be included. We implement a dynamic programming based algorithm with built-in dimensionality reduction and parent set identification. This reduces the search space substantially and can be applied to large-dimensional data. We use what we call 'generational orderings' based search for optimal networks, which is a novel way to efficiently search the space of possible networks given the possible parent sets. The algorithm supports both continuous and categorical data, as well as continuous, binary and survival outcomes.
RESULTS RESULTS
We demonstrate the efficacy of our algorithm on both synthetic and real data. In simulations, our algorithm performs better than three state-of-art algorithms that are currently used extensively. We then apply it to an Ovarian Cancer gene expression dataset with 513 genes and a survival outcome. Our algorithm is able to find an optimal network describing the disease pathway consisting of 6 genes leading to the outcome node in just 3.4 min on a personal computer with a 2.3 GHz Intel Core i9 processor with 16 GB RAM.
CONCLUSIONS CONCLUSIONS
Our generational orderings based search for optimal networks is both an efficient and highly scalable approach for finding optimal Bayesian Networks and can be applied to 1000 s of variables. Using specifiable parameters-correlation, FDR cutoffs, and in-degree-one can increase or decrease the number of nodes and density of the networks. Availability of two scoring option-BIC and Bge-and implementation for survival outcomes and mixed data types makes our algorithm very suitable for many types of high dimensional data in a variety of fields.

Identifiants

pubmed: 36788490
doi: 10.1186/s12859-023-05159-6
pii: 10.1186/s12859-023-05159-6
pmc: PMC9926787
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

46

Subventions

Organisme : NCI NIH HHS
ID : P01 CA196569
Pays : United States
Organisme : NIA NIH HHS
ID : P01AG055367
Pays : United States

Informations de copyright

© 2023. The Author(s).

Références

Ann Oncol. 2020 Sep;31(9):1240-1250
pubmed: 32473302
Genet Epidemiol. 2017 Nov;41(7):577-586
pubmed: 28691305
Sci Rep. 2018 May 3;8(1):6951
pubmed: 29725024
BioData Min. 2013 Mar 21;6(1):6
pubmed: 23514120
Bioinform Adv. 2022 Jun 13;2(1):vbac047
pubmed: 35747247
Sci Rep. 2016 Oct 12;6:34841
pubmed: 27731320
Front Genet. 2017 Sep 25;8:129
pubmed: 28993790
Front Comput Neurosci. 2014 Oct 16;8:131
pubmed: 25360109

Auteurs

Nand Sharma (N)

Division of Biostatistics, Department of Population and Public Health Sciences, University of Southern California, Los Angeles, USA. nandsh11@gmail.com.

Joshua Millstein (J)

Division of Biostatistics, Department of Population and Public Health Sciences, University of Southern California, Los Angeles, USA.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Algorithms Software Artificial Intelligence Computer Simulation

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking

Classifications MeSH