Heuristic algorithms for best match graph editing.

Arc modification problems Consistent algorithm Heuristic algorithm NP-hardness

Journal

Algorithms for molecular biology : AMB

ISSN: 1748-7188

Titre abrégé: Algorithms Mol Biol

Pays: England

ID NLM: 101265088

Informations de publication

Date de publication:
17 Aug 2021

Historique:

received: 17 03 2021

accepted: 26 06 2021

entrez: 18 8 2021

pubmed: 19 8 2021

medline: 19 8 2021

Statut: epublish

Résumé

Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho's supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho's supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing.

CONCLUSION CONCLUSIONS

Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.

Identifiants

DOI: 10.1186/s13015-021-00196-3 PMID: 34404422 PMC: PMC8369769

pubmed: 34404422

doi: 10.1186/s13015-021-00196-3

pii: 10.1186/s13015-021-00196-3

pmc: PMC8369769

doi:

Types de publication

Journal Article

Langues

eng

Pagination

Subventions

Organisme : Deutsche Forschungsgemeinschaft<

ID : STA 850/49-1

Organisme : competence centers for excellent technologies

ID : 865891

Informations de copyright

Références

J Bioinform Comput Biol. 2006 Feb;4(1):59-74

pubmed: 16568542

BMC Bioinformatics. 2011 Apr 28;12:124

pubmed: 21526987

BMC Genomics. 2014 Jun 25;15:522

pubmed: 24965762

Algorithms Mol Biol. 2020 Apr 9;15:5

pubmed: 32308731

Phys Rev E Stat Nonlin Soft Matter Phys. 2015 Sep;92(3):032801

pubmed: 26465522

Genes (Basel). 2018 Feb 28;9(3):

pubmed: 29495636

Trends Ecol Evol. 1994 Aug;9(8):297-8

pubmed: 21236859

Bioinformatics. 2018 Jul 1;34(13):i366-i375

pubmed: 29950018

Nat Methods. 2015 Jan;12(1):59-60

pubmed: 25402007

Brief Bioinform. 2016 Nov;17(6):1009-1023

pubmed: 26615024

J Math Biol. 2021 Feb 19;82(3):20

pubmed: 33606106

PLoS One. 2014 Aug 19;9(8):e105015

pubmed: 25137074

PLoS One. 2010 Oct 15;5(10):e13409

pubmed: 20976221

Protein Eng. 1999 Feb;12(2):85-94

pubmed: 10195279

J Theor Biol. 1987 Sep 7;128(1):11-45

pubmed: 3431131

J Math Biol. 2019 Jun;78(7):2015-2057

pubmed: 30968198

Evolution. 2002 Jul;56(7):1317-30

pubmed: 12206234

J Math Biol. 2021 Apr 5;82(6):47

pubmed: 33818665

J Theor Biol. 2016 May 21;397:89-102

pubmed: 26953649

Bioinformatics. 2008 Feb 1;24(3):319-24

pubmed: 18042555

BMC Genomics. 2020 Oct 24;21(1):741

pubmed: 33099302

Heuristic algorithms for best match graph editing.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Pagination

Subventions

Informations de copyright

Références

Auteurs

David Schaller (D)

Manuela Geiß (M)

Marc Hellmuth (M)

Peter F Stadler (PF)

Classifications MeSH