GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss.

gene duplication gene family tree horizontal gene transfer maximum likelihood reconciliation

Journal

Molecular biology and evolution
ISSN: 1537-1719
Titre abrégé: Mol Biol Evol
Pays: United States
ID NLM: 8501455

Informations de publication

Date de publication:
01 09 2020
Historique:
pubmed: 6 6 2020
medline: 17 4 2021
entrez: 6 6 2020
Statut: ppublish

Résumé

Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).

Identifiants

pubmed: 32502238
pii: 5851843
doi: 10.1093/molbev/msaa141
pmc: PMC8312565
doi:

Types de publication

Evaluation Study Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2763-2774

Subventions

Organisme : European Research Council
ID : 714774
Pays : International

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Références

Syst Biol. 2016 Mar;65(2):334-44
pubmed: 26526427
BMC Bioinformatics. 2013 Jun 27;14:209
pubmed: 23803001
PLoS One. 2016 Aug 11;11(8):e0159559
pubmed: 27513924
Science. 2019 May 10;364(6440):588-592
pubmed: 31073066
Bioinformatics. 2015 Mar 15;31(6):841-8
pubmed: 25380957
Syst Biol. 2013 May 1;62(3):386-97
pubmed: 23355531
Bioinformatics. 2019 May 15;35(10):1771-1773
pubmed: 30321303
Syst Biol. 2012 May;61(3):539-42
pubmed: 22357727
J Theor Biol. 2017 Nov 7;432:1-13
pubmed: 28801222
PLoS Genet. 2009 Jan;5(1):e1000344
pubmed: 19165319
Evolution. 1985 Jul;39(4):783-791
pubmed: 28561359
BMC Bioinformatics. 2009 Jun 16;10 Suppl 6:S3
pubmed: 19534752
Syst Biol. 2015 Jan;64(1):e42-62
pubmed: 25070970
Syst Biol. 2013 Jul;62(4):501-11
pubmed: 23479066
Proc Natl Acad Sci U S A. 2009 Apr 7;106(14):5714-9
pubmed: 19299507
Bioinformatics. 2014 Sep 1;30(17):i541-8
pubmed: 25161245
Syst Biol. 2009 Aug;58(4):411-24
pubmed: 20525594
J Mol Evol. 1981;17(6):368-76
pubmed: 7288891
Genome Res. 2012 Apr;22(4):755-65
pubmed: 22271778
J Comput Biol. 2000;7(3-4):429-47
pubmed: 11108472
Proc Natl Acad Sci U S A. 2012 Oct 23;109(43):17513-8
pubmed: 23043116
Syst Biol. 2013 Nov;62(6):901-12
pubmed: 23925510
Bioinformatics. 2020 Sep 15;36(18):4822-4824
pubmed: 33085745
Bioinformatics. 2019 Nov 1;35(21):4453-4455
pubmed: 31070718
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950
Mol Biol Evol. 2015 Jan;32(1):268-74
pubmed: 25371430
Nat Methods. 2018 Jul;15(7):475-476
pubmed: 29967506
Bioinformatics. 2018 Nov 1;34(21):3646-3652
pubmed: 29762653
Mol Biol Evol. 1994 May;11(3):459-68
pubmed: 8015439
Bioinformatics. 2011 Feb 15;27(4):592-3
pubmed: 21169378
Comput Appl Biosci. 1997 Jun;13(3):235-8
pubmed: 9183526
IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):477-85
pubmed: 26356016
Genome Res. 2013 Feb;23(2):323-30
pubmed: 23132911

Auteurs

Benoit Morel (B)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Alexey M Kozlov (AM)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Alexandros Stamatakis (A)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

Gergely J Szöllősi (GJ)

ELTE-MTA "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary.
Department of Biological Physics, Eötvös University, Budapest, Hungary.
Evolutionary Systems Research Group, Centre for Ecological Research, Hungarian Academy of Sciences, Tihany, Hungary.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Photosynthesis Ribulose-Bisphosphate Carboxylase Carbon Dioxide Molecular Dynamics Simulation Cyanobacteria
Animals Hemiptera Insect Proteins Phylogeny Insecticides

Classifications MeSH