SpeciesRax: A Tool for Maximum Likelihood Species Tree Inference from Gene Family Trees under Duplication, Transfer, and Loss.

gene duplication gene family tree gene loss horizontal gene transfer maximum likelihood species tree inference

Journal

Molecular biology and evolution
ISSN: 1537-1719
Titre abrégé: Mol Biol Evol
Pays: United States
ID NLM: 8501455

Informations de publication

Date de publication:
03 02 2022
Historique:
pubmed: 13 1 2022
medline: 1 4 2022
entrez: 12 1 2022
Statut: ppublish

Résumé

Species tree inference from gene family trees is becoming increasingly popular because it can account for discordance between the species tree and the corresponding gene family trees. In particular, methods that can account for multiple-copy gene families exhibit potential to leverage paralogy as informative signal. At present, there does not exist any widely adopted inference method for this purpose. Here, we present SpeciesRax, the first maximum likelihood method that can infer a rooted species tree from a set of gene family trees and can account for gene duplication, loss, and transfer events. By explicitly modeling events by which gene trees can depart from the species tree, SpeciesRax leverages the phylogenetic rooting signal in gene trees. SpeciesRax infers species tree branch lengths in units of expected substitutions per site and branch support values via paralogy-aware quartets extracted from the gene family trees. Using both empirical and simulated data sets we show that SpeciesRax is at least as accurate as the best competing methods while being one order of magnitude faster on large data sets at the same time. We used SpeciesRax to infer a biologically plausible rooted phylogeny of the vertebrates comprising 188 species from 31,612 gene families in 1 h using 40 cores. SpeciesRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax and on BioConda.

Identifiants

pubmed: 35021210
pii: 6503503
doi: 10.1093/molbev/msab365
pmc: PMC8826479
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Références

Adv Genet. 2017;100:49-72
pubmed: 29153404
BMC Evol Biol. 2015 Jun 14;15:113
pubmed: 26071950
Bioinformatics. 2019 May 15;35(10):1771-1773
pubmed: 30321303
Trends Ecol Evol. 2020 Jan;35(1):43-55
pubmed: 31606140
Science. 2013 Dec 20;342(6165):1241089
pubmed: 24357323
BMC Evol Biol. 2009 Oct 27;9:259
pubmed: 19860891
Curr Biol. 2020 Jun 8;30(11):2001-2012.e2
pubmed: 32302587
Bioinformatics. 2001 Dec;17(12):1246-7
pubmed: 11751242
Syst Biol. 2016 Mar;65(2):334-44
pubmed: 26526427
Nature. 2019 Oct;574(7780):679-685
pubmed: 31645766
Nat Commun. 2020 Aug 7;11(1):3939
pubmed: 32770105
Proc Natl Acad Sci U S A. 2009 Mar 10;106(10):3859-64
pubmed: 19237557
Genome Res. 2012 Apr;22(4):755-65
pubmed: 22271778
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
Nature. 2014 Jan 9;505(7482):174-9
pubmed: 24402279
Bioinformatics. 2009 Sep 1;25(17):2286-8
pubmed: 19535536
Nucleic Acids Res. 2014 Jan;42(Database issue):D897-902
pubmed: 24275491
Mol Biol Evol. 2007 Nov;24(11):2400-11
pubmed: 17720690
BMC Evol Biol. 2010 Jul 13;10:210
pubmed: 20626897
Proc Natl Acad Sci U S A. 2017 Jun 6;114(23):E4602-E4611
pubmed: 28533395
Nucleic Acids Res. 2012 Jan;40(Database issue):D136-43
pubmed: 22139910
PLoS One. 2012;7(11):e49521
pubmed: 23166696
Proc Natl Acad Sci U S A. 2012 Jul 17;109(29):11872-7
pubmed: 22753475
Mol Biol Evol. 2016 Jun;33(6):1635-8
pubmed: 26921390
Syst Biol. 2018 Jan 01;67(1):158-169
pubmed: 28973673
Cladistics. 2005 Apr;21(2):163-193
pubmed: 34892859
Syst Biol. 2007 Feb;56(1):17-24
pubmed: 17366134
Proc Natl Acad Sci U S A. 2018 Jun 12;115(24):6249-6254
pubmed: 29760103
Proc Natl Acad Sci U S A. 2012 Oct 23;109(43):17513-8
pubmed: 23043116
Mol Biol Evol. 1987 Jul;4(4):406-25
pubmed: 3447015
Syst Biol. 2013 Nov;62(6):901-12
pubmed: 23925510
Syst Biol. 2002 Jun;51(3):492-508
pubmed: 12079646
Proc Natl Acad Sci U S A. 2013 Jul 30;110(31):12738-43
pubmed: 23858462
Nature. 2015 May 14;521(7551):173-179
pubmed: 25945739
PLoS Genet. 2011 Mar;7(3):e1001342
pubmed: 21436896
PLoS Comput Biol. 2014 Apr 10;10(4):e1003537
pubmed: 24722319
Mol Biol Evol. 2014 Oct;31(10):2553-6
pubmed: 25135941
Bioinformatics. 2019 Nov 1;35(21):4453-4455
pubmed: 31070718
Mol Biol Evol. 2021 May 4;38(5):1777-1791
pubmed: 33316067
Trends Ecol Evol. 2007 Mar;22(3):114-5
pubmed: 17239486
Anim Biotechnol. 2019 Jul;30(3):219-232
pubmed: 29938580
Proc Natl Acad Sci U S A. 2019 Feb 5;116(6):2146-2151
pubmed: 30670644
Mol Biol Evol. 2020 Sep 1;37(9):2763-2774
pubmed: 32502238
Trends Ecol Evol. 2000 Dec 1;15(12):489-490
pubmed: 11114432
Proc Natl Acad Sci U S A. 2015 May 26;112(21):6670-5
pubmed: 25964353
Syst Biol. 2012 May;61(3):539-42
pubmed: 22357727
Mol Biol Evol. 2003 Feb;20(2):287-92
pubmed: 12598696
Mol Biol Evol. 2016 Mar;33(3):838-60
pubmed: 26589995
Syst Biol. 2003 Apr;52(2):229-38
pubmed: 12746148
J Theor Biol. 2017 Nov 7;432:1-13
pubmed: 28801222
Proc Natl Acad Sci U S A. 2009 Nov 3;106(44):18644-9
pubmed: 19846765
Nat Ecol Evol. 2020 Jan;4(1):138-147
pubmed: 31819234
PLoS One. 2009;4(2):e4357
pubmed: 19190756
Bioinformatics. 2020 Jul 1;36(Suppl_1):i57-i65
pubmed: 32657396
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
Curr Biol. 2017 Nov 6;27(21):R1177-R1192
pubmed: 29112874
Syst Biol. 2021 Jun 16;70(4):822-837
pubmed: 33169795
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950
Nature. 2009 Jun 4;459(7247):657-62
pubmed: 19465905
Nat Methods. 2018 Jul;15(7):475-476
pubmed: 29967506
Genome Biol Evol. 2014 Mar;6(3):474-81
pubmed: 24532674
Am J Bot. 2004 Oct;91(10):1446-80
pubmed: 21652303
Genome Biol. 2006;7(10):118
pubmed: 17081279
Nature. 2020 Nov;587(7833):252-257
pubmed: 33177665
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Bioinformatics. 2008 Jul 1;24(13):1540-1
pubmed: 18474508
Nucleic Acids Res. 2002 Apr 1;30(7):1575-84
pubmed: 11917018
Curr Biol. 2018 Mar 5;28(5):733-745.e2
pubmed: 29456145
Evolution. 1985 Jul;39(4):783-791
pubmed: 28561359
Syst Biol. 2011 Oct;60(5):661-7
pubmed: 21447481
Methods Mol Biol. 2017;1525:461-478
pubmed: 27896732
Mol Biol Evol. 1993 Nov;10(6):1396-401
pubmed: 8277861
Mol Biol Evol. 2020 Nov 1;37(11):3292-3307
pubmed: 32886770
Pac Symp Biocomput. 2013;:250-61
pubmed: 23424130
Mol Phylogenet Evol. 2009 Dec;53(3):808-25
pubmed: 19682589
Nature. 2017 Jan 19;541(7637):353-358
pubmed: 28077874
Methods Mol Biol. 2019;1910:149-175
pubmed: 31278664
PLoS Curr. 2013 Apr 18;5:
pubmed: 23653398
Genome Res. 2013 Feb;23(2):323-30
pubmed: 23132911
BMC Bioinformatics. 2018 May 8;19(Suppl 6):153
pubmed: 29745866
Syst Biol. 2020 Mar 1;69(2):308-324
pubmed: 31504977

Auteurs

Benoit Morel (B)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

Paul Schade (P)

Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

Sarah Lutteropp (S)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Tom A Williams (TA)

School of Biological Sciences, University of Bristol, Bristol, United Kingdom.

Gergely J Szöllősi (GJ)

ELTE-MTA "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary.
Department of Biological Physics, Eötvös University, Budapest, Hungary.
Institute of Evolution, Centre for Ecological Research, Budapest, Hungary.

Alexandros Stamatakis (A)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins

Classifications MeSH