OrthoSNAP: A tree splitting and pruning algorithm for retrieving single-copy orthologs from gene family trees.
Journal
PLoS biology
ISSN: 1545-7885
Titre abrégé: PLoS Biol
Pays: United States
ID NLM: 101183755
Informations de publication
Date de publication:
10 2022
10 2022
Historique:
received:
04
11
2021
accepted:
13
09
2022
revised:
25
10
2022
pubmed:
14
10
2022
medline:
28
10
2022
entrez:
13
10
2022
Statut:
epublish
Résumé
Molecular evolution studies, such as phylogenomic studies and genome-wide surveys of selection, often rely on gene families of single-copy orthologs (SC-OGs). Large gene families with multiple homologs in 1 or more species-a phenomenon observed among several important families of genes such as transporters and transcription factors-are often ignored because identifying and retrieving SC-OGs nested within them is challenging. To address this issue and increase the number of markers used in molecular evolution studies, we developed OrthoSNAP, a software that uses a phylogenetic framework to simultaneously split gene families into SC-OGs and prune species-specific inparalogs. We term SC-OGs identified by OrthoSNAP as SNAP-OGs because they are identified using a splitting and pruning procedure analogous to snapping branches on a tree. From 415,129 orthologous groups of genes inferred across 7 eukaryotic phylogenomic datasets, we identified 9,821 SC-OGs; using OrthoSNAP on the remaining 405,308 orthologous groups of genes, we identified an additional 10,704 SNAP-OGs. Comparison of SNAP-OGs and SC-OGs revealed that their phylogenetic information content was similar, even in complex datasets that contain a whole-genome duplication, complex patterns of duplication and loss, transcriptome data where each gene typically has multiple transcripts, and contentious branches in the tree of life. OrthoSNAP is useful for increasing the number of markers used in molecular evolution data matrices, a critical step for robustly inferring and exploring the tree of life.
Identifiants
pubmed: 36228036
doi: 10.1371/journal.pbio.3001827
pii: PBIOLOGY-D-21-02885
pmc: PMC9595520
doi:
Substances chimiques
Transcription Factors
0
Banques de données
figshare
['10.6084/m9.figshare.16875904']
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
e3001827Subventions
Organisme : NIAID NIH HHS
ID : R56 AI146096
Pays : United States
Organisme : NIAID NIH HHS
ID : R01 AI153356
Pays : United States
Déclaration de conflit d'intérêts
I have read the journal’s policy and the authors of this manuscript have the following competing interests: Antonis Rokas is a scientific consultant for LifeMine Therapeutics, Inc. Jacob L. Steenwyk is a scientific consultant for Latch AI Inc.
Références
G3 (Bethesda). 2021 Sep 6;11(9):
pubmed: 34544141
Genome Biol Evol. 2017 May 1;9(5):1130-1147
pubmed: 28460034
mBio. 2019 Jul 9;10(4):
pubmed: 31289177
Bioinformatics. 2010 Oct 1;26(19):2460-1
pubmed: 20709691
Nature. 1997 Jun 12;387(6634):708-13
pubmed: 9192896
Mol Biol Evol. 2016 Aug;33(8):2117-34
pubmed: 27189539
Genome Biol. 2019 Nov 14;20(1):238
pubmed: 31727128
Nature. 2003 Oct 23;425(6960):798-804
pubmed: 14574403
Microbiol Mol Biol Rev. 1999 Sep;63(3):554-69
pubmed: 10477308
Genome Res. 2014 Sep;24(9):1485-96
pubmed: 25053675
Bioinformatics. 2017 Jul 15;33(14):i75-i82
pubmed: 28881964
Nature. 2011 Aug 24;476(7361):442-5
pubmed: 21866158
BMC Bioinformatics. 2007 Mar 08;8:83
pubmed: 17346331
BMC Bioinformatics. 2013 Nov 19;14:330
pubmed: 24252138
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
Mol Phylogenet Evol. 2003 Aug;28(2):171-85
pubmed: 12878457
Genetics. 2022 Jul 4;221(3):
pubmed: 35536198
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
Proc Natl Acad Sci U S A. 2017 Aug 29;114(35):E7282-E7290
pubmed: 28808022
PLoS Biol. 2021 Aug 6;19(8):e3001365
pubmed: 34358228
Bioinformatics. 2014 May 1;30(9):1312-3
pubmed: 24451623
Proc Natl Acad Sci U S A. 2012 Sep 11;109(37):14942-7
pubmed: 22930817
Trends Genet. 2021 Feb;37(2):174-187
pubmed: 32921510
Nature. 2001 Feb 1;409(6820):614-8
pubmed: 11214319
Mol Biol Evol. 2022 Jun 2;39(6):
pubmed: 35642314
Genome Biol. 2020 Jan 23;21(1):15
pubmed: 31969194
Mol Biol Evol. 2021 Jun 25;38(7):2750-2766
pubmed: 33681996
Genome Biol Evol. 2016 Jan 05;8(2):330-44
pubmed: 26733575
Mol Biol Evol. 2007 Sep;24(9):2059-68
pubmed: 17630282
Sci Adv. 2022 May 6;8(18):eabn0105
pubmed: 35507651
Bioinformatics. 2021 Feb 09;:
pubmed: 33560364
PLoS Biol. 2011 Mar;9(3):e1000602
pubmed: 21423652
PLoS Biol. 2019 May 21;17(5):e3000255
pubmed: 31112549
Nature. 2013 May 16;497(7449):327-31
pubmed: 23657258
Science. 2014 Dec 12;346(6215):1320-31
pubmed: 25504713
Methods Mol Biol. 2015;1201:65-90
pubmed: 25388108
Curr Biol. 2017 Oct 9;27(19):3025-3033.e5
pubmed: 28966093
Genome Res. 2003 Sep;13(9):2178-89
pubmed: 12952885
Genome Res. 2016 Jul;26(7):918-32
pubmed: 27247244
PLoS Biol. 2020 Dec 2;18(12):e3001007
pubmed: 33264284
PLoS Comput Biol. 2011 Oct;7(10):e1002195
pubmed: 22039361
Sci Adv. 2020 Nov 4;6(45):
pubmed: 33148650
Syst Biol. 2015 Sep;64(5):824-37
pubmed: 26099258
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
Plant Physiol. 2016 Aug;171(4):2294-316
pubmed: 27288366
Mol Biol Evol. 2014 May;31(5):1261-71
pubmed: 24509691
Mol Biol Evol. 2018 Feb 1;35(2):518-522
pubmed: 29077904
Mol Biol Evol. 2022 Feb 3;39(2):
pubmed: 35021210
Proc Natl Acad Sci U S A. 2007 Sep 4;104(36):14395-400
pubmed: 17728403
Proc Natl Acad Sci U S A. 2004 Feb 24;101(8):2584-9
pubmed: 14983052
Elife. 2018 May 31;7:
pubmed: 29848444
Genome Biol Evol. 2016 Sep 02;8(8):2565-80
pubmed: 27492233
Syst Biol. 2022 Apr 19;71(3):610-629
pubmed: 34450658
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
Science. 2001 Dec 14;294(5550):2348-51
pubmed: 11743200
Curr Biol. 2020 Jul 6;30(13):2495-2507.e7
pubmed: 32502407
Genome Biol Evol. 2019 Aug 1;11(8):2292-2305
pubmed: 31364708
Proc Natl Acad Sci U S A. 2013 Feb 19;110(8):2898-903
pubmed: 23382190
Methods Mol Biol. 2017;1525:461-478
pubmed: 27896732
Nucleic Acids Res. 2013 Jan;41(Database issue):D165-70
pubmed: 23180794
Mol Biol Evol. 2020 Nov 1;37(11):3292-3307
pubmed: 32886770
Mol Biol Evol. 2016 Jun;33(6):1606-17
pubmed: 26915959
Mol Biol Evol. 2018 Feb 1;35(2):486-503
pubmed: 29177474
Cell. 2018 Nov 29;175(6):1533-1545.e20
pubmed: 30415838
PLoS Biol. 2015 Aug 07;13(8):e1002221
pubmed: 26252643
Nature. 2006 Mar 16;440(7082):341-5
pubmed: 16541074
Genome Res. 2013 Feb;23(2):323-30
pubmed: 23132911
Genome Biol. 2019 Feb 27;20(1):47
pubmed: 30813962
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
PLoS Biol. 2015 Aug 07;13(8):e1002220
pubmed: 26252497
Mol Biol Evol. 2014 Nov;31(11):3081-92
pubmed: 25158799
Evol Bioinform Online. 2013 Oct 29;9:429-35
pubmed: 24250218
Elife. 2019 Aug 02;8:
pubmed: 31373555