DEPP: Deep Learning Enables Extending Species Trees using Single Genes.


Journal

Systematic biology
ISSN: 1076-836X
Titre abrégé: Syst Biol
Pays: England
ID NLM: 9302532

Informations de publication

Date de publication:
19 05 2023
Historique:
received: 18 05 2021
revised: 13 04 2022
accepted: 22 04 2022
medline: 22 5 2023
pubmed: 30 4 2022
entrez: 29 4 2022
Statut: ppublish

Résumé

Placing new sequences onto reference phylogenies is increasingly used for analyzing environmental samples, especially microbiomes. Existing placement methods assume that query sequences have evolved under specific models directly on the reference phylogeny. For example, they assume single-gene data (e.g., 16S rRNA amplicons) have evolved under the GTR model on a gene tree. Placement, however, often has a more ambitious goal: extending a (genome-wide) species tree given data from individual genes without knowing the evolutionary model. Addressing this challenging problem requires new directions. Here, we introduce Deep-learning Enabled Phylogenetic Placement (DEPP), an algorithm that learns to extend species trees using single genes without prespecified models. In simulations and on real data, we show that DEPP can match the accuracy of model-based methods without any prior knowledge of the model. We also show that DEPP can update the multilocus microbial tree-of-life with single genes with high accuracy. We further demonstrate that DEPP can combine 16S and metagenomic data onto a single tree, enabling community structure analyses that take advantage of both sources of data. [Deep learning; gene tree discordance; metagenomics; microbiome analyses; neural networks; phylogenetic placement.].

Identifiants

pubmed: 35485976
pii: 6575921
doi: 10.1093/sysbio/syac031
pmc: PMC10198656
doi:

Substances chimiques

RNA, Ribosomal, 16S 0

Types de publication

Journal Article Research Support, U.S. Gov't, Non-P.H.S. Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

17-34

Subventions

Organisme : NIGMS NIH HHS
ID : R35 GM142725
Pays : United States

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Références

Mol Biol Evol. 2009 Aug;26(8):1879-88
pubmed: 19423664
Brief Bioinform. 2011 Sep;12(5):392-400
pubmed: 21949266
J Mol Evol. 1981;17(6):368-76
pubmed: 7288891
Syst Biol. 2020 May 1;69(3):566-578
pubmed: 31545363
mSystems. 2018 Apr 17;3(3):
pubmed: 29719869
Bioinformatics. 2015 Jun 15;31(12):i44-52
pubmed: 26072508
Evolution. 2005 Jan;59(1):24-37
pubmed: 15792224
Mol Phylogenet Evol. 2018 Apr;121:183-197
pubmed: 29337274
Nat Methods. 2018 Nov;15(11):847-848
pubmed: 30377368
SoftwareX. 2020 Jan-Jun;11:
pubmed: 35903557
NAR Genom Bioinform. 2020 Jun 23;2(2):lqaa041
pubmed: 33575594
Syst Biol. 2016 Mar;65(2):334-44
pubmed: 26526427
Nucleic Acids Res. 2010 Jul;38(12):3869-79
pubmed: 20197316
Proc Natl Acad Sci U S A. 2009 Nov 3;106(44):18621-6
pubmed: 19841276
Syst Biol. 2011 May;60(3):291-302
pubmed: 21436105
Genome Biol. 2015 Jun 16;16:124
pubmed: 26076734
Bioinformatics. 2022 Mar 4;38(6):1532-1541
pubmed: 34978565
J Comput Biol. 2022 Jan;29(1):74-89
pubmed: 34986031
Bioinformatics. 2014 May 1;30(9):1312-3
pubmed: 24451623
Elife. 2013 Oct 01;2:e01102
pubmed: 24137540
Bioinformatics. 2019 Oct 15;35(20):3961-3969
pubmed: 30903685
Microbiol Mol Biol Rev. 2004 Dec;68(4):669-85
pubmed: 15590779
J Mol Evol. 1997 Feb;44(2):226-33
pubmed: 9069183
Nat Methods. 2013 Dec;10(12):1196-9
pubmed: 24141494
Pac Symp Biocomput. 2012;:247-58
pubmed: 22174280
Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3997-4002
pubmed: 18852104
Syst Biol. 2020 Mar 1;69(2):384-391
pubmed: 31290974
Proc Biol Sci. 2003 Feb 7;270(1512):313-21
pubmed: 12614582
J Math Biol. 2017 Jan;74(1-2):99-111
pubmed: 27155875
Bioinformatics. 2019 Nov 1;35(21):4453-4455
pubmed: 31070718
Nat Biotechnol. 2020 Sep;38(9):1079-1086
pubmed: 32341564
Bioinformatics. 2014 Dec 15;30(24):3548-55
pubmed: 25359891
Mol Biol Evol. 2012 Jun;29(6):1587-98
pubmed: 22319162
Nat Methods. 2015 Oct;12(10):902-3
pubmed: 26418763
Biology (Basel). 2013 Sep 26;2(4):1189-209
pubmed: 24833220
Microbiome. 2018 Nov 8;6(1):201
pubmed: 30409177
Mol Biol Evol. 2000 Mar;17(3):401-5
pubmed: 10723740
Mol Biol Evol. 2020 May 1;37(5):1495-1507
pubmed: 31868908
J Comput Biol. 2002;9(5):687-705
pubmed: 12487758
Appl Environ Microbiol. 2005 Dec;71(12):8228-35
pubmed: 16332807
Syst Biol. 2001 Sep-Oct;50(5):723-9
pubmed: 12116942
Science. 1967 Jan 20;155(3760):279-84
pubmed: 5334057
Proc Natl Acad Sci U S A. 2007 Mar 6;104(10):3901-6
pubmed: 17360450
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
BMC Evol Biol. 2007 Feb 08;7 Suppl 1:S4
pubmed: 17288577
Syst Biol. 2015 Jan;64(1):e26-41
pubmed: 25102857
Mol Ecol. 2020 Jul;29(14):2521-2534
pubmed: 32542933
Nat Biotechnol. 2013 Sep;31(9):814-21
pubmed: 23975157
Mol Biol Evol. 2002 Dec;19(12):2226-38
pubmed: 12446813
Mol Ecol Resour. 2012 Jul;12(4):676-85
pubmed: 22487608
Cell Syst. 2022 Oct 19;13(10):817-829.e3
pubmed: 36265468
Cell. 2018 Mar 8;172(6):1181-1197
pubmed: 29522741
Genome Biol Evol. 2019 Dec 1;11(12):3341-3352
pubmed: 31536115
Mol Ecol Resour. 2022 Apr;22(3):1213-1227
pubmed: 34643995
Mol Biol Evol. 1998 Jul;15(7):910-7
pubmed: 9656490
Syst Biol. 2019 Mar 1;68(2):365-369
pubmed: 30165689
Syst Biol. 2020 Mar 1;69(2):221-233
pubmed: 31504938
Nat Commun. 2020 May 19;11(1):2500
pubmed: 32427907
PLoS One. 2013;8(3):e56859
pubmed: 23505415
Nature. 2000 May 18;405(6784):299-304
pubmed: 10830951
BMC Bioinformatics. 2010 Oct 30;11:538
pubmed: 21034504
Nat Commun. 2019 Dec 2;10(1):5477
pubmed: 31792218
Nucleic Acids Res. 2007;35(9):3100-8
pubmed: 17452365
Nat Biotechnol. 2019 Aug;37(8):852-857
pubmed: 31341288
Proc Natl Acad Sci U S A. 2005 Feb 15;102(7):2567-72
pubmed: 15701695

Auteurs

Yueyu Jiang (Y)

Department of Electrical and Computer Engineering, UC San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.

Metin Balaban (M)

Bioinformatics and Systems Biology Graduate Program, UC San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.

Qiyun Zhu (Q)

Center for Fundamental and Applied Microbiomics, Arizona State University, 1151 S Forest Ave, Tempe, AZ 85281, USA.

Siavash Mirarab (S)

Department of Electrical and Computer Engineering, UC San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Animals Hemiptera Insect Proteins Phylogeny Insecticides

Classifications MeSH