iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria.
Journal
PLoS biology
ISSN: 1545-7885
Titre abrégé: PLoS Biol
Pays: United States
ID NLM: 101183755
Informations de publication
Date de publication:
04 2023
04 2023
Historique:
received:
30
08
2022
accepted:
15
03
2023
revised:
03
05
2023
medline:
5
5
2023
pubmed:
21
4
2023
entrez:
21
04
2023
Statut:
epublish
Résumé
The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e., for a number of viruses they yield erroneous predictions or no prediction at all. Here, we describe iPHoP, a two-step framework that integrates multiple methods to reliably predict host taxonomy at the genus rank for a broad range of viruses infecting bacteria and archaea, while retaining a low false discovery rate. Based on a large dataset of metagenome-derived virus genomes from the IMG/VR database, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses.
Identifiants
pubmed: 37083735
doi: 10.1371/journal.pbio.3002083
pii: PBIOLOGY-D-22-01906
pmc: PMC10155999
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
e3002083Informations de copyright
Copyright: © 2023 Roux et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Nat Microbiol. 2018 Aug;3(8):870-880
pubmed: 30013236
Sci Rep. 2021 Jan 14;11(1):1467
pubmed: 33446856
Curr Protoc Bioinformatics. 2020 Mar;69(1):e96
pubmed: 32162851
Brief Bioinform. 2022 Jan 17;23(1):
pubmed: 34472593
Curr Opin Biotechnol. 2021 Feb;67:184-191
pubmed: 33592536
BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):594
pubmed: 31787095
BMC Bioinformatics. 2007 Jan 20;8:18
pubmed: 17239253
Nucleic Acids Res. 2022 Jan 7;50(D1):D785-D794
pubmed: 34520557
BMC Bioinformatics. 2010 Mar 08;11:119
pubmed: 20211023
Methods Enzymol. 1996;266:554-71
pubmed: 8743706
Nat Microbiol. 2021 May;6(5):630-642
pubmed: 33633401
Nat Commun. 2016 Feb 03;7:10613
pubmed: 26837824
Nucleic Acids Res. 2021 Jan 8;49(D1):D751-D763
pubmed: 33119741
Commun Biol. 2020 Jun 22;3(1):321
pubmed: 32572116
Bioinformatics. 2021 Jul 27;37(13):1805-1813
pubmed: 33471063
Nat Biotechnol. 2019 Jan;37(1):29-37
pubmed: 30556814
Nucleic Acids Res. 2021 Jan 8;49(D1):D764-D775
pubmed: 33137183
Annu Rev Virol. 2021 Sep 29;8(1):133-158
pubmed: 34033501
Nat Commun. 2018 Nov 14;9(1):4781
pubmed: 30429469
Nat Commun. 2018 Nov 30;9(1):5114
pubmed: 30504855
NAR Genom Bioinform. 2020 Jun;2(2):lqaa044
pubmed: 32626849
BMC Bioinformatics. 2007 Jun 18;8:209
pubmed: 17577412
BMC Biol. 2021 Oct 8;19(1):223
pubmed: 34625070
Nat Biotechnol. 2021 May;39(5):578-585
pubmed: 33349699
Nat Rev Microbiol. 2021 Aug;19(8):501-513
pubmed: 33762712
FEMS Microbiol Rev. 2016 Mar;40(2):258-72
pubmed: 26657537
Nat Biotechnol. 2021 Apr;39(4):499-509
pubmed: 33169036
BMC Biol. 2021 Jan 14;19(1):5
pubmed: 33441133
ISME J. 2017 Dec;11(12):2864-2868
pubmed: 28742071
Bioinformatics. 2022 Feb 7;38(5):1447-1449
pubmed: 34904625
BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):420
pubmed: 30453987
Bioinformatics. 2022 Jan 3;38(2):543-545
pubmed: 34383025
Nucleic Acids Res. 2013 May 1;41(10):e105
pubmed: 23511966
PeerJ. 2021 May 6;9:e11396
pubmed: 33996289
Microbiome. 2021 Nov 26;9(1):233
pubmed: 34836550
Nat Microbiol. 2021 Jul;6(7):960-970
pubmed: 34168315
J Mol Biol. 2014 Nov 25;426(23):3892-906
pubmed: 25020228
Phage (New Rochelle). 2022 Dec 1;3(4):204-212
pubmed: 36793881
Nat Commun. 2014 Jul 24;5:4498
pubmed: 25058116
Viruses. 2016 May 04;8(5):
pubmed: 27153081
Nat Microbiol. 2018 Jan;3(1):38-46
pubmed: 29133882
Bioinformatics. 2017 Oct 01;33(19):3113-3114
pubmed: 28957499
Cell Host Microbe. 2020 Nov 11;28(5):724-740.e8
pubmed: 32841606
Bioinformatics. 2021 Oct 11;37(19):3364-3366
pubmed: 33792634
ISME J. 2018 May;12(5):1171-1179
pubmed: 29371652
Nucleic Acids Res. 2019 Jan 8;47(D1):D649-D659
pubmed: 30357420
Brief Bioinform. 2022 Jan 17;23(1):
pubmed: 34553750
Phage (New Rochelle). 2021 Dec 1;2(4):214-223
pubmed: 36159887
Genome Res. 2015 Jul;25(7):1043-55
pubmed: 25977477
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Science. 2011 Jul 1;333(6038):58-62
pubmed: 21719670
Genomics Proteomics Bioinformatics. 2022 Jun;20(3):508-523
pubmed: 35272051
Sci Rep. 2019 Mar 5;9(1):3436
pubmed: 30837511
Patterns (N Y). 2021 Jun 15;2(7):100274
pubmed: 34286299
Water Res. 2018 Mar 15;131:142-150
pubmed: 29281808
Nat Biotechnol. 2018 Jul 6;36(7):566-569
pubmed: 29979655
Nucleic Acids Res. 2017 Jan 9;45(1):39-53
pubmed: 27899557
Bioinformatics. 2019 Nov 15;:
pubmed: 31730192
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Brief Bioinform. 2022 Sep 20;23(5):
pubmed: 35595715
Curr Opin Virol. 2021 Aug;49:117-126
pubmed: 34126465
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Nature. 2016 Apr 28;532(7600):465-470
pubmed: 26863193
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500