To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics.
Journal
Nucleic acids research
ISSN: 1362-4962
Titre abrégé: Nucleic Acids Res
Pays: England
ID NLM: 0411011
Informations de publication
Date de publication:
04 06 2020
04 06 2020
Historique:
accepted:
04
04
2020
revised:
20
03
2020
received:
18
12
2019
pubmed:
28
4
2020
medline:
31
7
2020
entrez:
28
4
2020
Statut:
ppublish
Résumé
As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.
Identifiants
pubmed: 32338745
pii: 5825624
doi: 10.1093/nar/gkaa265
pmc: PMC7261164
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
5217-5234Subventions
Organisme : NLM NIH HHS
ID : T15 LM007093
Pays : United States
Organisme : NINDS NIH HHS
ID : R21 NS106640
Pays : United States
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.
Références
mSystems. 2016 Jun 7;1(3):
pubmed: 27822531
Nat Rev Genet. 2013 May;14(5):333-46
pubmed: 23594911
Nat Biotechnol. 2019 Feb;37(2):160-168
pubmed: 30718881
PLoS Comput Biol. 2017 Sep 5;13(9):e1005727
pubmed: 28873405
Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21
pubmed: 21062823
Gigascience. 2015 Jun 19;4:27
pubmed: 26097697
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
EURASIP J Bioinform Syst Biol. 2009;:162824
pubmed: 19158952
Cell. 2017 Nov 30;171(6):1424-1436.e18
pubmed: 29153835
BMC Bioinformatics. 2017 Jul 3;18(1):324
pubmed: 28673253
Magn Reson Med. 2007 Dec;58(6):1182-95
pubmed: 17969013
Brief Bioinform. 2019 Oct 17;:
pubmed: 31624847
PLoS Pathog. 2016 Aug 04;12(8):e1005713
pubmed: 27490201
BMC Res Notes. 2012 Feb 28;5:123
pubmed: 22373455
PLoS Biol. 2007 Mar;5(3):e77
pubmed: 17355176
Sci Adv. 2016 Sep 28;2(9):e1600025
pubmed: 27704040
Nature. 2010 Mar 4;464(7285):59-65
pubmed: 20203603
Proc Natl Acad Sci U S A. 2014 Apr 1;111(13):4904-9
pubmed: 24632729
Bioinformatics. 2013 Sep 1;29(17):2096-102
pubmed: 23786768
Bioinformatics. 2019 Jan 15;35(2):219-226
pubmed: 30010790
Nucleic Acids Res. 2020 Jan 8;48(D1):D70-D76
pubmed: 31722421
Genome Biol. 2018 Nov 16;19(1):198
pubmed: 30445993
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4
pubmed: 15608248
PLoS Comput Biol. 2017 Oct 2;13(10):e1005777
pubmed: 28968408
Bioinformatics. 2019 Jul 15;35(14):i127-i135
pubmed: 31510667
PLoS One. 2014 Mar 13;9(3):e91784
pubmed: 24626336
Sci Rep. 2013;3:1968
pubmed: 23752679
F1000Res. 2019 Jul 4;8:1006
pubmed: 31508216
Bioinformatics. 2016 Nov 15;32(22):3492-3494
pubmed: 27423894
Genome Biol. 2014 Mar 03;15(3):R46
pubmed: 24580807
Bioinformatics. 2018 Jan 1;34(1):171-178
pubmed: 29036588
Genome Biol. 2019 Sep 13;20(1):199
pubmed: 31519212
Nature. 2016 Sep 29;537(7622):689-693
pubmed: 27654921
Bioinformatics. 2016 Apr 1;32(7):1023-32
pubmed: 26589281
F1000Res. 2015 Sep 25;4:900
pubmed: 26535114
Clin Microbiol Rev. 2012 Jan;25(1):193-213
pubmed: 22232376
Nat Commun. 2019 Jul 11;10(1):3066
pubmed: 31296857
BMC Bioinformatics. 2019 Jul 12;20(1):389
pubmed: 31299914
Nat Biotechnol. 2019 Jul;37(7):783-792
pubmed: 31235920
Bioinformatics. 2018 Sep 1;34(17):i766-i772
pubmed: 30423080
Bioinformatics. 2017 Jul 15;33(14):i110-i117
pubmed: 28881970
Nature. 2007 Oct 18;449(7164):804-10
pubmed: 17943116
Microbiome. 2019 Mar 16;7(1):40
pubmed: 30878035
Commun ACM. 2016 Aug;59(8):72-80
pubmed: 28966343
Nat Rev Genet. 2019 Jun;20(6):341-355
pubmed: 30918369
Nucleic Acids Res. 2019 Jan 8;47(D1):D666-D677
pubmed: 30289528
J Comput Biol. 2017 Jun;24(6):547-557
pubmed: 27828710
Genome Biol. 2019 Dec 4;20(1):265
pubmed: 31801633
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Nat Biotechnol. 2016 Mar;34(3):300-2
pubmed: 26854477
J Comput Biol. 2018 Jul;25(7):755-765
pubmed: 29641248
Proc Natl Acad Sci U S A. 2012 Jun 26;109(26):E1679-87
pubmed: 22689950
Nat Biotechnol. 2019 Feb;37(2):152-159
pubmed: 30718882
J Comput Biol. 2018 Jul;25(7):766-779
pubmed: 29708767
Genome Biol. 2019 Nov 5;20(1):232
pubmed: 31690338
Genome Biol. 2019 Nov 28;20(1):257
pubmed: 31779668
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Bioinformatics. 2019 Feb 15;35(4):671-673
pubmed: 30052763
Bioinformatics. 2004 Dec 12;20(18):3363-9
pubmed: 15256412
Algorithms Mol Biol. 2014 Feb 24;9(1):2
pubmed: 24565280
BMC Bioinformatics. 2014;15 Suppl 9:S7
pubmed: 25252952
Environ Microbiol. 2013 Jun;15(6):1882-99
pubmed: 23387867