Contamination detection in genomic data: more is not enough.

Algorithms Contamination detection Corroboration Databases Genomics Review

Journal

Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660

Informations de publication

Date de publication:
21 02 2022
Historique:
received: 14 10 2021
accepted: 18 01 2022
entrez: 22 2 2022
pubmed: 23 2 2022
medline: 3 5 2022
Statut: epublish

Résumé

The decreasing cost of sequencing and concomitant augmentation of publicly available genomes have created an acute need for automated software to assess genomic contamination. During the last 6 years, 18 programs have been published, each with its own strengths and weaknesses. Deciding which tools to use becomes more and more difficult without an understanding of the underlying algorithms. We review these programs, benchmarking six of them, and present their main operating principles. This article is intended to guide researchers in the selection of appropriate tools for specific applications. Finally, we present future challenges in the developing field of contamination detection.

Identifiants

pubmed: 35189924
doi: 10.1186/s13059-022-02619-9
pii: 10.1186/s13059-022-02619-9
pmc: PMC8862208
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Review

Langues

eng

Sous-ensembles de citation

IM

Pagination

60

Informations de copyright

© 2022. The Author(s).

Références

Appl Environ Microbiol. 2013 Nov;79(22):6868-73
pubmed: 23995941
Biotechnol Biotechnol Equip. 2014 Jul 4;28(4):583-591
pubmed: 26019546
Front Microbiol. 2021 Oct 22;12:755101
pubmed: 34745061
PeerJ. 2019 May 31;7:e6995
pubmed: 31183253
Genome Biol. 2014 Mar 03;15(3):R46
pubmed: 24580807
Int J Syst Evol Microbiol. 2017 Jun;67(6):2053-2057
pubmed: 28639931
Sci Rep. 2019 Feb 7;9(1):1652
pubmed: 30733546
Trends Microbiol. 2019 Feb;27(2):105-117
pubmed: 30497919
Proc Natl Acad Sci U S A. 2015 Dec 29;112(52):15976-81
pubmed: 26598659
Nat Rev Genet. 2015 Aug;16(8):472-82
pubmed: 26184597
Mol Biol Evol. 2020 Mar 1;37(3):651-659
pubmed: 31693153
Curr Biol. 2017 Apr 3;27(7):958-967
pubmed: 28318975
Genome Biol. 2019 Dec 18;20(1):286
pubmed: 31849328
Curr Biol. 2010 Dec 21;20(24):2217-22
pubmed: 21145743
PeerJ. 2016 Mar 29;4:e1839
pubmed: 27069789
PLoS One. 2009;4(2):e4437
pubmed: 19212443
Genome Res. 2020 Mar;30(3):315-333
pubmed: 32188701
ISME J. 2016 Jan;10(1):269-72
pubmed: 26057843
Proc Natl Acad Sci U S A. 2007 Jan 16;104(3):870-5
pubmed: 17213324
NAR Genom Bioinform. 2021 Aug 05;3(3):lqab071
pubmed: 34377979
Nucleic Acids Res. 2020 Jan 8;48(D1):D621-D625
pubmed: 31647096
Commun Integr Biol. 2009 Sep;2(5):403-5
pubmed: 19907700
Nat Biotechnol. 2018 Nov;36(10):996-1004
pubmed: 30148503
Mol Ecol Resour. 2020 May;20(3):
pubmed: 31943790
PeerJ. 2021 May 5;9:e11348
pubmed: 33996287
Nucleic Acids Res. 2018 Jan 4;46(D1):D851-D860
pubmed: 29112715
PLoS One. 2018 Jul 25;13(7):e0200323
pubmed: 30044797
Proc Natl Acad Sci U S A. 2008 Jul 22;105(29):10039-44
pubmed: 18632554
Curr Biol. 2018 Aug 6;28(15):2436-2444.e14
pubmed: 30017483
Front Microbiol. 2021 Oct 20;12:745076
pubmed: 34745046
Genome Biol. 2005;6(3):R23
pubmed: 15774024
Proc Natl Acad Sci U S A. 2016 May 31;113(22):E3057
pubmed: 27173901
Bioinformatics. 2012 Jul 15;28(14):1823-9
pubmed: 22556368
PLoS One. 2017 May 11;12(5):e0177459
pubmed: 28494014
Mol Biol Evol. 2021 Sep 27;38(10):4647-4654
pubmed: 34320186
Genome Biol. 2018 Nov 16;19(1):198
pubmed: 30445993
PLoS Biol. 2011 Mar;9(3):e1000602
pubmed: 21423652
Genome Res. 2019 Jun;29(6):954-960
pubmed: 31064768
BMC Genomics. 2015 Mar 25;16:236
pubmed: 25879410
Bioinformatics. 2021 Mar 17;:
pubmed: 33734313
BMC Biol. 2017 Mar 29;15(1):25
pubmed: 28356154
Genome Biol. 2017 May 8;18(1):85
pubmed: 28482857
Curr Biol. 2017 Mar 20;27(6):807-820
pubmed: 28262486
Genome Res. 2008 Dec;18(12):1979-90
pubmed: 18757608
G3 (Bethesda). 2020 Apr 9;10(4):1361-1374
pubmed: 32071071
G3 (Bethesda). 2020 Feb 6;10(2):721-730
pubmed: 31862787
Nat Rev Genet. 2008 Aug;9(8):605-18
pubmed: 18591983
Genome Res. 2015 Jul;25(7):1043-55
pubmed: 25977477
PLoS Comput Biol. 2018 Jun 25;14(6):e1006277
pubmed: 29939994
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
PLoS One. 2011 Feb 16;6(2):e16410
pubmed: 21358816
Genome Res. 2007 Mar;17(3):377-86
pubmed: 17255551
Nucleic Acids Res. 2012 Jan;40(Database issue):D115-22
pubmed: 22194640
Mol Biol Evol. 2016 Aug;33(8):2170-2
pubmed: 27189556
Stand Genomic Sci. 2015 Mar 30;10:18
pubmed: 26203331
Cell. 2019 Mar 7;176(6):1356-1366.e10
pubmed: 30799038
Proc Natl Acad Sci U S A. 2019 Sep 3;116(36):17906-17915
pubmed: 31431529
Genome Biol. 2021 Jun 13;22(1):178
pubmed: 34120611
Curr Biol. 2012 Aug 7;22(15):R593-4
pubmed: 22877776
Genome Biol. 2020 Sep 10;21(1):244
pubmed: 32912302
Proc Natl Acad Sci U S A. 2016 May 3;113(18):5053-8
pubmed: 27035985
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Mol Biol Evol. 2012 Jan;29(1):51-60
pubmed: 21680869
Genome Biol. 2019 Nov 28;20(1):257
pubmed: 31779668
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Genome Biol. 2020 May 12;21(1):115
pubmed: 32398145
Nat Commun. 2018 Jun 29;9(1):2542
pubmed: 29959318
PeerJ. 2015 Oct 08;3:e1319
pubmed: 26500826
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
BMC Res Notes. 2021 Aug 9;14(1):306
pubmed: 34372933
Nat Commun. 2019 Dec 2;10(1):5477
pubmed: 31792218
Bioinformatics. 2017 Oct 15;33(20):3283-3285
pubmed: 28637232
Nat Biotechnol. 2017 Aug 8;35(8):725-731
pubmed: 28787424
Front Genet. 2013 Nov 29;4:237
pubmed: 24348509
PLoS One. 2016 Sep 09;11(9):e0162424
pubmed: 27611326
Nat Ecol Evol. 2017 Sep;1(9):1370-1378
pubmed: 28890940

Auteurs

Luc Cornet (L)

BCCM/IHEM, Mycology and Aerobiology, Sciensano, Bruxelles, Belgium.

Denis Baurain (D)

InBioS-PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium. denis.baurain@uliege.be.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature

Classifications MeSH