Contamination in Reference Sequence Databases: Time for Divide-and-Rule Tactics.
NCBI RefSeq
assembly
contamination
databases
genomes
phylogenomics
sequencing
Journal
Frontiers in microbiology
ISSN: 1664-302X
Titre abrégé: Front Microbiol
Pays: Switzerland
ID NLM: 101548977
Informations de publication
Date de publication:
2021
2021
Historique:
received:
07
08
2021
accepted:
04
10
2021
entrez:
8
11
2021
pubmed:
9
11
2021
medline:
9
11
2021
Statut:
epublish
Résumé
Contaminating sequences in public genome databases is a pervasive issue with potentially far-reaching consequences. This problem has attracted much attention in the recent literature and many different tools are now available to detect contaminants. Although these methods are based on diverse algorithms that can sometimes produce widely different estimates of the contamination level, the majority of genomic studies rely on a single method of detection, which represents a risk of systematic error. In this work, we used two orthogonal methods to assess the level of contamination among National Center for Biotechnological Information Reference Sequence Database (RefSeq) bacterial genomes. First, we applied the most popular solution, CheckM, which is based on gene markers. We then complemented this approach by a genome-wide method, termed Physeter, which now implements a
Identifiants
pubmed: 34745061
doi: 10.3389/fmicb.2021.755101
pmc: PMC8570097
doi:
Banques de données
figshare
['10.6084/m9.figshare.13139810']
Types de publication
Journal Article
Langues
eng
Pagination
755101Informations de copyright
Copyright © 2021 Lupo, Van Vlierberghe, Vanderschuren, Kerff, Baurain and Cornet.
Déclaration de conflit d'intérêts
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Références
PeerJ. 2014 Nov 20;2:e675
pubmed: 25426337
Nucleic Acids Res. 2002 Jan 1;30(1):38-41
pubmed: 11752248
Genome Res. 2019 Jun;29(6):954-960
pubmed: 31064768
PeerJ. 2019 May 31;7:e6995
pubmed: 31183253
G3 (Bethesda). 2020 Apr 9;10(4):1361-1374
pubmed: 32071071
Curr Biol. 2012 Aug 7;22(15):R593-4
pubmed: 22877776
Nat Commun. 2019 Dec 2;10(1):5477
pubmed: 31792218
Nat Biotechnol. 2017 Aug 8;35(8):725-731
pubmed: 28787424
Nat Biotechnol. 2018 Nov;36(10):996-1004
pubmed: 30148503
Proc Natl Acad Sci U S A. 2016 May 31;113(22):E3054-6
pubmed: 27173902
Nucleic Acids Res. 2018 Jan 4;46(D1):D851-D860
pubmed: 29112715
PLoS One. 2018 Jul 25;13(7):e0200323
pubmed: 30044797
Genome Res. 2015 Jul;25(7):1043-55
pubmed: 25977477
PLoS Comput Biol. 2018 Jun 25;14(6):e1006277
pubmed: 29939994
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Genome Res. 2007 Mar;17(3):377-86
pubmed: 17255551
Nucleic Acids Res. 2012 Jan;40(Database issue):D115-22
pubmed: 22194640
Cell. 2019 Jan 24;176(3):649-662.e20
pubmed: 30661755
Proc Natl Acad Sci U S A. 2016 May 3;113(18):5053-8
pubmed: 27035985
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Genome Biol. 2019 Nov 28;20(1):257
pubmed: 31779668
Genome Biol. 2018 Oct 30;19(1):165
pubmed: 30373669