Mash Screen: high-throughput sequence containment estimation for genome discovery.
Metagenomics
MinHash
Polyomavirus
SRA
Sequencing
Viral Discovery
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
05 11 2019
05 11 2019
Historique:
received:
27
02
2019
accepted:
27
09
2019
entrez:
7
11
2019
pubmed:
7
11
2019
medline:
6
2
2020
Statut:
epublish
Résumé
The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.
Identifiants
pubmed: 31690338
doi: 10.1186/s13059-019-1841-x
pii: 10.1186/s13059-019-1841-x
pmc: PMC6833257
doi:
Substances chimiques
Proteome
0
Types de publication
Journal Article
Research Support, N.I.H., Intramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
232Références
J Clin Microbiol. 2018 Mar 26;56(4):
pubmed: 29305551
Genome Res. 2016 Dec;26(12):1721-1729
pubmed: 27852649
Bioinformatics. 2012 Feb 15;28(4):593-4
pubmed: 22199392
mSphere. 2018 Dec 12;3(6):
pubmed: 30541782
Nat Commun. 2019 Jul 11;10(1):3066
pubmed: 31296857
Cell Syst. 2018 Aug 22;7(2):201-207.e4
pubmed: 29936185
Nat Biotechnol. 2019 Feb;37(2):152-159
pubmed: 30718882
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Bioinformatics. 2019 Feb 15;35(4):671-673
pubmed: 30052763
Environ Microbiol. 2013 Jun;15(6):1882-99
pubmed: 23387867
PLoS Biol. 2015 Jul 07;13(7):e1002195
pubmed: 26151137
J Infect Dis. 2014 Nov 15;210(10):1595-9
pubmed: 24795478
PLoS One. 2018 Oct 23;13(10):e0206273
pubmed: 30352098
Curr Microbiol. 2017 Oct;74(10):1137-1147
pubmed: 28687946
Nature. 2017 Oct 5;550(7674):61-66
pubmed: 28953883
PLoS Pathog. 2016 Apr 19;12(4):e1005574
pubmed: 27093155
Genome Biol. 2014 Mar 03;15(3):R46
pubmed: 24580807
Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21
pubmed: 21062823
Genome Res. 2017 May;27(5):824-834
pubmed: 28298430
Genome Biol. 2004;5(10):R80
pubmed: 15461798
Microbiome. 2016 Mar 14;4:12
pubmed: 26975510
Genome Biol. 2019 Dec 4;20(1):265
pubmed: 31801633
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Nat Biotechnol. 2016 Mar;34(3):300-2
pubmed: 26854477
J Gen Virol. 2017 Dec;98(12):3060-3067
pubmed: 29095685
Emerg Infect Dis. 2016 Apr;22(4):617-24
pubmed: 26982594
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804