Strain level microbial detection and quantification with applications to single cell metagenomics.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
28 10 2022
Historique:
received: 23 09 2021
accepted: 04 10 2022
pubmed: 29 10 2022
medline: 2 11 2022
entrez: 28 10 2022
Statut: epublish

Résumé

Computational identification and quantification of distinct microbes from high throughput sequencing data is crucial for our understanding of human health. Existing methods either use accurate but computationally expensive alignment-based approaches or less accurate but computationally fast alignment-free approaches, which often fail to correctly assign reads to genomes. Here we introduce CAMMiQ, a combinatorial optimization framework to identify and quantify distinct genomes (specified by a database) in a metagenomic dataset. As a key methodological innovation, CAMMiQ uses substrings of variable length and those that appear in two genomes in the database, as opposed to the commonly used fixed-length, unique substrings. These substrings allow to accurately decouple mixtures of highly similar genomes resulting in higher accuracy than the leading alternatives, without requiring additional computational resources, as demonstrated on commonly used benchmarking datasets. Importantly, we show that CAMMiQ can distinguish closely related bacterial strains in simulated metagenomic and real single-cell metatranscriptomic data.

Identifiants

pubmed: 36307411
doi: 10.1038/s41467-022-33869-7
pii: 10.1038/s41467-022-33869-7
pmc: PMC9616933
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, N.I.H., Intramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

6430

Subventions

Organisme : NIAID NIH HHS
ID : R01 AI143254
Pays : United States

Informations de copyright

© 2022. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.

Références

Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5
pubmed: 17130148
Nucleic Acids Res. 2014 Apr;42(8):e67
pubmed: 24523352
Genome Res. 2016 Dec;26(12):1721-1729
pubmed: 27852649
BMC Genomics. 2015 Mar 25;16:236
pubmed: 25879410
Lancet Infect Dis. 2011 Nov;11(11):868-78
pubmed: 22035615
Cell Syst. 2018 Aug 22;7(2):201-207.e4
pubmed: 29936185
BMC Genomics. 2011;12 Suppl 2:S4
pubmed: 21989143
Nat Biotechnol. 2019 Feb;37(2):186-192
pubmed: 30718869
Genome Biol. 2020 Feb 24;21(1):47
pubmed: 32093762
PLoS One. 2015 Apr 17;10(4):e0121453
pubmed: 25884504
Am J Physiol Gastrointest Liver Physiol. 2016 Jul 1;311(1):G123-9
pubmed: 27288422
Nat Methods. 2012 Jun 10;9(8):811-4
pubmed: 22688413
Nat Commun. 2016 Apr 13;7:11257
pubmed: 27071849
Nature. 2012 Jun 13;486(7402):207-14
pubmed: 22699609
Bioinformatics. 2014 Mar 1;30(5):644-51
pubmed: 24130305
Sci Adv. 2020 Apr 22;6(17):eaaz2299
pubmed: 32494636
Res Comput Mol Biol. 2016 Apr;9649:255-257
pubmed: 28127592
Bioinformatics. 2020 Jul 1;36(Suppl_1):i12-i20
pubmed: 32657362
Oncoimmunology. 2019 Mar 27;8(6):e1581531
pubmed: 31069151
Bioinformatics. 2013 Sep 15;29(18):2253-60
pubmed: 23828782
mSystems. 2016 Jun 7;1(3):
pubmed: 27822531
Bioinformatics. 2014 Jul 15;30(14):2000-8
pubmed: 24828656
Clin Microbiol Rev. 2012 Oct;25(4):708-19
pubmed: 23034327
Genome Res. 2012 Feb;22(2):299-306
pubmed: 22009989
Bioinformatics. 2015 Nov 15;31(22):3584-92
pubmed: 26209798
Adv Bioinformatics. 2008;2008:205969
pubmed: 19956701
Cell. 2019 Aug 8;178(4):779-794
pubmed: 31398336
Science. 2017 Dec 15;358(6369):1443-1448
pubmed: 29170280
Genome Biol. 2018 Nov 16;19(1):198
pubmed: 30445993
Bioinformatics. 2011 Jan 1;27(1):127-9
pubmed: 21062764
Nucleic Acids Res. 2020 Jun 4;48(10):5217-5234
pubmed: 32338745
Nat Commun. 2018 Nov 19;9(1):4883
pubmed: 30451854
Nat Methods. 2009 Sep;6(9):673-6
pubmed: 19648916
Science. 2020 May 29;368(6494):973-980
pubmed: 32467386
Genome Biol. 2014 Mar 03;15(3):R46
pubmed: 24580807
Bioinformatics. 2016 Apr 1;32(7):1023-32
pubmed: 26589281
Genome Biol. 2017 Sep 21;18(1):182
pubmed: 28934964
Nat Methods. 2015 Oct;12(10):902-3
pubmed: 26418763
J Comput Biol. 2018 May;25(5):467-479
pubmed: 29620920
Nat Commun. 2018 Nov 23;9(1):4956
pubmed: 30470746
J Comput Biol. 2011 Nov;18(11):1693-707
pubmed: 21951053
Nature. 2020 Mar;579(7800):567-574
pubmed: 32214244
Front Microbiol. 2020 Aug 18;11:1925
pubmed: 33013732
Genome Res. 2007 Mar;17(3):377-86
pubmed: 17255551
Nat Biotechnol. 2016 Mar;34(3):300-2
pubmed: 26854477
Cell. 2017 Jul 27;170(3):548-563.e16
pubmed: 28753429
J Comput Biol. 2018 Jul;25(7):755-765
pubmed: 29641248
Genome Res. 2012 Feb;22(2):292-8
pubmed: 22009990
BMC Bioinformatics. 2005 May 23;6:123
pubmed: 15910684
Genome Biol. 2019 Nov 5;20(1):232
pubmed: 31690338
Genome Biol. 2019 Nov 28;20(1):257
pubmed: 31779668
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Bioinformatics. 2018 Dec 15;34(24):4287-4289
pubmed: 29982281
Front Public Health. 2019 Jun 27;7:172
pubmed: 31316960
Nat Methods. 2007 Jan;4(1):63-72
pubmed: 17179938
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Immunity. 2015 Feb 17;42(2):344-355
pubmed: 25680274

Auteurs

Kaiyuan Zhu (K)

Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
Department of Computer Science & Engineering, UC San Diego, La Jolla, CA, USA.
Department of Computer Science, Indiana University, Bloomington, IN, USA.

Alejandro A Schäffer (AA)

Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.

Welles Robinson (W)

Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
Surgery Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.

Junyan Xu (J)

Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.

Eytan Ruppin (E)

Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.

A Funda Ergun (AF)

Department of Computer Science, Indiana University, Bloomington, IN, USA.

Yuzhen Ye (Y)

Department of Computer Science, Indiana University, Bloomington, IN, USA.

S Cenk Sahinalp (SC)

Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA. cenk.sahinalp@nih.gov.
Department of Computer Science, Indiana University, Bloomington, IN, USA. cenk.sahinalp@nih.gov.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH