isolateR: an R package for generating microbial libraries from Sanger sequencing data.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
11 Jul 2024
Historique:
received: 30 04 2024
revised: 06 07 2024
accepted: 10 07 2024
medline: 12 7 2024
pubmed: 12 7 2024
entrez: 11 7 2024
Statut: aheadofprint

Résumé

Sanger sequencing of taxonomic marker genes (e.g., 16S/18S/ITS/rpoB/cpn60) represents the leading method for identifying a wide range of microorganisms including bacteria, archaea, and fungi. However, the manual processing of sequence data and limitations associated with conventional BLAST searches impede the efficient generation of strain libraries essential for cataloging microbial diversity and discovering novel species. isolateR addresses these challenges by implementing a standardized and scalable three-step pipeline that includes: 1) automated batch processing of Sanger sequence files, 2) taxonomic classification via global alignment to type strain databases in accordance with the latest international nomenclature standards, and 3) straightforward creation of strain libraries and handling of clonal isolates, with the ability to set customizable sequence dereplication thresholds and combine data from multiple sequencing runs into a single library. The tool's user-friendly design also features interactive HTML outputs that simplify data exploration and analysis. Additionally, in silico benchmarking done on two comprehensive human gut genome catalogues (IMGG and Hadza hunter-gather populations) showcase the proficiency of isolateR in uncovering and cataloging the nuanced spectrum of microbial diversity, advocating for a more targeted and granular exploration within individual hosts to achieve the highest strain-level resolution possible when generating culture collections. isolateR is available at: https://github.com/bdaisley/isolateR. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 38991828
pii: 7712426
doi: 10.1093/bioinformatics/btae448
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press.

Auteurs

Brendan Daisley (B)

Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.

Sarah J Vancuren (SJ)

Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.

Dylan J L Brettingham (DJL)

Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.

Jacob Wilde (J)

Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.

Simone Renwick (S)

Larsson-Rosenquist Foundation Mother-Milk-Infant Center of Research Excellence (MOMI CORE), The Human Milk Institute (HMI), University of California, CA 92093, USA.

Christine Macpherson (C)

Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.

David A Good (DA)

Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.

Alexander J Botschner (AJ)

Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.

Sandi Yen (S)

Medical Sciences Division, University of Oxford, Oxford, OX1 2JD, United Kingdom.

Janet E Hill (JE)

Department of Veterinary Microbiology, University of Saskatchewan, Saskatoon, S7N 5B4, Canada.

Matthew T Sorbara (MT)

Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.

Emma Allen-Vercoe (E)

Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.

Classifications MeSH