BioKIT: a versatile toolkit for processing and analyzing diverse types of sequence data.

bioinformatics codon gene-wise relative synonymous codon usage genetic code genome assembly quality multiple sequence alignment

Journal

Genetics
ISSN: 1943-2631
Titre abrégé: Genetics
Pays: United States
ID NLM: 0374636

Informations de publication

Date de publication:
04 07 2022
Historique:
received: 21 02 2022
accepted: 03 05 2022
pubmed: 11 5 2022
medline: 7 7 2022
entrez: 10 5 2022
Statut: ppublish

Résumé

Bioinformatic analysis-such as genome assembly quality assessment, alignment summary statistics, relative synonymous codon usage, file format conversion, and processing and analysis-is integrated into diverse disciplines in the biological sciences. Several command-line pieces of software have been developed to conduct some of these individual analyses, but unified toolkits that conduct all these analyses are lacking. To address this gap, we introduce BioKIT, a versatile command line toolkit that has, upon publication, 42 functions, several of which were community-sourced, that conduct routine and novel processing and analysis of genome assemblies, multiple sequence alignments, coding sequences, sequencing data, and more. To demonstrate the utility of BioKIT, we conducted a comprehensive examination of relative synonymous codon usage across 171 fungal genomes that use alternative genetic codes, showed that the novel metric of gene-wise relative synonymous codon usage can accurately estimate gene-wise codon optimization, evaluated the quality and characteristics of 901 eukaryotic genome assemblies, and calculated alignment summary statistics for 10 phylogenomic data matrices. BioKIT will be helpful in facilitating and streamlining sequence analysis workflows. BioKIT is freely available under the MIT license from GitHub (https://github.com/JLSteenwyk/BioKIT), PyPi (https://pypi.org/project/jlsteenwyk-biokit/), and the Anaconda Cloud (https://anaconda.org/jlsteenwyk/jlsteenwyk-biokit). Documentation, user tutorials, and instructions for requesting new features are available online (https://jlsteenwyk.com/BioKIT).

Identifiants

pubmed: 35536198
pii: 6583183
doi: 10.1093/genetics/iyac079
pmc: PMC9252278
pii:
doi:

Substances chimiques

Codon 0

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NIAID NIH HHS
ID : R56 AI146096
Pays : United States
Organisme : Howard Hughes Medical Institute through the James H. Gilliam Fellowships
Organisme : NIAID NIH HHS
ID : R01 AI153356
Pays : United States

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Références

Syst Appl Microbiol. 1991;14(4):364-71
pubmed: 11540072
Bioinformatics. 2007 Jul 15;23(14):1713-7
pubmed: 17485425
Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355-8
pubmed: 3474607
Proc Natl Acad Sci U S A. 2015 May 5;112(18):5773-8
pubmed: 25902535
BMC Genomics. 2020 Mar 2;21(1):193
pubmed: 32122303
Plant Cell. 2012 Apr;24(4):1379-97
pubmed: 22492812
Nucleic Acids Res. 1986 Jul 11;14(13):5125-43
pubmed: 3526280
Curr Biol. 2015 Aug 3;25(15):1993-9
pubmed: 26212885
PLoS Pathog. 2015 Feb 13;11(2):e1004664
pubmed: 25679389
PLoS Biol. 2021 Apr 19;19(4):e3001185
pubmed: 33872297
G3 (Bethesda). 2021 Sep 6;11(9):
pubmed: 34544141
Mol Biol Evol. 2021 Aug 23;38(9):4025-4038
pubmed: 33983409
Bioinformatics. 1998;14(4):372-3
pubmed: 9632833
Bioinformatics. 2009 Jun 1;25(11):1422-3
pubmed: 19304878
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W686-91
pubmed: 16845098
DNA Res. 2014 Oct;21(5):511-26
pubmed: 24906480
Proc Natl Acad Sci U S A. 2019 Jun 4;116(23):11275-11284
pubmed: 31110018
PLoS Genet. 2019 Jul 31;15(7):e1008304
pubmed: 31365533
BMC Bioinformatics. 2019 Aug 15;20(1):424
pubmed: 31416440
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Gigascience. 2018 Jul 1;7(7):
pubmed: 29961842
Genetics. 2001 Oct;159(2):907-11
pubmed: 11693127
Trends Genet. 2000 Jun;16(6):276-7
pubmed: 10827456
mBio. 2019 Jul 9;10(4):
pubmed: 31289177
Proc Natl Acad Sci U S A. 1998 May 26;95(11):5906-12
pubmed: 9600891
J Zhejiang Univ Sci B. 2008 Sep;9(9):667-74
pubmed: 18763298
BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):349
pubmed: 30367595
BMJ. 2002 Apr 27;324(7344):1018-22
pubmed: 11976246
PLoS Genet. 2009 Jul;5(7):e1000556
pubmed: 19593368
Science. 2014 Nov 7;346(6210):763-7
pubmed: 25378627
Bioinformatics. 2021 Feb 09;:
pubmed: 33560364
Bioinformatics. 2008 Mar 1;24(5):715-6
pubmed: 18227120
Mol Biol Evol. 2007 Sep;24(9):2139-50
pubmed: 17652333
Nat Commun. 2020 May 8;11(1):2288
pubmed: 32385271
Mol Biol Evol. 2018 May 1;35(5):1037-1046
pubmed: 29385525
PLoS Biol. 2022 Oct 13;20(10):e3001827
pubmed: 36228036
PeerJ. 2016 Jan 28;4:e1660
pubmed: 26835189
Proc Biol Sci. 2001 Jul 22;268(1475):1533-8
pubmed: 11454299
Philos Trans R Soc Lond B Biol Sci. 2015 Sep 26;370(1678):20140331
pubmed: 26323762
Mol Biol Evol. 2015 Aug;32(8):2001-14
pubmed: 25837578
PLoS Biol. 2020 Dec 2;18(12):e3001007
pubmed: 33264284
Sci Adv. 2020 Nov 4;6(45):
pubmed: 33148650
Nat Commun. 2014 Jul 18;5:4471
pubmed: 25034666
PLoS One. 2016 Oct 5;11(10):e0163962
pubmed: 27706213
G3 (Bethesda). 2016 Dec 7;6(12):3927-3939
pubmed: 27672114
Syst Biol. 2021 Oct 13;70(6):1200-1212
pubmed: 33837789
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
PLoS Genet. 2016 Mar 10;12(3):e1005926
pubmed: 26963725
J Theor Biol. 2004 May 7;228(1):97-106
pubmed: 15064085
Mol Syst Biol. 2013 Jun 18;9:675
pubmed: 23774758
Mol Biol Evol. 2013 Jul;30(7):1720-8
pubmed: 23564938
Bioinformatics. 2013 Apr 15;29(8):1072-5
pubmed: 23422339
BMC Genomics. 2015 Nov 23;16:987
pubmed: 26596625
Genome Biol Evol. 2016 Sep 02;8(8):2565-80
pubmed: 27492233
Genome Biol. 2013 May 27;14(5):R47
pubmed: 23710727
Genome Res. 2010 Aug;20(8):1001-9
pubmed: 20530252
Nat Commun. 2018 May 14;9(1):1887
pubmed: 29760453
BMC Evol Biol. 2011 Sep 26;11:275
pubmed: 21943181
Philos Trans R Soc Lond B Biol Sci. 2003 Jan 29;358(1429):191-201; discussion 201-2
pubmed: 12594927
Nature. 2004 Dec 2;432(7017):618-22
pubmed: 15577909
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
PLoS Biol. 2019 Jun 20;17(6):e3000333
pubmed: 31220077
Mol Biol Evol. 2016 Jul;33(7):1870-4
pubmed: 27004904
Microbiol Resour Announc. 2021 Nov 4;10(44):e0087121
pubmed: 34734767
Cell. 2018 Nov 29;175(6):1533-1545.e20
pubmed: 30415838
Nat Plants. 2018 Jul;4(7):440-452
pubmed: 29915331
Genome Biol. 2019 Feb 27;20(1):47
pubmed: 30813962
Bioinformatics. 2016 Sep 1;32(17):2686-91
pubmed: 27153671
BMC Evol Biol. 2013 Apr 29;13:93
pubmed: 23627680
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W14-8
pubmed: 20439312
Syst Biol. 2015 Nov;64(6):1104-20
pubmed: 26276158

Auteurs

Jacob L Steenwyk (JL)

Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.
Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.

Thomas J Buida (TJ)

9 City Place #312, Nashville, TN 37209, USA.

Carla Gonçalves (C)

Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.
Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.
Associate Laboratory i4HB-Institute for Health and Bioeconomy, NOVA School of Science and Technology, NOVA University Lisbon, 2819-516 Caparica, Portugal.
UCIBIO-Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, NOVA University Lisbon, 2819-516 Caparica, Portugal.

Dayna C Goltz (DC)

2312 Elliston Place #510, Nashville, TN 37203, USA.

Grace Morales (G)

Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA.

Matthew E Mead (ME)

Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.
Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.

Abigail L LaBella (AL)

Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.
Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.

Christina M Chavez (CM)

Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.
Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.

Jonathan E Schmitz (JE)

Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA.

Maria Hadjifrangiskou (M)

Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.
Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA.

Yuanning Li (Y)

Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.

Antonis Rokas (A)

Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.
Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH