PyIR: a scalable wrapper for processing billions of immunoglobulin and T cell receptor sequences using IgBLAST.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
16 Jul 2020
Historique:
received: 23 02 2020
accepted: 09 07 2020
entrez: 18 7 2020
pubmed: 18 7 2020
medline: 22 8 2020
Statut: epublish

Résumé

Recent advances in DNA sequencing technologies have enabled significant leaps in capacity to generate large volumes of DNA sequence data, which has spurred a rapid growth in the use of bioinformatics as a means of interrogating antibody variable gene repertoires. Common tools used for annotation of antibody sequences are often limited in functionality, modularity and usability. We have developed PyIR, a Python wrapper and library for IgBLAST, which offers a minimal setup CLI and API, FASTQ support, file chunking for large sequence files, JSON and Python dictionary output, and built-in sequence filtering. PyIR offers improved processing speed over multithreaded IgBLAST (version 1.14) when spawning more than 16 processes on a single computer system. Its customizable filtering and data encapsulation allow it to be adapted to a wide range of computing environments. The API allows for IgBLAST to be used in customized bioinformatics workflows.

Sections du résumé

BACKGROUND BACKGROUND
Recent advances in DNA sequencing technologies have enabled significant leaps in capacity to generate large volumes of DNA sequence data, which has spurred a rapid growth in the use of bioinformatics as a means of interrogating antibody variable gene repertoires. Common tools used for annotation of antibody sequences are often limited in functionality, modularity and usability.
RESULTS RESULTS
We have developed PyIR, a Python wrapper and library for IgBLAST, which offers a minimal setup CLI and API, FASTQ support, file chunking for large sequence files, JSON and Python dictionary output, and built-in sequence filtering.
CONCLUSIONS CONCLUSIONS
PyIR offers improved processing speed over multithreaded IgBLAST (version 1.14) when spawning more than 16 processes on a single computer system. Its customizable filtering and data encapsulation allow it to be adapted to a wide range of computing environments. The API allows for IgBLAST to be used in customized bioinformatics workflows.

Identifiants

pubmed: 32677886
doi: 10.1186/s12859-020-03649-5
pii: 10.1186/s12859-020-03649-5
pmc: PMC7364545
doi:

Substances chimiques

Immunoglobulins 0
Receptors, Antigen, T-Cell 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

314

Subventions

Organisme : NLM NIH HHS
ID : T15 LM007450
Pays : United States
Organisme : NIAID NIH HHS
ID : U19 AI117905
Pays : United States
Organisme : NIAID NIH HHS
ID : HHSN272201400024C
Pays : United States
Organisme : Human Vaccines Project
ID : NA

Références

Bioinformatics. 2020 Mar 1;36(6):1731-1739
pubmed: 31873728
Methods Mol Biol. 2012;882:569-604
pubmed: 22665256
Bioinformatics. 2014 Jul 1;30(13):1930-2
pubmed: 24618469
Nat Commun. 2016 Dec 20;7:13642
pubmed: 27995928
Front Immunol. 2019 Apr 30;10:899
pubmed: 31114573
Genes Immun. 2012 Oct;13(7):523-9
pubmed: 22717702
Nature. 2019 Feb;566(7744):393-397
pubmed: 30664748
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Science. 2009 May 8;324(5928):807-10
pubmed: 19423829
Nucleic Acids Res. 2015 Jan;43(Database issue):D413-22
pubmed: 25378316
Immunol Rev. 2017 Jan;275(1):108-128
pubmed: 28133812
Proc Natl Acad Sci U S A. 2013 Apr 16;110(16):6470-5
pubmed: 23536288
Sci Rep. 2016 Apr 22;6:23901
pubmed: 27102563
Nature. 2019 Feb;566(7744):398-402
pubmed: 30760926
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W34-40
pubmed: 23671333
Front Immunol. 2019 Oct 09;10:2365
pubmed: 31649674

Auteurs

Cinque Soto (C)

Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.

Jessica A Finn (JA)

Department of Pathology, Microbiology, and Immunology, Vanderbilt University, Nashville, TN, 37232, USA.

Jordan R Willis (JR)

Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.

Samuel B Day (SB)

Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.

Robert S Sinkovits (RS)

San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA, 92093, USA.

Taylor Jones (T)

Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.

Samuel Schmitz (S)

Department of Chemistry, Vanderbilt University, Nashville, TN, 37212, USA.

Jens Meiler (J)

Department of Chemistry, Vanderbilt University, Nashville, TN, 37212, USA.

Andre Branchizio (A)

Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.

James E Crowe (JE)

Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. james.crowe@vumc.org.
Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. james.crowe@vumc.org.
Department of Pathology, Microbiology, and Immunology, Vanderbilt University, Nashville, TN, 37232, USA. james.crowe@vumc.org.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH