PyIR: a scalable wrapper for processing billions of immunoglobulin and T cell receptor sequences using IgBLAST.
Antibody
CDR3
IgBLAST
Illumina
Immune repertoires
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
16 Jul 2020
16 Jul 2020
Historique:
received:
23
02
2020
accepted:
09
07
2020
entrez:
18
7
2020
pubmed:
18
7
2020
medline:
22
8
2020
Statut:
epublish
Résumé
Recent advances in DNA sequencing technologies have enabled significant leaps in capacity to generate large volumes of DNA sequence data, which has spurred a rapid growth in the use of bioinformatics as a means of interrogating antibody variable gene repertoires. Common tools used for annotation of antibody sequences are often limited in functionality, modularity and usability. We have developed PyIR, a Python wrapper and library for IgBLAST, which offers a minimal setup CLI and API, FASTQ support, file chunking for large sequence files, JSON and Python dictionary output, and built-in sequence filtering. PyIR offers improved processing speed over multithreaded IgBLAST (version 1.14) when spawning more than 16 processes on a single computer system. Its customizable filtering and data encapsulation allow it to be adapted to a wide range of computing environments. The API allows for IgBLAST to be used in customized bioinformatics workflows.
Sections du résumé
BACKGROUND
BACKGROUND
Recent advances in DNA sequencing technologies have enabled significant leaps in capacity to generate large volumes of DNA sequence data, which has spurred a rapid growth in the use of bioinformatics as a means of interrogating antibody variable gene repertoires. Common tools used for annotation of antibody sequences are often limited in functionality, modularity and usability.
RESULTS
RESULTS
We have developed PyIR, a Python wrapper and library for IgBLAST, which offers a minimal setup CLI and API, FASTQ support, file chunking for large sequence files, JSON and Python dictionary output, and built-in sequence filtering.
CONCLUSIONS
CONCLUSIONS
PyIR offers improved processing speed over multithreaded IgBLAST (version 1.14) when spawning more than 16 processes on a single computer system. Its customizable filtering and data encapsulation allow it to be adapted to a wide range of computing environments. The API allows for IgBLAST to be used in customized bioinformatics workflows.
Identifiants
pubmed: 32677886
doi: 10.1186/s12859-020-03649-5
pii: 10.1186/s12859-020-03649-5
pmc: PMC7364545
doi:
Substances chimiques
Immunoglobulins
0
Receptors, Antigen, T-Cell
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
314Subventions
Organisme : NLM NIH HHS
ID : T15 LM007450
Pays : United States
Organisme : NIAID NIH HHS
ID : U19 AI117905
Pays : United States
Organisme : NIAID NIH HHS
ID : HHSN272201400024C
Pays : United States
Organisme : Human Vaccines Project
ID : NA
Références
Bioinformatics. 2020 Mar 1;36(6):1731-1739
pubmed: 31873728
Methods Mol Biol. 2012;882:569-604
pubmed: 22665256
Bioinformatics. 2014 Jul 1;30(13):1930-2
pubmed: 24618469
Nat Commun. 2016 Dec 20;7:13642
pubmed: 27995928
Front Immunol. 2019 Apr 30;10:899
pubmed: 31114573
Genes Immun. 2012 Oct;13(7):523-9
pubmed: 22717702
Nature. 2019 Feb;566(7744):393-397
pubmed: 30664748
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Science. 2009 May 8;324(5928):807-10
pubmed: 19423829
Nucleic Acids Res. 2015 Jan;43(Database issue):D413-22
pubmed: 25378316
Immunol Rev. 2017 Jan;275(1):108-128
pubmed: 28133812
Proc Natl Acad Sci U S A. 2013 Apr 16;110(16):6470-5
pubmed: 23536288
Sci Rep. 2016 Apr 22;6:23901
pubmed: 27102563
Nature. 2019 Feb;566(7744):398-402
pubmed: 30760926
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W34-40
pubmed: 23671333
Front Immunol. 2019 Oct 09;10:2365
pubmed: 31649674