PxBLAT: an efficient python binding library for BLAT.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
19 Jun 2024
Historique:
received: 05 03 2024
accepted: 13 06 2024
medline: 20 6 2024
pubmed: 20 6 2024
entrez: 19 6 2024
Statut: epublish

Résumé

With the surge in genomic data driven by advancements in sequencing technologies, the demand for efficient bioinformatics tools for sequence analysis has become paramount. BLAST-like alignment tool (BLAT), a sequence alignment tool, faces limitations in performance efficiency and integration with modern programming environments, particularly Python. This study introduces PxBLAT, a Python-based framework designed to enhance the capabilities of BLAT, focusing on usability, computational efficiency, and seamless integration within the Python ecosystem. PxBLAT demonstrates significant improvements over BLAT in execution speed and data handling, as evidenced by comprehensive benchmarks conducted across various sample groups ranging from 50 to 600 samples. These experiments highlight a notable speedup, reducing execution time compared to BLAT. The framework also introduces user-friendly features such as improved server management, data conversion utilities, and shell completion, enhancing the overall user experience. Additionally, the provision of extensive documentation and comprehensive testing supports community engagement and facilitates the adoption of PxBLAT. PxBLAT stands out as a robust alternative to BLAT, offering performance and user interaction enhancements. Its development underscores the potential for modern programming languages to improve bioinformatics tools, aligning with the needs of contemporary genomic research. By providing a more efficient, user-friendly tool, PxBLAT has the potential to impact genomic data analysis workflows, supporting faster and more accurate sequence analysis in a Python environment.

Sections du résumé

BACKGROUND BACKGROUND
With the surge in genomic data driven by advancements in sequencing technologies, the demand for efficient bioinformatics tools for sequence analysis has become paramount. BLAST-like alignment tool (BLAT), a sequence alignment tool, faces limitations in performance efficiency and integration with modern programming environments, particularly Python. This study introduces PxBLAT, a Python-based framework designed to enhance the capabilities of BLAT, focusing on usability, computational efficiency, and seamless integration within the Python ecosystem.
RESULTS RESULTS
PxBLAT demonstrates significant improvements over BLAT in execution speed and data handling, as evidenced by comprehensive benchmarks conducted across various sample groups ranging from 50 to 600 samples. These experiments highlight a notable speedup, reducing execution time compared to BLAT. The framework also introduces user-friendly features such as improved server management, data conversion utilities, and shell completion, enhancing the overall user experience. Additionally, the provision of extensive documentation and comprehensive testing supports community engagement and facilitates the adoption of PxBLAT.
CONCLUSIONS CONCLUSIONS
PxBLAT stands out as a robust alternative to BLAT, offering performance and user interaction enhancements. Its development underscores the potential for modern programming languages to improve bioinformatics tools, aligning with the needs of contemporary genomic research. By providing a more efficient, user-friendly tool, PxBLAT has the potential to impact genomic data analysis workflows, supporting faster and more accurate sequence analysis in a Python environment.

Identifiants

pubmed: 38898394
doi: 10.1186/s12859-024-05844-0
pii: 10.1186/s12859-024-05844-0
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

219

Subventions

Organisme : US National Institute of General Medical Sciences
ID : R35GM142441

Informations de copyright

© 2024. The Author(s).

Références

Perkel JM. Programming: pick up Python. Nature. 2015;518(7537):125–6. https://doi.org/10.1038/518125a .
doi: 10.1038/518125a pubmed: 25653001
Putri GH, Anders S, Pyl PT, Pimanda JE, Zanini F. Analysing high-throughput sequencing data in Python with HTSeq 2.0. Bioinformatics. 2022;38(10):2943–5. https://doi.org/10.1093/bioinformatics/btac166 .
doi: 10.1093/bioinformatics/btac166 pubmed: 35561197 pmcid: 9113351
Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, Hoon MJL. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3. https://doi.org/10.1093/bioinformatics/btp163 .
doi: 10.1093/bioinformatics/btp163 pubmed: 19304878 pmcid: 2682512
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10. https://doi.org/10.1016/S0022-2836(05)80360-2 .
doi: 10.1016/S0022-2836(05)80360-2 pubmed: 2231712
Higgins DG, Sharp PM. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene. 1988;73(1):237–44. https://doi.org/10.1016/0378-1119(88)90330-7 .
doi: 10.1016/0378-1119(88)90330-7 pubmed: 3243435
Kent WJ. BLAT-The BLAST-like alignment tool. Genome Res. 2002;12(4):656–64. https://doi.org/10.1101/gr.229202 . arXiv: 1193.2250 .
doi: 10.1101/gr.229202 pubmed: 11932250 pmcid: 187518
Grant GR, Farkas MH, Pizarro AD, Lahens NF, Schug J, Brunk BP, Stoeckert CJ, Hogenesch JB, Pierce EA. Comparative analysis of RNA-seq alignment algorithms and the RNA-seq unified mapper (RUM). Bioinformatics. 2011;27(18):2518–28.
doi: 10.1093/bioinformatics/btr427 pubmed: 21775302 pmcid: 3167048
Borozan I, Watt SN, Ferretti V. Evaluation of alignment algorithms for discovery and identification of pathogens using RNA-seq. PloS ONE. 2013;8(10):76935.
doi: 10.1371/journal.pone.0076935
Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387–402.
doi: 10.1146/annurev.genom.9.081307.164359 pubmed: 18576944
Marx V. Method of the year: long-read sequencing. Nat Methods. 2023;20(1):6–11.
doi: 10.1038/s41592-022-01730-w pubmed: 36635542
Sielemann K, Pucker B, Schmidt N, Viehöver P, Weisshaar B, Heitkam T, Holtgräwe D. Complete pan-plastome sequences enable high resolution phylogenetic classification of sugar beet and closely related crop wild relatives. BMC Genomics. 2022;23(1):113.
doi: 10.1186/s12864-022-08336-8 pubmed: 35139817 pmcid: 8830136
Coates BS, Walden KK, Lata D, Vellichirammal NN, Mitchell RF, Andersson MN, McKay R, Lorenzen MD, Grubbs N, Wang Y-H, et al. A draft Diabrotica virgifera virgifera genome: insights into control and host plant adaption by a major maize pest insect. BMC Genomics. 2023;24(1):19.
doi: 10.1186/s12864-022-08990-y pubmed: 36639634 pmcid: 9840275
Carbonnel S, Falquet L, Hazak O. Deeper genomic insights into tomato CLE genes repertoire identify new active peptides. BMC Genomics. 2022;23(1):756.
doi: 10.1186/s12864-022-08980-0 pubmed: 36396987 pmcid: 9670457
Dressler L, Bortolomeazzi M, Keddar MR, Misetic H, Sartini G, Acha-Sagredo A, Montorsi L, Wijewardhane N, Repana D, Nulsen J, et al. Comparative assessment of genes driving cancer and somatic evolution in non-cancer tissues: an update of the network of cancer genes (NCG) resource. Genome Biol. 2022;23(1):35.
doi: 10.1186/s13059-022-02607-z pubmed: 35078504 pmcid: 8790917
Zhu Y, Gomez JA, Laufer BI, Mordaunt CE, Mouat JS, Soto DC, Dennis MY, Benke KS, Bakulski KM, Dou J, et al. Placental methylome reveals a 22q13. 33 brain regulatory gene locus associated with autism. Genome Biol. 2022;23(1):46.
doi: 10.1186/s13059-022-02613-1 pubmed: 35168652 pmcid: 8848662
Wang M, Kong L. pblat: a multithread blat algorithm speeding up aligning sequences to genomes. BMC Bioinform. 2019;20(1):1–4.
Jakob W, Rhinelander J, Moldovan D. pybind11 – Seamless operability between C++11 and Python. 2016; https://github.com/pybind/pybind11 .

Auteurs

Yangyang Li (Y)

Department of Urology, Northwestern University Feinberg School of Medicine, 303 E Superior St, Chicago, IL, 60611, USA.

Rendong Yang (R)

Department of Urology, Northwestern University Feinberg School of Medicine, 303 E Superior St, Chicago, IL, 60611, USA. rendong.yang@northwestern.edu.
Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, 675 N St Clair St, Chicago, IL, 60611, USA. rendong.yang@northwestern.edu.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Cephalometry Humans Anatomic Landmarks Software Internet

Classifications MeSH