Optimal gap-affine alignment in O(s) space.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
03 02 2023
Historique:
received: 17 08 2022
revised: 02 01 2023
pubmed: 8 2 2023
medline: 25 2 2023
entrez: 7 2 2023
Statut: ppublish

Résumé

Pairwise sequence alignment remains a fundamental problem in computational biology and bioinformatics. Recent advances in genomics and sequencing technologies demand faster and scalable algorithms that can cope with the ever-increasing sequence lengths. Classical pairwise alignment algorithms based on dynamic programming are strongly limited by quadratic requirements in time and memory. The recently proposed wavefront alignment algorithm (WFA) introduced an efficient algorithm to perform exact gap-affine alignment in O(ns) time, where s is the optimal score and n is the sequence length. Notwithstanding these bounds, WFA's O(s2) memory requirements become computationally impractical for genome-scale alignments, leading to a need for further improvement. In this article, we present the bidirectional WFA algorithm, the first gap-affine algorithm capable of computing optimal alignments in O(s) memory while retaining WFA's time complexity of O(ns). As a result, this work improves the lowest known memory bound O(n) to compute gap-affine alignments. In practice, our implementation never requires more than a few hundred MBs aligning noisy Oxford Nanopore Technologies reads up to 1 Mbp long while maintaining competitive execution times. All code is publicly available at https://github.com/smarco/BiWFA-paper. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 36749013
pii: 7030690
doi: 10.1093/bioinformatics/btad074
pmc: PMC9940620
pii:
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : European Union
Organisme : Ministerio de Ciencia e Innovacion
Organisme : NHGRI NIH HHS
ID : R01 HG010485
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG010961
Pays : United States
Organisme : NIH HHS
ID : OT2 OD026682
Pays : United States
Organisme : NHLBI NIH HHS
ID : OT3 HL142481
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG011853
Pays : United States

Informations de copyright

© The Author(s) 2023. Published by Oxford University Press.

Auteurs

Santiago Marco-Sola (S)

Computer Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain.
Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona 08193, Spain.

Jordan M Eizenga (JM)

Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

Andrea Guarracino (A)

Genomics Research Centre, Human Technopole, Milan 20157, Italy.
Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA.

Benedict Paten (B)

Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

Erik Garrison (E)

Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA.

Miquel Moreto (M)

Computer Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain.
Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona 08034, Spain.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH