FMAlign2: a novel fast multiple nucleotide sequence alignment method for ultralong datasets.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
10 Jan 2024
Historique:
received: 29 07 2023
revised: 27 12 2023
accepted: 09 01 2024
medline: 11 1 2024
pubmed: 11 1 2024
entrez: 10 1 2024
Statut: aheadofprint

Résumé

In bioinformatics, multiple sequence alignment (MSA) is a crucial task. However, conventional methods often struggle with aligning ultralong sequences. To address this issue, researchers have designed MSA methods rooted in a vertical division strategy, which segments sequence data for parallel alignment. A prime example of this approach is FMAlign, which utilizes the FM-index to extract common seeds and segment the sequences accordingly. FMAlign2 leverages the suffix array to identify maximal exact matches, redefining the approach of FMAlign from searching for global chains to partial chains. By employing a vertical division strategy, large-scale problem is deconstructed into manageable tasks, enabling parallel execution of subMSA. Furthermore, sequence-profile alignment and refinement are incorporated to concatenate subsets, yielding the final result seamlessly. Compared to FMAlign, FMAlign2 markedly augments the segmentation of sequences and significantly reduces the time while maintaining accuracy, especially on ultralong datasets. Importantly, FMAlign2 enhances existing MSA methods by conferring the capability to handle sequences reaching billions in length within an acceptable time frame. Source code and datasets are available at https://github.com/malabz/FMAlign2 and https://zenodo.org/records/10435770. pingluzhang@outlook.com. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 38200554
pii: 7515251
doi: 10.1093/bioinformatics/btae014
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press.

Auteurs

Pinglu Zhang (P)

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China.
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China.

Huan Liu (H)

School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China.

Yanming Wei (Y)

School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China.

Yixiao Zhai (Y)

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China.
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China.

Qinzhong Tian (Q)

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China.
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China.

Quan Zou (Q)

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China.
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China.

Classifications MeSH