Sequence Flow: interactive web application for visualizing partial order alignments.


Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
16 Oct 2024
Historique:
received: 09 07 2024
accepted: 09 10 2024
medline: 17 10 2024
pubmed: 17 10 2024
entrez: 16 10 2024
Statut: epublish

Résumé

Multiple sequence alignment (MSA) has proven extremely useful in computational biology, especially in inferring evolutionary relationships via phylogenetic analysis and providing insight into protein structure and function. An alternative to the standard MSA model is partial order alignment (POA), in which aligned sequences are represented as paths in a graph rather than rows in a matrix. While the POA model has proven useful in several applications (e.g. sequencing reads assembly and pangenome structure exploration), we lack efficient visualization tools that could highlight its advantages. We propose Sequence Flow - a web application designed to address the above problem. Sequence Flow presents the POA as a Sankey diagram, a kind of graph visualisation typically used for graphs representing flowcharts. Sequence Flow enables interactive alignment exploration, including fragment selection, highlighting a selected group of sequences, modification of the position of graph nodes, structure simplification etc. After adjustment, the visualization can be saved as a high-quality graphic file. Thanks to the use of SanKEY.js - a JavaScript library for creating Sankey diagrams, designed specifically to visualize POAs, Sequence Flow provides satisfactory performance even with large alignments. We provide Sankey diagram-based POA visualization tools for both end users (Sequence Flow) and bioinformatic software developers (SanKEY.js). Sequence Flow webservice is available at https://sequenceflow.mimuw.edu.pl/ . The source code for SanKEY.js is available at https://github.com/Krzysiekzd/SanKEY.js and for Sequence Flow at https://github.com/Krzysiekzd/SequenceFlow .

Sections du résumé

BACKGROUND BACKGROUND
Multiple sequence alignment (MSA) has proven extremely useful in computational biology, especially in inferring evolutionary relationships via phylogenetic analysis and providing insight into protein structure and function. An alternative to the standard MSA model is partial order alignment (POA), in which aligned sequences are represented as paths in a graph rather than rows in a matrix. While the POA model has proven useful in several applications (e.g. sequencing reads assembly and pangenome structure exploration), we lack efficient visualization tools that could highlight its advantages.
RESULTS RESULTS
We propose Sequence Flow - a web application designed to address the above problem. Sequence Flow presents the POA as a Sankey diagram, a kind of graph visualisation typically used for graphs representing flowcharts. Sequence Flow enables interactive alignment exploration, including fragment selection, highlighting a selected group of sequences, modification of the position of graph nodes, structure simplification etc. After adjustment, the visualization can be saved as a high-quality graphic file. Thanks to the use of SanKEY.js - a JavaScript library for creating Sankey diagrams, designed specifically to visualize POAs, Sequence Flow provides satisfactory performance even with large alignments.
CONCLUSIONS CONCLUSIONS
We provide Sankey diagram-based POA visualization tools for both end users (Sequence Flow) and bioinformatic software developers (SanKEY.js). Sequence Flow webservice is available at https://sequenceflow.mimuw.edu.pl/ . The source code for SanKEY.js is available at https://github.com/Krzysiekzd/SanKEY.js and for Sequence Flow at https://github.com/Krzysiekzd/SequenceFlow .

Identifiants

pubmed: 39415087
doi: 10.1186/s12864-024-10886-y
pii: 10.1186/s12864-024-10886-y
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

973

Subventions

Organisme : Polish Ministry of Science and Higher Education
ID : 01/IDUB/2019/04
Organisme : Polish Ministry of Science and Higher Education
ID : 01/IDUB/2019/04
Organisme : Polish Ministry of Science and Higher Education
ID : 01/IDUB/2019/04

Informations de copyright

© 2024. The Author(s).

Références

Lee C, Grasso C, Sharlow MF. Multiple sequence alignment using partial order graphs. Bioinformatics. 2002;18(3):452–464. https://doi.org/10.1093/bioinformatics/18.3.452 .
Grasso C, Quist M, Ke K, Lee C. POAVIZ: a partial order multiple sequence alignment visualizer. Bioinformatics. 2003;19(11):1446–8. https://doi.org/10.1093/bioinformatics/btg175 .
doi: 10.1093/bioinformatics/btg175 pubmed: 12874062
Dursi J. poapy - a simple partial order alignment implementation. 2015. https://github.com/ljdursi/poapy . Accessed 15 Oct 2024.
Dziadkiewicz P, Dojer N. Getting insight into the pan-genome structure with PangTree. BMC Genomics. 2020;21:1–13. https://doi.org/10.1186/s12864-020-6610-4 .
doi: 10.1186/s12864-020-6610-4
Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics. 2015;31(20):3350–2. https://doi.org/10.1093/bioinformatics/btv383 .
doi: 10.1093/bioinformatics/btv383 pubmed: 26099265 pmcid: 4595904
Gonnella G, Niehus N, Kurtz S. GfaViz: flexible and interactive visualization of GFA sequence graphs. Bioinformatics. 2018;35(16):2853–5. https://doi.org/10.1093/bioinformatics/bty1046 .
doi: 10.1093/bioinformatics/bty1046
Beyer W, Novak AM, Hickey G, Chan J, Tan V, Paten B, et al. Sequence tube maps: making graph genomes intuitive to commuters. Bioinformatics. 2019;35(24):5318–20. https://doi.org/10.1093/bioinformatics/btz597 .
doi: 10.1093/bioinformatics/btz597 pubmed: 31368484 pmcid: 6954646
Yokoyama TT, Sakamoto Y, Seki M, Suzuki Y, Kasahara M. MoMI-G: modular multi-scale integrated genome graph browser. BMC Bioinformatics. 2019;20(1):548:1–548:14. https://doi.org/10.1186/S12859-019-3145-2 .
Guarracino A, Heumos S, Nahnsen S, Prins P, Garrison E. ODGI: understanding pangenome graphs. Bioinformatics. 2022;38(13):3319–26. https://doi.org/10.1093/bioinformatics/btac308 .
doi: 10.1093/bioinformatics/btac308 pubmed: 35552372 pmcid: 9237687
Kennedy ABW, Sankey HR. The Thermal Efficiency Of Steam Engines. Minutes of the Proceedings of the Institution of Civil Engineers. 1898;134:278–312.
doi: 10.1680/imotp.1898.19100
Schmidt M. The Sankey Diagram in Energy and Material Flow Management - Part II: Methodology and Current Applications. J Ind Ecol. 2008;12(2):173–85. https://doi.org/10.1111/j.1530-9290.2008.00015.x .
doi: 10.1111/j.1530-9290.2008.00015.x
Icay K, Liu C, Hautaniemi S. Dynamic visualization of multi-level molecular data: The Director package in R. Comput Methods Prog Biomed. 2018;153:129–36. https://doi.org/10.1016/j.cmpb.2017.10.013 .
doi: 10.1016/j.cmpb.2017.10.013
Platzer A, Polzin J, Rembart K, Han PP, Rauer D, Nussbaumer T. BioSankey: Visualization of Microbial Communities Over Time. J Integr Bioinforma. 2018;15(4):20170063. https://doi.org/10.1515/jib-2017-0063 .
doi: 10.1515/jib-2017-0063
Bahr A, Thompson JD, Thierry J, Poch O. BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res. 2001;29(1):323–6. https://doi.org/10.1093/nar/29.1.323 .
doi: 10.1093/nar/29.1.323 pubmed: 11125126 pmcid: 29792
Plotly. Plotly. https://plotly.com/ . Accessed 15 Oct 2024.
Garrett RH, Grisham CM. Biochemistry. Cengage Learning; 2008.
Flask. Flask. https://flask.palletsprojects.com/ . Accessed 15 Oct 2024.
Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3. https://doi.org/10.1093/bioinformatics/btp163 .
doi: 10.1093/bioinformatics/btp163 pubmed: 19304878 pmcid: 2682512

Auteurs

Krzysztof Zdąbłasz (K)

Institute of Informatics, University of Warsaw, Banacha 2, Warszawa, 02-097, Poland.

Anna Lisiecka (A)

Institute of Informatics, University of Warsaw, Banacha 2, Warszawa, 02-097, Poland.

Norbert Dojer (N)

Institute of Informatics, University of Warsaw, Banacha 2, Warszawa, 02-097, Poland. dojer@mimuw.edu.pl.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Humans Adult Male Female Video Games

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Cephalometry Humans Anatomic Landmarks Software Internet

Classifications MeSH