Sequence Flow: interactive web application for visualizing partial order alignments.
Multiple sequence alignment
Partial order alignment
Sankey diagram
Webserver
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
16 Oct 2024
16 Oct 2024
Historique:
received:
09
07
2024
accepted:
09
10
2024
medline:
17
10
2024
pubmed:
17
10
2024
entrez:
16
10
2024
Statut:
epublish
Résumé
Multiple sequence alignment (MSA) has proven extremely useful in computational biology, especially in inferring evolutionary relationships via phylogenetic analysis and providing insight into protein structure and function. An alternative to the standard MSA model is partial order alignment (POA), in which aligned sequences are represented as paths in a graph rather than rows in a matrix. While the POA model has proven useful in several applications (e.g. sequencing reads assembly and pangenome structure exploration), we lack efficient visualization tools that could highlight its advantages. We propose Sequence Flow - a web application designed to address the above problem. Sequence Flow presents the POA as a Sankey diagram, a kind of graph visualisation typically used for graphs representing flowcharts. Sequence Flow enables interactive alignment exploration, including fragment selection, highlighting a selected group of sequences, modification of the position of graph nodes, structure simplification etc. After adjustment, the visualization can be saved as a high-quality graphic file. Thanks to the use of SanKEY.js - a JavaScript library for creating Sankey diagrams, designed specifically to visualize POAs, Sequence Flow provides satisfactory performance even with large alignments. We provide Sankey diagram-based POA visualization tools for both end users (Sequence Flow) and bioinformatic software developers (SanKEY.js). Sequence Flow webservice is available at https://sequenceflow.mimuw.edu.pl/ . The source code for SanKEY.js is available at https://github.com/Krzysiekzd/SanKEY.js and for Sequence Flow at https://github.com/Krzysiekzd/SequenceFlow .
Sections du résumé
BACKGROUND
BACKGROUND
Multiple sequence alignment (MSA) has proven extremely useful in computational biology, especially in inferring evolutionary relationships via phylogenetic analysis and providing insight into protein structure and function. An alternative to the standard MSA model is partial order alignment (POA), in which aligned sequences are represented as paths in a graph rather than rows in a matrix. While the POA model has proven useful in several applications (e.g. sequencing reads assembly and pangenome structure exploration), we lack efficient visualization tools that could highlight its advantages.
RESULTS
RESULTS
We propose Sequence Flow - a web application designed to address the above problem. Sequence Flow presents the POA as a Sankey diagram, a kind of graph visualisation typically used for graphs representing flowcharts. Sequence Flow enables interactive alignment exploration, including fragment selection, highlighting a selected group of sequences, modification of the position of graph nodes, structure simplification etc. After adjustment, the visualization can be saved as a high-quality graphic file. Thanks to the use of SanKEY.js - a JavaScript library for creating Sankey diagrams, designed specifically to visualize POAs, Sequence Flow provides satisfactory performance even with large alignments.
CONCLUSIONS
CONCLUSIONS
We provide Sankey diagram-based POA visualization tools for both end users (Sequence Flow) and bioinformatic software developers (SanKEY.js). Sequence Flow webservice is available at https://sequenceflow.mimuw.edu.pl/ . The source code for SanKEY.js is available at https://github.com/Krzysiekzd/SanKEY.js and for Sequence Flow at https://github.com/Krzysiekzd/SequenceFlow .
Identifiants
pubmed: 39415087
doi: 10.1186/s12864-024-10886-y
pii: 10.1186/s12864-024-10886-y
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
973Subventions
Organisme : Polish Ministry of Science and Higher Education
ID : 01/IDUB/2019/04
Organisme : Polish Ministry of Science and Higher Education
ID : 01/IDUB/2019/04
Organisme : Polish Ministry of Science and Higher Education
ID : 01/IDUB/2019/04
Informations de copyright
© 2024. The Author(s).
Références
Lee C, Grasso C, Sharlow MF. Multiple sequence alignment using partial order graphs. Bioinformatics. 2002;18(3):452–464. https://doi.org/10.1093/bioinformatics/18.3.452 .
Grasso C, Quist M, Ke K, Lee C. POAVIZ: a partial order multiple sequence alignment visualizer. Bioinformatics. 2003;19(11):1446–8. https://doi.org/10.1093/bioinformatics/btg175 .
doi: 10.1093/bioinformatics/btg175
pubmed: 12874062
Dursi J. poapy - a simple partial order alignment implementation. 2015. https://github.com/ljdursi/poapy . Accessed 15 Oct 2024.
Dziadkiewicz P, Dojer N. Getting insight into the pan-genome structure with PangTree. BMC Genomics. 2020;21:1–13. https://doi.org/10.1186/s12864-020-6610-4 .
doi: 10.1186/s12864-020-6610-4
Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics. 2015;31(20):3350–2. https://doi.org/10.1093/bioinformatics/btv383 .
doi: 10.1093/bioinformatics/btv383
pubmed: 26099265
pmcid: 4595904
Gonnella G, Niehus N, Kurtz S. GfaViz: flexible and interactive visualization of GFA sequence graphs. Bioinformatics. 2018;35(16):2853–5. https://doi.org/10.1093/bioinformatics/bty1046 .
doi: 10.1093/bioinformatics/bty1046
Beyer W, Novak AM, Hickey G, Chan J, Tan V, Paten B, et al. Sequence tube maps: making graph genomes intuitive to commuters. Bioinformatics. 2019;35(24):5318–20. https://doi.org/10.1093/bioinformatics/btz597 .
doi: 10.1093/bioinformatics/btz597
pubmed: 31368484
pmcid: 6954646
Yokoyama TT, Sakamoto Y, Seki M, Suzuki Y, Kasahara M. MoMI-G: modular multi-scale integrated genome graph browser. BMC Bioinformatics. 2019;20(1):548:1–548:14. https://doi.org/10.1186/S12859-019-3145-2 .
Guarracino A, Heumos S, Nahnsen S, Prins P, Garrison E. ODGI: understanding pangenome graphs. Bioinformatics. 2022;38(13):3319–26. https://doi.org/10.1093/bioinformatics/btac308 .
doi: 10.1093/bioinformatics/btac308
pubmed: 35552372
pmcid: 9237687
Kennedy ABW, Sankey HR. The Thermal Efficiency Of Steam Engines. Minutes of the Proceedings of the Institution of Civil Engineers. 1898;134:278–312.
doi: 10.1680/imotp.1898.19100
Schmidt M. The Sankey Diagram in Energy and Material Flow Management - Part II: Methodology and Current Applications. J Ind Ecol. 2008;12(2):173–85. https://doi.org/10.1111/j.1530-9290.2008.00015.x .
doi: 10.1111/j.1530-9290.2008.00015.x
Icay K, Liu C, Hautaniemi S. Dynamic visualization of multi-level molecular data: The Director package in R. Comput Methods Prog Biomed. 2018;153:129–36. https://doi.org/10.1016/j.cmpb.2017.10.013 .
doi: 10.1016/j.cmpb.2017.10.013
Platzer A, Polzin J, Rembart K, Han PP, Rauer D, Nussbaumer T. BioSankey: Visualization of Microbial Communities Over Time. J Integr Bioinforma. 2018;15(4):20170063. https://doi.org/10.1515/jib-2017-0063 .
doi: 10.1515/jib-2017-0063
Bahr A, Thompson JD, Thierry J, Poch O. BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Res. 2001;29(1):323–6. https://doi.org/10.1093/nar/29.1.323 .
doi: 10.1093/nar/29.1.323
pubmed: 11125126
pmcid: 29792
Plotly. Plotly. https://plotly.com/ . Accessed 15 Oct 2024.
Garrett RH, Grisham CM. Biochemistry. Cengage Learning; 2008.
Flask. Flask. https://flask.palletsprojects.com/ . Accessed 15 Oct 2024.
Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3. https://doi.org/10.1093/bioinformatics/btp163 .
doi: 10.1093/bioinformatics/btp163
pubmed: 19304878
pmcid: 2682512