Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements.

alignment-free complexity data compression genome comparison genome duplication genomic rearrangement high-throughput sequencing information theory probabilistic-algorithmic model visualization

Journal

GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872

Informations de publication

Date de publication:
01 05 2020
Historique:
received: 11 01 2020
revised: 06 04 2020
accepted: 20 04 2020
entrez: 21 5 2020
pubmed: 21 5 2020
medline: 5 10 2021
Statut: ppublish

Résumé

The development of high-throughput sequencing technologies and, as its result, the production of huge volumes of genomic data, has accelerated biological and medical research and discovery. Study on genomic rearrangements is crucial owing to their role in chromosomal evolution, genetic disorders, and cancer. We present Smash++, an alignment-free and memory-efficient tool to find and visualize small- and large-scale genomic rearrangements between 2 DNA sequences. This computational solution extracts information contents of the 2 sequences, exploiting a data compression technique to find rearrangements. We also present Smash++ visualizer, a tool that allows the visualization of the detected rearrangements along with their self- and relative complexity, by generating an SVG (Scalable Vector Graphics) image. Tested on several synthetic and real DNA sequences from bacteria, fungi, Aves, and Mammalia, the proposed tool was able to accurately find genomic rearrangements. The detected regions were in accordance with previous studies, which took alignment-based approaches or performed FISH (fluorescence in situ hybridization) analysis. The maximum peak memory usage among all experiments was ∼1 GB, which makes Smash++ feasible to run on present-day standard computers.

Sections du résumé

BACKGROUND
The development of high-throughput sequencing technologies and, as its result, the production of huge volumes of genomic data, has accelerated biological and medical research and discovery. Study on genomic rearrangements is crucial owing to their role in chromosomal evolution, genetic disorders, and cancer.
RESULTS
We present Smash++, an alignment-free and memory-efficient tool to find and visualize small- and large-scale genomic rearrangements between 2 DNA sequences. This computational solution extracts information contents of the 2 sequences, exploiting a data compression technique to find rearrangements. We also present Smash++ visualizer, a tool that allows the visualization of the detected rearrangements along with their self- and relative complexity, by generating an SVG (Scalable Vector Graphics) image.
CONCLUSIONS
Tested on several synthetic and real DNA sequences from bacteria, fungi, Aves, and Mammalia, the proposed tool was able to accurately find genomic rearrangements. The detected regions were in accordance with previous studies, which took alignment-based approaches or performed FISH (fluorescence in situ hybridization) analysis. The maximum peak memory usage among all experiments was ∼1 GB, which makes Smash++ feasible to run on present-day standard computers.

Identifiants

pubmed: 32432328
pii: 5841055
doi: 10.1093/gigascience/giaa048
pmc: PMC7238676
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press.

Références

Genome Res. 2003 Jan;13(1):37-45
pubmed: 12529304
PLoS Biol. 2010 Sep 07;8(9):
pubmed: 20838655
Nucleic Acids Res. 2018 Feb 28;46(4):1661-1673
pubmed: 29272440
Nucleic Acids Res. 2016 Jul 8;44(W1):W35-40
pubmed: 27154270
PLoS One. 2010 Jun 25;5(6):e11147
pubmed: 20593022
Cytogenet Cell Genet. 2000;91(1-4):81-4
pubmed: 11173835
Bioinformatics. 2010 Oct 15;26(20):2509-16
pubmed: 20736338
Appl Clin Genet. 2010 Dec 10;3:159-74
pubmed: 23776360
Mol Biol Evol. 2017 Jul 1;34(7):1812-1819
pubmed: 28387841
Sci Rep. 2015 May 18;5:10203
pubmed: 25984837
Am J Clin Pathol. 2006 Feb;125(2):267-72
pubmed: 16393685
BMC Genomics. 2011 Sep 09;12:447
pubmed: 21906286
Bioinformatics. 2014 Jan 1;30(1):117-8
pubmed: 24132931
Genome Biol. 2017 Oct 3;18(1):186
pubmed: 28974235
Mol Cell. 2015 May 21;58(4):586-97
pubmed: 26000844
Brief Bioinform. 2014 May;15(3):376-89
pubmed: 24058049
Nat Methods. 2010 Mar;7(3 Suppl):S5-S15
pubmed: 20195257
PLoS Genet. 2006 Mar;2(3):e32
pubmed: 16532063
Bioinformatics. 2003;19 Suppl 1:i54-62
pubmed: 12855437
Hum Mutat. 2014 Jan;35(1):1-14
pubmed: 24115352
BMC Genomics. 2008 May 01;9:204
pubmed: 18452608
Cold Spring Harb Perspect Med. 2019 Jan 2;9(1):
pubmed: 29959131
PeerJ. 2018 Jun 4;6:e4958
pubmed: 29888139
Bioinformatics. 2019 Jan 1;35(1):146-148
pubmed: 30020420
FEMS Yeast Res. 2014 Mar;14(2):281-8
pubmed: 24119009
Methods Mol Biol. 2018;1704:261-289
pubmed: 29277869
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Phytopathology. 2019 May;109(5):716-725
pubmed: 30801236
Genome Biol. 2019 Jul 25;20(1):144
pubmed: 31345254
Nat Genet. 1992 Jun;1(3):171-5
pubmed: 1303230
BMC Bioinformatics. 2007 Mar 08;8:82
pubmed: 17343765
Interdiscip Sci. 2019 Mar;11(1):68-76
pubmed: 30721401

Auteurs

Morteza Hosseini (M)

IEETA/DETI, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal.

Diogo Pratas (D)

IEETA/DETI, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal.
Department of Virology, University of Helsinki, Haartmaninkatu 3, 00014 Helsinki, Finland.

Burkhard Morgenstern (B)

Department of Bioinformatics, University of Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany.
Göttingen Center of Molecular Biosciences (GZMB), Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.

Armando J Pinho (AJ)

IEETA/DETI, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH