Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements.
alignment-free
complexity
data compression
genome comparison
genome duplication
genomic rearrangement
high-throughput sequencing
information theory
probabilistic-algorithmic model
visualization
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
01 05 2020
01 05 2020
Historique:
received:
11
01
2020
revised:
06
04
2020
accepted:
20
04
2020
entrez:
21
5
2020
pubmed:
21
5
2020
medline:
5
10
2021
Statut:
ppublish
Résumé
The development of high-throughput sequencing technologies and, as its result, the production of huge volumes of genomic data, has accelerated biological and medical research and discovery. Study on genomic rearrangements is crucial owing to their role in chromosomal evolution, genetic disorders, and cancer. We present Smash++, an alignment-free and memory-efficient tool to find and visualize small- and large-scale genomic rearrangements between 2 DNA sequences. This computational solution extracts information contents of the 2 sequences, exploiting a data compression technique to find rearrangements. We also present Smash++ visualizer, a tool that allows the visualization of the detected rearrangements along with their self- and relative complexity, by generating an SVG (Scalable Vector Graphics) image. Tested on several synthetic and real DNA sequences from bacteria, fungi, Aves, and Mammalia, the proposed tool was able to accurately find genomic rearrangements. The detected regions were in accordance with previous studies, which took alignment-based approaches or performed FISH (fluorescence in situ hybridization) analysis. The maximum peak memory usage among all experiments was ∼1 GB, which makes Smash++ feasible to run on present-day standard computers.
Sections du résumé
BACKGROUND
The development of high-throughput sequencing technologies and, as its result, the production of huge volumes of genomic data, has accelerated biological and medical research and discovery. Study on genomic rearrangements is crucial owing to their role in chromosomal evolution, genetic disorders, and cancer.
RESULTS
We present Smash++, an alignment-free and memory-efficient tool to find and visualize small- and large-scale genomic rearrangements between 2 DNA sequences. This computational solution extracts information contents of the 2 sequences, exploiting a data compression technique to find rearrangements. We also present Smash++ visualizer, a tool that allows the visualization of the detected rearrangements along with their self- and relative complexity, by generating an SVG (Scalable Vector Graphics) image.
CONCLUSIONS
Tested on several synthetic and real DNA sequences from bacteria, fungi, Aves, and Mammalia, the proposed tool was able to accurately find genomic rearrangements. The detected regions were in accordance with previous studies, which took alignment-based approaches or performed FISH (fluorescence in situ hybridization) analysis. The maximum peak memory usage among all experiments was ∼1 GB, which makes Smash++ feasible to run on present-day standard computers.
Identifiants
pubmed: 32432328
pii: 5841055
doi: 10.1093/gigascience/giaa048
pmc: PMC7238676
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press.
Références
Genome Res. 2003 Jan;13(1):37-45
pubmed: 12529304
PLoS Biol. 2010 Sep 07;8(9):
pubmed: 20838655
Nucleic Acids Res. 2018 Feb 28;46(4):1661-1673
pubmed: 29272440
Nucleic Acids Res. 2016 Jul 8;44(W1):W35-40
pubmed: 27154270
PLoS One. 2010 Jun 25;5(6):e11147
pubmed: 20593022
Cytogenet Cell Genet. 2000;91(1-4):81-4
pubmed: 11173835
Bioinformatics. 2010 Oct 15;26(20):2509-16
pubmed: 20736338
Appl Clin Genet. 2010 Dec 10;3:159-74
pubmed: 23776360
Mol Biol Evol. 2017 Jul 1;34(7):1812-1819
pubmed: 28387841
Sci Rep. 2015 May 18;5:10203
pubmed: 25984837
Am J Clin Pathol. 2006 Feb;125(2):267-72
pubmed: 16393685
BMC Genomics. 2011 Sep 09;12:447
pubmed: 21906286
Bioinformatics. 2014 Jan 1;30(1):117-8
pubmed: 24132931
Genome Biol. 2017 Oct 3;18(1):186
pubmed: 28974235
Mol Cell. 2015 May 21;58(4):586-97
pubmed: 26000844
Brief Bioinform. 2014 May;15(3):376-89
pubmed: 24058049
Nat Methods. 2010 Mar;7(3 Suppl):S5-S15
pubmed: 20195257
PLoS Genet. 2006 Mar;2(3):e32
pubmed: 16532063
Bioinformatics. 2003;19 Suppl 1:i54-62
pubmed: 12855437
Hum Mutat. 2014 Jan;35(1):1-14
pubmed: 24115352
BMC Genomics. 2008 May 01;9:204
pubmed: 18452608
Cold Spring Harb Perspect Med. 2019 Jan 2;9(1):
pubmed: 29959131
PeerJ. 2018 Jun 4;6:e4958
pubmed: 29888139
Bioinformatics. 2019 Jan 1;35(1):146-148
pubmed: 30020420
FEMS Yeast Res. 2014 Mar;14(2):281-8
pubmed: 24119009
Methods Mol Biol. 2018;1704:261-289
pubmed: 29277869
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Phytopathology. 2019 May;109(5):716-725
pubmed: 30801236
Genome Biol. 2019 Jul 25;20(1):144
pubmed: 31345254
Nat Genet. 1992 Jun;1(3):171-5
pubmed: 1303230
BMC Bioinformatics. 2007 Mar 08;8:82
pubmed: 17343765
Interdiscip Sci. 2019 Mar;11(1):68-76
pubmed: 30721401