Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs.
Gene breakpoints
Genome
Mitochondria
de-Bruijn graph
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
05 Jun 2023
05 Jun 2023
Historique:
received:
20
03
2023
accepted:
30
05
2023
medline:
7
6
2023
pubmed:
6
6
2023
entrez:
5
6
2023
Statut:
epublish
Résumé
Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task. This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI 's ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI 's applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach. The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps.
Sections du résumé
BACKGROUND
BACKGROUND
Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task.
RESULTS
RESULTS
This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI 's ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI 's applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach.
CONCLUSION
CONCLUSIONS
The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps.
Identifiants
pubmed: 37277700
doi: 10.1186/s12859-023-05371-4
pii: 10.1186/s12859-023-05371-4
pmc: PMC10243065
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
235Subventions
Organisme : Deutsche Forschungsgemeinschaft
ID : 21210538
Organisme : Universität Leipzig
ID : 21210538
Informations de copyright
© 2023. The Author(s).
Références
Bioinformatics. 2012 Sep 15;28(18):i333-i339
pubmed: 22962449
Mol Phylogenet Evol. 2013 Nov;69(2):352-64
pubmed: 23684911
Nucleic Acids Res. 2012 Apr;40(7):2833-45
pubmed: 22139921
J Comput Biol. 1998 Fall;5(3):555-70
pubmed: 9773350
BMC Genomics. 2006 Jul 19;7:182
pubmed: 16854241
Heredity (Edinb). 2008 Oct;101(4):301-20
pubmed: 18612321
Nat Rev Genet. 2022 May;23(5):298-314
pubmed: 34880424
Genome Res. 2004 Jul;14(7):1394-403
pubmed: 15231754
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
J Mol Evol. 2006 Sep;63(3):375-92
pubmed: 16838214
BMC Genomics. 2014;15 Suppl 6:S6
pubmed: 25572416
Cell Genom. 2022 Mar 22;2(4):100112
pubmed: 36776527
PLoS One. 2010 Jun 25;5(6):e11147
pubmed: 20593022
Mol Phylogenet Evol. 2013 Nov;69(2):313-9
pubmed: 22982435
BMC Bioinformatics. 2015 Aug 11;16:250
pubmed: 26260162
PLoS One. 2013 Dec 16;8(12):e83356
pubmed: 24358278
Bioinformatics. 2012 Oct 15;28(20):2576-83
pubmed: 22851530
IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):487-98
pubmed: 26357234
Mol Biol Evol. 1997 Jan;14(1):91-104
pubmed: 9000757
Elife. 2018 Jun 13;7:
pubmed: 29897334
Nat Biotechnol. 2022 Jun;40(6):896-905
pubmed: 35058622
Cell Stem Cell. 2022 Mar 3;29(3):472-486.e7
pubmed: 35176222
Genome Res. 2008 May;18(5):821-9
pubmed: 18349386
Genome Res. 2017 Dec;27(12):2050-2060
pubmed: 29097403
iScience. 2019 Aug 30;18:28-36
pubmed: 31377530
Bioinformatics. 2020 May 1;36(9):2725-2730
pubmed: 31985791
J Mol Evol. 1999 Aug;49(2):193-203
pubmed: 10441671
Mol Biol Evol. 2003 Oct;20(10):1612-9
pubmed: 12832626
Bioinformatics. 2013 Dec 15;29(24):3143-50
pubmed: 24072733
Brief Bioinform. 2015 Sep;16(5):852-64
pubmed: 25504367
Comput Appl Biosci. 1997 Jun;13(3):235-8
pubmed: 9183526
Nat Genet. 2012 Jan 08;44(2):226-32
pubmed: 22231483
Algorithms Mol Biol. 2017 Aug 23;12:22
pubmed: 28852417