Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs.

Software Sequence Analysis, DNA / methods Algorithms Molecular Sequence Annotation Genome, Mitochondrial High-Throughput Nucleotide Sequencing / methods

Gene breakpoints Genome Mitochondria de-Bruijn graph

Journal

BMC bioinformatics

ISSN: 1471-2105

Titre abrégé: BMC Bioinformatics

Pays: England

ID NLM: 100965194

Informations de publication

Date de publication:
05 Jun 2023

Historique:

received: 20 03 2023

accepted: 30 05 2023

medline: 7 6 2023

pubmed: 6 6 2023

entrez: 5 6 2023

Statut: epublish

Résumé

Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task. This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI 's ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI 's applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach. The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI 's ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI 's applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach.

CONCLUSION CONCLUSIONS

The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps.

Identifiants

DOI: 10.1186/s12859-023-05371-4 PMID: 37277700 PMC: PMC10243065

pubmed: 37277700

doi: 10.1186/s12859-023-05371-4

pii: 10.1186/s12859-023-05371-4

pmc: PMC10243065

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

235

Subventions

Organisme : Deutsche Forschungsgemeinschaft

ID : 21210538

Organisme : Universität Leipzig

ID : 21210538

Informations de copyright

Références

Bioinformatics. 2012 Sep 15;28(18):i333-i339

pubmed: 22962449

Mol Phylogenet Evol. 2013 Nov;69(2):352-64

pubmed: 23684911

Nucleic Acids Res. 2012 Apr;40(7):2833-45

pubmed: 22139921

J Comput Biol. 1998 Fall;5(3):555-70

pubmed: 9773350

BMC Genomics. 2006 Jul 19;7:182

pubmed: 16854241

Heredity (Edinb). 2008 Oct;101(4):301-20

pubmed: 18612321

Nat Rev Genet. 2022 May;23(5):298-314

pubmed: 34880424

Genome Res. 2004 Jul;14(7):1394-403

pubmed: 15231754

Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45

pubmed: 26553804

J Mol Evol. 2006 Sep;63(3):375-92

pubmed: 16838214

BMC Genomics. 2014;15 Suppl 6:S6

pubmed: 25572416

Cell Genom. 2022 Mar 22;2(4):100112

pubmed: 36776527

PLoS One. 2010 Jun 25;5(6):e11147

pubmed: 20593022

Mol Phylogenet Evol. 2013 Nov;69(2):313-9

pubmed: 22982435

BMC Bioinformatics. 2015 Aug 11;16:250

pubmed: 26260162

PLoS One. 2013 Dec 16;8(12):e83356

pubmed: 24358278

Bioinformatics. 2012 Oct 15;28(20):2576-83

pubmed: 22851530

IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):487-98

pubmed: 26357234

Mol Biol Evol. 1997 Jan;14(1):91-104

pubmed: 9000757

Elife. 2018 Jun 13;7:

pubmed: 29897334

Nat Biotechnol. 2022 Jun;40(6):896-905

pubmed: 35058622

Cell Stem Cell. 2022 Mar 3;29(3):472-486.e7

pubmed: 35176222

Genome Res. 2008 May;18(5):821-9

pubmed: 18349386

Genome Res. 2017 Dec;27(12):2050-2060

pubmed: 29097403

iScience. 2019 Aug 30;18:28-36

pubmed: 31377530

Bioinformatics. 2020 May 1;36(9):2725-2730

pubmed: 31985791

J Mol Evol. 1999 Aug;49(2):193-203

pubmed: 10441671

Mol Biol Evol. 2003 Oct;20(10):1612-9

pubmed: 12832626

Bioinformatics. 2013 Dec 15;29(24):3143-50

pubmed: 24072733

Brief Bioinform. 2015 Sep;16(5):852-64

pubmed: 25504367

Comput Appl Biosci. 1997 Jun;13(3):235-8

pubmed: 9183526

Nat Genet. 2012 Jan 08;44(2):226-32

pubmed: 22231483

Algorithms Mol Biol. 2017 Aug 23;12:22

pubmed: 28852417

Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Lisa Fiedler (L)

Matthias Bernt (M)

Martin Middendorf (M)

Peter F Stadler (PF)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

Selecting optimal software code descriptors-The case of Java.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Fasciola hepatica and Fasciola hybrid form co-existence in yak from Tibet of China: application of rDNA internal transcribed spacer.

Classifications MeSH