IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
06 Feb 2021
Historique:
received: 22 06 2020
accepted: 27 01 2021
entrez: 7 2 2021
pubmed: 8 2 2021
medline: 13 3 2021
Statut: epublish

Résumé

An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. We present IUPACPAL, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Within the parameters that were tested, our experimental results show that IUPACPAL compares favourably to a similar application packaged with EMBOSS. We show that IUPACPAL identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.

Sections du résumé

BACKGROUND BACKGROUND
An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets.
RESULTS RESULTS
We present IUPACPAL, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats.
CONCLUSION CONCLUSIONS
Within the parameters that were tested, our experimental results show that IUPACPAL compares favourably to a similar application packaged with EMBOSS. We show that IUPACPAL identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.

Identifiants

pubmed: 33549041
doi: 10.1186/s12859-021-03983-2
pii: 10.1186/s12859-021-03983-2
pmc: PMC7866733
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

51

Subventions

Organisme : Engineering and Physical Sciences Research Council
ID : EP/M50788X-1.

Références

J Mol Biol. 2000 Mar 10;296(5):1169-73
pubmed: 10698623
J Mol Biol. 2002 Feb 22;316(3):563-81
pubmed: 11866518
Genome Res. 2004 Oct;14(10A):1861-9
pubmed: 15466286
Genomics. 2000 Mar 15;64(3):221-9
pubmed: 10756090
Brief Bioinform. 2018 Jan 1;19(1):118-135
pubmed: 27769991
Genome Biol. 2020 Feb 14;21(1):38
pubmed: 32059685
Genomics. 2020 Mar;112(2):1897-1901
pubmed: 31706022
Bioinformatics. 2014 Mar 15;30(6):887-8
pubmed: 24215021
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Nat Genet. 1997 May;16(1):96-9
pubmed: 9140403
Genomics. 2020 Jul;112(4):2772-2777
pubmed: 32234431
Trends Genet. 2000 Jun;16(6):276-7
pubmed: 10827456
J Cell Biochem. 1996 Oct;63(1):1-22
pubmed: 8891900
Nature. 2003 Jun 19;423(6942):825-37
pubmed: 12815422
Nucleic Acids Res. 1999 Jan 15;27(2):573-80
pubmed: 9862982
Hum Mol Genet. 1993 Aug;2(8):1105-15
pubmed: 8401491
J Phycol. 2020 Feb;56(1):170-184
pubmed: 31578712
Natl Sci Rev. 2020 Feb;7(2):403-417
pubmed: 34692056
Bioinformatics. 2018 Dec 15;34(24):4290-4292
pubmed: 29939210
Nature. 2003 Jun 19;423(6942):873-6
pubmed: 12815433

Auteurs

Hayam Alamro (H)

Department of Informatics, King's College London, 30 Aldwych, London, UK.
Department of Information Systems, Princess Nourah bint Abdulrahman University, Riyadh, Kingdom of Saudi Arabia.

Mai Alzamel (M)

Department of Informatics, King's College London, 30 Aldwych, London, UK.
Computer Science Department, King Saud University, Riyadh, Kingdom of Saudi Arabia.

Costas S Iliopoulos (CS)

Department of Informatics, King's College London, 30 Aldwych, London, UK.

Solon P Pissis (SP)

Centrum Wiskunde & Informatica, Amsterdam, The Netherlands. solon.pissis@cwi.nl.
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands. solon.pissis@cwi.nl.

Steven Watts (S)

Department of Informatics, King's College London, 30 Aldwych, London, UK.

Articles similaires

Ascomycota Cenchrus Chromosomes, Fungal Genome, Fungal Plant Diseases
Animals Genome Fishes Chromosomes Molecular Sequence Annotation
Isopoda Animals Phylogeny Biological Evolution Transcriptome

Bank vole genomics links determinate and indeterminate growth of teeth.

Zachary T Calamari, Andrew Song, Emily Cohen et al.
1.00
Animals Arvicolinae Genomics Mice Tooth

Classifications MeSH