IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences.
Gaps
IUPAC
Inverted repeat
Mismatches
Palindrome
Software
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
06 Feb 2021
06 Feb 2021
Historique:
received:
22
06
2020
accepted:
27
01
2021
entrez:
7
2
2021
pubmed:
8
2
2021
medline:
13
3
2021
Statut:
epublish
Résumé
An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. We present IUPACPAL, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Within the parameters that were tested, our experimental results show that IUPACPAL compares favourably to a similar application packaged with EMBOSS. We show that IUPACPAL identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.
Sections du résumé
BACKGROUND
BACKGROUND
An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets.
RESULTS
RESULTS
We present IUPACPAL, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats.
CONCLUSION
CONCLUSIONS
Within the parameters that were tested, our experimental results show that IUPACPAL compares favourably to a similar application packaged with EMBOSS. We show that IUPACPAL identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.
Identifiants
pubmed: 33549041
doi: 10.1186/s12859-021-03983-2
pii: 10.1186/s12859-021-03983-2
pmc: PMC7866733
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
51Subventions
Organisme : Engineering and Physical Sciences Research Council
ID : EP/M50788X-1.
Références
J Mol Biol. 2000 Mar 10;296(5):1169-73
pubmed: 10698623
J Mol Biol. 2002 Feb 22;316(3):563-81
pubmed: 11866518
Genome Res. 2004 Oct;14(10A):1861-9
pubmed: 15466286
Genomics. 2000 Mar 15;64(3):221-9
pubmed: 10756090
Brief Bioinform. 2018 Jan 1;19(1):118-135
pubmed: 27769991
Genome Biol. 2020 Feb 14;21(1):38
pubmed: 32059685
Genomics. 2020 Mar;112(2):1897-1901
pubmed: 31706022
Bioinformatics. 2014 Mar 15;30(6):887-8
pubmed: 24215021
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Nat Genet. 1997 May;16(1):96-9
pubmed: 9140403
Genomics. 2020 Jul;112(4):2772-2777
pubmed: 32234431
Trends Genet. 2000 Jun;16(6):276-7
pubmed: 10827456
J Cell Biochem. 1996 Oct;63(1):1-22
pubmed: 8891900
Nature. 2003 Jun 19;423(6942):825-37
pubmed: 12815422
Nucleic Acids Res. 1999 Jan 15;27(2):573-80
pubmed: 9862982
Hum Mol Genet. 1993 Aug;2(8):1105-15
pubmed: 8401491
J Phycol. 2020 Feb;56(1):170-184
pubmed: 31578712
Natl Sci Rev. 2020 Feb;7(2):403-417
pubmed: 34692056
Bioinformatics. 2018 Dec 15;34(24):4290-4292
pubmed: 29939210
Nature. 2003 Jun 19;423(6942):873-6
pubmed: 12815433