UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction.
bioinformatics
error correction
molecular barcodes
next-generation sequencing
unique molecular identifiers
Journal
Frontiers in genetics
ISSN: 1664-8021
Titre abrégé: Front Genet
Pays: Switzerland
ID NLM: 101560621
Informations de publication
Date de publication:
2021
2021
Historique:
received:
29
01
2021
accepted:
08
04
2021
entrez:
14
6
2021
pubmed:
15
6
2021
medline:
15
6
2021
Statut:
epublish
Résumé
A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc.
Identifiants
pubmed: 34122513
doi: 10.3389/fgene.2021.660366
pmc: PMC8193862
doi:
Types de publication
Journal Article
Langues
eng
Pagination
660366Informations de copyright
Copyright © 2021 Tsagiopoulou, Maniou, Pechlivanis, Togkousidis, Kotrová, Hutzenlaub, Kappas, Chatzidimitriou and Psomopoulos.
Déclaration de conflit d'intérêts
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Références
Front Genet. 2015 Jun 17;6:215
pubmed: 26136771
Nat Biotechnol. 2019 Aug;37(8):907-915
pubmed: 31375807
Genome Biol. 2019 Mar 27;20(1):65
pubmed: 30917859
Bioinformatics. 2014 Jul 1;30(13):1930-2
pubmed: 24618469
BMC Bioinformatics. 2016 Oct 8;17(1):419
pubmed: 27717304
Genome Res. 2017 Mar;27(3):491-499
pubmed: 28100584
Nature. 2001 Feb 15;409(6822):860-921
pubmed: 11237011
Nat Methods. 2011 Nov 20;9(1):72-4
pubmed: 22101854
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Leukemia. 2019 Sep;33(9):2241-2253
pubmed: 31243313
Proc Natl Acad Sci U S A. 2011 Jun 7;108(23):9530-5
pubmed: 21586637
Sci Rep. 2020 Sep 3;10(1):14593
pubmed: 32884024
Nat Protoc. 2017 Jan;12(1):44-73
pubmed: 27929523
Methods Mol Biol. 2012;882:569-604
pubmed: 22665256
J Immunol. 2015 Jun 15;194(12):6155-63
pubmed: 25957172
Nat Rev Genet. 2018 May;19(5):269-285
pubmed: 29576615
Bioinformatics. 2019 Jun 1;35(11):1829-1836
pubmed: 30351359
Nat Protoc. 2017 Apr;12(4):664-682
pubmed: 28253235
Front Cell Dev Biol. 2020 May 08;8:249
pubmed: 32457898
PeerJ. 2019 Dec 16;7:e8275
pubmed: 31871845
BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):606
pubmed: 31881822
Gigascience. 2018 Jun 1;7(6):
pubmed: 29846586
Nat Methods. 2014 Feb;11(2):163-6
pubmed: 24363023
Nat Methods. 2014 Jun;11(6):653-5
pubmed: 24793455