An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data.
benchmarking
bioinformatics
herv-k
retrovirus
whole-genome sequencing
Journal
Frontiers in bioinformatics
ISSN: 2673-7647
Titre abrégé: Front Bioinform
Pays: Switzerland
ID NLM: 9918227263306676
Informations de publication
Date de publication:
2022
2022
Historique:
received:
05
10
2022
accepted:
12
12
2022
entrez:
27
2
2023
pubmed:
28
2
2023
medline:
28
2
2023
Statut:
epublish
Résumé
There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available.
Identifiants
pubmed: 36845320
doi: 10.3389/fbinf.2022.1062328
pii: 1062328
pmc: PMC9945273
doi:
Types de publication
Journal Article
Langues
eng
Pagination
1062328Subventions
Organisme : Motor Neurone Disease Association
ID : ALCHALABI-DOBSON/APR14/829-791
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/R024804/1
Pays : United Kingdom
Informations de copyright
Copyright © 2023 Bowles, Kabiljo, Al Khleifat, Jones, Quinn, Dobson, Swanson, Al-Chalabi and Iacoangeli.
Déclaration de conflit d'intérêts
AC is the Principal Investigator of the Lighthouse 2 trial of Triumeq in ALS. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Références
Viruses. 2021 Mar 10;13(3):
pubmed: 33802118
Genome Biol. 2018 Nov 19;19(1):199
pubmed: 30454069
Eur J Immunol. 2020 May;50(5):685-694
pubmed: 32012247
Acta Neuropathol Commun. 2019 Jul 17;7(1):115
pubmed: 31315673
Crit Rev Microbiol. 2018 Nov;44(6):715-738
pubmed: 30318978
Mol Ecol. 2019 Mar;28(6):1491-1505
pubmed: 30520198
Methods Mol Biol. 2012;859:29-51
pubmed: 22367864
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Sci Transl Med. 2015 Sep 30;7(307):307ra153
pubmed: 26424568
J Virol. 2005 Oct;79(19):12507-14
pubmed: 16160178
J Virol. 2019 Jul 30;93(16):
pubmed: 31167914
Sci Data. 2019 Jun 14;6(1):91
pubmed: 31201313
Sci Rep. 2021 Jul 12;11(1):14283
pubmed: 34253796
iScience. 2022 Oct 07;25(11):105289
pubmed: 36339261
BMC Bioinformatics. 2019 Apr 27;20(1):213
pubmed: 31029080
Biology (Basel). 2021 May 14;10(5):
pubmed: 34069102
Mob DNA. 2021 Jan 12;12(1):2
pubmed: 33436076
Eur J Hum Genet. 2018 Oct;26(10):1537-1546
pubmed: 29955173
Nat Genet. 2019 Sep;51(9):1380-1388
pubmed: 31427791
Genome Res. 2017 Nov;27(11):1916-1929
pubmed: 28855259
Nat Commun. 2021 Jun 22;12(1):3836
pubmed: 34158502
Retrovirology. 2020 May 6;17(1):10
pubmed: 32375827
Mob DNA. 2019 Dec 29;10:52
pubmed: 31890048
J Virol. 1994 Jun;68(6):3830-40
pubmed: 8189520
Mob DNA. 2021 Nov 27;12(1):28
pubmed: 34838103
Front Mol Biosci. 2016 Nov 16;3:76
pubmed: 27900322
Front Oncol. 2021 May 13;11:658489
pubmed: 34055625
Genome Res. 2019 Oct;29(10):1567-1577
pubmed: 31575651
Bioinformatics. 2019 Oct 15;35(20):3913-3922
pubmed: 30895294
Virus Genes. 2003 May;26(3):291-315
pubmed: 12876457
Mob DNA. 2015 Dec 29;6:24
pubmed: 26719777
Chromosome Res. 2018 Mar;26(1-2):93-111
pubmed: 29460123
Int J Mol Sci. 2019 Jul 29;20(15):
pubmed: 31362360
Genome Biol. 2021 May 10;22(1):146
pubmed: 33971925
Gene. 2018 Oct 30;675:69-79
pubmed: 29953920
Genome Res. 2017 May;27(5):849-864
pubmed: 28396521
Retrovirology. 2011 Nov 08;8:90
pubmed: 22067224
Nucleic Acids Res. 2022 Mar 21;50(5):2493-2508
pubmed: 35212372
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Front Microbiol. 2020 Jul 17;11:1690
pubmed: 32765477
Bioorg Khim. 2003 Jan-Feb;29(1):103-6
pubmed: 12659000
Retrovirology. 2012 Dec 20;9:111
pubmed: 23253934
J Gen Virol. 2008 Feb;89(Pt 2):567-572
pubmed: 18198388
J Virol. 2007 Oct;81(19):10712-7
pubmed: 17634225
Bioinformatics. 2013 Feb 01;29(3):389-90
pubmed: 23233656
Proc Natl Acad Sci U S A. 2016 Apr 19;113(16):E2326-34
pubmed: 27001843
BMC Genomics. 2017 Jun 27;18(1):487
pubmed: 28655292
Genome Biol. 2014;15(10):488
pubmed: 25348035
Semin Cancer Biol. 2010 Aug;20(4):234-45
pubmed: 20416380
Virus Evol. 2017 Aug 21;3(2):vex023
pubmed: 28948042
Nat Rev Genet. 2019 Dec;20(12):760-772
pubmed: 31515540
Int J Mol Sci. 2019 Nov 27;20(23):
pubmed: 31783611
Retrovirology. 2022 Jun 8;19(1):11
pubmed: 35676699
Nat Rev Genet. 2020 Aug;21(8):448
pubmed: 32488197
Cell. 2015 Aug 27;162(5):974-86
pubmed: 26317466