Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae).
Plutella xylostella
assembly
haplotype
pool-seq
Journal
G3 (Bethesda, Md.)
ISSN: 2160-1836
Titre abrégé: G3 (Bethesda)
Pays: England
ID NLM: 101566598
Informations de publication
Date de publication:
30 09 2022
30 09 2022
Historique:
received:
06
05
2022
accepted:
25
07
2022
pubmed:
19
8
2022
medline:
5
10
2022
entrez:
18
8
2022
Statut:
ppublish
Résumé
The assembly of divergent haplotypes using noisy long-read data presents a challenge to the reconstruction of haploid genome assemblies, due to overlapping distributions of technical sequencing error, intralocus genetic variation, and interlocus similarity within these data. Here, we present a comparative analysis of assembly algorithms representing overlap-layout-consensus, repeat graph, and de Bruijn graph methods. We examine how postprocessing strategies attempting to reduce redundant heterozygosity interact with the choice of initial assembly algorithm and ultimately produce a series of chromosome-level assemblies for an agricultural pest, the diamondback moth, Plutella xylostella (L.). We compare evaluation methods and show that BUSCO analyses may overestimate haplotig removal processing in long-read draft genomes, in comparison to a k-mer method. We discuss the trade-offs inherent in assembly algorithm and curation choices and suggest that "best practice" is research question dependent. We demonstrate a link between allelic divergence and allele-derived contig redundancy in final genome assemblies and document the patterns of coding and noncoding diversity between redundant sequences. We also document a link between an excess of nonsynonymous polymorphism and haplotigs that are unresolved by assembly or postassembly algorithms. Finally, we discuss how this phenomenon may have relevance for the usage of noisy long-read genome assemblies in comparative genomics.
Identifiants
pubmed: 35980174
pii: 6671219
doi: 10.1093/g3journal/jkac210
pmc: PMC9526047
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/M001512/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/M503472/1
Pays : United Kingdom
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America.
Références
Genome Res. 2020 Sep;30(9):1291-1305
pubmed: 32801147
Bioinformatics. 2017 Aug 15;33(16):2577-2579
pubmed: 28407147
Chromosome Res. 2013 Aug;21(5):491-505
pubmed: 23949445
Bioinformatics. 2017 Jul 15;33(14):2202-2204
pubmed: 28369201
Hum Genomics. 2015 Sep 04;9:21
pubmed: 26337052
Genes (Basel). 2019 Jan 18;10(1):
pubmed: 30669388
Genome Biol. 2020 Sep 14;21(1):245
pubmed: 32928274
Nat Rev Genet. 2009 Nov;10(11):783-96
pubmed: 19834483
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
Nat Biotechnol. 2018 Oct 22;:
pubmed: 30346939
Nat Biotechnol. 2011 Dec 25;30(1):90-8
pubmed: 22198700
Nat Biotechnol. 2016 Mar;34(3):303-11
pubmed: 26829319
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Genet Res. 1970 Oct 2;16(2):165-77
pubmed: 5516427
Nat Methods. 2021 Feb;18(2):165-169
pubmed: 33432244
Insect Mol Biol. 2012 Aug;21(4):414-21
pubmed: 22621377
Gigascience. 2017 Jul 1;6(7):1-7
pubmed: 28486658
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
BMC Bioinformatics. 2018 Nov 29;19(1):460
pubmed: 30497373
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Genetics. 2016 Jul;203(3):1315-34
pubmed: 27182952
Curr Biol. 2020 Jan 6;30(1):101-107.e3
pubmed: 31866368
Genome Biol. 2020 Feb 12;21(1):35
pubmed: 32051000
Gigascience. 2021 May 21;10(5):
pubmed: 34018554
Evolution. 2021 Apr;75(4):779-793
pubmed: 33598971
Nat Commun. 2020 May 8;11(1):2321
pubmed: 32385305
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
Nature. 2020 Nov;587(7833):246-251
pubmed: 33177663
PLoS Comput Biol. 2019 Aug 21;15(8):e1007273
pubmed: 31433799
Bioinformatics. 2020 May 1;36(9):2896-2898
pubmed: 31971576
Sci Rep. 2016 Aug 30;6:31900
pubmed: 27573208
PLoS One. 2011 Apr 26;6(4):e19315
pubmed: 21541297
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838
Proc Natl Acad Sci U S A. 2010 Sep 14;107(37):16060-5
pubmed: 20798343
Nat Methods. 2013 Jun;10(6):563-9
pubmed: 23644548
Gigascience. 2021 Jun 2;10(6):
pubmed: 34076242
Nat Genet. 2013 Feb;45(2):220-5
pubmed: 23313953
Nat Commun. 2019 Apr 12;10(1):1702
pubmed: 30979905
Genome Res. 2017 May;27(5):787-792
pubmed: 28130360
Genome Biol. 2019 Jul 25;20(1):144
pubmed: 31345254
Nature. 2012 Jul 5;487(7405):94-8
pubmed: 22722851