Benchmarking hybrid assemblies of Giardia and prediction of widespread intra-isolate structural variation.
Genome assembly
Giardia
Heterozygosity
Long-read sequencing
MinION
Parasite
Polyploidy
Structural variants
Tetraploid
Journal
Parasites & vectors
ISSN: 1756-3305
Titre abrégé: Parasit Vectors
Pays: England
ID NLM: 101462774
Informations de publication
Date de publication:
28 Feb 2020
28 Feb 2020
Historique:
received:
03
10
2019
accepted:
13
02
2020
entrez:
1
3
2020
pubmed:
1
3
2020
medline:
21
10
2020
Statut:
epublish
Résumé
Currently available short read genome assemblies of the tetraploid protozoan parasite Giardia intestinalis are highly fragmented, highlighting the need for improved genome assemblies at a reasonable cost. Long nanopore reads are well suited to resolve repetitive genomic regions resulting in better quality assemblies of eukaryotic genomes. Subsequent addition of highly accurate short reads to long-read assemblies further improves assembly quality. Using this hybrid approach, we assembled genomes for three Giardia isolates, two with published assemblies and one novel, to evaluate the improvement in genome quality gained from long reads. We then used the long reads to predict structural variants to examine this previously unexplored source of genetic variation in Giardia. With MinION reads for each isolate, we assembled genomes using several assemblers specializing in long reads. Assembly metrics, gene finding, and whole genome alignments to the reference genomes enabled direct comparison to evaluate the performance of the nanopore reads. Further improvements from adding Illumina reads to the long-read assemblies were evaluated using gene finding. Structural variants were predicted from alignments of the long reads to the best hybrid genome for each isolate and enrichment of key genes was analyzed using random genome sampling and calculation of percentiles to find thresholds of significance. Our hybrid assembly method generated reference quality genomes for each isolate. Consistent with previous findings based on SNPs, examination of heterozygosity using the structural variants found that Giardia BGS was considerably more heterozygous than the other isolates that are from Assemblage A. Further, each isolate was shown to contain structural variant regions enriched for variant-specific surface proteins, a key class of virulence factor in Giardia. The ability to generate reference quality genomes from a single MinION run and a multiplexed MiSeq run enables future large-scale comparative genomic studies within the genus Giardia. Further, prediction of structural variants from long reads allows for more in-depth analyses of major sources of genetic variation within and between Giardia isolates that could have effects on both pathogenicity and host range.
Sections du résumé
BACKGROUND
BACKGROUND
Currently available short read genome assemblies of the tetraploid protozoan parasite Giardia intestinalis are highly fragmented, highlighting the need for improved genome assemblies at a reasonable cost. Long nanopore reads are well suited to resolve repetitive genomic regions resulting in better quality assemblies of eukaryotic genomes. Subsequent addition of highly accurate short reads to long-read assemblies further improves assembly quality. Using this hybrid approach, we assembled genomes for three Giardia isolates, two with published assemblies and one novel, to evaluate the improvement in genome quality gained from long reads. We then used the long reads to predict structural variants to examine this previously unexplored source of genetic variation in Giardia.
METHODS
METHODS
With MinION reads for each isolate, we assembled genomes using several assemblers specializing in long reads. Assembly metrics, gene finding, and whole genome alignments to the reference genomes enabled direct comparison to evaluate the performance of the nanopore reads. Further improvements from adding Illumina reads to the long-read assemblies were evaluated using gene finding. Structural variants were predicted from alignments of the long reads to the best hybrid genome for each isolate and enrichment of key genes was analyzed using random genome sampling and calculation of percentiles to find thresholds of significance.
RESULTS
RESULTS
Our hybrid assembly method generated reference quality genomes for each isolate. Consistent with previous findings based on SNPs, examination of heterozygosity using the structural variants found that Giardia BGS was considerably more heterozygous than the other isolates that are from Assemblage A. Further, each isolate was shown to contain structural variant regions enriched for variant-specific surface proteins, a key class of virulence factor in Giardia.
CONCLUSIONS
CONCLUSIONS
The ability to generate reference quality genomes from a single MinION run and a multiplexed MiSeq run enables future large-scale comparative genomic studies within the genus Giardia. Further, prediction of structural variants from long reads allows for more in-depth analyses of major sources of genetic variation within and between Giardia isolates that could have effects on both pathogenicity and host range.
Identifiants
pubmed: 32111234
doi: 10.1186/s13071-020-3968-8
pii: 10.1186/s13071-020-3968-8
pmc: PMC7048089
doi:
Substances chimiques
DNA, Protozoan
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
108Subventions
Organisme : Ontario Ministry of Agriculture, Food and Rural Affairs
ID : FS2016-3010
Organisme : Alberta Agriculture and Forestry
ID : 2016F013R
Organisme : Natural Sciences and Engineering Research Council of Canada
ID : 222982
Organisme : Natural Sciences and Engineering Research Council of Canada
ID : Visiting Fellowship in Canadian Government Laboratories
Références
Nucleic Acids Res. 2009 Jan;37(Database issue):D526-30
pubmed: 18824479
J Infect Dis. 2010 Dec 1;202(11):1713-21
pubmed: 20977340
Infect Genet Evol. 2018 Jun;60:7-16
pubmed: 29438742
Genome Res. 2011 Mar;21(3):487-93
pubmed: 21209072
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
Trends Parasitol. 2017 Jul;33(7):561-576
pubmed: 28336217
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279
pubmed: 27646134
Bioinformatics. 2016 Jan 1;32(1):142-4
pubmed: 26382197
Genome Biol Evol. 2019 Jul 1;11(7):1952-1957
pubmed: 31218350
Genomics Proteomics Bioinformatics. 2015 Oct;13(5):278-89
pubmed: 26542840
Front Plant Sci. 2018 Nov 21;9:1660
pubmed: 30519250
Genomics Proteomics Bioinformatics. 2015 Feb;13(1):4-16
pubmed: 25743089
Trends Parasitol. 2010 Feb;26(2):70-4
pubmed: 20022561
Mol Ecol. 2019 Mar;28(6):1203-1209
pubmed: 30834648
Nat Methods. 2018 Jun;15(6):461-468
pubmed: 29713083
Science. 2007 Sep 28;317(5846):1921-6
pubmed: 17901334
Bioinformatics. 2010 Mar 1;26(5):589-95
pubmed: 20080505
Clin Microbiol Rev. 2002 Jul;15(3):329-41
pubmed: 12097242
BMC Bioinformatics. 2005 Feb 15;6:31
pubmed: 15713233
Nature. 2008 Dec 11;456(7223):750-4
pubmed: 19079052
BMC Genomics. 2010 Oct 07;11:543
pubmed: 20929575
Curr Biol. 2007 Nov 20;17(22):1984-8
pubmed: 17980591
Genome Res. 2018 Feb;28(2):266-274
pubmed: 29273626
Bioinformatics. 2017 Mar 15;33(6):926-928
pubmed: 28039163
BMC Genomics. 2015 Sep 15;16:697
pubmed: 26370391
Sci Rep. 2019 Nov 5;9(1):16040
pubmed: 31690847
Nat Commun. 2017 Jan 24;8:14061
pubmed: 28117401
PLoS Genet. 2011 Dec;7(12):e1002384
pubmed: 22144907
PLoS Pathog. 2009 Aug;5(8):e1000560
pubmed: 19696920
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509
Nat Methods. 2015 Aug;12(8):733-5
pubmed: 26076426
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Gigascience. 2019 Sep 1;8(9):
pubmed: 31494670
Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405
pubmed: 27956617
Gigascience. 2018 Mar 1;7(3):1-13
pubmed: 29385462
Nat Rev Genet. 2013 Feb;14(2):125-38
pubmed: 23329113
Genome Biol Evol. 2013;5(12):2498-511
pubmed: 24307482