Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies.
Bioinformatics
Chromosomes
Comparative genomics
Computational evolutionary biology
Gene synteny
Genome assembly
Mosquito genomes
Orthology
Physical mapping
Journal
BMC biology
ISSN: 1741-7007
Titre abrégé: BMC Biol
Pays: England
ID NLM: 101190720
Informations de publication
Date de publication:
02 01 2020
02 01 2020
Historique:
received:
13
11
2019
accepted:
26
11
2019
entrez:
4
1
2020
pubmed:
4
1
2020
medline:
1
9
2020
Statut:
epublish
Résumé
New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from 'finished'. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.
Sections du résumé
BACKGROUND
New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from 'finished'. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies.
RESULTS
We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi.
CONCLUSIONS
Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.
Identifiants
pubmed: 31898513
doi: 10.1186/s12915-019-0728-3
pii: 10.1186/s12915-019-0728-3
pmc: PMC6939337
doi:
Types de publication
Comparative Study
Journal Article
Research Support, N.I.H., Extramural
Research Support, N.I.H., Intramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
1Subventions
Organisme : NIAID NIH HHS
ID : R21 AI099528
Pays : United States
Organisme : NIAID NIH HHS
ID : R21 AI135298
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA211000
Pays : United States
Organisme : NIAID NIH HHS
ID : R21 AI112734
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG006677
Pays : United States
Organisme : NHGRI NIH HHS
ID : ZIA HG200398
Pays : United States
Références
Bioinformatics. 2010 Oct 15;26(20):2620-1
pubmed: 20736339
Nucleic Acids Res. 2015 Jan;43(Database issue):D707-13
pubmed: 25510499
BMC Genomics. 2014 Jan 18;15:42
pubmed: 24438588
PLoS Comput Biol. 2019 Aug 21;15(8):e1007273
pubmed: 31433799
Bioinformatics. 2018 Jul 1;34(13):i142-i150
pubmed: 29949969
Insect Mol Biol. 2008 Feb;17(1):1-8
pubmed: 18237279
PLoS Biol. 2015 Apr 16;13(4):e1002078
pubmed: 25879221
Insect Biochem Mol Biol. 2016 Sep;76:118-147
pubmed: 27522922
Med Vet Entomol. 2015 Sep;29(3):230-7
pubmed: 25776224
Science. 2002 Oct 4;298(5591):182-5
pubmed: 12364797
Nature. 2013 May 30;497(7451):579-84
pubmed: 23698360
Bioinformatics. 2011 Feb 15;27(4):578-9
pubmed: 21149342
G3 (Bethesda). 2016 Jan 15;6(3):695-708
pubmed: 26772750
PLoS One. 2014 Jul 17;9(7):e101717
pubmed: 25032825
BMC Bioinformatics. 2014 Aug 15;15:281
pubmed: 25128196
Science. 2017 Apr 7;356(6333):92-95
pubmed: 28336562
PLoS One. 2010 May 12;5(5):e10592
pubmed: 20485676
Proc Natl Acad Sci U S A. 2013 Jan 29;110(5):1785-90
pubmed: 23307812
Methods Mol Biol. 2019;1858:59-74
pubmed: 30414111
J Hered. 2017 Sep 01;108(6):693-700
pubmed: 28821183
Nucleic Acids Res. 2013 Aug;41(15):7387-400
pubmed: 23761445
Methods Mol Biol. 2019;1858:177-194
pubmed: 30414118
Bioinformatics. 2014 May 1;30(9):1312-3
pubmed: 24451623
Genes (Basel). 2019 Jan 18;10(1):
pubmed: 30669388
Gigascience. 2019 Jun 1;8(6):
pubmed: 31157884
Gigascience. 2016 Aug 22;5(1):38
pubmed: 27549770
Gigascience. 2016 Jul 19;5(1):31
pubmed: 27435057
Exp Cell Res. 1980 Aug;128(2):485-90
pubmed: 6157553
BMC Evol Biol. 2011 Nov 18;11:337
pubmed: 22098672
Bioinformatics. 2012 Oct 1;28(19):2520-2
pubmed: 22908215
Genome Biol Evol. 2018 Jul 1;10(7):1663-1672
pubmed: 29860336
BMC Genomics. 2015;16 Suppl 10:S11
pubmed: 26450761
BMC Genomics. 2014 Jan 30;15:86
pubmed: 24479613
Genome Biol. 2009;10(8):R88
pubmed: 19712469
G3 (Bethesda). 2014 Apr 16;4(4):669-79
pubmed: 24531727
BMC Genomics. 2018 Apr 23;19(1):278
pubmed: 29688842
Genome Res. 2018 Nov;28(11):1720-1732
pubmed: 30341161
Science. 2002 Oct 4;298(5591):129-49
pubmed: 12364791
Genome Biol. 2007;8(1):R5
pubmed: 17210077
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
Nat Commun. 2014 Sep 05;5:4737
pubmed: 25189940
Science. 2002 Nov 15;298(5597):1415-8
pubmed: 12364623
Malar J. 2017 Jun 5;16(1):235
pubmed: 28583133
Genome Res. 2009 May;19(5):943-57
pubmed: 19218533
Methods Mol Biol. 2019;1858:33-44
pubmed: 30414109
G3 (Bethesda). 2012 Feb;2(2):313-9
pubmed: 22384409
Genome Res. 2009 Sep;19(9):1639-45
pubmed: 19541911
Genome Res. 2010 Dec;20(12):1740-7
pubmed: 20980554
Front Genet. 2015 Jun 19;6:220
pubmed: 26150829
Genetics. 2014 Mar;196(3):875-90
pubmed: 24653210
Nature. 2018 Nov;563(7732):501-507
pubmed: 30429615
BMC Bioinformatics. 2012 Jun 18;13:134
pubmed: 22708584
PeerJ. 2018 Jun 4;6:e4958
pubmed: 29888139
BMC Bioinformatics. 2017 Dec 6;18(Suppl 15):496
pubmed: 29244014
Genomics Proteomics Bioinformatics. 2018 Oct;16(5):373-381
pubmed: 30583062
Mol Ecol. 2017 Oct;26(20):5552-5566
pubmed: 28833796
G3 (Bethesda). 2017 Jan 5;7(1):155-164
pubmed: 27821634
Science. 2010 Oct 22;330(6003):512-4
pubmed: 20966253
Genome Res. 2017 May;27(5):778-786
pubmed: 28159771
PLoS Genet. 2016 Sep 15;12(9):e1006303
pubmed: 27631375
J Comput Biol. 2016 Mar;23(3):150-64
pubmed: 26885568
Science. 2015 Jan 2;347(6217):1258524
pubmed: 25431491
BMC Genomics. 2018 May 9;19(Suppl 2):96
pubmed: 29764366
Nature. 2017 Apr 26;544(7651):427-433
pubmed: 28447635
Nat Methods. 2015 Apr;12(4):357-60
pubmed: 25751142
Curr Opin Biotechnol. 2013 Aug;24(4):690-8
pubmed: 23428595
J Med Entomol. 2006 Sep;43(5):861-6
pubmed: 17017220
G3 (Bethesda). 2018 Oct 3;8(10):3131-3141
pubmed: 30087105
Trends Parasitol. 2019 Jan;35(1):32-51
pubmed: 30391118
J Hered. 2004 Jan-Feb;95(1):29-34
pubmed: 14757727
Nat Biotechnol. 2013 Dec;31(12):1119-25
pubmed: 24185095
Mol Biol Evol. 2012 Jun;29(6):1645-53
pubmed: 22319158
Bioinformatics. 2012 Sep 15;28(18):i382-i388
pubmed: 22962456
Nucleic Acids Res. 2017 Jan 4;45(D1):D744-D749
pubmed: 27899580
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
BMC Bioinformatics. 2018 Jan 30;19(1):26
pubmed: 29382321
Nat Biotechnol. 2013 Dec;31(12):1143-7
pubmed: 24270850
Heredity (Edinb). 2015 Nov;115(5):471-9
pubmed: 25920668
Comput Biol Chem. 2015 Aug;57:46-53
pubmed: 25819137
Am J Trop Med Hyg. 2010 Nov;83(5):1023-7
pubmed: 21036831
G3 (Bethesda). 2017 Jul 5;7(7):2259-2270
pubmed: 28546385
G3 (Bethesda). 2017 Jun 7;7(6):1927-1940
pubmed: 28450369
Genome Biol Evol. 2015 Jun 15;7(7):1914-24
pubmed: 26078263
G3 (Bethesda). 2013 Jul 08;3(7):1191-4
pubmed: 23708298
Plant Genome. 2016 Nov;9(3):
pubmed: 27902792
Mol Biol Evol. 2018 Mar 1;35(3):543-548
pubmed: 29220515
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
Genome Res. 2017 May;27(5):875-884
pubmed: 27903645
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7
pubmed: 15034147
Genome Biol Evol. 2017 May 1;9(5):1312-1319
pubmed: 28402423
Science. 2015 Jan 2;347(6217):1258522
pubmed: 25554792
Genome Res. 2016 Mar;26(3):342-50
pubmed: 26848124
Genome Res. 2017 Sep;27(9):1536-1548
pubmed: 28747381
Genome Biol. 2015 Sep 24;16:207
pubmed: 26403281
Genome Biol. 2014 Sep 23;15(9):459
pubmed: 25244985
Insects. 2018 Sep 17;9(3):null
pubmed: 30227611
Development. 2007 Jul;134(14):2549-60
pubmed: 17553908