Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies.

Bioinformatics Chromosomes Comparative genomics Computational evolutionary biology Gene synteny Genome assembly Mosquito genomes Orthology Physical mapping

Journal

BMC biology
ISSN: 1741-7007
Titre abrégé: BMC Biol
Pays: England
ID NLM: 101190720

Informations de publication

Date de publication:
02 01 2020
Historique:
received: 13 11 2019
accepted: 26 11 2019
entrez: 4 1 2020
pubmed: 4 1 2020
medline: 1 9 2020
Statut: epublish

Résumé

New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from 'finished'. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.

Sections du résumé

BACKGROUND
New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from 'finished'. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies.
RESULTS
We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi.
CONCLUSIONS
Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.

Identifiants

pubmed: 31898513
doi: 10.1186/s12915-019-0728-3
pii: 10.1186/s12915-019-0728-3
pmc: PMC6939337
doi:

Types de publication

Comparative Study Journal Article Research Support, N.I.H., Extramural Research Support, N.I.H., Intramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

1

Subventions

Organisme : NIAID NIH HHS
ID : R21 AI099528
Pays : United States
Organisme : NIAID NIH HHS
ID : R21 AI135298
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA211000
Pays : United States
Organisme : NIAID NIH HHS
ID : R21 AI112734
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG006677
Pays : United States
Organisme : NHGRI NIH HHS
ID : ZIA HG200398
Pays : United States

Références

Bioinformatics. 2010 Oct 15;26(20):2620-1
pubmed: 20736339
Nucleic Acids Res. 2015 Jan;43(Database issue):D707-13
pubmed: 25510499
BMC Genomics. 2014 Jan 18;15:42
pubmed: 24438588
PLoS Comput Biol. 2019 Aug 21;15(8):e1007273
pubmed: 31433799
Bioinformatics. 2018 Jul 1;34(13):i142-i150
pubmed: 29949969
Insect Mol Biol. 2008 Feb;17(1):1-8
pubmed: 18237279
PLoS Biol. 2015 Apr 16;13(4):e1002078
pubmed: 25879221
Insect Biochem Mol Biol. 2016 Sep;76:118-147
pubmed: 27522922
Med Vet Entomol. 2015 Sep;29(3):230-7
pubmed: 25776224
Science. 2002 Oct 4;298(5591):182-5
pubmed: 12364797
Nature. 2013 May 30;497(7451):579-84
pubmed: 23698360
Bioinformatics. 2011 Feb 15;27(4):578-9
pubmed: 21149342
G3 (Bethesda). 2016 Jan 15;6(3):695-708
pubmed: 26772750
PLoS One. 2014 Jul 17;9(7):e101717
pubmed: 25032825
BMC Bioinformatics. 2014 Aug 15;15:281
pubmed: 25128196
Science. 2017 Apr 7;356(6333):92-95
pubmed: 28336562
PLoS One. 2010 May 12;5(5):e10592
pubmed: 20485676
Proc Natl Acad Sci U S A. 2013 Jan 29;110(5):1785-90
pubmed: 23307812
Methods Mol Biol. 2019;1858:59-74
pubmed: 30414111
J Hered. 2017 Sep 01;108(6):693-700
pubmed: 28821183
Nucleic Acids Res. 2013 Aug;41(15):7387-400
pubmed: 23761445
Methods Mol Biol. 2019;1858:177-194
pubmed: 30414118
Bioinformatics. 2014 May 1;30(9):1312-3
pubmed: 24451623
Genes (Basel). 2019 Jan 18;10(1):
pubmed: 30669388
Gigascience. 2019 Jun 1;8(6):
pubmed: 31157884
Gigascience. 2016 Aug 22;5(1):38
pubmed: 27549770
Gigascience. 2016 Jul 19;5(1):31
pubmed: 27435057
Exp Cell Res. 1980 Aug;128(2):485-90
pubmed: 6157553
BMC Evol Biol. 2011 Nov 18;11:337
pubmed: 22098672
Bioinformatics. 2012 Oct 1;28(19):2520-2
pubmed: 22908215
Genome Biol Evol. 2018 Jul 1;10(7):1663-1672
pubmed: 29860336
BMC Genomics. 2015;16 Suppl 10:S11
pubmed: 26450761
BMC Genomics. 2014 Jan 30;15:86
pubmed: 24479613
Genome Biol. 2009;10(8):R88
pubmed: 19712469
G3 (Bethesda). 2014 Apr 16;4(4):669-79
pubmed: 24531727
BMC Genomics. 2018 Apr 23;19(1):278
pubmed: 29688842
Genome Res. 2018 Nov;28(11):1720-1732
pubmed: 30341161
Science. 2002 Oct 4;298(5591):129-49
pubmed: 12364791
Genome Biol. 2007;8(1):R5
pubmed: 17210077
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
Nat Commun. 2014 Sep 05;5:4737
pubmed: 25189940
Science. 2002 Nov 15;298(5597):1415-8
pubmed: 12364623
Malar J. 2017 Jun 5;16(1):235
pubmed: 28583133
Genome Res. 2009 May;19(5):943-57
pubmed: 19218533
Methods Mol Biol. 2019;1858:33-44
pubmed: 30414109
G3 (Bethesda). 2012 Feb;2(2):313-9
pubmed: 22384409
Genome Res. 2009 Sep;19(9):1639-45
pubmed: 19541911
Genome Res. 2010 Dec;20(12):1740-7
pubmed: 20980554
Front Genet. 2015 Jun 19;6:220
pubmed: 26150829
Genetics. 2014 Mar;196(3):875-90
pubmed: 24653210
Nature. 2018 Nov;563(7732):501-507
pubmed: 30429615
BMC Bioinformatics. 2012 Jun 18;13:134
pubmed: 22708584
PeerJ. 2018 Jun 4;6:e4958
pubmed: 29888139
BMC Bioinformatics. 2017 Dec 6;18(Suppl 15):496
pubmed: 29244014
Genomics Proteomics Bioinformatics. 2018 Oct;16(5):373-381
pubmed: 30583062
Mol Ecol. 2017 Oct;26(20):5552-5566
pubmed: 28833796
G3 (Bethesda). 2017 Jan 5;7(1):155-164
pubmed: 27821634
Science. 2010 Oct 22;330(6003):512-4
pubmed: 20966253
Genome Res. 2017 May;27(5):778-786
pubmed: 28159771
PLoS Genet. 2016 Sep 15;12(9):e1006303
pubmed: 27631375
J Comput Biol. 2016 Mar;23(3):150-64
pubmed: 26885568
Science. 2015 Jan 2;347(6217):1258524
pubmed: 25431491
BMC Genomics. 2018 May 9;19(Suppl 2):96
pubmed: 29764366
Nature. 2017 Apr 26;544(7651):427-433
pubmed: 28447635
Nat Methods. 2015 Apr;12(4):357-60
pubmed: 25751142
Curr Opin Biotechnol. 2013 Aug;24(4):690-8
pubmed: 23428595
J Med Entomol. 2006 Sep;43(5):861-6
pubmed: 17017220
G3 (Bethesda). 2018 Oct 3;8(10):3131-3141
pubmed: 30087105
Trends Parasitol. 2019 Jan;35(1):32-51
pubmed: 30391118
J Hered. 2004 Jan-Feb;95(1):29-34
pubmed: 14757727
Nat Biotechnol. 2013 Dec;31(12):1119-25
pubmed: 24185095
Mol Biol Evol. 2012 Jun;29(6):1645-53
pubmed: 22319158
Bioinformatics. 2012 Sep 15;28(18):i382-i388
pubmed: 22962456
Nucleic Acids Res. 2017 Jan 4;45(D1):D744-D749
pubmed: 27899580
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
BMC Bioinformatics. 2018 Jan 30;19(1):26
pubmed: 29382321
Nat Biotechnol. 2013 Dec;31(12):1143-7
pubmed: 24270850
Heredity (Edinb). 2015 Nov;115(5):471-9
pubmed: 25920668
Comput Biol Chem. 2015 Aug;57:46-53
pubmed: 25819137
Am J Trop Med Hyg. 2010 Nov;83(5):1023-7
pubmed: 21036831
G3 (Bethesda). 2017 Jul 5;7(7):2259-2270
pubmed: 28546385
G3 (Bethesda). 2017 Jun 7;7(6):1927-1940
pubmed: 28450369
Genome Biol Evol. 2015 Jun 15;7(7):1914-24
pubmed: 26078263
G3 (Bethesda). 2013 Jul 08;3(7):1191-4
pubmed: 23708298
Plant Genome. 2016 Nov;9(3):
pubmed: 27902792
Mol Biol Evol. 2018 Mar 1;35(3):543-548
pubmed: 29220515
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
Genome Res. 2017 May;27(5):875-884
pubmed: 27903645
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7
pubmed: 15034147
Genome Biol Evol. 2017 May 1;9(5):1312-1319
pubmed: 28402423
Science. 2015 Jan 2;347(6217):1258522
pubmed: 25554792
Genome Res. 2016 Mar;26(3):342-50
pubmed: 26848124
Genome Res. 2017 Sep;27(9):1536-1548
pubmed: 28747381
Genome Biol. 2015 Sep 24;16:207
pubmed: 26403281
Genome Biol. 2014 Sep 23;15(9):459
pubmed: 25244985
Insects. 2018 Sep 17;9(3):null
pubmed: 30227611
Development. 2007 Jul;134(14):2549-60
pubmed: 17553908

Auteurs

Robert M Waterhouse (RM)

Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland. robert.waterhouse@unil.ch.

Sergey Aganezov (S)

Department of Computer Science, Princeton University, Princeton, NJ, 08450, USA.
Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA.

Yoann Anselmetti (Y)

ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France.

Jiyoung Lee (J)

The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.

Livio Ruzzante (L)

Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.

Maarten J M F Reijnders (MJMF)

Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.

Romain Feron (R)

Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.

Sèverine Bérard (S)

ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France.

Phillip George (P)

Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.

Matthew W Hahn (MW)

Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA.

Paul I Howell (PI)

Centers for Disease Control and Prevention, Atlanta, GA, 30329, USA.

Maryam Kamali (M)

Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.
Department of Medical Entomology and Parasitology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran.

Sergey Koren (S)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA.

Daniel Lawson (D)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK.

Gareth Maslen (G)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK.

Ashley Peery (A)

Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.

Adam M Phillippy (AM)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA.

Maria V Sharakhova (MV)

Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.
Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050.

Eric Tannier (E)

Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, Unité Mixte de Recherche 5558 Centre National de la Recherche Scientifique, 69622, Villeurbanne, France.
Institut national de recherche en informatique et en automatique, Montbonnot, 38334, Grenoble, Rhône-Alpes, France.

Maria F Unger (MF)

Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA.

Simo V Zhang (SV)

Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA.

Max A Alekseyev (MA)

Department of Mathematics and Computational Biology Institute, George Washington University, Ashburn, VA, 20147, USA.

Nora J Besansky (NJ)

Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA.

Cedric Chauve (C)

Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada.

Scott J Emrich (SJ)

Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, 37996, USA.

Igor V Sharakhov (IV)

The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. igor@vt.edu.
Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. igor@vt.edu.
Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050. igor@vt.edu.

Articles similaires

Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell
Animals TOR Serine-Threonine Kinases Colorectal Neoplasms Colitis Mice
Animals Tail Swine Behavior, Animal Animal Husbandry

Classifications MeSH