Improved high quality sand fly assemblies enabled by ultra low input long read sequencing.


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
24 Aug 2024
Historique:
received: 29 02 2024
accepted: 09 07 2024
medline: 26 8 2024
pubmed: 26 8 2024
entrez: 24 8 2024
Statut: epublish

Résumé

Phlebotomine sand flies are the vectors of leishmaniasis, a neglected tropical disease. High-quality reference genomes are an important tool for understanding the biology and eco-evolutionary dynamics underpinning disease epidemiology. Previous leishmaniasis vector reference sequences were limited by sequencing technologies available at the time and inadequate for high-resolution genomic inquiry. Here, we present updated reference assemblies of two sand flies, Phlebotomus papatasi and Lutzomyia longipalpis. These chromosome-level assemblies were generated using an ultra-low input library protocol, PacBio HiFi long reads, and Hi-C technology. The new P. papatasi reference has a final assembly span of 351.6 Mb and contig and scaffold N50s of 926 kb and 111.8 Mb, respectively. The new Lu. longipalpis reference has a final assembly span of 147.8 Mb and contig and scaffold N50s of 1.09 Mb and 40.6 Mb, respectively. Benchmarking Universal Single-Copy Orthologue (BUSCO) assessments indicated 94.5% and 95.6% complete single copy insecta orthologs for P. papatasi and Lu. longipalpis. These improved assemblies will serve as an invaluable resource for future genomic work on phlebotomine sandflies.

Identifiants

pubmed: 39181902
doi: 10.1038/s41597-024-03628-y
pii: 10.1038/s41597-024-03628-y
doi:

Types de publication

Journal Article Dataset

Langues

eng

Sous-ensembles de citation

IM

Pagination

918

Subventions

Organisme : U.S. Department of Health & Human Services | NIH | National Institute of Allergy and Infectious Diseases (NIAID)
ID : 5R03AI153899-02
Organisme : U.S. Department of Health & Human Services | NIH | National Institute of Allergy and Infectious Diseases (NIAID)
ID : 5R03AI153899-02

Informations de copyright

© 2024. The Author(s).

Références

World Health Organization. Leishmaniasis Factsheet, https://www.who.int/news-room/fact-sheets/detail/leishmaniasis (2023).
Cecilio, P., Cordeiro-da-Silva, A. & Oliveira, F. Sand flies: Basic information on the vectors of leishmaniasis and their interactions with Leishmania parasites. Commun Biol 5, 305, https://doi.org/10.1038/s42003-022-03240-z (2022).
doi: 10.1038/s42003-022-03240-z pubmed: 35379881 pmcid: 8979968
Flanley, C. M. et al. Population genetics analysis of Phlebotomus papatasi sand flies from Egypt and Jordan based on mitochondrial cytochrome b haplotypes. Parasites & vectors 11, 214, https://doi.org/10.1186/s13071-018-2785-9 (2018).
doi: 10.1186/s13071-018-2785-9
Maroli, M., Feliciangeli, M. D., Bichaud, L., Charrel, R. N. & Gradoni, L. Phlebotomine sandflies and the spreading of leishmaniases and other diseases of public health concern. Medical and veterinary entomology 27, 123–147, https://doi.org/10.1111/j.1365-2915.2012.01034.x (2013).
doi: 10.1111/j.1365-2915.2012.01034.x pubmed: 22924419
Dobson, D. E. et al. Leishmania major survival in selective Phlebotomus papatasi sand fly vector requires a specific SCG-encoded lipophosphoglycan galactosylation pattern. PLoS Pathog 6, e1001185, https://doi.org/10.1371/journal.ppat.1001185 (2010).
doi: 10.1371/journal.ppat.1001185 pubmed: 21085609 pmcid: 2978724
Ministério da Saúde Brazil Secretaria de Vigilância em Saúde Departamento de Vigilância Epidemiológica. Manual de Vigilância e Controle da Leishmaniose Visceral. First edn, (Ministério da Saúde. Brasília, 2014).
Cecilio, P. et al. Exploring Lutzomyia longipalpis Sand Fly Vector Competence for Leishmania major Parasites. J Infect Dis 222, 1199–1203, https://doi.org/10.1093/infdis/jiaa203 (2020).
doi: 10.1093/infdis/jiaa203 pubmed: 32328656 pmcid: 7459136
Casaril, A. E. et al. Macrogeographic genetic structure of Lutzomyia longipalpis complex populations using Next Generation Sequencing. PloS one 14, e0223277, https://doi.org/10.1371/journal.pone.0223277 (2019).
doi: 10.1371/journal.pone.0223277 pubmed: 31581227 pmcid: 6776309
Rinker, D. C., Pitts, R. J. & Zwiebel, L. J. Disease vectors in the era of next generation sequencing. Genome Biol 17, 95, https://doi.org/10.1186/s13059-016-0966-4 (2016).
doi: 10.1186/s13059-016-0966-4 pubmed: 27154554 pmcid: 4858832
Labbé, F. et al. Genomic analysis of two phlebotomine sand fly vectors of leishmania from the new and old World. PLoS neglected tropical diseases 17, e0010862, https://doi.org/10.1371/journal.pntd.0010862 (2023).
doi: 10.1371/journal.pntd.0010862 pubmed: 37043542 pmcid: 10138862
Giraldo-Calderon, G. I. et al. VectorBase.org updates: bioinformatic resources for invertebrate vectors of human pathogens and related organisms. Curr Opin Insect Sci 50, 100860, https://doi.org/10.1016/j.cois.2021.11.008 (2022).
doi: 10.1016/j.cois.2021.11.008 pubmed: 34864248
Pacific Biosciences Inc. Procedure Checklist Preparing HiFi SMRTbell Libraries from Ultra Low DNA Input, https://www.pacb.com/wp-content/uploads/Procedure-Checklist-Preparing-HiFi-SMRTbell-Libraries-from-Ultra-Low-DNA-Input-.pdf (2021).
NCBI. The NCBI Eukaryotic Genome Annotation Pipeline https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/#naming (Accessed Jan 27th 2024).
Davison, H. Transfer-annotations, https://github.com/VEuPathDB/liftoff-transfer-annotations (2023).
Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643, https://doi.org/10.1093/bioinformatics/btaa1016 (2021).
doi: 10.1093/bioinformatics/btaa1016 pubmed: 33320174 pmcid: 8289374
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
doi: 10.1038/s41592-020-01056-5 pubmed: 33526886 pmcid: 7961889
Chin, C. S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nature methods 13, 1050–1054, https://doi.org/10.1038/nmeth.4035 (2016).
doi: 10.1038/nmeth.4035 pubmed: 27749838 pmcid: 5503144
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680, https://doi.org/10.1016/j.cell.2014.11.021 (2014).
doi: 10.1016/j.cell.2014.11.021 pubmed: 25497547 pmcid: 5635824
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).
doi: 10.1016/j.cels.2016.07.002 pubmed: 27467249 pmcid: 5846465
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95, https://doi.org/10.1126/science.aal3327 (2017).
doi: 10.1126/science.aal3327 pubmed: 28336562 pmcid: 5635820
Ko, B. J. et al. Widespread false gene gains caused by duplication errors in genome assemblies. Genome Biol 23, 205, https://doi.org/10.1186/s13059-022-02764-1 (2022).
doi: 10.1186/s13059-022-02764-1 pubmed: 36167596 pmcid: 9516828
Matthews, B. J. et al. Improved reference genome of Aedes aegypti informs arbovirus vector control. Nature 563, 501–507, https://doi.org/10.1038/s41586-018-0692-z (2018).
doi: 10.1038/s41586-018-0692-z pubmed: 30429615 pmcid: 6421076
Dudchenko, O. et al. The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. BioRxiv, 254797 (2018).
Robinson, J. T. et al. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst 6, 256–258 e251, https://doi.org/10.1016/j.cels.2018.01.001 (2018).
doi: 10.1016/j.cels.2018.01.001 pubmed: 29428417 pmcid: 6047755
Aiden Lab. DNA Zoo: New World sand fly (Lutzomyia longipalpis), https://www.dnazoo.org/assemblies/lutzomyia_longipalpis (2023).
Aiden Lab. DNA Zoo, Old World sand fly (Phlebotomus papatasi), https://www.dnazoo.org/assemblies/phlebotomus_papatasi (2023).
Dainat, J. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format. (Version v0.7.0). (2023).
NCBI Sequence Read Archive Accession Number SRX16150135 Lutzomyia longipalpis PacBio HiFi long reads https://identifiers.org/ncbi/insdc.sra:SRX16150135 (2023).
NCBI Genome Database Accession Number GCA_024334085.1 Lutzomyia longipalpis genome assembly https://identifiers.org/ncbi/insdc.gca:GCA_024334085.1 (2023).
NCBI BioProject Database Accession Number PRJNA849274 Lutzomyia longipalpis genome reference bioproject https://identifiers.org/bioproject:PRJNA849274 (2023).
NCBI Sequence Read Archive Accession Number SRX18440490 Hi-C of Lutzomyia longipalpis DNA Zoo Sample4557 https://identifiers.org/ncbi/insdc.sra:SRX18440490 (2023).
NCBI BioProject Database Accession Number PRJNA512907 DNA Zoo BioProject https://identifiers.org/bioproject:PRJNA512907 (2023).
NCBI Sequence Read Archive Accession SRX8948934 Phlebotomus papatasi PacBio HiFi long reads https://identifiers.org/ncbi/insdc.sra:SRX8948934 (2023).
NCBI Genome Database Accession Number GCA_024763615.2 Phlebotomus papatasi genome assembly https://identifiers.org/ncbi/insdc.gca:GCA_024763615.2 (2023).
NCBI BioProject Database Acession Number PRJNA657245 PacBio HiFi data from human, Drosophila, and sandfly for Ultra-Low DNA Input Libraries https://identifiers.org/bioproject:PRJNA657245 (2023).
NCBI BioProject Accession Number PRJNA858452 Phlebotomus papatasi Genome Reference BioProject https://identifiers.org/bioproject:PRJNA858452 (2023).
NCBI Sequence Read Archive Accession Number SRX18440491 Hi-C of Phlebotomus papatasi DNA Zoo Sample4550 https://identifiers.org/ncbi/insdc.sra:SRX18440491 (2023).
Lawniczak, M. K. N. et al. Standards recommendations for the Earth BioGenome Project. Proceedings of the National Academy of Sciences 119, e2115639118, https://doi.org/10.1073/pnas.2115639118 (2022).
doi: 10.1073/pnas.2115639118
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
doi: 10.1093/bioinformatics/btv351 pubmed: 26059717
Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic acids research 51, D445–D451, https://doi.org/10.1093/nar/gkac998 (2023).
doi: 10.1093/nar/gkac998 pubmed: 36350662
Kumar, S. et al. TimeTree 5: An Expanded Resource for Species Divergence Times. Molecular biology and evolution 39, https://doi.org/10.1093/molbev/msac174 (2022).
Vigoder, F. M., Araripe, L. O. & Carvalho, A. B. Identification of the sex chromosome system in a sand fly species, Lutzomyia longipalpis s.l. G3 (Bethesda) 11, https://doi.org/10.1093/g3journal/jkab217 (2021).
Laetsch, D. & Blaxter, M. BlobTools: Interrogation of genome assemblies [version 1; peer review: 2 approved with reservations]. F1000Research 6, https://doi.org/10.12688/f1000research.12232.1 (2017).
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst 3, 99–101, https://doi.org/10.1016/j.cels.2015.07.012 (2016).
doi: 10.1016/j.cels.2015.07.012 pubmed: 27467250 pmcid: 5596920

Auteurs

Michelle Huang (M)

Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA.

Sarah Kingan (S)

Pacific Biosciences, Menlo Park, CA, USA.

Douglas Shoue (D)

Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA.

Oanh Nguyen (O)

DNA Technologies and Expression Analysis Cores, UC Davis Genome Center, University of California, Davis, Davis, CA, USA.

Lutz Froenicke (L)

DNA Technologies and Expression Analysis Cores, UC Davis Genome Center, University of California, Davis, Davis, CA, USA.

Brendan Galvin (B)

Pacific Biosciences, Menlo Park, CA, USA.

Christine Lambert (C)

Pacific Biosciences, Menlo Park, CA, USA.

Ruqayya Khan (R)

The Center for Genome Architecture, Baylor College of Medicine, Houston, TX, 77030, USA.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.

Chirag Maheshwari (C)

The Center for Genome Architecture, Baylor College of Medicine, Houston, TX, 77030, USA.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.

David Weisz (D)

The Center for Genome Architecture, Baylor College of Medicine, Houston, TX, 77030, USA.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.

Gareth Maslen (G)

Department of Life Sciences, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK.

Helen Davison (H)

Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK.

Erez Lieberman Aiden (EL)

The Center for Genome Architecture, Baylor College of Medicine, Houston, TX, 77030, USA.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
Center for Theoretical and Biological Physics, Rice University, Houston, TX, 77030, USA.
Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, 02139, USA.

Jonas Korlach (J)

Pacific Biosciences, Menlo Park, CA, USA.

Olga Dudchenko (O)

The Center for Genome Architecture, Baylor College of Medicine, Houston, TX, 77030, USA.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
Center for Theoretical and Biological Physics, Rice University, Houston, TX, 77030, USA.

Mary Ann McDowell (MA)

Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA. mcdowell.11@nd.edu.
Eck Institute for Global Health, University of Notre dame, Notre Dame, IN, USA. mcdowell.11@nd.edu.

Stephen Richards (S)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA. stephenr@bcm.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell
Animals TOR Serine-Threonine Kinases Colorectal Neoplasms Colitis Mice

Classifications MeSH