Assembly of a pan-genome from deep sequencing of 910 humans of African descent.


Journal

Nature genetics
ISSN: 1546-1718
Titre abrégé: Nat Genet
Pays: United States
ID NLM: 9216904

Informations de publication

Date de publication:
01 2019
Historique:
received: 20 11 2017
accepted: 08 10 2018
pubmed: 21 11 2018
medline: 25 4 2019
entrez: 21 11 2018
Statut: ppublish

Résumé

We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.

Identifiants

pubmed: 30455414
doi: 10.1038/s41588-018-0273-y
pii: 10.1038/s41588-018-0273-y
pmc: PMC6309586
mid: NIHMS1509230
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

30-35

Subventions

Organisme : NHGRI NIH HHS
ID : R01 HG006677
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL129239
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM115428
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL104608
Pays : United States
Organisme : NIAID NIH HHS
ID : R01 AI132476
Pays : United States

Commentaires et corrections

Type : ErratumIn
Type : CommentIn

Références

International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
doi: 10.1038/35057062
Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
doi: 10.1126/science.1058040
Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).
doi: 10.1101/gr.213611.116
Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
doi: 10.1126/science.1188021
E pluribus unum. Nat Methods 7, 331 (2010).
Need, A. C. & Goldstein, D. B. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 25, 489–494 (2009).
doi: 10.1016/j.tig.2009.09.012
Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).
doi: 10.1038/538161a
Church, D. M. et al. Extending reference assembly models. Genome. Biol. 16, 13 (2015).
doi: 10.1186/s13059-015-0587-3
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
doi: 10.1093/nar/29.1.308
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
doi: 10.1038/nature15393
Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).
doi: 10.1038/nbt.1596
Seo, J. S. et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016).
doi: 10.1038/nature20098
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).
doi: 10.1038/sdata.2016.25
Shi, L. et al. Long-read sequencing and de novo assembly of a Chinese genome. Nat. Commun. 7, 12065 (2016).
doi: 10.1038/ncomms12065
Cho, Y. S. et al. An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes. Nat. Commun. 7, 13637 (2016).
doi: 10.1038/ncomms13637
Kehr, B., Melsted, P. & Halldorsson, B. V. PopIns: population-scale detection of novel sequence insertions. Bioinformatics 32, 961–967 (2016).
doi: 10.1093/bioinformatics/btv273
Maretty, L. et al. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. Nature 548, 87–91 (2017).
doi: 10.1038/nature23264
Hehir-Kwa, J. Y. et al. A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Nat. Commun. 7, 12989 (2016).
doi: 10.1038/ncomms12989
Kehr, B. et al. Diversity in non-repetitive human sequences not found in the reference genome. Nat. Genet. 49, 588–593 (2017).
doi: 10.1038/ng.3801
Gordienko, E. N., Kazanov, M. D. & Gelfand, M. S. Evolution of pan-genomes of Escherichia coli, Shigella spp., and Salmonella enterica. J. Bacteriol. 195, 2786–2792 (2013).
doi: 10.1128/JB.02285-12
Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc. Natl Acad. Sci. USA 102, 13950–13955 (2005).
doi: 10.1073/pnas.0506758102
Vernikos, G., Medini, D., Riley, D. R. & Tettelin, H. Ten years of pan-genome analyses. Curr. Opin. Microbiol. 23, 148–154 (2015).
doi: 10.1016/j.mib.2014.11.016
Telenti, A. et al. Deep sequencing of 10,000 human genomes. Proc. Natl Acad. Sci. USA 113, 11901–11906 (2016).
doi: 10.1073/pnas.1613365113
Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 27, 677–685 (2017).
doi: 10.1101/gr.214007.116
Mathias, R. A. et al. A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nat. Commun. 7, 12522 (2016).
doi: 10.1038/ncomms12522
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
doi: 10.1186/1471-2105-10-421
Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).
doi: 10.1038/nature18964
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
doi: 10.1093/bioinformatics/btp324
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
doi: 10.1038/nmeth.1923
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
doi: 10.1093/bioinformatics/btp352
Zimin, A. V. et al. The MaSuRCA genome assembler. Bioinformatics 29, 2669–2677 (2013).
doi: 10.1093/bioinformatics/btt476
Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).
doi: 10.1101/gr.210641.116
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics Chapter 4, Unit 4.10, (2009).
Delcher, A. L., Salzberg, S. L. & Phillippy, A. M. Using MUMmer to identify similar regions in large sequence sets. Curr. Protoc. Bioinformatics Chapter 10, Unit 10.13, (2003).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
doi: 10.1093/bioinformatics/btq033
Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome. Biol. 15, R46 (2014).
doi: 10.1186/gb-2014-15-3-r46

Auteurs

Rachel M Sherman (RM)

Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA. rsherman@jhu.edu.
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. rsherman@jhu.edu.

Juliet Forman (J)

Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
Departments of Computer Science, Biology, and Mathematics, Harvey Mudd College, Claremont, CA, USA.

Valentin Antonescu (V)

Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.

Daniela Puiu (D)

Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.

Michelle Daya (M)

Department of Medicine, University of Colorado Denver, Aurora, CO, USA.

Nicholas Rafaels (N)

Department of Medicine, University of Colorado Denver, Aurora, CO, USA.

Meher Preethi Boorgula (MP)

Department of Medicine, University of Colorado Denver, Aurora, CO, USA.

Sameer Chavan (S)

Department of Medicine, University of Colorado Denver, Aurora, CO, USA.

Candelaria Vergara (C)

Department of Medicine, Johns Hopkins University, Baltimore, MD, USA.

Victor E Ortega (VE)

Department of Internal Medicine, Section on Pulmonary, Critical Care, Allergy and Immunologic Diseases, Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA.

Albert M Levin (AM)

Department of Public Health Sciences, Henry Ford Health System, Detroit, MI, USA.

Celeste Eng (C)

Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.

Maria Yazdanbakhsh (M)

Department of Parasitology, Leiden University Medical Center, Leiden, The Netherlands.

James G Wilson (JG)

Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA.

Javier Marrugo (J)

Institute for Immunological Research, Universidad de Cartagena, Cartagena, Colombia.

Leslie A Lange (LA)

Department of Medicine, University of Colorado Denver, Aurora, CO, USA.

L Keoki Williams (LK)

Department of Internal Medicine, Henry Ford Health System, Detroit, MI, USA.

Harold Watson (H)

Faculty of Medical Sciences Cave Hill Campus, The University of the West Indies, Bridgetown, Barbados.

Lorraine B Ware (LB)

Department of Medicine, Vanderbilt University, Nashville, TN, USA.

Christopher O Olopade (CO)

Department of Medicine and Center for Global Health, University of Chicago, Chicago, IL, USA.

Olufunmilayo Olopade (O)

Department of Medicine, University of Chicago, Chicago, IL, USA.

Ricardo R Oliveira (RR)

Laboratório de Patologia Experimental, Centro de Pesquisas Gonçalo Moniz, Salvador, Brazil.

Carole Ober (C)

Department of Human Genetics, University of Chicago, Chicago, IL, USA.

Dan L Nicolae (DL)

Department of Medicine, University of Chicago, Chicago, IL, USA.

Deborah A Meyers (DA)

Department of Medicine, University of Arizona College of Medicine, Tucson, AZ, USA.

Alvaro Mayorga (A)

Centro de Neumologia y Alergias, San Pedro Sula, Honduras.

Jennifer Knight-Madden (J)

Caribbean Institute for Health Research, The University of the West Indies, Kingston, Jamaica.

Tina Hartert (T)

Department of Medicine, Vanderbilt University, Nashville, TN, USA.

Nadia N Hansel (NN)

Department of Medicine, Johns Hopkins University, Baltimore, MD, USA.

Marilyn G Foreman (MG)

Pulmonary and Critical Care Medicine, Morehouse School of Medicine, Atlanta, GA, USA.

Jean G Ford (JG)

Department of Medicine, Einstein Medical Center, Philadelphia, PA, USA.

Mezbah U Faruque (MU)

National Human Genome Center, Howard University College of Medicine, Washington, DC, USA.

Georgia M Dunston (GM)

Department of Microbiology, Howard University College of Medicine, Washington, DC, USA.

Luis Caraballo (L)

Institute for Immunological Research, Universidad de Cartagena, Cartagena, Colombia.

Esteban G Burchard (EG)

Departments of Bioengineering & Therapeutic Sciences and Medicine, University of California, San Francisco, San Francisco, CA, USA.

Eugene R Bleecker (ER)

Department of Medicine, University of Arizona College of Medicine, Tucson, AZ, USA.

Maria I Araujo (MI)

Immunology Service, Universidade Federal da Bahia, Salvador, Brazil.

Edwin F Herrera-Paz (EF)

Facultad de Ciencias Médicas, Universidad Tecnológica Centroamericana (UNITEC), Tegucigalpa, Honduras.

Monica Campbell (M)

Department of Medicine, University of Colorado Denver, Aurora, CO, USA.

Cassandra Foster (C)

Department of Medicine, Johns Hopkins University, Baltimore, MD, USA.

Margaret A Taub (MA)

Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.

Terri H Beaty (TH)

Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.

Ingo Ruczinski (I)

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.

Rasika A Mathias (RA)

Department of Medicine, Johns Hopkins University, Baltimore, MD, USA.
Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.

Kathleen C Barnes (KC)

Department of Medicine, University of Colorado Denver, Aurora, CO, USA.

Steven L Salzberg (SL)

Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA. salzberg@jhu.edu.
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. salzberg@jhu.edu.
Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA. salzberg@jhu.edu.
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA. salzberg@jhu.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH