Pangenomics enables genotyping of known structural variants in 5202 diverse genomes.


Journal

Science (New York, N.Y.)
ISSN: 1095-9203
Titre abrégé: Science
Pays: United States
ID NLM: 0404511

Informations de publication

Date de publication:
17 Dec 2021
Historique:
entrez: 16 12 2021
pubmed: 17 12 2021
medline: 31 12 2021
Statut: ppublish

Résumé

We introduce Giraffe, a pangenome short-read mapper that can efficiently map to a collection of haplotypes threaded through a sequence graph. Giraffe maps sequencing reads to thousands of human genomes at a speed comparable to that of standard methods mapping to a single reference genome. The increased mapping accuracy enables downstream improvements in genome-wide genotyping pipelines for both small variants and larger structural variants. We used Giraffe to genotype 167,000 structural variants, discovered in long-read studies, in 5202 diverse human genomes that were sequenced using short reads. We conclude that pangenomics facilitates a more comprehensive characterization of variation and, as a result, has the potential to improve many genomic analyses.

Identifiants

pubmed: 34914532
doi: 10.1126/science.abg8871
pmc: PMC9365333
mid: NIHMS1824904
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

abg8871

Subventions

Organisme : NHGRI NIH HHS
ID : U01 HG010961
Pays : United States
Organisme : NHLBI NIH HHS
ID : U01 HL137183
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG010972
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG007234
Pays : United States
Organisme : NHLBI NIH HHS
ID : N02 HL64278
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010485
Pays : United States
Organisme : NIH HHS
ID : OT2 OD026682
Pays : United States
Organisme : NHLBI NIH HHS
ID : OT3 HL142481
Pays : United States

Références

Bioinformatics. 2015 Oct 15;31(20):3350-2
pubmed: 26099265
Nat Genet. 2019 Feb;51(2):354-362
pubmed: 30643257
Nat Rev Genet. 2004 Jun;5(6):467-75
pubmed: 15153999
Nat Biotechnol. 2018 Nov;36(10):983-987
pubmed: 30247488
Bioinformatics. 2016 Jul 15;32(14):2103-10
pubmed: 27153593
Neurology. 2017 Aug 15;89(7):722-729
pubmed: 28724583
Nature. 2013 Sep 26;501(7468):506-11
pubmed: 24037378
Genome Biol. 2019 Nov 20;20(1):246
pubmed: 31747936
Genome Biol. 2020 Oct 16;21(1):265
pubmed: 33066802
Genome Biol. 2019 Aug 9;20(1):159
pubmed: 31399121
Genome Res. 2011 Jun;21(6):974-84
pubmed: 21324876
PLoS One. 2013 Dec 04;8(12):e82138
pubmed: 24324759
Genome Biol. 2019 Dec 19;20(1):291
pubmed: 31856913
Bioinformatics. 2019 Dec 15;35(24):5318-5320
pubmed: 31368484
Nat Biotechnol. 2020 Nov;38(11):1347-1355
pubmed: 32541955
Genome Biol. 2018 Dec 17;19(1):220
pubmed: 30558649
Nature. 2015 Oct 1;526(7571):75-81
pubmed: 26432246
Cell. 2019 Jan 24;176(3):663-675.e19
pubmed: 30661756
Genome Biol. 2020 Sep 24;21(1):253
pubmed: 32972461
J Comput Biol. 2018 Jul;25(7):649-663
pubmed: 29461862
Brief Bioinform. 2018 Jan 1;19(1):118-135
pubmed: 27769991
Bioinformatics. 2019 Sep 1;35(17):2966-2973
pubmed: 30649250
Bioinformatics. 2012 May 15;28(10):1353-8
pubmed: 22492648
Curr Protoc Bioinformatics. 2020 Jun;70(1):e102
pubmed: 32559359
Nat Biotechnol. 2019 Aug;37(8):907-915
pubmed: 31375807
Genome Biol. 2020 Jul 27;21(1):184
pubmed: 32718320
Nucleic Acids Res. 2018 Jan 4;46(D1):D794-D801
pubmed: 29126249
Annu Rev Genomics Hum Genet. 2020 Aug 31;21:139-162
pubmed: 32453966
Bioinformatics. 2020 Jan 15;36(2):400-407
pubmed: 31406990
Nat Commun. 2019 Nov 27;10(1):5402
pubmed: 31776332
Am J Epidemiol. 2002 Nov 1;156(9):871-81
pubmed: 12397006
Nat Biotechnol. 2018 Oct;36(9):875-879
pubmed: 30125266
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Genome Biol. 2020 Feb 12;21(1):35
pubmed: 32051000
PLoS Biol. 2011 Jul;9(7):e1001091
pubmed: 21750661
Genome Res. 1998 Mar;8(3):186-94
pubmed: 9521922
Bioinformatics. 2011 Nov 1;27(21):2987-93
pubmed: 21903627
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Proc Natl Acad Sci U S A. 2017 Sep 12;114(37):E7841-E7850
pubmed: 28851834
Nat Biotechnol. 2019 May;37(5):555-560
pubmed: 30858580
Bioinformatics. 2019 Jul 15;35(14):i81-i89
pubmed: 31510650
Bioinformatics. 2020 Jul 1;36(Suppl_1):i146-i153
pubmed: 32657356
Bioinformatics. 2017 Dec 15;33(24):4015-4023
pubmed: 28169394
Nat Genet. 2017 May;49(5):692-699
pubmed: 28369037
Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264-8
pubmed: 2315319
Science. 2021 Dec 17;374(6574):abg8871
pubmed: 34914532
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Nucleic Acids Res. 2016 Jan 4;44(D1):D726-32
pubmed: 26527727
Nat Rev Genet. 2020 Apr;21(4):243-254
pubmed: 32034321
BMC Bioinformatics. 2016 Mar 11;17:125
pubmed: 26968756
Bioinformatics. 2004 Dec 12;20(18):3363-9
pubmed: 15256412

Auteurs

Jouni Sirén (J)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Jean Monlong (J)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Xian Chang (X)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Adam M Novak (AM)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Jordan M Eizenga (JM)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Charles Markello (C)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Jonas A Sibbesen (JA)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Glenn Hickey (G)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Pi-Chuan Chang (PC)

Google Inc., Mountain View, CA, USA.

Andrew Carroll (A)

Google Inc., Mountain View, CA, USA.

Namrata Gupta (N)

Genomics Platform, Broad Institute, Cambridge, MA, USA.

Stacey Gabriel (S)

Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA.

Thomas W Blackwell (TW)

Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.

Aakrosh Ratan (A)

Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA.

Kent D Taylor (KD)

The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA.

Stephen S Rich (SS)

Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA.

Jerome I Rotter (JI)

The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA.

David Haussler (D)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.
Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA.

Erik Garrison (E)

Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.

Benedict Paten (B)

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH