Pangenomics enables genotyping of known structural variants in 5202 diverse genomes.
Algorithms
Alleles
Computational Biology
Genetic Variation
Genome, Fungal
Genome, Human
Genomics
/ methods
Genotype
Genotyping Techniques
Haplotypes
High-Throughput Nucleotide Sequencing
Humans
Polymorphism, Single Nucleotide
Quantitative Trait Loci
Saccharomyces
/ genetics
Saccharomyces cerevisiae
/ genetics
Sequence Analysis, DNA
Journal
Science (New York, N.Y.)
ISSN: 1095-9203
Titre abrégé: Science
Pays: United States
ID NLM: 0404511
Informations de publication
Date de publication:
17 Dec 2021
17 Dec 2021
Historique:
entrez:
16
12
2021
pubmed:
17
12
2021
medline:
31
12
2021
Statut:
ppublish
Résumé
We introduce Giraffe, a pangenome short-read mapper that can efficiently map to a collection of haplotypes threaded through a sequence graph. Giraffe maps sequencing reads to thousands of human genomes at a speed comparable to that of standard methods mapping to a single reference genome. The increased mapping accuracy enables downstream improvements in genome-wide genotyping pipelines for both small variants and larger structural variants. We used Giraffe to genotype 167,000 structural variants, discovered in long-read studies, in 5202 diverse human genomes that were sequenced using short reads. We conclude that pangenomics facilitates a more comprehensive characterization of variation and, as a result, has the potential to improve many genomic analyses.
Identifiants
pubmed: 34914532
doi: 10.1126/science.abg8871
pmc: PMC9365333
mid: NIHMS1824904
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
abg8871Subventions
Organisme : NHGRI NIH HHS
ID : U01 HG010961
Pays : United States
Organisme : NHLBI NIH HHS
ID : U01 HL137183
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG010972
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG007234
Pays : United States
Organisme : NHLBI NIH HHS
ID : N02 HL64278
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010485
Pays : United States
Organisme : NIH HHS
ID : OT2 OD026682
Pays : United States
Organisme : NHLBI NIH HHS
ID : OT3 HL142481
Pays : United States
Références
Bioinformatics. 2015 Oct 15;31(20):3350-2
pubmed: 26099265
Nat Genet. 2019 Feb;51(2):354-362
pubmed: 30643257
Nat Rev Genet. 2004 Jun;5(6):467-75
pubmed: 15153999
Nat Biotechnol. 2018 Nov;36(10):983-987
pubmed: 30247488
Bioinformatics. 2016 Jul 15;32(14):2103-10
pubmed: 27153593
Neurology. 2017 Aug 15;89(7):722-729
pubmed: 28724583
Nature. 2013 Sep 26;501(7468):506-11
pubmed: 24037378
Genome Biol. 2019 Nov 20;20(1):246
pubmed: 31747936
Genome Biol. 2020 Oct 16;21(1):265
pubmed: 33066802
Genome Biol. 2019 Aug 9;20(1):159
pubmed: 31399121
Genome Res. 2011 Jun;21(6):974-84
pubmed: 21324876
PLoS One. 2013 Dec 04;8(12):e82138
pubmed: 24324759
Genome Biol. 2019 Dec 19;20(1):291
pubmed: 31856913
Bioinformatics. 2019 Dec 15;35(24):5318-5320
pubmed: 31368484
Nat Biotechnol. 2020 Nov;38(11):1347-1355
pubmed: 32541955
Genome Biol. 2018 Dec 17;19(1):220
pubmed: 30558649
Nature. 2015 Oct 1;526(7571):75-81
pubmed: 26432246
Cell. 2019 Jan 24;176(3):663-675.e19
pubmed: 30661756
Genome Biol. 2020 Sep 24;21(1):253
pubmed: 32972461
J Comput Biol. 2018 Jul;25(7):649-663
pubmed: 29461862
Brief Bioinform. 2018 Jan 1;19(1):118-135
pubmed: 27769991
Bioinformatics. 2019 Sep 1;35(17):2966-2973
pubmed: 30649250
Bioinformatics. 2012 May 15;28(10):1353-8
pubmed: 22492648
Curr Protoc Bioinformatics. 2020 Jun;70(1):e102
pubmed: 32559359
Nat Biotechnol. 2019 Aug;37(8):907-915
pubmed: 31375807
Genome Biol. 2020 Jul 27;21(1):184
pubmed: 32718320
Nucleic Acids Res. 2018 Jan 4;46(D1):D794-D801
pubmed: 29126249
Annu Rev Genomics Hum Genet. 2020 Aug 31;21:139-162
pubmed: 32453966
Bioinformatics. 2020 Jan 15;36(2):400-407
pubmed: 31406990
Nat Commun. 2019 Nov 27;10(1):5402
pubmed: 31776332
Am J Epidemiol. 2002 Nov 1;156(9):871-81
pubmed: 12397006
Nat Biotechnol. 2018 Oct;36(9):875-879
pubmed: 30125266
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Genome Biol. 2020 Feb 12;21(1):35
pubmed: 32051000
PLoS Biol. 2011 Jul;9(7):e1001091
pubmed: 21750661
Genome Res. 1998 Mar;8(3):186-94
pubmed: 9521922
Bioinformatics. 2011 Nov 1;27(21):2987-93
pubmed: 21903627
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Proc Natl Acad Sci U S A. 2017 Sep 12;114(37):E7841-E7850
pubmed: 28851834
Nat Biotechnol. 2019 May;37(5):555-560
pubmed: 30858580
Bioinformatics. 2019 Jul 15;35(14):i81-i89
pubmed: 31510650
Bioinformatics. 2020 Jul 1;36(Suppl_1):i146-i153
pubmed: 32657356
Bioinformatics. 2017 Dec 15;33(24):4015-4023
pubmed: 28169394
Nat Genet. 2017 May;49(5):692-699
pubmed: 28369037
Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264-8
pubmed: 2315319
Science. 2021 Dec 17;374(6574):abg8871
pubmed: 34914532
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Nucleic Acids Res. 2016 Jan 4;44(D1):D726-32
pubmed: 26527727
Nat Rev Genet. 2020 Apr;21(4):243-254
pubmed: 32034321
BMC Bioinformatics. 2016 Mar 11;17:125
pubmed: 26968756
Bioinformatics. 2004 Dec 12;20(18):3363-9
pubmed: 15256412