Haplotype-resolved diverse human genomes and integrated analysis of structural variation.
Journal
Science (New York, N.Y.)
ISSN: 1095-9203
Titre abrégé: Science
Pays: United States
ID NLM: 0404511
Informations de publication
Date de publication:
02 04 2021
02 04 2021
Historique:
received:
13
11
2020
accepted:
09
02
2021
pubmed:
27
2
2021
medline:
10
4
2021
entrez:
26
2
2021
Statut:
ppublish
Résumé
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Identifiants
pubmed: 33632895
pii: science.abf7117
doi: 10.1126/science.abf7117
pmc: PMC8026704
mid: NIHMS1680320
pii:
doi:
Substances chimiques
Retroelements
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : NIGMS NIH HHS
ID : R35 GM138212
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG010973
Pays : United States
Organisme : NCI NIH HHS
ID : P30 CA034196
Pays : United States
Organisme : NHGRI NIH HHS
ID : UM1 HG008901
Pays : United States
Organisme : NICHD NIH HHS
ID : R01 HD081256
Pays : United States
Organisme : Wellcome Trust
Pays : United Kingdom
Organisme : NHGRI NIH HHS
ID : U24 HG007497
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG002385
Pays : United States
Organisme : NHGRI NIH HHS
ID : R15 HG009565
Pays : United States
Organisme : European Research Council
ID : 773026
Pays : International
Organisme : NIDDK NIH HHS
ID : T32 DK067872
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG002898
Pays : United States
Organisme : NHGRI NIH HHS
ID : K99 HG011041
Pays : United States
Organisme : NIMH NIH HHS
ID : R01 MH115957
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG007068
Pays : United States
Organisme : NHGRI NIH HHS
ID : T32 HG000035
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010169
Pays : United States
Organisme : NHLBI NIH HHS
ID : OT3 HL147154
Pays : United States
Commentaires et corrections
Type : CommentIn
Informations de copyright
Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Références
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7
pubmed: 15034147
Am J Hum Genet. 2013 Aug 8;93(2):278-88
pubmed: 23910464
Nat Genet. 2016 May;48(5):481-7
pubmed: 27019110
Viruses. 2017 May 31;9(6):
pubmed: 28561751
Genome Res. 2020 Sep;30(9):1291-1305
pubmed: 32801147
Cell. 2014 Jun 19;157(7):1577-90
pubmed: 24949970
Bioinformatics. 2012 Sep 15;28(18):i333-i339
pubmed: 22962449
J Mol Biol. 2005 Dec 9;354(4):994-1007
pubmed: 16288912
Bioinformatics. 2018 Jul 1;34(13):i142-i150
pubmed: 29949969
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Am J Hum Genet. 2004 Jun;74(6):1111-20
pubmed: 15114531
Genome Biol. 2016 May 31;17(1):115
pubmed: 27246460
Bioinformatics. 2018 Jul 1;34(13):i115-i123
pubmed: 29949971
Bioinformatics. 2020 Dec 21;:
pubmed: 33346817
PLoS Comput Biol. 2015 Dec 01;11(12):e1004572
pubmed: 26625158
Science. 2016 Apr 1;352(6281):aae0344
pubmed: 27034376
Nat Genet. 2019 Sep;51(9):1321-1329
pubmed: 31477933
Science. 2019 Oct 18;366(6463):
pubmed: 31624180
Genome Biol. 2019 Dec 19;20(1):291
pubmed: 31856913
Bioinformatics. 2019 Nov 1;35(22):4782-4787
pubmed: 31218349
Nat Genet. 2010 May;42(5):385-91
pubmed: 20364136
Bioinformatics. 2010 Sep 1;26(17):2204-7
pubmed: 20639541
Science. 2010 Jul 2;329(5987):75-8
pubmed: 20595611
Nat Commun. 2016 Jun 30;7:12065
pubmed: 27356984
Genome Res. 2017 May;27(5):677-685
pubmed: 27895111
Nat Biotechnol. 2020 Nov;38(11):1347-1355
pubmed: 32541955
Nature. 2010 Oct 28;467(7319):1061-73
pubmed: 20981092
Nature. 2015 Oct 1;526(7571):75-81
pubmed: 26432246
Nat Rev Genet. 2016 Apr;17(4):224-38
pubmed: 26924765
Nat Biotechnol. 2020 Mar;38(3):343-354
pubmed: 31873213
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
Mob DNA. 2016 May 06;7:9
pubmed: 27158268
Genome Res. 2002 Sep;12(9):1333-44
pubmed: 12213770
Nucleic Acids Res. 2020 Feb 20;48(3):1146-1163
pubmed: 31853540
Cell. 2019 Jan 24;176(3):663-675.e19
pubmed: 30661756
Brief Bioinform. 2018 Jan 1;19(1):118-135
pubmed: 27769991
Nature. 2014 Mar 20;507(7492):354-7
pubmed: 24476815
Genome Res. 2017 Nov;27(11):1916-1929
pubmed: 28855259
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
Genome Res. 2017 May;27(5):665-676
pubmed: 28360232
Cell. 2010 Jun 25;141(7):1159-70
pubmed: 20602998
Genome Biol. 2014;15(11):509
pubmed: 25398208
Curr Opin Genet Dev. 2012 Jun;22(3):191-203
pubmed: 22406018
Proc Natl Acad Sci U S A. 2006 Nov 21;103(47):17608-13
pubmed: 17101974
Cell. 1996 Nov 29;87(5):905-16
pubmed: 8945517
Gigascience. 2017 Aug 1;6(8):1-9
pubmed: 28873962
Nat Methods. 2012 Nov;9(11):1107-12
pubmed: 23042453
Nucleic Acids Res. 2013 May 1;41(10):e108
pubmed: 23558742
Nat Methods. 2015 Aug;12(8):755-8
pubmed: 26076425
Nat Commun. 2019 Oct 11;10(1):4660
pubmed: 31604920
Nat Biotechnol. 2018 Nov;36(10):983-987
pubmed: 30247488
Bioinformatics. 2016 Feb 15;32(4):587-9
pubmed: 26508757
Microbiol Rev. 1992 Mar;56(1):229-64
pubmed: 1579111
Genome Res. 2003 Mar;13(3):347-57
pubmed: 12618365
Nat Rev Genet. 2009 Oct;10(10):691-703
pubmed: 19763152
Am J Hum Genet. 2003 Dec;73(6):1444-51
pubmed: 14628287
Comp Biochem Physiol B. 1993 Nov;106(3):489-94
pubmed: 8281749
Nat Commun. 2017 Nov 3;8(1):1293
pubmed: 29101320
Bioinformatics. 2019 Nov 1;35(22):4851-4853
pubmed: 31233103
Nat Genet. 2008 Sep;40(9):1076-83
pubmed: 19165922
Science. 2018 Jun 8;360(6393):
pubmed: 29880660
Elife. 2020 Aug 10;9:
pubmed: 32773033
Nat Protoc. 2017 Jun;12(6):1151-1176
pubmed: 28492527
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D493-6
pubmed: 14681465
Genome Res. 2008 Dec;18(12):1875-83
pubmed: 18836035
Science. 2020 Sep 11;369(6509):1318-1330
pubmed: 32913098
Nat Biotechnol. 2010 Jan;28(1):47-55
pubmed: 20037582
Nature. 2020 Sep;585(7823):79-84
pubmed: 32663838
Nucleic Acids Res. 1999 Jan 15;27(2):573-80
pubmed: 9862982
Nature. 2020 Feb;578(7793):82-93
pubmed: 32025007
J Comput Biol. 2015 Jun;22(6):498-509
pubmed: 25658651
J Vis Exp. 2019 Mar 15;(145):
pubmed: 30933081
Annu Rev Genomics Hum Genet. 2020 Aug 31;21:139-162
pubmed: 32453966
Genome Res. 2009 Nov;19(11):1992-2008
pubmed: 19652014
Genome Biol. 2014 Jun 26;15(6):R84
pubmed: 24970577
PLoS Genet. 2014 Apr 17;10(4):e1004234
pubmed: 24743097
Nucleic Acids Res. 2000 Jan 1;28(1):352-5
pubmed: 10592272
Nat Methods. 2019 Jan;16(1):88-94
pubmed: 30559433
Am J Hum Genet. 2017 Apr 6;100(4):635-649
pubmed: 28366442
Ann Hum Genet. 2020 Mar;84(2):125-140
pubmed: 31711268
Nat Genet. 2018 Jan;50(1):151-158
pubmed: 29229983
Nucleic Acids Res. 2012 May;40(9):e69
pubmed: 22302147
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
Genome Res. 2016 Nov;26(11):1575-1587
pubmed: 27472961
Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169
pubmed: 27899622
Bioinformatics. 2016 Apr 15;32(8):1220-2
pubmed: 26647377
Science. 2015 Sep 11;349(6253):aab3761
pubmed: 26249230
Pac Symp Biocomput. 2005;:385-96
pubmed: 15759644
Nature. 2020 Jul;583(7818):699-710
pubmed: 32728249
Nature. 2016 Oct 13;538(7624):201-206
pubmed: 27654912
Am J Hum Genet. 2014 Nov 6;95(5):535-52
pubmed: 25439723
Nature. 2020 May;581(7809):444-451
pubmed: 32461652
Nat Commun. 2018 Oct 2;9(1):4038
pubmed: 30279509
Nat Genet. 2020 Mar;52(3):306-319
pubmed: 32024998
Am J Hum Genet. 2011 Jan 7;88(1):76-82
pubmed: 21167468
Nat Methods. 2010 May;7(5):365-71
pubmed: 20440878
Nat Biotechnol. 2019 Oct;37(10):1155-1162
pubmed: 31406327
Genome Biol. 2020 Sep 17;21(1):249
pubmed: 32943081
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Nature. 2016 Oct 13;538(7624):243-247
pubmed: 27706134
Genome Res. 2018 Aug;28(8):1136-1146
pubmed: 29970450
Genome Res. 2017 May;27(5):849-864
pubmed: 28396521
Genet Epidemiol. 2000;19 Suppl 1:S29-35
pubmed: 11055367
Bioinformatics. 2010 Jan 1;26(1):139-40
pubmed: 19910308
Nat Genet. 2020 Aug;52(8):849-858
pubmed: 32541924
Am J Hum Genet. 2012 Nov 2;91(5):839-48
pubmed: 23103226
Bioinformatics. 2011 Nov 1;27(21):2987-93
pubmed: 21903627
BMC Genomics. 2020 Mar 14;21(1):230
pubmed: 32171249
Nucleic Acids Res. 2019 Jan 8;47(D1):D1005-D1012
pubmed: 30445434
Genomics. 2009 Jul;94(1):78-82
pubmed: 19379804
Nature. 2015 Jan 29;517(7536):608-11
pubmed: 25383537
Genome Biol. 2018 Mar 20;19(1):38
pubmed: 29559002
Nature. 2013 Sep 26;501(7468):506-11
pubmed: 24037378
Genome Res. 2002 Jun;12(6):996-1006
pubmed: 12045153
Bioinformatics. 2016 Oct 15;32(20):3207-3209
pubmed: 27318201
Mol Cytogenet. 2008 Apr 28;1:8
pubmed: 18471269
Bioinformatics. 2009 May 1;25(9):1189-91
pubmed: 19151095
Nat Genet. 2016 Jan;48(1):22-9
pubmed: 26642241
Nat Biotechnol. 2021 Mar;39(3):309-312
pubmed: 33288905
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886
Nucleic Acids Res. 2019 Jan 8;47(D1):D853-D858
pubmed: 30407534
Nat Biotechnol. 2021 Mar;39(3):302-308
pubmed: 33288906
Mol Biol Evol. 2015 Jan;32(1):268-74
pubmed: 25371430
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773
pubmed: 30357393
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944
pubmed: 29373581
Bioinformatics. 2020 Feb 15;36(4):1260-1261
pubmed: 31504176
Cell. 2019 May 2;177(4):837-851.e28
pubmed: 30955886
Nature. 2011 Feb 3;470(7332):59-65
pubmed: 21293372
Nat Biotechnol. 2019 May;37(5):561-566
pubmed: 30936564
BMC Bioinformatics. 2017 Jul 12;18(1):338
pubmed: 28701187
Genome Biol. 2015 Mar 24;16:56
pubmed: 25887522
Trends Genet. 2007 Apr;23(4):183-91
pubmed: 17331616
Am J Hum Genet. 2007 Sep;81(3):559-75
pubmed: 17701901
Nature. 2001 Feb 15;409(6822):860-921
pubmed: 11237011
Genome Res. 2011 Jun;21(6):974-84
pubmed: 21324876
Nat Genet. 2016 Nov;48(11):1443-1448
pubmed: 27694958
BMJ Open Diabetes Res Care. 2017 Jul 19;5(1):e000401
pubmed: 28878935
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
Nature. 2020 Jul;583(7814):83-89
pubmed: 32460305
Nucleic Acids Res. 2020 Jan 8;48(D1):D941-D947
pubmed: 31584097
Nat Commun. 2019 Apr 16;10(1):1784
pubmed: 30992455
Nat Rev Genet. 2002 May;3(5):370-9
pubmed: 11988762
Genes Dev. 2017 Sep 1;31(17):1717-1731
pubmed: 28982758
Mol Cell Biol. 2001 Feb;21(4):1429-39
pubmed: 11158327
Genome Res. 2009 Sep;19(9):1655-64
pubmed: 19648217
Bioinformatics. 2011 Feb 15;27(4):592-3
pubmed: 21169378
Nat Commun. 2016 Oct 11;7:12522
pubmed: 27725671
Mol Cell Biol. 1988 Apr;8(4):1385-97
pubmed: 2454389
Proc Natl Acad Sci U S A. 2019 Nov 12;116(46):23243-23253
pubmed: 31659027
Bioinformatics. 2016 May 15;32(10):1479-85
pubmed: 26708335
Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5
pubmed: 12883005
Nature. 2005 Jun 16;435(7044):903-10
pubmed: 15959507
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Nat Genet. 2017 May;49(5):692-699
pubmed: 28369037
Semin Cancer Biol. 2010 Aug;20(4):234-45
pubmed: 20416380
Proc Natl Acad Sci U S A. 1997 Mar 4;94(5):1872-7
pubmed: 9050872
Nature. 2007 May 10;447(7141):161-5
pubmed: 17495918
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Curr Opin Genet Dev. 1999 Dec;9(6):657-63
pubmed: 10607616
Bioinformatics. 2016 Jun 15;32(12):i201-i208
pubmed: 27307618
Science. 2014 Aug 01;345(6196):1251343
pubmed: 25082706
Bioinformatics. 2004 Jan 22;20(2):289-90
pubmed: 14734327
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Genome Res. 2009 May;19(5):838-49
pubmed: 19279335
Hum Genomics. 2019 May 21;13(1):22
pubmed: 31113495
Nat Methods. 2011 Dec 04;9(2):179-81
pubmed: 22138821
Nat Methods. 2018 Aug;15(8):595-597
pubmed: 30013044
Proc Natl Acad Sci U S A. 2003 Apr 29;100(9):5280-5
pubmed: 12682288
Mol Biol Evol. 2000 Jun;17(6):915-28
pubmed: 10833198
J Mol Biol. 2003 Feb 28;326(4):1127-46
pubmed: 12589758