Telomere-to-telomere assembly of diploid chromosomes with Verkko.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
Oct 2023
Oct 2023
Historique:
received:
24
06
2022
accepted:
03
01
2023
medline:
3
11
2023
pubmed:
17
2
2023
entrez:
16
2
2023
Statut:
ppublish
Résumé
The Telomere-to-Telomere consortium recently assembled the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on manual integration of ultra-long Oxford Nanopore sequencing reads with a high-resolution assembly graph built from long, accurate PacBio high-fidelity reads. We have improved and automated this strategy in Verkko, an iterative, graph-based pipeline for assembling complete, diploid genomes. Verkko begins with a multiplex de Bruijn graph built from long, accurate reads and progressively simplifies this graph by integrating ultra-long reads and haplotype-specific markers. The result is a phased, diploid assembly of both haplotypes, with many chromosomes automatically assembled from telomere to telomere. Running Verkko on the HG002 human genome resulted in 20 of 46 diploid chromosomes assembled without gaps at 99.9997% accuracy. The complete assembly of diploid genomes is a critical step towards the construction of comprehensive pangenome databases and chromosome-scale comparative genomics.
Identifiants
pubmed: 36797493
doi: 10.1038/s41587-023-01662-6
pii: 10.1038/s41587-023-01662-6
pmc: PMC10427740
mid: NIHMS1878337
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1474-1482Subventions
Organisme : NIGMS NIH HHS
ID : F32 GM134558
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010169
Pays : United States
Organisme : Intramural NIH HHS
ID : Z99 HG999999
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG002385
Pays : United States
Informations de copyright
© 2023. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.
Références
Nature. 2022 Apr;604(7906):437-446
pubmed: 35444317
Genome Biol. 2020 Sep 14;21(1):245
pubmed: 32928274
Nat Protoc. 2017 Jun;12(6):1151-1176
pubmed: 28492527
Science. 2022 Apr;376(6588):eabl4178
pubmed: 35357911
Genome Res. 1998 Mar;8(3):186-94
pubmed: 9521922
Nat Biotechnol. 2013 Dec;31(12):1119-25
pubmed: 24185095
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Bioinformatics. 2015 Oct 15;31(20):3350-2
pubmed: 26099265
Genome Res. 2020 Sep;30(9):1291-1305
pubmed: 32801147
Proc Natl Acad Sci U S A. 1986 Aug;83(15):5611-5
pubmed: 3016709
Nature. 2021 May;593(7857):101-107
pubmed: 33828295
Genome Inform. 2002;13:93-102
pubmed: 14571378
Methods Mol Biol. 2010;673:1-17
pubmed: 20835789
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Bioinformatics. 2018 Jul 1;34(13):i115-i123
pubmed: 29949971
Nat Rev Genet. 2020 Oct;21(10):597-614
pubmed: 32504078
Genome Res. 2017 May;27(5):801-812
pubmed: 27940952
Bioinformatics. 2016 Apr 1;32(7):1009-15
pubmed: 26589280
Science. 1976 Feb 13;191(4227):528-35
pubmed: 1251186
Nat Methods. 2022 Jun;19(6):687-695
pubmed: 35361931
Nature. 2022 Nov;611(7936):519-531
pubmed: 36261518
Elife. 2019 Jun 25;8:
pubmed: 31237235
Science. 2022 Apr;376(6588):44-53
pubmed: 35357919
Cell Genom. 2021 Dec;1(3):
pubmed: 34993501
Science. 2017 Apr 7;356(6333):92-95
pubmed: 28336562
Genome Res. 2016 Nov;26(11):1453-1467
pubmed: 27803192
Bioinformatics. 2008 Dec 15;24(24):2818-24
pubmed: 18952627
Genome Biol. 2020 Sep 24;21(1):253
pubmed: 32972461
Bioinformatics. 2021 Dec 7;37(23):4572-4574
pubmed: 34623391
Nat Methods. 2012 Nov;9(11):1107-12
pubmed: 23042453
Bioinformatics. 2020 Jul 1;36(Suppl_1):i75-i83
pubmed: 32657355
Bioinformatics. 2016 Jul 15;32(14):2103-10
pubmed: 27153593
Nature. 2020 Sep;585(7823):79-84
pubmed: 32663838
PLoS Comput Biol. 2017 Jun 8;13(6):e1005595
pubmed: 28594827
Nat Methods. 2019 Jan;16(1):88-94
pubmed: 30559433
Nat Biotechnol. 2022 Jul;40(7):1075-1081
pubmed: 35228706
Bioinformatics. 2022 Mar 28;38(7):2049-2051
pubmed: 35020798
J Comput Biol. 2004;11(5):933-44
pubmed: 15700410
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
Nature. 2021 Apr;592(7856):737-746
pubmed: 33911273
J Comput Biol. 1995 Summer;2(2):291-306
pubmed: 7497130
Nat Biotechnol. 2019 Oct;37(10):1155-1162
pubmed: 31406327
Science. 1993 Oct 1;262(5130):110-4
pubmed: 8211116
Genomics. 1987 Sep;1(1):43-51
pubmed: 2889661
Nucleic Acids Res. 1991 May 11;19(9):2295-301
pubmed: 2041770
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Nat Biotechnol. 2018 Oct 22;:
pubmed: 30346939
Gigascience. 2021 Jan 9;10(1):
pubmed: 33420778
Nat Biotechnol. 2020 Sep;38(9):1044-1053
pubmed: 32686750
Nat Biotechnol. 2021 Apr;39(4):422-430
pubmed: 33318652
Bioinformatics. 2018 Sep 1;34(17):i748-i756
pubmed: 30423094
Curr Opin Microbiol. 2015 Feb;23:110-20
pubmed: 25461581
Genomics Proteomics Bioinformatics. 2022 Feb;20(1):4-13
pubmed: 34487862
Proc Natl Acad Sci U S A. 1989 Dec;86(23):9394-8
pubmed: 2594775
PeerJ. 2021 Feb 5;9:e10805
pubmed: 33604186
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
Nat Biotechnol. 2021 Mar;39(3):309-312
pubmed: 33288905
PLoS Comput Biol. 2019 Aug 21;15(8):e1007273
pubmed: 31433799
Bioinformatics. 2017 Sep 01;33(17):2737-2739
pubmed: 28475666
Nat Biotechnol. 2021 Mar;39(3):302-308
pubmed: 33288906
Nat Biotechnol. 2022 Sep;40(9):1332-1335
pubmed: 35332338
Chromosoma. 1989 May;97(6):475-80
pubmed: 2568244
Bioinformatics. 2013 Apr 15;29(8):1072-5
pubmed: 23422339
Genome Res. 2002 Dec;12(12):1815-26
pubmed: 12466285
Bioinformatics. 2020 Feb 15;36(4):1260-1261
pubmed: 31504176
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
Bioinformatics. 2020 May 1;36(9):2896-2898
pubmed: 31971576
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
Chromosome Res. 2018 Sep;26(3):115-138
pubmed: 29974361
Bioinformatics. 2021 Aug 25;37(16):2476-2478
pubmed: 33475133
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Cell. 2022 May 26;185(11):1986-2005.e26
pubmed: 35525246
Genomics. 1990 Aug;7(4):607-13
pubmed: 1974881
Bioinformatics. 2004 Dec 12;20(18):3363-9
pubmed: 15256412