Pangenome graph construction from genome alignments with Minigraph-Cactus.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
10 May 2023
10 May 2023
Historique:
received:
06
10
2022
accepted:
18
04
2023
pmc-release:
10
11
2024
pubmed:
11
5
2023
medline:
11
5
2023
entrez:
10
5
2023
Statut:
aheadofprint
Résumé
Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.
Identifiants
pubmed: 37165083
doi: 10.1038/s41587-023-01793-w
pii: 10.1038/s41587-023-01793-w
pmc: PMC10638906
mid: NIHMS1928546
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : NHGRI NIH HHS
ID : U01 HG010961
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010040
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG010973
Pays : United States
Organisme : NHLBI NIH HHS
ID : OT3 HL142481
Pays : United States
Organisme : NIH HHS
ID : OT2 OD033761
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG011274
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG010262
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG010971
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG011853
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010485
Pays : United States
Investigateurs
Haley J Abel
(HJ)
Lucinda L Antonacci-Fulton
(LL)
Mobin Asri
(M)
Gunjan Baid
(G)
Carl A Baker
(CA)
Anastasiya Belyaeva
(A)
Konstantinos Billis
(K)
Guillaume Bourque
(G)
Silvia Buonaiuto
(S)
Andrew Carroll
(A)
Mark J P Chaisson
(MJP)
Pi-Chuan Chang
(PC)
Xian H Chang
(XH)
Haoyu Cheng
(H)
Justin Chu
(J)
Sarah Cody
(S)
Vincenza Colonna
(V)
Daniel E Cook
(DE)
Robert M Cook-Deegan
(RM)
Omar E Cornejo
(OE)
Mark Diekhans
(M)
Daniel Doerr
(D)
Peter Ebert
(P)
Jana Ebler
(J)
Evan E Eichler
(EE)
Susan Fairley
(S)
Olivier Fedrigo
(O)
Adam L Felsenfeld
(AL)
Xiaowen Feng
(X)
Christian Fischer
(C)
Paul Flicek
(P)
Giulio Formenti
(G)
Adam Frankish
(A)
Robert S Fulton
(RS)
Shilpa Garg
(S)
Erik Garrison
(E)
Nanibaa' A Garrison
(NA)
Carlos Garcia Giron
(CG)
Richard E Green
(RE)
Cristian Groza
(C)
Andrea Guarracino
(A)
Leanne Haggerty
(L)
Ira M Hall
(IM)
William T Harvey
(WT)
Marina Haukness
(M)
David Haussler
(D)
Simon Heumos
(S)
Kendra Hoekzema
(K)
Thibaut Hourlier
(T)
Kerstin Howe
(K)
Miten Jain
(M)
Erich D Jarvis
(ED)
Hanlee P Ji
(HP)
Eimear E Kenny
(EE)
Barbara A Koenig
(BA)
Alexey Kolesnikov
(A)
Jan O Korbel
(JO)
Jennifer Kordosky
(J)
Sergey Koren
(S)
HoJoon Lee
(H)
Alexandra P Lewis
(AP)
Wen-Wei Liao
(WW)
Shuangjia Lu
(S)
Tsung-Yu Lu
(TY)
Julian K Lucas
(JK)
Hugo Magalhães
(H)
Santiago Marco-Sola
(S)
Pierre Marijon
(P)
Charles Markello
(C)
Tobias Marschall
(T)
Fergal J Martin
(FJ)
Ann McCartney
(A)
Jennifer McDaniel
(J)
Karen H Miga
(KH)
Matthew W Mitchell
(MW)
Jacquelyn Mountcastle
(J)
Katherine M Munson
(KM)
Moses Njagi Mwaniki
(MN)
Maria Nattestad
(M)
Sergey Nurk
(S)
Hugh E Olsen
(HE)
Nathan D Olson
(ND)
Trevor Pesout
(T)
Adam M Phillippy
(AM)
Alice B Popejoy
(AB)
David Porubsky
(D)
Pjotr Prins
(P)
Daniela Puiu
(D)
Mikko Rautiainen
(M)
Allison A Regier
(AA)
Arang Rhie
(A)
Samuel Sacco
(S)
Ashley D Sanders
(AD)
Valerie A Schneider
(VA)
Baergen I Schultz
(BI)
Kishwar Shafin
(K)
Jonas A Sibbesen
(JA)
Jouni Sirén
(J)
Michael W Smith
(MW)
Heidi J Sofia
(HJ)
Ahmad N Abou Tayoun
(ANA)
Françoise Thibaud-Nissen
(F)
Chad Tomlinson
(C)
Francesca Floriana Tricomi
(FF)
Flavia Villani
(F)
Mitchell R Vollger
(MR)
Justin Wagner
(J)
Brian Walenz
(B)
Ting Wang
(T)
Jonathan M D Wood
(JMD)
Aleksey V Zimin
(AV)
Justin M Zook
(JM)
Informations de copyright
© 2023. The Author(s), under exclusive licence to Springer Nature America, Inc.
Références
Bioinformatics. 2018 Sep 1;34(17):i706-i714
pubmed: 30423092
Nat Biotechnol. 2017 Apr 11;35(4):314-316
pubmed: 28398314
Nat Biotechnol. 2018 Nov;36(10):983-987
pubmed: 30247488
Annu Rev Genomics Hum Genet. 2021 Aug 31;22:81-102
pubmed: 33929893
Bioinformatics. 2020 Jan 15;36(2):400-407
pubmed: 31406990
Bioinformatics. 2019 Nov 1;35(21):4408-4410
pubmed: 30989183
Genome Biol. 2020 Feb 12;21(1):35
pubmed: 32051000
Bioinformatics. 2002 Mar;18(3):452-64
pubmed: 11934745
Bioinformatics. 2023 Jan 1;39(1):
pubmed: 36448683
Nature. 2022 Jun;606(7914):527-534
pubmed: 35676474
Nat Methods. 2018 Aug;15(8):595-597
pubmed: 30013044
Nature. 2023 May;617(7960):335-343
pubmed: 37165241
Nat Commun. 2019 Oct 25;10(1):4872
pubmed: 31653862
Brief Bioinform. 2013 Mar;14(2):144-61
pubmed: 22908213
Science. 2022 Apr;376(6588):44-53
pubmed: 35357919
Bioinformatics. 2013 May 15;29(10):1341-2
pubmed: 23505295
Nat Biotechnol. 2023 Oct;41(10):1474-1482
pubmed: 36797493
Nature. 2023 May;617(7960):312-324
pubmed: 37165242
J Comput Biol. 2018 Jul;25(7):649-663
pubmed: 29461862
Genome Biol. 2020 Sep 24;21(1):253
pubmed: 32972461
Genome Biol. 2022 Aug 29;23(1):182
pubmed: 36038949
Genome Res. 2004 Apr;14(4):708-15
pubmed: 15060014
Annu Rev Genomics Hum Genet. 2020 Aug 31;21:139-162
pubmed: 32453966
Gigascience. 2021 Feb 16;10(2):
pubmed: 33590861
J Comput Biol. 2001;8(6):615-23
pubmed: 11747615
Genome Res. 2014 Jul;24(7):1193-208
pubmed: 24714809
Nat Biotechnol. 2018 Oct;36(9):875-879
pubmed: 30125266
Genome Res. 2014 Dec;24(12):2077-89
pubmed: 25273068
Nat Biotechnol. 2019 May;37(5):555-560
pubmed: 30858580
Nat Methods. 2023 Sep;20(9):1346-1354
pubmed: 37580559
Genome Biol. 2022 Dec 27;23(1):271
pubmed: 36575487
Nature. 2020 Nov;587(7833):246-251
pubmed: 33177663
Nat Commun. 2022 May 31;13(1):3012
pubmed: 35641504
J Comput Biol. 2011 Mar;18(3):469-81
pubmed: 21385048
Genome Biol. 2020 Oct 16;21(1):265
pubmed: 33066802
Bioinformatics. 2021 Jan 29;36(21):5139-5144
pubmed: 33040146
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Bioinformatics. 2021 Aug 9;37(15):2209-2211
pubmed: 33165528
Science. 2021 Apr 2;372(6537):
pubmed: 33632895
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
Nature. 2020 Jul;583(7814):83-89
pubmed: 32460305
Nat Biotechnol. 2022 May;40(5):672-680
pubmed: 35132260
Genome Res. 2018 Jul;28(7):1029-1038
pubmed: 29884752
Genome Res. 2011 Sep;21(9):1512-28
pubmed: 21665927
Nat Genet. 2019 Jan;51(1):30-35
pubmed: 30455414
G3 (Bethesda). 2020 Nov 5;10(11):4271-4285
pubmed: 32972999
Nat Genet. 2022 Apr;54(4):518-525
pubmed: 35410384
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Science. 2021 Dec 17;374(6574):abg8871
pubmed: 34914532
Bioinformatics. 2014 Oct;30(19):2813-5
pubmed: 24907369