Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes.


Journal

Nucleic acids research
ISSN: 1362-4962
Titre abrégé: Nucleic Acids Res
Pays: England
ID NLM: 0411011

Informations de publication

Date de publication:
28 11 2022
Historique:
accepted: 10 11 2022
revised: 13 09 2022
received: 18 02 2022
pubmed: 2 12 2022
medline: 21 12 2022
entrez: 1 12 2022
Statut: ppublish

Résumé

Although long-read sequencing can often enable chromosome-level reconstruction of genomes, it is still unclear how one can routinely obtain gapless assemblies. In the model plant Arabidopsis thaliana, other than the reference accession Col-0, all other accessions de novo assembled with long-reads until now have used PacBio continuous long reads (CLR). Although these assemblies sometimes achieved chromosome-arm level contigs, they inevitably broke near the centromeres, excluding megabases of DNA from analysis in pan-genome projects. Since PacBio high-fidelity (HiFi) reads circumvent the high error rate of CLR technologies, albeit at the expense of read length, we compared a CLR assembly of accession Eyach15-2 to HiFi assemblies of the same sample. The use of five different assemblers starting from subsampled data allowed us to evaluate the impact of coverage and read length. We found that centromeres and rDNA clusters are responsible for 71% of contig breaks in the CLR scaffolds, while relatively short stretches of GA/TC repeats are at the core of >85% of the unfilled gaps in our best HiFi assemblies. Since the HiFi technology consistently enabled us to reconstruct gapless centromeres and 5S rDNA clusters, we demonstrate the value of the approach by comparing these previously inaccessible regions of the genome between the Eyach15-2 accession and the reference accession Col-0.

Identifiants

pubmed: 36453992
pii: 6858746
doi: 10.1093/nar/gkac1115
pmc: PMC9757041
doi:

Substances chimiques

DNA, Ribosomal 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

12309-12327

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.

Références

Nat Commun. 2018 Nov 19;9(1):4844
pubmed: 30451840
Science. 2021 Nov 12;374(6569):eabi7489
pubmed: 34762468
Mol Biol Evol. 2021 Jan 23;38(2):557-574
pubmed: 32966577
Sci Data. 2014 Nov 25;1:140045
pubmed: 25977796
Genomics Proteomics Bioinformatics. 2022 Feb;20(1):4-13
pubmed: 34487862
Mob DNA. 2019 Dec 12;10:48
pubmed: 31857828
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Science. 2009 Jan 2;323(5910):133-8
pubmed: 19023044
Nucleic Acids Res. 2018 Nov 30;46(21):e126
pubmed: 30107434
Genome Res. 2020 Sep;30(9):1291-1305
pubmed: 32801147
Nature. 2000 Dec 14;408(6814):796-815
pubmed: 11130711
Gigascience. 2020 Dec 15;9(12):
pubmed: 33319909
Plant Physiol. 2018 Feb;176(2):1410-1422
pubmed: 29233850
Science. 2021 Aug 6;373(6555):655-662
pubmed: 34353948
Nature. 2022 Nov;611(7936):519-531
pubmed: 36261518
Nat Biotechnol. 2015 Jun;33(6):623-30
pubmed: 26006009
Biotechniques. 2016 Oct 1;61(4):203-205
pubmed: 27712583
Science. 2022 Apr;376(6588):44-53
pubmed: 35357919
PLoS Genet. 2009 Nov;5(11):e1000743
pubmed: 19956743
Cancer Res. 2017 Nov 1;77(21):e31-e34
pubmed: 29092934
Cell. 2021 Jun 24;184(13):3542-3558.e16
pubmed: 34051138
Bioinformatics. 2008 Dec 15;24(24):2818-24
pubmed: 18952627
Proc Natl Acad Sci U S A. 2001 Apr 24;98(9):5099-103
pubmed: 11309509
Nat Commun. 2020 Feb 20;11(1):989
pubmed: 32080174
Genome Biol. 2019 Dec 16;20(1):277
pubmed: 31842948
Plant J. 1996 Feb;9(2):259-72
pubmed: 8820610
Genome Biol. 2020 Sep 14;21(1):245
pubmed: 32928274
Plant J. 1998 Mar;13(6):867-76
pubmed: 9681023
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
G3 (Bethesda). 2017 Apr 3;7(4):1201-1209
pubmed: 28188182
Proc Natl Acad Sci U S A. 2014 Jul 15;111(28):10263-8
pubmed: 24982153
Hortic Res. 2022 Feb 19;:
pubmed: 35184178
Bioinformatics. 2022 Mar 28;38(7):2049-2051
pubmed: 35020798
Methods Mol Biol. 2022;2484:363-379
pubmed: 35461463
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
Nat Methods. 2019 Jan;16(1):88-94
pubmed: 30559433
Ann Hum Genet. 2020 Mar;84(2):125-140
pubmed: 31711268
Nat Commun. 2018 Feb 7;9(1):541
pubmed: 29416032
Proc Natl Acad Sci U S A. 2016 Jul 12;113(28):E4052-60
pubmed: 27354520
Front Plant Sci. 2022 May 19;13:883897
pubmed: 35665166
Nat Commun. 2020 May 8;11(1):2288
pubmed: 32385271
Mol Plant. 2019 Mar 4;12(3):447-460
pubmed: 30802553
Genome Res. 2017 Mar;27(3):471-478
pubmed: 28223399
Genome Biol. 2022 Dec 15;23(1):258
pubmed: 36522651
Nat Biotechnol. 2019 Oct;37(10):1155-1162
pubmed: 31406327
BMC Bioinformatics. 2008 Jan 14;9:18
pubmed: 18194517
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Genome Biol. 2020 May 20;21(1):121
pubmed: 32434565
PLoS One. 2019 May 21;14(5):e0216233
pubmed: 31112551
Nature. 2011 Aug 28;477(7365):419-23
pubmed: 21874022
PLoS One. 2016 Oct 6;11(10):e0164321
pubmed: 27711162
Bioinformatics. 2018 Feb 15;34(4):550-557
pubmed: 29444236
Proc Natl Acad Sci U S A. 2011 Jun 21;108(25):10249-54
pubmed: 21646520
Nat Biotechnol. 2008 Oct;26(10):1146-53
pubmed: 18846088
Nat Methods. 2021 Feb;18(2):170-175
pubmed: 33526886
Nat Biotechnol. 2021 Mar;39(3):302-308
pubmed: 33288906
Plant Cell. 2021 Jul 19;33(6):1888-1906
pubmed: 33710295
Nucleic Acids Res. 2018 Apr 6;46(6):3019-3033
pubmed: 29518237
Plant Physiol. 2019 Aug;180(4):1803-1815
pubmed: 31152127
Genome Biol. 2019 Dec 16;20(1):275
pubmed: 31843001
Genome Biol. 2017 May 3;18(1):75
pubmed: 28464948
Nat Biotechnol. 2016 May 6;34(5):518-24
pubmed: 27153285
Science. 2021 Apr 2;372(6537):
pubmed: 33632895
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838
Bioinformatics. 2011 Mar 15;27(6):764-70
pubmed: 21217122
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W265-8
pubmed: 17485477
Nucleic Acids Res. 2012 Jan;40(Database issue):D1202-10
pubmed: 22140109
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Nat Methods. 2013 Jun;10(6):563-9
pubmed: 23644548

Auteurs

Fernando A Rabanal (FA)

Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany.

Maike Gräff (M)

Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany.

Christa Lanz (C)

Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany.

Katrin Fritschi (K)

Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany.

Victor Llaca (V)

Genomics Technologies, Corteva Agriscience, Johnston, IA 50131, USA.

Michelle Lang (M)

Genomics Technologies, Corteva Agriscience, Johnston, IA 50131, USA.

Pablo Carbonell-Bejerano (P)

Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany.

Ian Henderson (I)

Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK.

Detlef Weigel (D)

Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Arabidopsis Arabidopsis Proteins Osmotic Pressure Cytoplasm RNA, Messenger

Classifications MeSH