The structure, function and evolution of a complete human chromosome 8.
Animals
Cell Line
Centromere
/ chemistry
Chromosomes, Human, Pair 8
/ chemistry
DNA Methylation
DNA, Satellite
/ genetics
Epigenesis, Genetic
Evolution, Molecular
Female
Humans
Macaca mulatta
/ genetics
Male
Minisatellite Repeats
/ genetics
Pan troglodytes
/ genetics
Phylogeny
Pongo abelii
/ genetics
Telomere
/ chemistry
Journal
Nature
ISSN: 1476-4687
Titre abrégé: Nature
Pays: England
ID NLM: 0410462
Informations de publication
Date de publication:
05 2021
05 2021
Historique:
received:
04
09
2020
accepted:
04
03
2021
pubmed:
9
4
2021
medline:
28
5
2021
entrez:
8
4
2021
Statut:
ppublish
Résumé
The complete assembly of each human chromosome is essential for understanding human biology and evolution
Identifiants
pubmed: 33828295
doi: 10.1038/s41586-021-03420-7
pii: 10.1038/s41586-021-03420-7
pmc: PMC8099727
mid: NIHMS1696366
doi:
Substances chimiques
DNA, Satellite
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, N.I.H., Intramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
101-107Subventions
Organisme : NHGRI NIH HHS
ID : K99 HG011041
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG011274
Pays : United States
Organisme : NHGRI NIH HHS
ID : R21 HG010548
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG010971
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG002385
Pays : United States
Organisme : NIAMS NIH HHS
ID : P30 AR074990
Pays : United States
Organisme : NLM NIH HHS
ID : T32 LM012419
Pays : United States
Organisme : Howard Hughes Medical Institute
Pays : United States
Organisme : NIGMS NIH HHS
ID : F32 GM134558
Pays : United States
Organisme : NHGRI NIH HHS
ID : T32 HG000035
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010169
Pays : United States
Références
International Human Genome Project Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
doi: 10.1038/35057062
Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
pubmed: 11181995
doi: 10.1126/science.1058040
Alkan, C. et al. Genome-wide characterization of centromeric satellites from multiple mammalian genomes. Genome Res. 21, 137–145 (2011).
pubmed: 21081712
pmcid: 3012921
doi: 10.1101/gr.111278.110
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
doi: 10.1038/nature03001
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. gr.263566.120 (2020).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly with phased assembly graphs. Nat. Methods 18, 170–175 (2021).
pubmed: 33526886
doi: 10.1038/s41592-020-01056-5
pmcid: 7961889
Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).
pubmed: 32504078
pmcid: 7877196
doi: 10.1038/s41576-020-0236-x
McNulty, S. M. & Sullivan, B. A. Alpha satellite DNA biology: finding function in the recesses of the genome. Chromosome Res. 26, 115–138 (2018).
pubmed: 29974361
pmcid: 6121732
doi: 10.1007/s10577-018-9582-3
Ge, Y., Wagner, M. J., Siciliano, M. & Wells, D. E. Sequence, higher order repeat structure, and long-range organization of alpha satellite DNA specific to human chromosome 8. Genomics 13, 585–593 (1992).
pubmed: 1639387
doi: 10.1016/0888-7543(92)90128-F
Hollox, E. J., Armour, J. A. & Barber, J. C. K. Extensive normal copy number variation of a β-defensin antimicrobial-gene cluster. Am. J. Hum. Genet. 73, 591–600 (2003).
pubmed: 12916016
pmcid: 1180683
doi: 10.1086/378157
Hollox, E. J. et al. Psoriasis is associated with increased beta-defensin genomic copy number. Nat. Genet. 40, 23–25 (2008).
pubmed: 18059266
doi: 10.1038/ng.2007.48
Mohajeri, K. et al. Interchromosomal core duplicons drive both evolutionary instability and disease susceptibility of the chromosome 8p23.1 region. Genome Res. 26, 1453–1467 (2016).
pubmed: 27803192
pmcid: 5088589
doi: 10.1101/gr.211284.116
Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585, 79–84 (2020).
pubmed: 32663838
pmcid: 7484160
doi: 10.1038/s41586-020-2547-7
Sudmant, P. H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).
pubmed: 21030649
pmcid: 3020103
doi: 10.1126/science.1197005
Falconer, E. & Lansdorp, P. M. Strand-seq: a unifying tool for studies of chromosome segregation. Semin. Cell Dev. Biol. 24, 643–652 (2013).
pubmed: 23665005
doi: 10.1016/j.semcdb.2013.04.005
Sanders, A. D., Falconer, E., Hills, M., Spierings, D. C. J. & Lansdorp, P. M. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs. Nat. Protocols 12, 1151–1176 (2017).
pubmed: 28492527
doi: 10.1038/nprot.2017.029
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
pubmed: 32928274
pmcid: 7488777
doi: 10.1186/s13059-020-02134-9
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).
pubmed: 28218898
doi: 10.1038/nmeth.4184
Devriendt, K. et al. Delineation of the critical deletion region for congenital heart defects, on chromosome 8p23.1. Am. J. Hum. Genet. 64, 1119–1126 (1999).
pubmed: 10090897
pmcid: 1377836
doi: 10.1086/302330
Giglio, S. et al. Heterozygous submicroscopic inversions involving olfactory receptor-gene clusters mediate the recurrent t(4;8)(p16;p23) translocation. Am. J. Hum. Genet. 71, 276–285 (2002).
pubmed: 12058347
pmcid: 379160
doi: 10.1086/341610
Cantsilieris, S. & White, S. J. Correlating multiallelic copy number polymorphisms with disease susceptibility. Hum. Mutat. 34, 1–13 (2013).
pubmed: 22837109
doi: 10.1002/humu.22172
Tyson, C. et al. Expansion of a 12-kb VNTR containing the REXO1L1 gene cluster underlies the microscopically visible euchromatic variant of 8q21.2. Eur. J. Hum. Genet. 22, 458–463 (2014).
pubmed: 24045839
doi: 10.1038/ejhg.2013.185
Warburton, P. E. et al. Analysis of the largest tandemly repeated DNA families in the human genome. BMC Genomics 9, 533 (2008).
pubmed: 18992157
pmcid: 2588610
doi: 10.1186/1471-2164-9-533
Hasson, D. et al. Formation of novel CENP-A domains on tandem repetitive DNA and across chromosome breakpoints on human chromosome 8q21 neocentromeres. Chromosoma 120, 621–632 (2011).
pubmed: 21826412
doi: 10.1007/s00412-011-0337-6
Hasson, D. et al. The octamer is the major form of CENP-A nucleosomes at human centromeres. Nat. Struct. Mol. Biol. 20, 687–695 (2013).
pubmed: 23644596
pmcid: 3760417
doi: 10.1038/nsmb.2562
Alkan, C. et al. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data. PLOS Comput. Biol. 3, 1807–1818 (2007).
pubmed: 17907796
doi: 10.1371/journal.pcbi.0030181
Cacheux, L., Ponger, L., Gerbault-Seureau, M., Richard, F. A. & Escudé, C. Diversity and distribution of alpha satellite DNA in the genome of an Old World monkey: Cercopithecus solatus. BMC Genomics 17, 916 (2016).
pubmed: 27842493
pmcid: 5109768
doi: 10.1186/s12864-016-3246-5
Jain, M. et al. Linear assembly of a human centromere on the Y chromosome. Nat. Biotechnol. 36, 321–323 (2018).
pubmed: 29553574
pmcid: 5886786
doi: 10.1038/nbt.4109
Warburton, P. E. et al. Immunolocalization of CENP-A suggests a distinct nucleosome structure at the inner kinetochore plate of active centromeres. Curr. Biol. 7, 901–904 (1997).
pubmed: 9382805
doi: 10.1016/S0960-9822(06)00382-4
Vafa, O. & Sullivan, K. F. Chromatin containing CENP-A and α-satellite DNA is a major component of the inner kinetochore plate. Curr. Biol. 7, 897–900 (1997).
pubmed: 9382804
doi: 10.1016/S0960-9822(06)00381-2
Smith, G. P. Evolution of repeated DNA sequences by unequal crossover. Science 191, 528–535 (1976).
pubmed: 1251186
doi: 10.1126/science.1251186
Shepelev, V. A., Alexandrov, A. A., Yurov, Y. B. & Alexandrov, I. A. The evolutionary origin of man can be traced in the layers of defunct ancestral alpha satellites flanking the active centromeres of human chromosomes. PLoS Genet. 5, e1000641 (2009).
pubmed: 19749981
pmcid: 2729386
doi: 10.1371/journal.pgen.1000641
Alexandrov, I., Kazakov, A., Tumeneva, I., Shepelev, V. & Yurov, Y. Alpha-satellite DNA of primates: old and new families. Chromosoma 110, 253–266 (2001).
pubmed: 11534817
doi: 10.1007/s004120100146
Koga, A. et al. Evolutionary origin of higher-order repeat structure in alpha-satellite DNA of primate centromeres. DNA Res. 21, 407–415 (2014).
pubmed: 24585002
pmcid: 4131833
doi: 10.1093/dnares/dsu005
Alexandrov, I. A., Mitkevich, S. P. & Yurov, Y. B. The phylogeny of human chromosome specific alpha satellites. Chromosoma 96, 443–453 (1988).
pubmed: 3219915
doi: 10.1007/BF00303039
Vollger, M. R. et al. Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Ann. Hum. Genet. 84, 125–140 (2019).
pubmed: 31711268
pmcid: 7015760
doi: 10.1111/ahg.12364
Huddleston, J. et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 24, 688–696 (2014).
pubmed: 24418700
pmcid: 3975067
doi: 10.1101/gr.168450.113
Garg, S. et al. Chromosome-scale, haplotype-resolved assembly of human genomes. Nat. Biotechnol. 39, 309–312 (2021).
Porubsky, D. et al. Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat. Biotechnol. 39, 302–308 (2021).
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science https://doi.org/10.1126/science.abf7117 (2021).
Logsdon, G. A. HMW gDNA purification and ONT ultra-long-read data generation. protocols.io https://doi.org/10.17504/protocols.io.bchhit36 (2020).
Dvorkina, T., Bzikadze, A. V. & Pevzner, P. A. The string decomposition problem and its applications to centromere analysis and assembly. Bioinformatics 36 (Suppl. 1), i93–i101 (2020).
pubmed: 32657390
pmcid: 7428072
doi: 10.1093/bioinformatics/btaa454
Jain, C. et al. Weighted minimizer sampling improves long read mapping. Bioinformatics 36 (Suppl. 1), i111–i118 (2020).
pubmed: 32657365
doi: 10.1093/bioinformatics/btaa435
pmcid: 7355284
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
pubmed: 29750242
pmcid: 6137996
doi: 10.1093/bioinformatics/bty191
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
pubmed: 20080505
pmcid: 2828108
doi: 10.1093/bioinformatics/btp698
Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
pubmed: 25697820
pmcid: 4765878
doi: 10.1093/bioinformatics/btv098
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943
pmcid: 2723002
doi: 10.1093/bioinformatics/btp352
Porubsky, D. et al. breakpointR: an R/Bioconductor package to localize strand state changes in Strand-seq data. Bioinformatics 36, 1260–1261 (2020).
pubmed: 31504176
doi: 10.1093/bioinformatics/btz681
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).
pubmed: 30992455
pmcid: 6467913
doi: 10.1038/s41467-018-08148-z
Sanders, A. D. et al. Characterizing polymorphic inversions in human genomes by single-cell sequencing. Genome Res. 26, 1575–1587 (2016).
pubmed: 27472961
pmcid: 5088599
doi: 10.1101/gr.201160.115
Ghareghani, M. et al. Strand-seq enables reliable separation of long reads by chromosome via expectation maximization. Bioinformatics 34, i115–i123 (2018).
pubmed: 29949971
pmcid: 6022540
doi: 10.1093/bioinformatics/bty290
Mikheenko, A., Bzikadze, A. V., Gurevich, A., Miga, K. H. & Pevzner, P. A. TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics 36 (Suppl. 1), i75–i83 (2020).
pubmed: 32657355
pmcid: 7355294
doi: 10.1093/bioinformatics/btaa440
Lee, I. et al. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. Nat. Methods 17, 1191–1199 (2020).
pubmed: 33230324
doi: 10.1038/s41592-020-01000-7
pmcid: 7704922
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
pubmed: 21221095
pmcid: 3346182
doi: 10.1038/nbt.1754
Dougherty, M. L. et al. Transcriptional fates of human-specific segmental duplications in brain. Genome Res. 28, 1566–1576 (2018).
pubmed: 30228200
pmcid: 6169893
doi: 10.1101/gr.237610.118
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
pubmed: 24227677
doi: 10.1093/bioinformatics/btt656
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
pubmed: 22955987
pmcid: 3431492
doi: 10.1101/gr.135350.111
Pertea, M. et al. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol. 19, 208 (2018).
pubmed: 30486838
pmcid: 6260756
doi: 10.1186/s13059-018-1590-2
Shumate, A. & Salzberg, S. L. Liftoff: an accurate gene annotation mapping tool. Bioinformatics https://doi.org/10.1093/bioinformatics/btaa1016 (2020).
doi: 10.1093/bioinformatics/btaa1016
pubmed: 33320174
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020).
Parsons, J. D. Miropeats: graphical DNA sequence comparisons. Bioinformatics 11, 615–619 (1995).
doi: 10.1093/bioinformatics/11.6.615
Bergström, A. et al. Insights into human genetic variation and population history from 929 diverse genomes. Science 367, eaay5012 (2020).
pubmed: 32193295
pmcid: 7115999
doi: 10.1126/science.aay5012
Mafessoni, F. et al. A high-coverage Neandertal genome from Chagyrskaya Cave. Proc. Natl Acad. Sci. USA 117, 15132–15136 (2020).
pubmed: 32546518
doi: 10.1073/pnas.2004944117
pmcid: 7334501
Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).
pubmed: 27654912
pmcid: 5161557
doi: 10.1038/nature18964
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
pubmed: 22936568
pmcid: 3617501
doi: 10.1126/science.1224344
Prado-Martinez, J. et al. Great ape genetic diversity and population history. Nature 499, 471–475 (2013).
pubmed: 23823723
pmcid: 3822165
doi: 10.1038/nature12228
Prüfer, K. et al. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 358, 655–658 (2017).
pubmed: 28982794
pmcid: 6185897
doi: 10.1126/science.aao1887
Hach, F. et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7, 576–577 (2010).
pubmed: 20676076
pmcid: 3115707
doi: 10.1038/nmeth0810-576
Gymrek, M., Golan, D., Rosset, S. & Erlich, Y. lobSTR: A short tandem repeat profiler for personal genomes. Genome Res. 22, 1154–1162 (2012).
pubmed: 22522390
pmcid: 3371701
doi: 10.1101/gr.135780.111
Haaf, T. & Willard, H. F. Chromosome-specific alpha-satellite DNA from the centromere of chimpanzee chromosome 4. Chromosoma 106, 226–232 (1997).
pubmed: 9254724
doi: 10.1007/s004120050243
Iwata-Otsubo, A. et al. Expanded satellite repeats amplify a discrete CENP-A nucleosome assembly site on chromosomes that drive in female meiosis. Curr. Biol. 27, 2365–2373.e8 (2017).
pubmed: 28756949
pmcid: 5567862
doi: 10.1016/j.cub.2017.06.069
Logsdon, G. A. et al. Human artificial chromosomes that bypass centromeric DNA. Cell 178, 624–639.e19 (2019).
pubmed: 31348889
pmcid: 6657561
doi: 10.1016/j.cell.2019.06.006
Ventura, M. et al. Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee. Genome Res. 21, 1640–1649 (2011).
pubmed: 21685127
pmcid: 3202281
doi: 10.1101/gr.124461.111
Darby, I. A. In Situ Hybridization Protocols (Humana Press, 2000).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011).
doi: 10.14806/ej.17.1.200
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187-91 (2014).
pubmed: 24799436
pmcid: 4086134
doi: 10.1093/nar/gku365
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0 (2013).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
pubmed: 23329690
pmcid: 3603318
doi: 10.1093/molbev/mst010
Nakamura, T., Yamada, K. D., Tomii, K. & Katoh, K. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics 34, 2490–2492 (2018).
pubmed: 29506019
pmcid: 6041967
doi: 10.1093/bioinformatics/bty121
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
pubmed: 25371430
doi: 10.1093/molbev/msu300
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23, 127–128 (2007).
pubmed: 17050570
doi: 10.1093/bioinformatics/btl529
Tamura, K. & Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10, 512–526 (1993).
pubmed: 8336541
Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, 1983).
Numanagić, I. et al. Fast characterization of segmental duplications in genome assemblies. Bioinformatics 34, i706–i714 (2018).
pubmed: 30423092
pmcid: 6129265
doi: 10.1093/bioinformatics/bty586
Landry, J. J. M. et al. The genomic and transcriptomic landscape of a HeLa cell line. G3 (Bethesda) 3, 1213–1224 (2013).
doi: 10.1534/g3.113.005777