Hi-C Data Formats.

4D nucleome Bioinformatics Data formats Hi-C Software

Journal

Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969

Informations de publication

Date de publication:
2022
Historique:
entrez: 20 8 2021
pubmed: 21 8 2021
medline: 12 1 2022
Statut: ppublish

Résumé

Processing, storing, and visualizing high-resolution Hi-C data required development of efficient data formats. A sparse matrix format saving only nonzero values has become the norm. A "zoomable" matrix style also became popular, storing multiple resolutions in a single file for interactive visualization. This chapter discusses the latest matrix file formats such as .hic and .mcool, and other intermediate formats including SAM/BAM and random-accessible contact lists.

Identifiants

pubmed: 34415533
doi: 10.1007/978-1-0716-1390-0_6
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

133-141

Informations de copyright

© 2022. Springer Science+Business Media, LLC, part of Springer Nature.

Références

Rao SSP, Huntley MH, Durand NC et al (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159:1665–1680
doi: 10.1016/j.cell.2014.11.021
Durand NC, Shamim MS, Machol I et al (2016) Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst 3:95–98
doi: 10.1016/j.cels.2016.07.002
Durand NC, Robinson JT, Shamim MS et al (2016) Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst 3:99–101
doi: 10.1016/j.cels.2015.07.012
Dekker J, Belmont AS, Guttman M et al (2017) The 4D nucleome project. Nature 549:219–226
doi: 10.1038/nature23884
Cock PJA, Fields CJ, Goto N et al (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38:1767–1771
doi: 10.1093/nar/gkp1137
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
doi: 10.1093/bioinformatics/btp352
Robinson JT, Turner D, Durand NC et al (2018) Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Syst 6:256–258
doi: 10.1016/j.cels.2018.01.001
Kerpedjiev P, Abdennur N, Lekschas F et al (2018) HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol 19:125
doi: 10.1186/s13059-018-1486-1
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760
doi: 10.1093/bioinformatics/btp324
Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25
doi: 10.1186/gb-2009-10-3-r25
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359
doi: 10.1038/nmeth.1923
Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997
Li H (2011) Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27:718–719
doi: 10.1093/bioinformatics/btq671
Heinz S, Benner C, Spann N et al (2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38:576–589
doi: 10.1016/j.molcel.2010.05.004
Abdenur N, Mirny LA (2020) Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics 36:311–316
doi: 10.1093/bioinformatics/btz540
Servant N, Varoquaux N, Lajoie BR et al (2015) HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol 16:259
doi: 10.1186/s13059-015-0831-x
Akdemir KC, Chin L (2015) HiCPlotter integrates genomic data with interaction matrices. Genome Biol 16:198
doi: 10.1186/s13059-015-0767-1
Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006
doi: 10.1101/gr.229102
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842
doi: 10.1093/bioinformatics/btq033
Dixon JR, Selvaraj S, Yue F et al (2012) Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485:376–380
doi: 10.1038/nature11082
Cao Y, Chen Z, Chen X et al (2019) Accurate loop calling for 3D genomic data with cLoops. Bioinformatics 36(3):666–675
Crane E, Bian Q, McCord RP et al (2015) Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523:240–244
doi: 10.1038/nature14450
Kent WJ, Zweig AS, Barber GP et al (2010) BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26:2204–2207
doi: 10.1093/bioinformatics/btq351
Robinson JT, Thorvaldsdóttir H, Winckler W et al (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26
doi: 10.1038/nbt.1754
Sridhar B, Rivas-Astroza M, Nguyen TC et al (2017) Systematic mapping of RNA-chromatin interactions in vivo. Curr Biol 27:602–609
doi: 10.1016/j.cub.2017.01.011
Hsieh T-HS, Weiner A, Lajoie B et al (2015) Mapping nucleosome resolution chromosome folding in yeast by micro-C. Cell 162:108–119
doi: 10.1016/j.cell.2015.05.048
Quinodoz SA, Ollikainen N, Tabak B et al (2018) Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell 174:744–757
doi: 10.1016/j.cell.2018.05.024
Schoenfelder S, Javierre B-M, Furlan-Magaril M et al (2018) Promoter capture Hi-C: high-resolution, genome-wide profiling of promoter interactions. J Vis Exp 136:57320
Ma W, Ay F, Lee C et al (2015) Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat Methods 12:71–78
doi: 10.1038/nmeth.3205
Flyamer IM, Gassler J, Imakaev M et al (2017) Single-nucleus Hi-C reveals unique chromatin reorganization at oocyte-to-zygote transition. Nature 544:110–114
doi: 10.1038/nature21711
Ramani V, Deng X, Qiu R et al (2019) Massively multiplex single-cell Hi-C. Nat Methods 14:264–266
Nagano T, Lubling Y, Stevens TJ et al (2013) Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502:59–64
doi: 10.1038/nature12593

Auteurs

Soohyun Lee (S)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. soohyun_lee@hms.harvard.edu.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Cephalometry Humans Anatomic Landmarks Software Internet
Humans Algorithms Software Artificial Intelligence Computer Simulation

Classifications MeSH