Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2.


Journal

Nature protocols
ISSN: 1750-2799
Titre abrégé: Nat Protoc
Pays: England
ID NLM: 101284307

Informations de publication

Date de publication:
03 2020
Historique:
received: 19 06 2018
accepted: 27 11 2019
pubmed: 26 1 2020
medline: 22 4 2020
entrez: 26 1 2020
Statut: ppublish

Résumé

Fit-Hi-C is a programming application to compute statistical confidence estimates for Hi-C contact maps to identify significant chromatin contacts. By fitting a monotonically non-increasing spline, Fit-Hi-C captures the relationship between genomic distance and contact probability without any parametric assumption. The spline fit together with the correction of contact probabilities with respect to bin- or locus-specific biases accounts for previously characterized covariates impacting Hi-C contact counts. Fit-Hi-C is best applied for the study of mid-range (e.g., 20 kb-2 Mb for human genome) intra-chromosomal contacts; however, with the latest reimplementation, named FitHiC2, it is possible to perform genome-wide analysis for high-resolution Hi-C data, including all intra-chromosomal distances and inter-chromosomal contacts. FitHiC2 also offers a merging filter module, which eliminates indirect/bystander interactions, leading to significant reduction in the number of reported contacts without sacrificing recovery of key loops such as those between convergent CTCF binding sites. Here, we describe how to apply the FitHiC2 protocol to three use cases: (i) 5-kb resolution Hi-C data of chromosome 5 from GM12878 (a human lymphoblastoid cell line), (ii) 40-kb resolution whole-genome Hi-C data from IMR90 (human lung fibroblast), and (iii) budding yeast whole-genome Hi-C data at a single restriction cut site (EcoRI) resolution. The procedure takes ~12 h with preprocessing when all use cases are run sequentially (~4 h when run parallel). With the recent improvements in its implementation, FitHiC2 (8 processors and 16 GB memory) is also scalable to genome-wide analysis of the highest resolution (1 kb) Hi-C data available to date (~48 h with 32 GB peak memory). FitHiC2 is available through Bioconda, GitHub and the Python Package Index.

Identifiants

pubmed: 31980751
doi: 10.1038/s41596-019-0273-0
pii: 10.1038/s41596-019-0273-0
pmc: PMC7451401
mid: NIHMS1584039
doi:

Substances chimiques

Chromatin 0

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

991-1012

Subventions

Organisme : NIGMS NIH HHS
ID : R35 GM128938
Pays : United States
Organisme : U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
ID : R35-GM128938
Pays : International

Références

Bickmore, W. A. The spatial organization of the human genome. Annu. Rev. Genomics Hum. Genet. 14, 67–84 (2013).
doi: 10.1146/annurev-genom-091212-153515
Dekker, J., Marti-Renom, M. A. & Mirny, L. A. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat. Rev. Genet. 14, 390–403 (2013).
doi: 10.1038/nrg3454
Quinodoz, S. A. et al. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell 174, 744–757.e24 (2018).
doi: 10.1016/j.cell.2018.05.024
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
doi: 10.1016/j.cell.2014.11.021
Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010).
doi: 10.1038/nature08973
Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011).
doi: 10.1038/nbt.2057
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
doi: 10.1126/science.1181369
Stadhouders, R. et al. Transcription regulation by distal enhancers: who’s in the loop? Transcription 3, 181–186 (2012).
doi: 10.4161/trns.20720
Ay, F. & Noble, W. S. Analysis methods for studying the 3D architecture of the genome. Genome Biol. 16, 183 (2015).
doi: 10.1186/s13059-015-0745-7
Lajoie, B. R., Dekker, J. & Kaplan, N. The hitchhiker’s guide to Hi-C analysis: practical guidelines. Methods 72, 65–75 (2015).
doi: 10.1016/j.ymeth.2014.10.031
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
doi: 10.1186/s13059-015-0831-x
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
doi: 10.1038/nmeth.2148
Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43, 1059–1065 (2011).
doi: 10.1038/ng.947
Ay, F., Bailey, T. L. & Noble, W. S. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 24, 999–1011 (2014).
doi: 10.1101/gr.160374.113
Bhattacharyya, S., Chandra, V., Vijayanand, P. & Ay, F. Identification of significant chromatin contacts from HiChIP data by FitHiChIP. Nat. Commun. 10, 4221 (2019).
doi: 10.1038/s41467-019-11950-y
Knight, P. A. & Ruiz, D. A fast algorithm for matrix balancing. IMA J. Numer. Anal. 33, 1029–1047 (2013).
doi: 10.1093/imanum/drs019
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).
Ay, F. et al. Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression. Genome Res. 24, 974–988 (2014).
doi: 10.1101/gr.169417.113
Wang, C. et al. Genome-wide analysis of local chromatin packing in Arabidopsis thaliana. Genome Res. 25, 246–256 (2015).
doi: 10.1101/gr.170332.113
Wang, M. et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat. Genet. 49, 579–587 (2017).
doi: 10.1038/ng.3807
Ay, F. et al. Identifying multi-locus chromatin contacts in human cells using tethered multiple 3C. BMC Genomics 16, 121 (2015).
doi: 10.1186/s12864-015-1236-7
Bunnik, E. M. et al. Comparative 3D genome organization in apicomplexan parasites. Proc. Natl Acad. Sci. USA 116, 3183–3192 (2019).
doi: 10.1073/pnas.1810815116
Forcato, M. et al. Comparison of computational methods for Hi-C data analysis. Nat. Methods 14, 679–685 (2017).
doi: 10.1038/nmeth.4325
Hwang, Y. C. et al. HIPPIE: a high-throughput identification pipeline for promoter interacting enhancer elements. Bioinformatics 31, 1290–1292 (2015).
doi: 10.1093/bioinformatics/btu801
Lun, A. T. & Smyth, G. K. diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data. BMC Bioinformatics 16, 258 (2015).
doi: 10.1186/s12859-015-0683-0
Mifsud, B. et al. GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data. PLoS One 12, e0174744 (2017).
doi: 10.1371/journal.pone.0174744
Carty, M. et al. An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data. Nat. Commun. 8, 15454 (2017).
doi: 10.1038/ncomms15454
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
doi: 10.1038/nature11082
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
doi: 10.1038/s41588-019-0538-0
Chakraborty, A. & Ay, F. Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics, https://doi.org/10.1093/bioinformatics/btx664 (2017).
Dixon, J. R. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398 (2018).
doi: 10.1038/s41588-018-0195-8
Jin, F. et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503, 290–294 (2013).
doi: 10.1038/nature12644
Yardimci, G. G. et al. Measuring the reproducibility and quality of Hi-C data. Genome Biol. 20, 57 (2019).
doi: 10.1186/s13059-019-1658-7
Huang, J., Marco, E., Pinello, L. & Yuan, G. C. Predicting chromatin organization using histone marks. Genome Biol. 16, 162 (2015).
doi: 10.1186/s13059-015-0740-z
Kaul, A., Bhattacharyya, S. & Ay, F. Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2. Zenodo, https://doi.org/10.5281/zenodo.3380589 (2019).
Kaul, A., Bhattacharyya, S. & Ay, F. Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2. Code Ocean, https://doi.org/10.24433/CO.5589539.v2 (2019).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
doi: 10.1016/j.cels.2016.07.002
Yardimci, G. G. & Noble, W. S. Software tools for visualizing Hi-C data. Genome Biol. 18, 26 (2017).

Auteurs

Arya Kaul (A)

Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Sourya Bhattacharyya (S)

Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, CA, USA.

Ferhat Ay (F)

Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, CA, USA. ferhatay@lji.org.
School of Medicine, University of California San Diego, La Jolla, CA, USA. ferhatay@lji.org.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH