Integrative computational epigenomics to build data-driven gene regulation hypotheses.

bioinformatics computational biology data integration deep learning epigenetics epigenomics gene regulation genomics high-throughput sequencing machine learning

Journal

GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872

Informations de publication

Date de publication:
01 06 2020
Historique:
received: 25 03 2020
revised: 25 05 2020
accepted: 26 05 2020
entrez: 17 6 2020
pubmed: 17 6 2020
medline: 5 10 2021
Statut: ppublish

Résumé

Diseases are complex phenotypes often arising as an emergent property of a non-linear network of genetic and epigenetic interactions. To translate this resulting state into a causal relationship with a subset of regulatory features, many experiments deploy an array of laboratory assays from multiple modalities. Often, each of these resulting datasets is large, heterogeneous, and noisy. Thus, it is non-trivial to unify these complex datasets into an interpretable phenotype. Although recent methods address this problem with varying degrees of success, they are constrained by their scopes or limitations. Therefore, an important gap in the field is the lack of a universal data harmonizer with the capability to arbitrarily integrate multi-modal datasets. In this review, we perform a critical analysis of methods with the explicit aim of harmonizing data, as opposed to case-specific integration. This revealed that matrix factorization, latent variable analysis, and deep learning are potent strategies. Finally, we describe the properties of an ideal universal data harmonization framework. A sufficiently advanced universal harmonizer has major medical implications, such as (i) identifying dysregulated biological pathways responsible for a disease is a powerful diagnostic tool; (2) investigating these pathways further allows the biological community to better understand a disease's mechanisms; and (3) precision medicine also benefits from developments in this area, particularly in the context of the growing field of selective epigenome editing, which can suppress or induce a desired phenotype.

Sections du résumé

BACKGROUND
Diseases are complex phenotypes often arising as an emergent property of a non-linear network of genetic and epigenetic interactions. To translate this resulting state into a causal relationship with a subset of regulatory features, many experiments deploy an array of laboratory assays from multiple modalities. Often, each of these resulting datasets is large, heterogeneous, and noisy. Thus, it is non-trivial to unify these complex datasets into an interpretable phenotype. Although recent methods address this problem with varying degrees of success, they are constrained by their scopes or limitations. Therefore, an important gap in the field is the lack of a universal data harmonizer with the capability to arbitrarily integrate multi-modal datasets.
RESULTS
In this review, we perform a critical analysis of methods with the explicit aim of harmonizing data, as opposed to case-specific integration. This revealed that matrix factorization, latent variable analysis, and deep learning are potent strategies. Finally, we describe the properties of an ideal universal data harmonization framework.
CONCLUSIONS
A sufficiently advanced universal harmonizer has major medical implications, such as (i) identifying dysregulated biological pathways responsible for a disease is a powerful diagnostic tool; (2) investigating these pathways further allows the biological community to better understand a disease's mechanisms; and (3) precision medicine also benefits from developments in this area, particularly in the context of the growing field of selective epigenome editing, which can suppress or induce a desired phenotype.

Identifiants

pubmed: 32543653
pii: 5858063
doi: 10.1093/gigascience/giaa064
pmc: PMC7297091
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press.

Références

Bioinformatics. 2019 Oct 15;35(20):3877-3883
pubmed: 31410461
Nucleic Acids Res. 2013 Jan;41(Database issue):D991-5
pubmed: 23193258
Bioinformatics. 2019 Jun 1;35(11):1974-1977
pubmed: 30364927
Genome Biol. 2019 Aug 9;20(1):159
pubmed: 31399121
Nat Commun. 2019 Sep 11;10(1):4112
pubmed: 31511512
Nat Rev Genet. 2010 Oct;11(10):733-9
pubmed: 20838408
Biochem Biophys Res Commun. 2018 Jan 22;495(4):2602-2608
pubmed: 29258823
Bioinformatics. 2012 Oct 1;28(19):2458-66
pubmed: 22863767
Nature. 2020 Mar;579(7798):270-273
pubmed: 32015507
Nature. 2014 Dec 11;516(7530):198-206
pubmed: 25503233
J Mol Biol. 2005 Feb 11;346(1):135-46
pubmed: 15663933
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Science. 2009 Oct 9;326(5950):289-93
pubmed: 19815776
Proc Natl Acad Sci U S A. 2017 Nov 14;114(46):12225-12230
pubmed: 29087325
Mol Cell. 2014 Oct 2;56(1):55-66
pubmed: 25242144
Cell. 2016 Nov 17;167(5):1145-1149
pubmed: 27863232
PLoS Biol. 2012;10(4):e1001301
pubmed: 22509135
Nucleic Acids Res. 2019 Jan 8;47(D1):D841-D846
pubmed: 30407577
Nat Biotechnol. 2010 Oct;28(10):1045-8
pubmed: 20944595
Cell Rep. 2018 Jun 26;23(13):3710-3720.e8
pubmed: 29949756
Genome Biol. 2017 Jul 27;18(1):141
pubmed: 28750683
BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):648
pubmed: 31881818
Nucleic Acids Res. 2017 Jan 4;45(D1):D25-D31
pubmed: 27924010
Bioinformatics. 2010 May 15;26(10):1308-15
pubmed: 20363728
Bioinformatics. 2010 Jun 15;26(12):i237-45
pubmed: 20529912
Nat Genet. 2017 Oct;49(10):1428-1436
pubmed: 28869592
Mol Cell. 2010 Dec 22;40(6):939-53
pubmed: 21172659
J R Soc Interface. 2018 Apr;15(141):
pubmed: 29618526
Cell. 2019 Oct 3;179(2):355-372.e23
pubmed: 31564455
Methods. 2019 Oct 28;:
pubmed: 31672653
Oncogenesis. 2019 Dec 10;8(12):73
pubmed: 31822653
Bioinformatics. 2019 Sep 1;35(17):3055-3062
pubmed: 30657866
Nucleic Acids Res. 2019 Jan 8;47(D1):D711-D715
pubmed: 30357387
Nat Biotechnol. 2015 Aug;33(8):831-8
pubmed: 26213851
PLoS Comput Biol. 2015 Feb 13;11(2):e1003983
pubmed: 25679508
Nat Rev Genet. 2020 Feb;21(2):102-117
pubmed: 31729473
Proc Natl Acad Sci U S A. 1992 Mar 1;89(5):1827-31
pubmed: 1542678
Nature. 2008 Aug 7;454(7205):766-70
pubmed: 18600261
Proc Natl Acad Sci U S A. 2014 May 27;111(21):E2191-9
pubmed: 24821768
Emerg Microbes Infect. 2020 Dec;9(1):761-770
pubmed: 32228226
Bioinformatics. 2018 Jul 15;34(14):2441-2448
pubmed: 29547932
Epigenetics Chromatin. 2020 Feb 6;13(1):4
pubmed: 32029002
Cell. 2018 Jul 12;174(2):363-376.e16
pubmed: 29887381
Nat Methods. 2014 Mar;11(3):333-7
pubmed: 24464287
Nat Med. 2004 Aug;10(8):789-99
pubmed: 15286780
Nucleic Acids Res. 2015 Apr 20;43(7):e47
pubmed: 25605792
Nucleic Acids Res. 2018 Jan 4;46(D1):D21-D29
pubmed: 29186510
Sci Rep. 2016 Dec 08;6:38433
pubmed: 27929098
Genome Biol. 2013 Mar 12;14(3):R21
pubmed: 23497655
PLoS One. 2017 May 11;12(5):e0177459
pubmed: 28494014
Genome Res. 2014 Jan;24(1):1-13
pubmed: 24196873
J Mol Biol. 1961 Jun;3:318-56
pubmed: 13718526
Nucleic Acids Res. 2018 Jan 4;46(D1):D794-D801
pubmed: 29126249
Nucleic Acids Res. 2012 Oct;40(19):9379-91
pubmed: 22879375
Front Physiol. 2019 May 22;10:369
pubmed: 31191327
Development. 2016 Jun 1;143(11):1838-47
pubmed: 27246710
Nat Methods. 2020 Jan;17(1):17-20
pubmed: 31907464
Biostatistics. 2014 Jul;15(3):569-83
pubmed: 24550197
Nat Biotechnol. 2019 Jun;37(6):685-691
pubmed: 31061482
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W369-73
pubmed: 16845028
Cell. 2019 Jun 13;177(7):1873-1887.e17
pubmed: 31178122
PLoS One. 2013 May 31;8(5):e64832
pubmed: 23741402
Nat Genet. 2016 Nov;48(11):1370-1376
pubmed: 27668660
Genome Biol. 2004;5(10):R80
pubmed: 15461798
Bioinformatics. 2009 Nov 15;25(22):2906-12
pubmed: 19759197
Bioinformatics. 2012 Dec 15;28(24):3290-7
pubmed: 23047558
Nat Rev Genet. 2017 Jan;18(1):51-66
pubmed: 27867193
Nat Methods. 2013 Dec;10(12):1213-8
pubmed: 24097267
Science. 2017 Sep 22;357(6357):1299-1303
pubmed: 28798045
Bioinformatics. 2020 Mar 1;36(6):1704-1711
pubmed: 31742318
Nature. 1998 Feb 19;391(6669):806-11
pubmed: 9486653
F1000Res. 2017 Jun 13;6:
pubmed: 28751965
Proc Natl Acad Sci U S A. 1998 Nov 10;95(23):13959-64
pubmed: 9811908
Clin Epigenetics. 2019 May 16;11(1):81
pubmed: 31097014
PLoS Comput Biol. 2019 Oct 30;15(10):e1007436
pubmed: 31665135
Ann Appl Stat. 2013 Mar 1;7(1):523-542
pubmed: 23745156
Nat Biotechnol. 2018 Jun;36(5):421-427
pubmed: 29608177
Nucleic Acids Res. 2013 Mar 1;41(5):2918-31
pubmed: 23355616
Nat Commun. 2016 Nov 24;7:13637
pubmed: 27882922
Bioinformatics. 2017 Jul 15;33(14):i333-i340
pubmed: 28881975
Nat Methods. 2011 Jan;8(1):61-5
pubmed: 21102452
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Front Pharmacol. 2018 Oct 08;9:1113
pubmed: 30349480
Contemp Oncol (Pozn). 2015;19(1A):A68-77
pubmed: 25691825
Mol Cell. 2011 Nov 18;44(4):667-78
pubmed: 21963238
Bioinformatics. 2016 Jan 1;32(1):1-8
pubmed: 26377073
Nucleic Acids Res. 2005 Oct 13;33(18):5868-77
pubmed: 16224102
Nature. 2019 Nov;575(7781):229-233
pubmed: 31666694
Hum Gene Ther. 2017 Nov;28(11):1105-1115
pubmed: 28806883
Mol Cell. 2018 Nov 1;72(3):594-600.e2
pubmed: 30401433
Cell. 2019 Jun 13;177(7):1888-1902.e21
pubmed: 31178118
Science. 2012 Aug 17;337(6096):816-21
pubmed: 22745249
BMC Bioinformatics. 2017 Feb 27;18(1):128
pubmed: 28241739
Nature. 2013 Mar 21;495(7441):384-8
pubmed: 23446346
Science. 2007 Jun 8;316(5830):1497-502
pubmed: 17540862
Nat Methods. 2015 Feb;12(2):115-21
pubmed: 25633503
Nucleic Acids Res. 2019 Nov 18;47(20):10580-10596
pubmed: 31584093
NPJ Syst Biol Appl. 2019 Jul 9;5:22
pubmed: 31312515
Nucleic Acids Res. 2015 Oct 15;43(18):8694-712
pubmed: 26338778
Nat Biotechnol. 2014 Sep;32(9):896-902
pubmed: 25150836
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W13-7
pubmed: 21558174
Epigenetics Chromatin. 2016 Nov 9;9:50
pubmed: 27833659
Mol Syst Biol. 2018 Jun 20;14(6):e8124
pubmed: 29925568
EMBnet J. 2018;24:
pubmed: 29782620
Nature. 2020 Mar;579(7798):265-269
pubmed: 32015508

Auteurs

Tyrone Chen (T)

25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia.

Sonika Tyagi (S)

25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH