Integrative computational epigenomics to build data-driven gene regulation hypotheses.

Computational Biology / methods Epigenesis, Genetic Epigenomics / methods Gene Expression Regulation High-Throughput Nucleotide Sequencing Machine Learning Software

bioinformatics computational biology data integration deep learning epigenetics epigenomics gene regulation genomics high-throughput sequencing machine learning

Journal

GigaScience

ISSN: 2047-217X

Titre abrégé: Gigascience

Pays: United States

ID NLM: 101596872

Informations de publication

Date de publication:
01 06 2020

Historique:

received: 25 03 2020

revised: 25 05 2020

accepted: 26 05 2020

entrez: 17 6 2020

pubmed: 17 6 2020

medline: 5 10 2021

Statut: ppublish

Résumé

Diseases are complex phenotypes often arising as an emergent property of a non-linear network of genetic and epigenetic interactions. To translate this resulting state into a causal relationship with a subset of regulatory features, many experiments deploy an array of laboratory assays from multiple modalities. Often, each of these resulting datasets is large, heterogeneous, and noisy. Thus, it is non-trivial to unify these complex datasets into an interpretable phenotype. Although recent methods address this problem with varying degrees of success, they are constrained by their scopes or limitations. Therefore, an important gap in the field is the lack of a universal data harmonizer with the capability to arbitrarily integrate multi-modal datasets. In this review, we perform a critical analysis of methods with the explicit aim of harmonizing data, as opposed to case-specific integration. This revealed that matrix factorization, latent variable analysis, and deep learning are potent strategies. Finally, we describe the properties of an ideal universal data harmonization framework. A sufficiently advanced universal harmonizer has major medical implications, such as (i) identifying dysregulated biological pathways responsible for a disease is a powerful diagnostic tool; (2) investigating these pathways further allows the biological community to better understand a disease's mechanisms; and (3) precision medicine also benefits from developments in this area, particularly in the context of the growing field of selective epigenome editing, which can suppress or induce a desired phenotype.

Sections du résumé

BACKGROUND

RESULTS

In this review, we perform a critical analysis of methods with the explicit aim of harmonizing data, as opposed to case-specific integration. This revealed that matrix factorization, latent variable analysis, and deep learning are potent strategies. Finally, we describe the properties of an ideal universal data harmonization framework.

CONCLUSIONS

A sufficiently advanced universal harmonizer has major medical implications, such as (i) identifying dysregulated biological pathways responsible for a disease is a powerful diagnostic tool; (2) investigating these pathways further allows the biological community to better understand a disease's mechanisms; and (3) precision medicine also benefits from developments in this area, particularly in the context of the growing field of selective epigenome editing, which can suppress or induce a desired phenotype.

Identifiants

DOI: 10.1093/gigascience/giaa064 PMID: 32543653 PMC: PMC7297091

pubmed: 32543653

pii: 5858063

doi: 10.1093/gigascience/giaa064

pmc: PMC7297091

pii:

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Bioinformatics. 2019 Oct 15;35(20):3877-3883

pubmed: 31410461

Nucleic Acids Res. 2013 Jan;41(Database issue):D991-5

pubmed: 23193258

Bioinformatics. 2019 Jun 1;35(11):1974-1977

pubmed: 30364927

Genome Biol. 2019 Aug 9;20(1):159

pubmed: 31399121

Nat Commun. 2019 Sep 11;10(1):4112

pubmed: 31511512

Nat Rev Genet. 2010 Oct;11(10):733-9

pubmed: 20838408

Biochem Biophys Res Commun. 2018 Jan 22;495(4):2602-2608

pubmed: 29258823

Bioinformatics. 2012 Oct 1;28(19):2458-66

pubmed: 22863767

Nature. 2020 Mar;579(7798):270-273

pubmed: 32015507

Nature. 2014 Dec 11;516(7530):198-206

pubmed: 25503233

J Mol Biol. 2005 Feb 11;346(1):135-46

pubmed: 15663933

Bioinformatics. 2009 Jul 15;25(14):1754-60

pubmed: 19451168

Science. 2009 Oct 9;326(5950):289-93

pubmed: 19815776

Proc Natl Acad Sci U S A. 2017 Nov 14;114(46):12225-12230

pubmed: 29087325

Mol Cell. 2014 Oct 2;56(1):55-66

pubmed: 25242144

Cell. 2016 Nov 17;167(5):1145-1149

pubmed: 27863232

PLoS Biol. 2012;10(4):e1001301

pubmed: 22509135

Nucleic Acids Res. 2019 Jan 8;47(D1):D841-D846

pubmed: 30407577

Nat Biotechnol. 2010 Oct;28(10):1045-8

pubmed: 20944595

Cell Rep. 2018 Jun 26;23(13):3710-3720.e8

pubmed: 29949756

Genome Biol. 2017 Jul 27;18(1):141

pubmed: 28750683

BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):648

pubmed: 31881818

Nucleic Acids Res. 2017 Jan 4;45(D1):D25-D31

pubmed: 27924010

Bioinformatics. 2010 May 15;26(10):1308-15

pubmed: 20363728

Bioinformatics. 2010 Jun 15;26(12):i237-45

pubmed: 20529912

Nat Genet. 2017 Oct;49(10):1428-1436

pubmed: 28869592

Mol Cell. 2010 Dec 22;40(6):939-53

pubmed: 21172659

J R Soc Interface. 2018 Apr;15(141):

pubmed: 29618526

Cell. 2019 Oct 3;179(2):355-372.e23

pubmed: 31564455

Methods. 2019 Oct 28;:

pubmed: 31672653

Oncogenesis. 2019 Dec 10;8(12):73

pubmed: 31822653

Bioinformatics. 2019 Sep 1;35(17):3055-3062

pubmed: 30657866

Nucleic Acids Res. 2019 Jan 8;47(D1):D711-D715

pubmed: 30357387

Nat Biotechnol. 2015 Aug;33(8):831-8

pubmed: 26213851

PLoS Comput Biol. 2015 Feb 13;11(2):e1003983

pubmed: 25679508

Nat Rev Genet. 2020 Feb;21(2):102-117

pubmed: 31729473

Proc Natl Acad Sci U S A. 1992 Mar 1;89(5):1827-31

pubmed: 1542678

Nature. 2008 Aug 7;454(7205):766-70

pubmed: 18600261

Proc Natl Acad Sci U S A. 2014 May 27;111(21):E2191-9

pubmed: 24821768

Emerg Microbes Infect. 2020 Dec;9(1):761-770

pubmed: 32228226

Bioinformatics. 2018 Jul 15;34(14):2441-2448

pubmed: 29547932

Epigenetics Chromatin. 2020 Feb 6;13(1):4

pubmed: 32029002

Cell. 2018 Jul 12;174(2):363-376.e16

pubmed: 29887381

Nat Methods. 2014 Mar;11(3):333-7

pubmed: 24464287

Nat Med. 2004 Aug;10(8):789-99

pubmed: 15286780

Nucleic Acids Res. 2015 Apr 20;43(7):e47

pubmed: 25605792

Nucleic Acids Res. 2018 Jan 4;46(D1):D21-D29

pubmed: 29186510

Sci Rep. 2016 Dec 08;6:38433

pubmed: 27929098

Genome Biol. 2013 Mar 12;14(3):R21

pubmed: 23497655

PLoS One. 2017 May 11;12(5):e0177459

pubmed: 28494014

Genome Res. 2014 Jan;24(1):1-13

pubmed: 24196873

J Mol Biol. 1961 Jun;3:318-56

pubmed: 13718526

Nucleic Acids Res. 2018 Jan 4;46(D1):D794-D801

pubmed: 29126249

Nucleic Acids Res. 2012 Oct;40(19):9379-91

pubmed: 22879375

Front Physiol. 2019 May 22;10:369

pubmed: 31191327

Development. 2016 Jun 1;143(11):1838-47

pubmed: 27246710

Nat Methods. 2020 Jan;17(1):17-20

pubmed: 31907464

Biostatistics. 2014 Jul;15(3):569-83

pubmed: 24550197

Nat Biotechnol. 2019 Jun;37(6):685-691

pubmed: 31061482

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W369-73

pubmed: 16845028

Cell. 2019 Jun 13;177(7):1873-1887.e17

pubmed: 31178122

PLoS One. 2013 May 31;8(5):e64832

pubmed: 23741402

Nat Genet. 2016 Nov;48(11):1370-1376

pubmed: 27668660

Genome Biol. 2004;5(10):R80

pubmed: 15461798

Bioinformatics. 2009 Nov 15;25(22):2906-12

pubmed: 19759197

Bioinformatics. 2012 Dec 15;28(24):3290-7

pubmed: 23047558

Nat Rev Genet. 2017 Jan;18(1):51-66

pubmed: 27867193

Nat Methods. 2013 Dec;10(12):1213-8

pubmed: 24097267

Science. 2017 Sep 22;357(6357):1299-1303

pubmed: 28798045

Bioinformatics. 2020 Mar 1;36(6):1704-1711

pubmed: 31742318

Nature. 1998 Feb 19;391(6669):806-11

pubmed: 9486653

F1000Res. 2017 Jun 13;6:

pubmed: 28751965

Proc Natl Acad Sci U S A. 1998 Nov 10;95(23):13959-64

pubmed: 9811908

Clin Epigenetics. 2019 May 16;11(1):81

pubmed: 31097014

PLoS Comput Biol. 2019 Oct 30;15(10):e1007436

pubmed: 31665135

Ann Appl Stat. 2013 Mar 1;7(1):523-542

pubmed: 23745156

Nat Biotechnol. 2018 Jun;36(5):421-427

pubmed: 29608177

Nucleic Acids Res. 2013 Mar 1;41(5):2918-31

pubmed: 23355616

Nat Commun. 2016 Nov 24;7:13637

pubmed: 27882922

Bioinformatics. 2017 Jul 15;33(14):i333-i340

pubmed: 28881975

Nat Methods. 2011 Jan;8(1):61-5

pubmed: 21102452

J Comput Biol. 2012 May;19(5):455-77

pubmed: 22506599

Front Pharmacol. 2018 Oct 08;9:1113

pubmed: 30349480

Contemp Oncol (Pozn). 2015;19(1A):A68-77

pubmed: 25691825

Mol Cell. 2011 Nov 18;44(4):667-78

pubmed: 21963238

Bioinformatics. 2016 Jan 1;32(1):1-8

pubmed: 26377073

Nucleic Acids Res. 2005 Oct 13;33(18):5868-77

pubmed: 16224102

Nature. 2019 Nov;575(7781):229-233

pubmed: 31666694

Hum Gene Ther. 2017 Nov;28(11):1105-1115

pubmed: 28806883

Mol Cell. 2018 Nov 1;72(3):594-600.e2

pubmed: 30401433

Cell. 2019 Jun 13;177(7):1888-1902.e21

pubmed: 31178118

Science. 2012 Aug 17;337(6096):816-21

pubmed: 22745249

BMC Bioinformatics. 2017 Feb 27;18(1):128

pubmed: 28241739

Nature. 2013 Mar 21;495(7441):384-8

pubmed: 23446346

Science. 2007 Jun 8;316(5830):1497-502

pubmed: 17540862

Nat Methods. 2015 Feb;12(2):115-21

pubmed: 25633503

Nucleic Acids Res. 2019 Nov 18;47(20):10580-10596

pubmed: 31584093

NPJ Syst Biol Appl. 2019 Jul 9;5:22

pubmed: 31312515

Nucleic Acids Res. 2015 Oct 15;43(18):8694-712

pubmed: 26338778

Nat Biotechnol. 2014 Sep;32(9):896-902

pubmed: 25150836

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W13-7

pubmed: 21558174

Epigenetics Chromatin. 2016 Nov 9;9:50

pubmed: 27833659

Mol Syst Biol. 2018 Jun 20;14(6):e8124

pubmed: 29925568

EMBnet J. 2018;24:

pubmed: 29782620

Nature. 2020 Mar;579(7798):265-269

pubmed: 32015508

Integrative computational epigenomics to build data-driven gene regulation hypotheses.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Tyrone Chen (T)

Sonika Tyagi (S)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

Selecting optimal software code descriptors-The case of Java.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Fasciola hepatica and Fasciola hybrid form co-existence in yak from Tibet of China: application of rDNA internal transcribed spacer.

Classifications MeSH