FactorNet: A deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data.


Journal

Methods (San Diego, Calif.)
ISSN: 1095-9130
Titre abrégé: Methods
Pays: United States
ID NLM: 9426302

Informations de publication

Date de publication:
15 08 2019
Historique:
received: 01 11 2018
revised: 05 03 2019
accepted: 20 03 2019
pubmed: 30 3 2019
medline: 18 6 2020
entrez: 30 3 2019
Statut: ppublish

Résumé

Due to the large numbers of transcription factors (TFs) and cell types, querying binding profiles of all valid TF/cell type pairs is not experimentally feasible. To address this issue, we developed a convolutional-recurrent neural network model, called FactorNet, to computationally impute the missing binding data. FactorNet trains on binding data from reference cell types to make predictions on testing cell types by leveraging a variety of features, including genomic sequences, genome annotations, gene expression, and signal data, such as DNase I cleavage. FactorNet implements several convenient strategies to reduce runtime and memory consumption. By visualizing the neural network models, we can interpret how the model predicts binding. We also investigate the variables that affect cross-cell type accuracy, and offer suggestions to improve upon this field. Our method ranked among the top teams in the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge, achieving first place on six of the 13 final round evaluation TF/cell type pairs, the most of any competing team. The FactorNet source code is publicly available, allowing users to reproduce our methodology from the ENCODE-DREAM Challenge.

Identifiants

pubmed: 30922998
pii: S1046-2023(18)30329-3
doi: 10.1016/j.ymeth.2019.03.020
pmc: PMC6708499
mid: NIHMS1525354
pii:
doi:

Substances chimiques

Chromatin 0
Nucleotides 0
Transcription Factors 0

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

40-47

Subventions

Organisme : NIBIB NIH HHS
ID : T32 EB009418
Pays : United States

Informations de copyright

Copyright © 2019 Elsevier Inc. All rights reserved.

Références

Elife. 2017 Jan 16;6:
pubmed: 28079019
Nucleic Acids Res. 2016 Jul 8;44(W1):W160-5
pubmed: 27079975
Genome Res. 2013 Dec;23(12):2136-48
pubmed: 24170599
Nucleic Acids Res. 2016 Jun 20;44(11):e107
pubmed: 27084946
Curr Protoc Mol Biol. 2013 Jul;Chapter 27:Unit 21.27
pubmed: 23821440
Bioinformatics. 2011 Dec 15;27(24):3423-4
pubmed: 21949271
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2016 Dec;2016:178-183
pubmed: 32551184
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Genome Biol. 2017 Apr 11;18(1):67
pubmed: 28395661
BMC Genomics. 2018 May 23;19(1):390
pubmed: 29792182
Bioinformatics. 2015 Mar 1;31(5):761-3
pubmed: 25338716
PLoS One. 2015 Sep 25;10(9):e0138030
pubmed: 26406244
Cell. 2011 Dec 9;147(6):1408-19
pubmed: 22153082
Nature. 2012 Sep 6;489(7414):83-90
pubmed: 22955618
Nat Methods. 2009 Apr;6(4):283-9
pubmed: 19305407
Genome Biol. 2015 Jan 24;16:14
pubmed: 25616342
Bioinformatics. 2016 Jun 15;32(12):1832-9
pubmed: 26873929
Bioinformatics. 2010 Sep 1;26(17):2204-7
pubmed: 20639541
Nat Methods. 2012 Mar 18;9(5):473-6
pubmed: 22426492
Nat Biotechnol. 2015 Aug;33(8):831-8
pubmed: 26213851
Genome Res. 2011 Mar;21(3):456-64
pubmed: 21106903
Nat Biotechnol. 2008 Dec;26(12):1351-9
pubmed: 19029915
Nature. 2015 Feb 19;518(7539):317-30
pubmed: 25693563
Curr Protoc Mol Biol. 2015 Jan 05;109:21.29.1-21.29.9
pubmed: 25559105
Genome Res. 2006 Jan;16(1):123-31
pubmed: 16344561
Nucleic Acids Res. 2016 Jan 4;44(D1):D110-5
pubmed: 26531826
Nat Biotechnol. 2015 Apr;33(4):364-76
pubmed: 25690853
Nat Biotechnol. 2014 Feb;32(2):171-178
pubmed: 24441470
Genome Res. 2002 Jun;12(6):996-1006
pubmed: 12045153
Genome Res. 2012 Sep;22(9):1760-74
pubmed: 22955987
Nat Methods. 2012 Feb 28;9(3):215-6
pubmed: 22373907
Genome Res. 2011 Mar;21(3):447-55
pubmed: 21106904
Nat Methods. 2015 Oct;12(10):931-4
pubmed: 26301843
Nucleic Acids Res. 2015 Jul 1;43(W1):W50-6
pubmed: 25904632
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W187-91
pubmed: 24799436
Genome Res. 2007 Jun;17(6):877-85
pubmed: 17179217
Neural Netw. 2005 Jun-Jul;18(5-6):602-10
pubmed: 16112549
J Mol Biol. 1987 Jul 20;196(2):261-82
pubmed: 3656447
Bioinformatics. 2014 Jun 15;30(12):i121-9
pubmed: 24931975
Epigenetics Chromatin. 2015 Jul 16;8:23
pubmed: 26180553
Science. 2007 Jun 8;316(5830):1497-502
pubmed: 17540862
PLoS One. 2015 Mar 04;10(3):e0118432
pubmed: 25738806
Genome Biol. 2007;8(2):R24
pubmed: 17324271
Genome Res. 2016 Jul;26(7):990-9
pubmed: 27197224
Nat Biotechnol. 2019 Jun;37(6):592-600
pubmed: 31138913
Brief Bioinform. 2013 Mar;14(2):178-92
pubmed: 22517427
Bioinformatics. 2014 Jun 15;30(12):1667-73
pubmed: 24532725

Auteurs

Daniel Quang (D)

University of California, Department of Computer Science, Irvine, CA 92697, United States. Electronic address: daquang@umich.edu.

Xiaohui Xie (X)

University of California, Department of Computer Science, Irvine, CA 92697, United States. Electronic address: xhx@ics.uci.edu.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Adenosine Triphosphate Adenosine Diphosphate Mitochondrial ADP, ATP Translocases Binding Sites Mitochondria

Classifications MeSH