An intrinsically interpretable neural network architecture for sequence to function learning.


Journal

bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187

Informations de publication

Date de publication:
28 Mar 2023
Historique:
pubmed: 8 2 2023
medline: 8 2 2023
entrez: 7 2 2023
Statut: epublish

Résumé

Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post hoc analyses, and even then, one can often not explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called tiSFM (totally interpretable sequence to function model). tiSFM improves upon the performance of standard multi-layer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multi-layer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs. We analyze published open chromatin measurements across hematopoietic lineage cell-types and demonstrate that tiSFM outperforms a state-of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM's model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function of developmental transition. The source code, including scripts for the analysis of key findings, can be found at https://github.com/boooooogey/ATAConv, implemented in Python.

Identifiants

pubmed: 36747873
doi: 10.1101/2023.01.25.525572
pmc: PMC9900791
pii:
doi:

Types de publication

Preprint

Langues

eng

Subventions

Organisme : NHGRI NIH HHS
ID : R01 HG009299
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL127349
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL157879
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL159805
Pays : United States

Commentaires et corrections

Type : UpdateIn

Références

Bioinformatics. 2007 Jan 15;23(2):134-41
pubmed: 17098775
Haematologica. 2018 Apr;103(4):565-574
pubmed: 29519871
Nucleic Acids Res. 2016 Jun 20;44(11):e107
pubmed: 27084946
Nat Biotechnol. 2015 Aug;33(8):831-8
pubmed: 26213851
Nat Genet. 2021 Mar;53(3):354-366
pubmed: 33603233
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Bioinformatics. 2020 Jul 1;36(Suppl_1):i499-i507
pubmed: 32657418
Nat Methods. 2015 Oct;12(10):931-4
pubmed: 26301843
Front Biol (Beijing). 2014 Aug;9(4):269-282
pubmed: 25506356
Nat Commun. 2019 Sep 20;10(1):4295
pubmed: 31541153
Immunity. 2012 Oct 19;37(4):685-96
pubmed: 23021953
Genome Res. 2016 Jul;26(7):990-9
pubmed: 27197224
Nat Methods. 2021 Oct;18(10):1196-1203
pubmed: 34608324
Nat Immunol. 2021 Aug;22(8):969-982
pubmed: 34312548
Curr Top Microbiol Immunol. 2012;356:17-38
pubmed: 21735360
J Exp Med. 2021 Apr 5;218(4):
pubmed: 33433611
Elife. 2022 Apr 20;11:
pubmed: 35442882
Nucleic Acids Res. 2021 Oct 11;49(18):10309-10327
pubmed: 34508359
Blood. 2014 Nov 13;124(20):3065-75
pubmed: 25185710
Proc Natl Acad Sci U S A. 2020 Oct 13;117(41):25655-25666
pubmed: 32978299
Cell. 2014 Sep 11;158(6):1431-1443
pubmed: 25215497
Blood. 2005 Dec 1;106(12):3988-94
pubmed: 16091451
Nat Immunol. 2016 Jun;17(6):695-703
pubmed: 27111144
Nat Commun. 2021 Oct 6;12(1):5863
pubmed: 34615872
Nat Rev Immunol. 2009 Feb;9(2):125-35
pubmed: 19151747
J Immunol. 2011 Jun 15;186(12):6649-55
pubmed: 21646301
Front Immunol. 2021 Sep 24;12:732511
pubmed: 34630413
Front Immunol. 2012 Apr 09;3:72
pubmed: 22566953
PLoS Comput Biol. 2019 Dec 19;15(12):e1007560
pubmed: 31856220
Blood. 2021 Jun 3;137(22):3037-3049
pubmed: 33619557
Bioinformatics. 2011 Jun 15;27(12):1603-9
pubmed: 21543443
Genome Res. 2018 Jan;28(1):122-131
pubmed: 29208628

Auteurs

Ali Tuğrul Balcı (AT)

Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States and.
Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, 15213, Unites States and.

Mark Maher Ebeid (MM)

Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States and.
Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, 15213, Unites States and.

Panayiotis V Benos (PV)

Department of Epidemiology, University of Florida, Gainesville, 32610, Unites States.

Dennis Kostka (D)

Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States and.
Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, 15213, Unites States and.

Maria Chikina (M)

Joint Carnegie Mellon University-University of Pittsburgh Program in Computational Biology, Institution, Pittsburgh, 15213, United States and.
Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, 15213, Unites States and.

Classifications MeSH