Mini-batch optimization enables training of ODE models on large-scale datasets.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
10 01 2022
10 01 2022
Historique:
received:
30
11
2019
accepted:
11
11
2021
entrez:
11
1
2022
pubmed:
12
1
2022
medline:
28
1
2022
Statut:
epublish
Résumé
Quantitative dynamic models are widely used to study cellular signal processing. A critical step in modelling is the estimation of unknown model parameters from experimental data. As model sizes and datasets are steadily growing, established parameter optimization approaches for mechanistic models become computationally extremely challenging. Mini-batch optimization methods, as employed in deep learning, have better scaling properties. In this work, we adapt, apply, and benchmark mini-batch optimization for ordinary differential equation (ODE) models, thereby establishing a direct link between dynamic modelling and machine learning. On our main application example, a large-scale model of cancer signaling, we benchmark mini-batch optimization against established methods, achieving better optimization results and reducing computation by more than an order of magnitude. We expect that our work will serve as a first step towards mini-batch optimization tailored to ODE models and enable modelling of even larger and more complex systems than what is currently possible.
Identifiants
pubmed: 35013141
doi: 10.1038/s41467-021-27374-6
pii: 10.1038/s41467-021-27374-6
pmc: PMC8748893
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
34Informations de copyright
© 2022. The Author(s).
Références
Münzner, U., Klipp, E. & Krantz, M. A comprehensive, mechanistically detailed, and executable model of the cell division cycle in saccharomyces cerevisiae. Nat. Commun. 10, 1308 (2019).
Lloyd, A. C. The regulation of cell size. Cell 154, 1194–1205 (2013).
pubmed: 24034244
doi: 10.1016/j.cell.2013.08.053
Chaker, Z., Aïd, S., Berry, H. & Holzenberger, M. Suppression of igf-i signals in neural stem cells enhances neurogenesis and olfactory function during aging. Aging Cell 14, 847–856 (2015).
pubmed: 26219530
pmcid: 4568972
doi: 10.1111/acel.12365
Spencer, S. L. & Sorger, P. K. Measuring and modeling apoptosis in single cells. Cell 144, 926–939 (2011).
pubmed: 21414484
pmcid: 3087303
doi: 10.1016/j.cell.2011.03.002
Kitano, H. Computational systems biology. Nature 420, 206–210 (2002).
pubmed: 12432404
doi: 10.1038/nature01254
Kholodenko, B. N., Demin, O. V., Moehren, G. & Hoek, J. B. Quantification of short term signaling by the epidermal growth factor receptor. J. Biol. Chem. 274, 30169–30181 (1999).
pubmed: 10514507
doi: 10.1074/jbc.274.42.30169
Swameye, I., Müller, T. G., Timmer, J., Sandra, O. & Klingmüller, U. Identification of nucleocytoplasmic cycling as a remote sensor in cellular signaling by databased modeling. Proc. Natl Acad. Sci. USA 100, 1028–1033 (2003).
pubmed: 12552139
pmcid: 298720
doi: 10.1073/pnas.0237333100
Zheng, Y. et al. Total kinetic analysis reveals how combinatorial methylation patterns are established on lysines 27 and 36 of histone H3. Proc. Natl Acad. Sci. USA 109, 13549–13554 (2012).
pubmed: 22869745
pmcid: 3427122
doi: 10.1073/pnas.1205707109
Crauste, F. et al. Identification of nascent memory CD8 T cells and modeling of their ontogeny. Cell Syst. 4, 306–317 (2017).
pubmed: 28237797
doi: 10.1016/j.cels.2017.01.014
Fröhlich, F. et al. Efficient parameter estimation enables the prediction of drug response using a mechanistic pan-cancer pathway model. Cell Syst. 7, 567–579.e6 (2018).
pubmed: 30503647
Korkut, A. et al. Perturbation biology nominates upstream–downstream drug combinations in raf inhibitor resistant melanoma cells. Elife 4, e04640 (2015).
pmcid: 4539601
doi: 10.7554/eLife.04640
Hass, H. et al. Predicting ligand-dependent tumors from multi-dimensional signaling features. npj Syst. Biol. Appl. 3, 27 (2017).
pubmed: 28944080
pmcid: 5607260
doi: 10.1038/s41540-017-0030-3
Bouhaddou, M. et al. A mechanistic pan-cancer pathway model informed by multi-omics data interprets stochastic cell fate responses to drugs and mitogens. PLoS Comput. Biol. 14, e1005985 (2018).
pubmed: 29579036
pmcid: 5886578
doi: 10.1371/journal.pcbi.1005985
Schmiester, L., Schälte, Y., Fröhlich, F., Hasenauer, J. & Weindl, D. Efficient parameterization of large-scale dynamic models based on relative measurements. Bioinformatics 36, 594–602 (2019).
Chen, W. W. et al. Input–output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Mol. Syst. Biol. 5, 239 (2009).
pubmed: 19156131
pmcid: 2644173
doi: 10.1038/msb.2008.74
Bachmann, J. et al. Division of labor by dual feedback regulators controls JAK2/STAT5 signaling over broad ligand range. Mol. Syst. Biol. 7, 516 (2011).
pubmed: 21772264
pmcid: 3159971
doi: 10.1038/msb.2011.50
Oguz, C. et al. Optimization and model reduction in the high dimensional parameter space of a budding yeast cell cycle model. BMC Syst. Biol. 7 53 (2013).
Aldridge, B. B., Burke, J. M., Lauffenburger, D. A. & Sorger, P. K. Physicochemical modelling of cell signalling pathways. Nat. Cell Biol. 8, 1195–1203 (2006).
pubmed: 17060902
doi: 10.1038/ncb1497
Barretina, J. et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
pubmed: 22460905
pmcid: 3320027
doi: 10.1038/nature11003
TCGA Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Eduati, F. et al. Drug resistance mechanisms in colorectal cancer dissected with cell type-specific dynamic logic models. Cancer Res. 77, 3364–3375 (2017).
pubmed: 28381545
pmcid: 6433282
doi: 10.1158/0008-5472.CAN-17-0078
Li, J. et al. Characterization of human cancer cell lines by reverse-phase protein arrays. Cancer Cell 31, 225–239 (2017).
pubmed: 28196595
pmcid: 5501076
doi: 10.1016/j.ccell.2017.01.005
Raue, A. et al. Lessons learned from quantitative dynamical modeling in systems biology. PLoS ONE 8, e74335 (2013).
pubmed: 24098642
pmcid: 3787051
doi: 10.1371/journal.pone.0074335
Villaverde, A. F., Froehlich, F., Weindl, D., Hasenauer, J. & Banga, J. R. Benchmarking optimization methods for parameter estimation in large kinetic models. Bioinformatics 35, 830–838 (2019).
Hass, H. et al. Benchmark problems for dynamic modeling of intracellular processes. Bioinformatics 35, 3073–3082 (2019).
Kapfer, E.-M., Stapor, P. & Hasenauer, J. Challenges in the calibration of large-scale ordinary differential equation models. IFAC-PapersOnLine 52, 58–64 (2019).
Kreutz, C. Guidelines for benchmarking of optimization-based approaches for fitting mathematical models. Genome Biol. 20, 281 (2019).
pubmed: 31842943
pmcid: 6915982
doi: 10.1186/s13059-019-1887-9
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
doi: 10.1038/323533a0
LeCun, Y., Bottou, L., Orr, G. B. & Müller, K.-R. Neural Networks: Tricks of the Trade Vol. 1524 (eds Orr, G. B. & Muller, K.-R.) Ch. 1 (Springer, 2002).
Martens, J. Deep learning via hessian-free optimization. In Proceedings of the 27th International Conference on Machine Learning 735–742 (Omnipress, 2010).
Sutskever, I., Martens, J., Dahl, G. & Hinton, G. On the importance of initialization and momentum in deep learning. In Proc. 30th International Conference on Machine Learning 1139–1147 (PMLR, 2013).
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient based learning applied to document recognition. Proc. IEEE 86, 2278–2323 (1998).
doi: 10.1109/5.726791
Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases. J. Pathol. Inf. 7, 29 (2016).
Wilson, D. R. & Martinez, T. R. The general inefficiency of batch training for gradient descent learning. Neural Netw. 16, 1429–1451 (2003).
pubmed: 14622875
doi: 10.1016/S0893-6080(03)00138-2
Sutskever, I. Training Recurrent Neural Networks. PhD thesis, University of Toronto, Department of Computer Science (2013).
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous systems. In Proc. 12th USENIX Conference on Operating Systems Design and Implementation. 285–300 (USENIX Association, 2015).
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
Robbins, H. & Monroe, S. A stoachstic approximation method. Ann. Math. Stat. 22, 400–407 (1951).
doi: 10.1214/aoms/1177729586
Jin, C., Netrapalli, P., Ge, R., Kakade, S. M. & Jordan, M. I. On nonconvex optimization for machine learning: gradients, stochasticity, and saddle points. J. ACM 68, 11 https://doi.org/10.1145/3418526 (2021).
Yuan, B. et al. CellBox: interpretable machine learning for perturbation biology with application to the design of cancer combination therapy. Cell Syst. 12, 128 (2021).
pubmed: 33373583
doi: 10.1016/j.cels.2020.11.013
Stäedter, P., Schälte, Y., Schmiester, L., Hasenauer, J. & Stapor, P. L. Benchmarking of numerical integration methods for ODE models of biological systems. Sci. Rep. 11, 2696 (2021).
doi: 10.1038/s41598-021-82196-2
Goldfarb, D. A family of variable-metric methods derived by variational means. Math. Comput. 24, 23 (1970).
doi: 10.1090/S0025-5718-1970-0258249-6
Wächter, A. & Biegler, L. T. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106, 25–57 (2006).
doi: 10.1007/s10107-004-0559-y
Schmidt, R. M., Schneider, F. & Hennig, P. Descending through a crowded valley – Benchmarking deep learning optimizers. Preprint at arXiv:2007.01547 (2020).
Polyak, B. T. Some methods of speeding up the convergence of iteration methods. USSR Comp. Math. Math. Phys. 4, 1–17 (1964).
doi: 10.1016/0041-5553(64)90137-5
Tieleman, T. & Hinton, G. Lecture 6.5 – rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 2012.
Kingma, D. P. & Ba, L. J. Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR) 2015 - accepted papers. (ICLR, 2015).
Hindmarsh, A. C. et al. SUNDIALS: suite of nonlinear and differential/algebraic equation solvers. ACM T. Math. Softw. 31, 363–396 (2005).
doi: 10.1145/1089014.1089020
Fröhlich, F., Kaltenbacher, B., Theis, F. J. & Hasenauer, J. Scalable parameter estimation for genome-scale biochemical reaction networks. PLoS Comput. Biol. 13, e1005331 (2017).
pubmed: 28114351
pmcid: 5256869
doi: 10.1371/journal.pcbi.1005331
Nocedal, J. & Wright, S. Numerical Optimization (Springer, 2006).
Henriques, D., Villaverde, A. F., Rocha, M., Saez-Rodriguez, J. & Banga, J. R. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models. PLoS Comput. Biol. 13, e1005379 (2017).
pubmed: 28166222
pmcid: 5319798
doi: 10.1371/journal.pcbi.1005379
Hastie, T., Tibshirani, R. & Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd edn (Springer, 2005).
Behan, F. M. et al. Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens. Nature 568, 511–516 (2019).
Roberts, D. A. SGD implicitly regularizes generalization error. Preprint at arXiv:2104.04874 (2021).
Villaverde, A. F., Raimúndez-Álvarez, E., Hasenauer. J. & Banga, J. R. A comparison of methods for quantifying prediction uncertainty in systems biology. IFAC-PapersOnLine 52, 45–51 (2019).
Byrd, R. H., Schnabel, R. B. & Shultz, G. A. Approximate solution of the trust region problem by minimization over two-dimensional subspaces. Math. Program. 40, 247–263 (1988).
doi: 10.1007/BF01580735
Boyd, S. & Vandenberghe, L. Convex Optimisation (Cambridge Univ. Press, 2004).
Lei, L. & Jordan, M. I. On the adaptivity of stochastic gradient-based optimization. Preprint at arXiv:1904.04480v2 [math.OC] (2019).
Loos, C., Krause, S. & Hasenauer, J. Hierarchical optimization for the efficient parametrization of ODE models. Bioinformatics 34, 4266–4273 (2018).
pubmed: 30010716
pmcid: 6289139
doi: 10.1093/bioinformatics/bty514
Defazio, A., Bach, F. & Lacoste-Julien, S. Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems (NIPS) (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q) 1646–1654 (NIPS, 2014).
Schmidt, M., Le Roux, N. & Bach, F. Minimizing finite sums with the stochastic average gradient. Math. Program. Ser. A 162, 83–112 (2017).
doi: 10.1007/s10107-016-1030-6
Hardt, M., Recht, B. & Singer, Y. Train faster, generalize better: Stability of stochastic gradient descent. PMLR 48, 1225 (2016).
Mahsereci, M., Balles, L., Lassner, C. & Hennig, P. Early stopping without a validation set. Preprint at arXiv:1703.09580 [cs.LG] (2017).
Nobile, M., Cazzaniga, P., Tangherloni, A. & Besozzi, D. Graphics processing units in bioinformatics, computational biology and systems biology. Brief. Bioinf. 18, 870 (2017).
Gopalakrishnan, S., Dash, S. & Maranas, C. K-FIT: An accelerated kinetic parameterization algorithm using steady-state fluxomic data. Metab. Eng. 61, 197 (2020).
pubmed: 32173504
doi: 10.1016/j.ymben.2020.03.001
Penas, D. R., González, P., Egea, J. A., Banga, J. R. & Doallo, R. Parallel metaheuristics in computational biology: an asynchronous cooperative enhanced scatter search method. Procedia Comput. Sci. 51, 630–639 (2015).
doi: 10.1016/j.procs.2015.05.331
Costello, J. C. et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotech. 32, 1202–1212 (2014).
doi: 10.1038/nbt.2877
Klipp,E., Herwig,R., Kowald, A., Wierling, C. & Lehrach, H. Systems Bology in Practice (Wiley, 2005).
Mendes, P. et al. Computational modeling of biochemical networks using COPASI. Methods Mol. Biol. 500, 17–59 (2009).
Plessix, R.-E. A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J. Int. 167, 495–503 (2006).
Sengupta, B., Friston, K. J. & Penny, W. D. Efficient gradient computation for dynamical models. NeuroImage 98, 521 (2014).
pubmed: 24769182
doi: 10.1016/j.neuroimage.2014.04.040
Nocedal, J. Updating quasi-newton matrices with limited storage. Math. Comput. 35, 773–782 (1980).
doi: 10.1090/S0025-5718-1980-0572855-7
HSL. A collection of Fortran codes for large scale scientific computation. http://www.hsl.rl.ac.uk/ .
Ruder, S. An overview of gradient descent optimisation algorithms. Preprint at arXiv:1609.04747 (2016).
Weber, P., Hasenauer, J., Allgöwer, F. & Radde, N. Parameter estimation and identifiability of biological networks using relative data. In Proc. of the 18th IFAC World Congress (eds Bittanti, S., Cenedese, A. & Zampieri, S.) 11648–11653 (Elsevier, 2011).
Fröhlich, F. et al. AMICI: high-performance sensitivity analysis for large ordinary differential equation models. Bioinformatics 37, 3676–3677 (2021).
Serban, R. & Hindmarsh, A. C. CVODES: The sensitivity-enabled ODE solver in SUNDIALS. In ASME 2005 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference 257–269. (ASME, 2005).
Hucka, M. et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 (2003).
pubmed: 12611808
doi: 10.1093/bioinformatics/btg015
Schmiester, L., Schälte, Y., Fröhlich, F., Hasenauer, J. & Weindl, D. PEtab-interoperable specification of parameter estimation problems in systems biology. PLoS Comput. Biol. 17, e1008646 (2021).
pubmed: 33497393
pmcid: 7864467
doi: 10.1371/journal.pcbi.1008646
Stapor, P. et al. Supplementary material to Mini-batch optimization enables training of ODE models on large-scale datasets. Zenodo https://doi.org/10.5281/zenodo.4949641 (2021).
Fujita, K. A. et al. Decoupling of receptor and downstream signals in the akt pathway by its low-pass filter characteristics. Sci. Signal. 3, ra56 (2010).
pubmed: 20664065
doi: 10.1126/scisignal.2000810
Lucarelli, P. et al. Resolving the combinatorial complexity of smad protein complex formation and its link to gene expression. Cell Syst. 6, 75–89 (2018).
pubmed: 29248373
doi: 10.1016/j.cels.2017.11.010