Bidirectionally self-normalizing neural networks.
Neural networks
Optimization
Training
Vanishing/exploding gradient problem
Journal
Neural networks : the official journal of the International Neural Network Society
ISSN: 1879-2782
Titre abrégé: Neural Netw
Pays: United States
ID NLM: 8805018
Informations de publication
Date de publication:
Oct 2023
Oct 2023
Historique:
received:
11
10
2022
revised:
09
08
2023
accepted:
11
08
2023
medline:
23
10
2023
pubmed:
5
9
2023
entrez:
4
9
2023
Statut:
ppublish
Résumé
The problem of vanishing and exploding gradients has been a long-standing obstacle that hinders the effective training of neural networks. Despite various tricks and techniques that have been employed to alleviate the problem in practice, there still lacks satisfactory theories or provable solutions. In this paper, we address the problem from the perspective of high-dimensional probability theory. We provide a rigorous result that shows, under mild conditions, how the vanishing/exploding gradients problem disappears with high probability if the neural networks have sufficient width. Our main idea is to constrain both forward and backward signal propagation in a nonlinear neural network through a new class of activation functions, namely Gaussian-Poincaré normalized functions, and orthogonal weight matrices. Experiments on both synthetic and real-world data validate our theory and confirm its effectiveness on very deep neural networks when applied in practice.
Identifiants
pubmed: 37666186
pii: S0893-6080(23)00436-7
doi: 10.1016/j.neunet.2023.08.017
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
283-291Informations de copyright
Copyright © 2023 Elsevier Ltd. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.