Bidirectionally self-normalizing neural networks.

Neural Networks, Computer Normal Distribution

Neural networks Optimization Training Vanishing/exploding gradient problem

Journal

Neural networks : the official journal of the International Neural Network Society

ISSN: 1879-2782

Titre abrégé: Neural Netw

Pays: United States

ID NLM: 8805018

Informations de publication

Date de publication:
Oct 2023

Historique:

received: 11 10 2022

revised: 09 08 2023

accepted: 11 08 2023

medline: 23 10 2023

pubmed: 5 9 2023

entrez: 4 9 2023

Statut: ppublish

Résumé

The problem of vanishing and exploding gradients has been a long-standing obstacle that hinders the effective training of neural networks. Despite various tricks and techniques that have been employed to alleviate the problem in practice, there still lacks satisfactory theories or provable solutions. In this paper, we address the problem from the perspective of high-dimensional probability theory. We provide a rigorous result that shows, under mild conditions, how the vanishing/exploding gradients problem disappears with high probability if the neural networks have sufficient width. Our main idea is to constrain both forward and backward signal propagation in a nonlinear neural network through a new class of activation functions, namely Gaussian-Poincaré normalized functions, and orthogonal weight matrices. Experiments on both synthetic and real-world data validate our theory and confirm its effectiveness on very deep neural networks when applied in practice.

Identifiants

DOI: 10.1016/j.neunet.2023.08.017 PMID: 37666186

pubmed: 37666186

pii: S0893-6080(23)00436-7

doi: 10.1016/j.neunet.2023.08.017

pii:

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

283-291

Informations de copyright

Déclaration de conflit d'intérêts

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Bidirectionally self-normalizing neural networks.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Déclaration de conflit d'intérêts

Auteurs

Yao Lu (Y)

Stephen Gould (S)

Thalaiyasingam Ajanthan (T)

Articles similaires

Unsupervised learning for real-time and continuous gait phase detection.

Detection, classification, and characterization of proximal humerus fractures on plain radiographs.

Editorial: Artificial Intelligence (AI), Digital Image Analysis, and the Future of Cancer Diagnosis and Prognosis.

Deep learning-based automatic image classification of oral cancer cells acquiring chemoresistance in vitro.

Classifications MeSH