G2Basy: A framework to improve the RNN language model and ease overfitting problem.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2021
Historique:
received: 18 09 2020
accepted: 26 03 2021
entrez: 14 4 2021
pubmed: 15 4 2021
medline: 24 9 2021
Statut: epublish

Résumé

Recurrent neural networks are efficient ways of training language models, and various RNN networks have been proposed to improve performance. However, with the increase of network scales, the overfitting problem becomes more urgent. In this paper, we propose a framework-G2Basy-to speed up the training process and ease the overfitting problem. Instead of using predefined hyperparameters, we devise a gradient increasing and decreasing technique that changes the parameters training batch size and input dropout simultaneously by a user-defined step size. Together with a pretrained word embedding initialization procedure and the introduction of different optimizers at different learning rates, our framework speeds up the training process dramatically and improves performance compared with a benchmark model of the same scale. For the word embedding initialization, we propose the concept of "artificial features" to describe the characteristics of the obtained word embeddings. We experiment on two of the most often used corpora-the Penn Treebank and WikiText-2 datasets-and both outperform the benchmark results and show potential towards further improvement. Furthermore, our framework shows better results with the larger and more complicated WikiText-2 corpus than with the Penn Treebank. Compared with other state-of-the-art results, we achieve comparable results with network scales hundreds of times smaller and within fewer training epochs.

Identifiants

pubmed: 33852595
doi: 10.1371/journal.pone.0249820
pii: PONE-D-20-28022
pmc: PMC8046238
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0249820

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Neural Netw. 1999 Jan;12(1):145-151
pubmed: 12662723
IEEE Trans Neural Netw Learn Syst. 2020 Oct;31(10):3932-3946
pubmed: 31825875
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276

Auteurs

Lu Yuwen (L)

School of Big Data & Software Engineering, ChongQing University, ChongQing, China.

Shuyu Chen (S)

School of Big Data & Software Engineering, ChongQing University, ChongQing, China.

Xiaohan Yuan (X)

School of Big Data & Software Engineering, ChongQing University, ChongQing, China.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
Cephalometry Humans Anatomic Landmarks Software Internet

Classifications MeSH