G2Basy: A framework to improve the RNN language model and ease overfitting problem.

Machine Learning Natural Language Processing Software / standards

Journal

PloS one

ISSN: 1932-6203

Titre abrégé: PLoS One

Pays: United States

ID NLM: 101285081

Informations de publication

Date de publication:
2021

Historique:

received: 18 09 2020

accepted: 26 03 2021

entrez: 14 4 2021

pubmed: 15 4 2021

medline: 24 9 2021

Statut: epublish

Résumé

Recurrent neural networks are efficient ways of training language models, and various RNN networks have been proposed to improve performance. However, with the increase of network scales, the overfitting problem becomes more urgent. In this paper, we propose a framework-G2Basy-to speed up the training process and ease the overfitting problem. Instead of using predefined hyperparameters, we devise a gradient increasing and decreasing technique that changes the parameters training batch size and input dropout simultaneously by a user-defined step size. Together with a pretrained word embedding initialization procedure and the introduction of different optimizers at different learning rates, our framework speeds up the training process dramatically and improves performance compared with a benchmark model of the same scale. For the word embedding initialization, we propose the concept of "artificial features" to describe the characteristics of the obtained word embeddings. We experiment on two of the most often used corpora-the Penn Treebank and WikiText-2 datasets-and both outperform the benchmark results and show potential towards further improvement. Furthermore, our framework shows better results with the larger and more complicated WikiText-2 corpus than with the Penn Treebank. Compared with other state-of-the-art results, we achieve comparable results with network scales hundreds of times smaller and within fewer training epochs.

Identifiants

DOI: 10.1371/journal.pone.0249820 PMID: 33852595 PMC: PMC8046238

pubmed: 33852595

doi: 10.1371/journal.pone.0249820

pii: PONE-D-20-28022

pmc: PMC8046238

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

e0249820

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Neural Netw. 1999 Jan;12(1):145-151

pubmed: 12662723

IEEE Trans Neural Netw Learn Syst. 2020 Oct;31(10):3932-3946

pubmed: 31825875

Neural Comput. 1997 Nov 15;9(8):1735-80

pubmed: 9377276

G2Basy: A framework to improve the RNN language model and ease overfitting problem.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Déclaration de conflit d'intérêts

Références

Auteurs

Lu Yuwen (L)

Shuyu Chen (S)

Xiaohan Yuan (X)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Understanding the role of machine learning in predicting progression of osteoarthritis.

Accuracy of web-based automated versus digital manual cephalometric landmark identification.

Classifications MeSH