Publication:

On Evaluating the Generalization of LSTM Models in Formal Languages

Loading...
Thumbnail Image

Date

2019-01

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

Society for Computation in Linguistics
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Suzgun, Mirac; Belinkov, Yonatan; and Shieber, Stuart M. (2019) "On Evaluating the Generalization of LSTM Models in Formal Languages," Proceedings of the Society for Computation in Linguistics: Vol. 2 , Article 29. https://doi.org/10.7275/s02b-4d91

Abstract

Recurrent Neural Networks (RNNs) are theoretically Turing-complete and established themselves as a dominant model for language processing. Yet, there still remains an uncertainty regarding their language learning capabilities. In this paper, we empirically evaluate the inductive learning capabilities of Long Short-Term Memory networks, a popular extension of simple RNNs, to learn simple formal languages, in particular $a^nb^n$, $a^nb^nc^n$, and $a^n b^n c^n d^n$. We investigate the influence of various aspects of learning, such as training data regimes and model capacity, on the generalization to unobserved samples. We find striking differences in model performances under different training settings and highlight the need for careful analysis and assessment when making claims about the learning capabilities of neural network models.

Description

Other Available Sources

Research Data

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories