MomentumRNN: Integrating momentum into recurrent neural networks

Tan M. Nguyen; Richard G. Baraniuk; Andrea L. Bertozzi; Stanley J. Osher; Bao Wang

MomentumRNN: Integrating momentum into recurrent neural networks

Tan M. Nguyen, Richard G. Baraniuk, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

Research output: Contribution to journal › Conference article › peer-review

Abstract

Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called MomentumRNNs. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.

Original language	English (US)
Journal	Advances in Neural Information Processing Systems
Volume	2020-December
State	Published - 2020
Event	34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online Duration: Dec 6 2020 → Dec 12 2020

ASJC Scopus subject areas

Computer Networks and Communications
Information Systems
Signal Processing

Cite this

@article{62b421ae2e6a481da686931087412852,

title = "MomentumRNN: Integrating momentum into recurrent neural networks",

abstract = "Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called MomentumRNNs. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.",

author = "Nguyen, {Tan M.} and Baraniuk, {Richard G.} and Bertozzi, {Andrea L.} and Osher, {Stanley J.} and Bao Wang",

note = "Funding Information: This material is also based upon work supported by the NSF under Grant# 2030859 to the Computing Research Association for the CIFellows Project, the NSF Graduate Research Fellowship Program, and the NSF IGERT Training Grant (DGE-1250104). Funding Information: This material is based on research sponsored by the NSF grant DMS-1924935 and DMS-1952339, and the DOE grant DE-SC0021142. Other grants that support the work include the NSF grants CCF-1911094, IIS-1838177, and IIS-1730574; the ONR grants N00014-18-12571 and N00014-17-1-2551; the AFOSR grant FA9550-18-1-0478; the DARPA grant G001534-7500; and a Vannevar Bush Faculty Fellowship, ONR grant N00014-18-1-2047. Publisher Copyright: {\textcopyright} 2020 Neural information processing systems foundation. All rights reserved.; 34th Conference on Neural Information Processing Systems, NeurIPS 2020 ; Conference date: 06-12-2020 Through 12-12-2020",

year = "2020",

language = "English (US)",

volume = "2020-December",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

}

TY - JOUR

T1 - MomentumRNN

T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020

AU - Nguyen, Tan M.

AU - Baraniuk, Richard G.

AU - Bertozzi, Andrea L.

AU - Osher, Stanley J.

AU - Wang, Bao

N1 - Funding Information: This material is also based upon work supported by the NSF under Grant# 2030859 to the Computing Research Association for the CIFellows Project, the NSF Graduate Research Fellowship Program, and the NSF IGERT Training Grant (DGE-1250104). Funding Information: This material is based on research sponsored by the NSF grant DMS-1924935 and DMS-1952339, and the DOE grant DE-SC0021142. Other grants that support the work include the NSF grants CCF-1911094, IIS-1838177, and IIS-1730574; the ONR grants N00014-18-12571 and N00014-17-1-2551; the AFOSR grant FA9550-18-1-0478; the DARPA grant G001534-7500; and a Vannevar Bush Faculty Fellowship, ONR grant N00014-18-1-2047. Publisher Copyright: © 2020 Neural information processing systems foundation. All rights reserved.

PY - 2020

Y1 - 2020

N2 - Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called MomentumRNNs. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.

AB - Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called MomentumRNNs. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.

UR - http://www.scopus.com/inward/record.url?scp=85108433584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85108433584&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85108433584

SN - 1049-5258

VL - 2020-December

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

Y2 - 6 December 2020 through 12 December 2020

ER -

MomentumRNN: Integrating momentum into recurrent neural networks

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this