TY - JOUR
T1 - MomentumRNN
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
AU - Nguyen, Tan M.
AU - Baraniuk, Richard G.
AU - Bertozzi, Andrea L.
AU - Osher, Stanley J.
AU - Wang, Bao
N1 - Funding Information:
This material is also based upon work supported by the NSF under Grant# 2030859 to the Computing Research Association for the CIFellows Project, the NSF Graduate Research Fellowship Program, and the NSF IGERT Training Grant (DGE-1250104).
Funding Information:
This material is based on research sponsored by the NSF grant DMS-1924935 and DMS-1952339, and the DOE grant DE-SC0021142. Other grants that support the work include the NSF grants CCF-1911094, IIS-1838177, and IIS-1730574; the ONR grants N00014-18-12571 and N00014-17-1-2551; the AFOSR grant FA9550-18-1-0478; the DARPA grant G001534-7500; and a Vannevar Bush Faculty Fellowship, ONR grant N00014-18-1-2047.
Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called MomentumRNNs. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.
AB - Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called MomentumRNNs. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.
UR - http://www.scopus.com/inward/record.url?scp=85108433584&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108433584&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85108433584
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 6 December 2020 through 12 December 2020
ER -