The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization

Daniel LeJeune; Hamid Javadi; Richard G. Baraniuk

The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization

Daniel LeJeune, Hamid Javadi, Richard G. Baraniuk

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Among the most successful methods for sparsifying deep (neural) networks are those that adaptively mask the network weights throughout training. By examining this masking, or dropout, in the linear case, we uncover a duality between such adaptive methods and regularization through the so-called “η-trick” that casts both as iteratively reweighted optimizations. We show that any dropout strategy that adapts to the weights in a monotonic way corresponds to an effective subquadratic regularization penalty, and therefore leads to sparse solutions. We obtain the effective penalties for several popular sparsification strategies, which are remarkably similar to classical penalties commonly used in sparse optimization. Considering variational dropout as a case study, we demonstrate similar empirical behavior between the adaptive dropout method and classical methods on the task of deep network sparsification, validating our theory.

Original language	English (US)
Title of host publication	Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Editors	Marc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan
Publisher	Neural information processing systems foundation
Pages	23401-23412
Number of pages	12
ISBN (Electronic)	9781713845393
State	Published - 2021
Event	35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online Duration: Dec 6 2021 → Dec 14 2021

Publication series

Name	Advances in Neural Information Processing Systems
Volume	28
ISSN (Print)	1049-5258

Conference

Conference	35th Conference on Neural Information Processing Systems, NeurIPS 2021
City	Virtual, Online
Period	12/6/21 → 12/14/21

ASJC Scopus subject areas

Computer Networks and Communications
Information Systems
Signal Processing

Cite this

LeJeune, D., Javadi, H., & Baraniuk, R. G. (2021). The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization. In MA. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. Wortman Vaughan (Eds.), Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021 (pp. 23401-23412). (Advances in Neural Information Processing Systems; Vol. 28). Neural information processing systems foundation.

The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization. / LeJeune, Daniel; Javadi, Hamid; Baraniuk, Richard G.
Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021. ed. / Marc'Aurelio Ranzato; Alina Beygelzimer; Yann Dauphin; Percy S. Liang; Jenn Wortman Vaughan. Neural information processing systems foundation, 2021. p. 23401-23412 (Advances in Neural Information Processing Systems; Vol. 28).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

LeJeune, D, Javadi, H & Baraniuk, RG 2021, The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization. in MA Ranzato, A Beygelzimer, Y Dauphin, PS Liang & J Wortman Vaughan (eds), Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021. Advances in Neural Information Processing Systems, vol. 28, Neural information processing systems foundation, pp. 23401-23412, 35th Conference on Neural Information Processing Systems, NeurIPS 2021, Virtual, Online, 12/6/21.

LeJeune D, Javadi H, Baraniuk RG. The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization. In Ranzato MA, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, editors, Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021. Neural information processing systems foundation. 2021. p. 23401-23412. (Advances in Neural Information Processing Systems).

LeJeune, Daniel ; Javadi, Hamid ; Baraniuk, Richard G. / The Flip Side of the Reweighted Coin : Duality of Adaptive Dropout and Regularization. Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021. editor / Marc'Aurelio Ranzato ; Alina Beygelzimer ; Yann Dauphin ; Percy S. Liang ; Jenn Wortman Vaughan. Neural information processing systems foundation, 2021. pp. 23401-23412 (Advances in Neural Information Processing Systems).

@inproceedings{185d440cb77f4dcf98e35a8a9fa4b327,

title = "The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization",

abstract = "Among the most successful methods for sparsifying deep (neural) networks are those that adaptively mask the network weights throughout training. By examining this masking, or dropout, in the linear case, we uncover a duality between such adaptive methods and regularization through the so-called “η-trick” that casts both as iteratively reweighted optimizations. We show that any dropout strategy that adapts to the weights in a monotonic way corresponds to an effective subquadratic regularization penalty, and therefore leads to sparse solutions. We obtain the effective penalties for several popular sparsification strategies, which are remarkably similar to classical penalties commonly used in sparse optimization. Considering variational dropout as a case study, we demonstrate similar empirical behavior between the adaptive dropout method and classical methods on the task of deep network sparsification, validating our theory.",

author = "Daniel LeJeune and Hamid Javadi and Baraniuk, {Richard G.}",

note = "Funding Information: This work was supported by NSF grants CCF-1911094, IIS-1838177, and IIS-1730574; ONR grants N00014-18-12571, N00014-20-1-2534, and MURI N00014-20-1-2787; AFOSR grant FA9550-18-1-0478; and a Vannevar Bush Faculty Fellowship, ONR grant N00014-18-1-2047. Publisher Copyright: {\textcopyright} 2021 Neural information processing systems foundation. All rights reserved.; 35th Conference on Neural Information Processing Systems, NeurIPS 2021 ; Conference date: 06-12-2021 Through 14-12-2021",

year = "2021",

language = "English (US)",

series = "Advances in Neural Information Processing Systems",

publisher = "Neural information processing systems foundation",

pages = "23401--23412",

editor = "Marc'Aurelio Ranzato and Alina Beygelzimer and Yann Dauphin and Liang, {Percy S.} and {Wortman Vaughan}, Jenn",

booktitle = "Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021",

}

TY - GEN

T1 - The Flip Side of the Reweighted Coin

T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021

AU - LeJeune, Daniel

AU - Javadi, Hamid

AU - Baraniuk, Richard G.

N1 - Funding Information: This work was supported by NSF grants CCF-1911094, IIS-1838177, and IIS-1730574; ONR grants N00014-18-12571, N00014-20-1-2534, and MURI N00014-20-1-2787; AFOSR grant FA9550-18-1-0478; and a Vannevar Bush Faculty Fellowship, ONR grant N00014-18-1-2047. Publisher Copyright: © 2021 Neural information processing systems foundation. All rights reserved.

PY - 2021

Y1 - 2021

N2 - Among the most successful methods for sparsifying deep (neural) networks are those that adaptively mask the network weights throughout training. By examining this masking, or dropout, in the linear case, we uncover a duality between such adaptive methods and regularization through the so-called “η-trick” that casts both as iteratively reweighted optimizations. We show that any dropout strategy that adapts to the weights in a monotonic way corresponds to an effective subquadratic regularization penalty, and therefore leads to sparse solutions. We obtain the effective penalties for several popular sparsification strategies, which are remarkably similar to classical penalties commonly used in sparse optimization. Considering variational dropout as a case study, we demonstrate similar empirical behavior between the adaptive dropout method and classical methods on the task of deep network sparsification, validating our theory.

AB - Among the most successful methods for sparsifying deep (neural) networks are those that adaptively mask the network weights throughout training. By examining this masking, or dropout, in the linear case, we uncover a duality between such adaptive methods and regularization through the so-called “η-trick” that casts both as iteratively reweighted optimizations. We show that any dropout strategy that adapts to the weights in a monotonic way corresponds to an effective subquadratic regularization penalty, and therefore leads to sparse solutions. We obtain the effective penalties for several popular sparsification strategies, which are remarkably similar to classical penalties commonly used in sparse optimization. Considering variational dropout as a case study, we demonstrate similar empirical behavior between the adaptive dropout method and classical methods on the task of deep network sparsification, validating our theory.

UR - http://www.scopus.com/inward/record.url?scp=85132016166&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85132016166&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85132016166

T3 - Advances in Neural Information Processing Systems

SP - 23401

EP - 23412

BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021

A2 - Ranzato, Marc'Aurelio

A2 - Beygelzimer, Alina

A2 - Dauphin, Yann

A2 - Liang, Percy S.

A2 - Wortman Vaughan, Jenn

PB - Neural information processing systems foundation

Y2 - 6 December 2021 through 14 December 2021

ER -

The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this