Periodic step-size adaptation in second-order gradient descent for single-pass on-line structured learning

Chun Nan Hsu*, Han Shen Huang, Yu Ming Chang, Yuh-Jye Lee

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

It has been established that the second-order stochastic gradient descent (SGD) method can potentially achieve generalization performance as well as empirical optimum in a single pass through the training examples. However, second-order SGD requires computing the inverse of the Hessian matrix of the loss function, which is prohibitively expensive for structured prediction problems that usually involve a very high dimensional feature space. This paper presents a new second-order SGD method, called Periodic Step-size Adaptation (PSA). PSA approximates the Jacobian matrix of the mapping function and explores a linear relation between the Jacobian and Hessian to approximate the Hessian, which is proved to be simpler and more effective than directly approximating Hessian in an on-line setting. We tested PSA on a wide variety of models and tasks, including large scale sequence labeling tasks using conditional random fields and large scale classification tasks using linear support vector machines and convolutional neural networks. Experimental results show that single-pass performance of PSA is always very close to empirical optimum.

Original languageEnglish
Pages (from-to)195-224
Number of pages30
JournalMachine Learning
Volume77
Issue number2-3
DOIs
StatePublished - 1 Dec 2009

Keywords

  • Conditional random fields
  • Convolutional neural networks
  • On-line learning
  • Sequence labeling
  • Stochastic gradient descent

Fingerprint Dive into the research topics of 'Periodic step-size adaptation in second-order gradient descent for single-pass on-line structured learning'. Together they form a unique fingerprint.

Cite this