Variance reduction for optimization in speech recognition

Jen-Tzung Chien, Pei Wen Huang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep neural network (DNN) is trained according to a mini-batch optimization based on the stochastic gradient descent algorithm. Such a stochastic learning suffers from instability in parameter updating and may easily trap into local optimum. This study deals with the stability of stochastic learning by reducing the variance of gradients in optimization procedure. We upgrade the optimization from the stochastic dual coordinated ascent (SDCA) to the accelerated SDCA without duality (or dual-free ASDCA). This optimization incorporates the momentum method to accelerate the updating rule where the variance of gradients can be reduced. Using dual-free ASDCA, the optimization of dual function of SDCA in a form of convex loss is implemented by directly optimizing the primal function with respect to pseudo-dual parameters. The non-convex optimization in DNN training can be resolved and accelerated. Experimental results illustrate the reduction of training loss, variance of gradients and word error rate by using the proposed optimization for DNN speech recognition.

Original languageEnglish
Title of host publication2016 IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2016 - Proceedings
EditorsKostas Diamantaras, Aurelio Uncini, Francesco A. N. Palmieri, Jan Larsen
PublisherIEEE Computer Society
ISBN (Electronic)9781509007462
DOIs
StatePublished - 8 Nov 2016
Event26th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2016 - Proceedings - Vietri sul Mare, Salerno, Italy
Duration: 13 Sep 201616 Sep 2016

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
Volume2016-November
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Conference

Conference26th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2016 - Proceedings
CountryItaly
CityVietri sul Mare, Salerno
Period13/09/1616/09/16

Keywords

  • deep neural network
  • Optimization algorithm
  • speech recognition
  • variance reduction

Fingerprint Dive into the research topics of 'Variance reduction for optimization in speech recognition'. Together they form a unique fingerprint.

Cite this