The shared dirichlet priors for Bayesian language modeling

Jen-Tzung Chien*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

We present a new full Bayesian approach for language modeling based on the shared Dirichlet priors. This model is constructed by introducing the Dirichlet distribution to represent the uncertainty of n-gram parameters in training phase as well as in test time. Given a set of training data, the marginal likelihood over n-gram probabilities is illustrated in a form of linearly-interpolated n-grams. The hyperparameters in Dirichlet distributions are interpreted as the prior backoff information which is shared for the group of n-gram histories. This study estimates the shared hyperparameters by maximizing the marginal distribution of n-gram given the training data. Such Bayesian language model is connected to the smoothed language model. Experimental results show the superiority of the proposed method to the other methods in terms of perplexity and word error rate.

Original languageEnglish
Title of host publication2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2081-2085
Number of pages5
ISBN (Electronic)9781467369978
DOIs
StatePublished - 4 Aug 2015
Event40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Brisbane, Australia
Duration: 19 Apr 201424 Apr 2014

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2015-August
ISSN (Print)1520-6149

Conference

Conference40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
CountryAustralia
CityBrisbane
Period19/04/1424/04/14

Keywords

  • Bayesian learning
  • language model
  • model smoothing
  • optimal hyperparameter

Fingerprint Dive into the research topics of 'The shared dirichlet priors for Bayesian language modeling'. Together they form a unique fingerprint.

Cite this