Convolutional neural turing machine for speech separation

Jen-Tzung Chien, Kai Wei Tsou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Long short-term memory (LSTM) has been successfully developed for monaural speech separation. Temporal information is learned by using dynamic states which are evolved through time and stored as an internal memory. The spectro-temporal data matrix of mixed signal is flattened as input vectors. There are twofold limitations. First, the internal memory in LSTM could not sufficiently characterize long-term information from different sources. Second, the temporal correlation and frequency neighboring in the flattened vectors were smeared. To deal with these limitations, this paper presents a convolutional neural Turing machine (ConvNTM) where the feature maps of spectro-temporal data are extracted and embedded in an external memory at each time step. ConvNTM aims to preserve the spectro-temporal structure in long sequential signals which is exploited to estimate the separated spectral signals. An addressing mechanism is introduced to continuously calculate the read and write heads to retrieve and update memory slots, respectively. The memory augmented source separation is implemented for single-channel speech enhancement. Experimental results illustrate the superiority of ConvNTM to LSTM, NTM and convolutional LSTM for speech enhancement in terms of short-term objective intelligibility measure.

Original languageEnglish
Title of host publication2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages81-85
Number of pages5
ISBN (Electronic)9781538656273
DOIs
StatePublished - 2 Jul 2018
Event11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Taipei, Taiwan
Duration: 26 Nov 201829 Nov 2018

Publication series

Name2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings

Conference

Conference11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018
CountryTaiwan
CityTaipei
Period26/11/1829/11/18

Keywords

  • Convolutional neural network
  • Monaural speech separation
  • Neural Turing machine
  • Recurrent neural network

Fingerprint Dive into the research topics of 'Convolutional neural turing machine for speech separation'. Together they form a unique fingerprint.

Cite this