A two-stage singing voice separation algorithm using spectro-temporal modulation features

Frederick Z. Yen, Mao Chang Huang, Tai-Shih Chi

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

A two-stage singing voice separation algorithm using spectrotemporal modulation features is proposed in this paper. First, music clips are transformed into auditory spectrograms and the spectral-temporal modulation contents of all time-frequency (T-F) units of the auditory spectrograms are extracted using an auditory model. Then, T-F units are sequentially clustered using the expectation-maximization (EM) algorithm into percussive, harmonic and vocal units through the proposed two-stage algorithm. Lastly, the singing voice is synthesized from clustered vocal T-F units via time-frequency masking. The algorithm was evaluated using the MIR-1K dataset and demonstrated better separation results than our previously proposed one-stage algorithm.

Original languageEnglish
Pages (from-to)3321-3324
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2015-January
StatePublished - 1 Jan 2015
Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
Duration: 6 Sep 201510 Sep 2015

Keywords

  • Auditory scene analysis
  • Singing voice separation
  • Spectro-temporal modulation

Fingerprint Dive into the research topics of 'A two-stage singing voice separation algorithm using spectro-temporal modulation features'. Together they form a unique fingerprint.

Cite this