Effective video annotation by mining visual features and speech features

Vincent S. Tseng, Ja Hwung Su, Chih Jen Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In the area of multimedia processing, a number of studies have been devoted to narrowing the gap between multimedia content and human sense. In fact, multimedia understanding is a difficult and challenging task even using machine-learning techniques. To deal with this challenge, in this paper, we propose an innovative method that employs data mining techniques and content-based paradigm to conceptualize videos. Mainly, our proposed method puts the focus on: (1) Construction of prediction models, namely speech-association model Model sass and visual-statistical model Model CRM, and (2) Fusion of prediction models to annotate unknown videos automatically. Without additional manual cost, discovered speech-association patterns can show the implicit relationships among the sequential images. On the other hand, visual features can atone for the inadequacy of speech-association patterns. Empirical evaluations reveal that our approach makes, on the average, the promising results than other methods for annotating videos.

Original languageEnglish
Title of host publicationProceedings - 3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2007.
Pages202-205
Number of pages4
DOIs
StatePublished - 1 Dec 2007
Event3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2007 - Kaohsiung, Taiwan
Duration: 26 Nov 200728 Nov 2007

Publication series

NameProceedings - 3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2007.
Volume1

Conference

Conference3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2007
CountryTaiwan
CityKaohsiung
Period26/11/0728/11/07

Fingerprint Dive into the research topics of 'Effective video annotation by mining visual features and speech features'. Together they form a unique fingerprint.

Cite this