Multi-View and Multi-Modal Action Recognition with Learned Fusion

Sandy Ardianto, Hsueh-Ming Hang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

In this paper, we study multi-modal and multi-view action recognition system based on the deep-learning techniques. We extended the Temporal Segment Network with additional data fusion stage to combine information from different sources. In this research, we use multiple types of information from different modality such as RGB, depth, infrared data to detect predefined human actions. We tested various combinations of these data sources to examine their impact on the final detection accuracy. We designed 3 information fusion methods to generate the final decision. The most interested one is the Learned Fusion Net designed by us. It turns out the Learned Fusion structure has the best results but requires more training.

Original languageEnglish
Title of host publication2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1601-1604
Number of pages4
ISBN (Electronic)9789881476852
DOIs
StatePublished - 4 Mar 2019
Event10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Honolulu, United States
Duration: 12 Nov 201815 Nov 2018

Publication series

Name2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

Conference

Conference10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018
CountryUnited States
CityHonolulu
Period12/11/1815/11/18

Keywords

  • deep learning
  • human action recognition
  • information fusion
  • multi-modal video
  • multi-view video
  • neural nets

Fingerprint Dive into the research topics of 'Multi-View and Multi-Modal Action Recognition with Learned Fusion'. Together they form a unique fingerprint.

  • Cite this

    Ardianto, S., & Hang, H-M. (2019). Multi-View and Multi-Modal Action Recognition with Learned Fusion. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings (pp. 1601-1604). [8659539] (2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/APSIPA.2018.8659539