Summarizing first-person videos from third persons’ points of views

Hsuan I. Ho, Wei-Chen Chiu, Yu Chiang Frank Wang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


Video highlight or summarization is among interesting topics in computer vision, which benefits a variety of applications like viewing, searching, or storage. However, most existing studies rely on training data of third-person videos, which cannot easily generalize to highlight the first-person ones. With the goal of deriving an effective model to summarize first-person videos, we propose a novel deep neural network architecture for describing and discriminating vital spatiotemporal information across videos with different points of view. Our proposed model is realized in a semi-supervised setting, in which fully annotated third-person videos, unlabeled first-person videos, and a small number of annotated first-person ones are presented during training. In our experiments, qualitative and quantitative evaluations on both benchmarks and our collected first-person video datasets are presented.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
EditorsYair Weiss, Vittorio Ferrari, Cristian Sminchisescu, Martial Hebert
PublisherSpringer Verlag
Number of pages18
ISBN (Print)9783030012663
StatePublished - 1 Jan 2018
Event15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany
Duration: 8 Sep 201814 Sep 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11219 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference15th European Conference on Computer Vision, ECCV 2018


  • First-person vision
  • Metric learning
  • Transfer learning
  • Video summarization

Fingerprint Dive into the research topics of 'Summarizing first-person videos from third persons’ points of views'. Together they form a unique fingerprint.

Cite this