First Person (Egocentric) Vision: History and Applications

Tutorial at VISIGRAPP 2024 - 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications

February 28 2024

[Program & Slides][Further Reading]

Tutorial Description

Abstract

Wearable devices equipped with a camera and computing abilities are attracting the attention of both the market and the society, with commercial devices more and more available and many companies announcing the upcoming release of new devices. The main appeal of wearable devices is due to their mobility and to their ability to enable user-machine interaction through Augmented Reality. Due to these characteristics, wearable devices provide an ideal platform to develop intelligent assistants able to assist humans and augment their abilities, for which Artificial Intelligence and Computer Vision play a major role.

Differently from classic computer vision (the so called “third person vision”), which analyses images collected from a static point of view, first person (egocentric) vision assume that images are collected from the point of view of the user, which gives privileged information on the user’s activities and the way they perceive and interact with the world. Indeed, the visual data acquired with wearable cameras usually provides useful information about the users, their intentions, and how they interact with the world.

This tutorial will discuss the challenges and opportunities offered by first person (egocentric) vision, covering the historical background and seminal works, presenting the main technological tools and building blocks, and discussing applications.

Keywords

wearable, first person vision, egocentric vision, augmented reality, visual localization, action recognition, action anticipation, human-object interaction

Aims and learning objectives

The participants will understand the main advantages of first person (egocentric) vision over third person vision to analyze the user’s behavior, build personalized applications and predict future events. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform localization, indexing, object detection, action recognition, human-object interaction and the prediction of future events.

Program

[09.00 - 10.30] Part I: History and motivations Slides

Definitions, motivations, history and research trends of First Person (egocentric) Vision;
Seminal works in First Person (Egocentric) Vision;
Differences between Third Person and First Person Vision;
First Person Vision datasets;
Wearable devices to acquire/process first person visual data;
Main research trends in First Person (Egocentric) Vision;

[12.00 - 13.00] Part II: Fundamental tasks for first person vision systems Slides

Localization;
Hand/Object Detection;
Egocentric Human-Object Interaction;
Action/Activities;
Anticipation;
Real Application Examples developed at Next Vision;
Conclusion.

Francesco Ragusa