Egocentric Vision: Exploring User-Centric Perspectives

Tutorial at VISIGRAPP 2025 - 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications

February 27 2025

[Program & Slides][Further Reading]

Tutorial Description

Abstract

Wearable devices with integrated cameras and computing capabilities are gaining significant attention from both the market and society. With the increasing availability of commercial devices and numerous companies announcing the release of new products, interest is on the rise. The main attraction of wearable devices lies in their mobility and their ability to facilitate user-machine interaction through Augmented Reality. These features make wearable devices an ideal platform for developing intelligent assistants that can support and enhance human abilities, with Artificial Intelligence and Computer Vision playing crucial roles.

Unlike traditional computer vision (known as "third-person vision"), which analyzes images from a static viewpoint, first-person (egocentric) vision assumes that images are captured from the user's perspective, providing privileged information about the user's activities and how they perceive and interact with the world. Visual data acquired from wearable cameras typically offers valuable insights into users, their intentions, and their interactions with the environment.

This tutorial will explore the challenges and opportunities presented by first-person (egocentric) vision, covering its historical background and seminal works, presenting key technological tools and building blocks, and discussing various applications.

Keywords

Wearable devices, first person vision, egocentric vision, augmented reality, visual localization, action recognition, action anticipation, human-object interaction, procedural assistance

Aims and learning objectives

The participants will understand the main advantages of first person (egocentric) vision over third person vision to analyze the user’s behavior, build personalized applications and predict future events. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform localization, indexing, object detection, action recognition, human-object interaction and the prediction of future events.

Program

[14.15 - 15.45] Part I: History and motivations Slides

Perception and Egocentric Vision;
Seminal works in Egocentric Vision;
Differences between Third Person and First Person Vision;
First Person Vision datasets;
Wearable devices to acquire/process first person visual data;
Main research trends in First Person (Egocentric) Vision;
What's next?

[17.15 - 18.30] Part II: Fundamental tasks for first person vision systems Slides

Localization;
Hand/Object Detection;
Action/Activity Recognition
Human-Object Interaction;
Anticipation;
Dual-Agent Language Assistance
Industrial Applications
Conclusion.

Francesco Ragusa