Seeing Through the User's Eyes: Advances in Human-Centric Egocentric Vision

Tutorial at VISAPP 2026 - 21th International Conference on Computer Vision Theory and Applications

March 9 2026

Tutorial Description

Abstract

Wearable devices equipped with cameras, sensors, and on‑device AI capabilities are rapidly evolving, driven by the growing availability of commercial solutions and the integration of Augmented Reality into everyday workflows. These devices enable natural and continuous user-machine interaction and open the door to intelligent assistants that expand human capabilities. The combination of mobility, contextual awareness, and multimodal sensing makes wearable systems a unique platform for advanced AI and Computer Vision applications.

First-person (egocentric) vision, unlike traditional third-person approaches that observe the scene from an external point of view, captures the world directly from the user's perspective. This viewpoint provides privileged access to users’ actions, intentions, attention, and interactions with objects and the environment. Recent advances in multimodal learning, large-scale datasets, and foundation models have further accelerated research in egocentric understanding, task reasoning, and human-AI collaboration.

This tutorial will present an updated overview of the challenges and opportunities in egocentric vision, discussing its foundations while highlighting recent methodological breakthroughs and emerging applications.

Keywords

Wearable devices, first person vision, egocentric vision, augmented reality, egocentric datasets, action recognition, action anticipation, human-object interaction, procedural understanding

Aims and learning objectives

The participants will understand the main advantages of first person (egocentric) vision over third person vision to analyze the user’s behavior, build personalized applications and predict future events. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform action recognition, human-object interaction, procedural understanding and the prediction of future events.

Program

[10.30 - 12.00] Part I: History and motivations Slides

Perception and Egocentric Vision;
Seminal works in Egocentric Vision;
Differences between Third Person and First Person Vision;
First Person Vision datasets;
Wearable devices to acquire/process first person visual data;
Main research trends in First Person (Egocentric) Vision;
What's next?

[13.00 - 15.00] Part II: Fundamental tasks for first person vision systems Slides

Localization;
Hand/Object Detection;
Action/Activity Recognition
Human-Object Interaction;
Anticipation;
Dual-Agent Language Assistance
Industrial Applications
Conclusion.

Francesco Ragusa