Egocentric Vision: Emerging Trends and Human-Centric Applications
Tutorial at ICIAP 2025 - 23rd International Conference on Image Analysis and Processing
September 15 2025
[Program & Slides][Further Reading]
Tutorial Description
Abstract
Wearable devices with integrated cameras and computing capabilities are gaining significant attention. With the increasing availability of commercial devices and new product announcements, interest is on the rise. The main attraction of wearable devices lies in their mobility and ability to facilitate user-machine interaction through Augmented Reality. These features make them ideal for developing intelligent assistants that enhance human abilities, with AI and Computer Vision playing crucial roles.
Unlike traditional "third-person vision," which analyzes images from a static viewpoint, first-person (egocentric) vision captures images from the user's perspective, providing unique insights into their activities and interactions. Visual data from wearable cameras offers valuable information about users, their intentions, and their environment.
This tutorial will explore the challenges and opportunities of first-person (egocentric) vision, covering its historical background, key technological tools, and various applications.
Keywords
Wearable devices, first person vision, egocentric vision, augmented reality, visual localization, action recognition, action anticipation, human-object interaction, procedural assistance
Aims and learning objectives
Participants will understand the advantages of first-person vision over third-person vision in analyzing behavior and building personalized applications. They will learn about: 1) differences between third-person and first-person vision, 2) devices for data collection and user services, and 3) algorithms for managing first-person visual data, object detection, human-object interaction, gaze understanding and visual-language relations.
Program (Afternoon)
[TBA] Part I: History and motivations (Francesco Ragusa) Slides
- Agenda of the tutorial;
- Definitions, motivations, history and research trends of First Person (egocentric) Vision;
- Seminal works in First Person (Egocentric) Vision;
- Differences between Third Person and First Person Vision;
- First Person Vision datasets;
- Wearable devices to acquire/process first person visual data;
- Main research trends in First Person (Egocentric) Vision;
[TBA] Part II: Hand-Object Interactions in Egocentric Vision (Rosario Leonardi) Slides
- Introduction to Hand-Object Interactions Detection
- Definition and importance of Hand-Object Interactions (HOI)
- Applications in AR/VR, robotics, industrial monitoring, and assistive systems
- Datasets and Benchmarks for HOI in Egocentric Vision
- Overview of popular datasets
- Challenges in dataset collection and annotation
- Models and Architectures for Hand-Object Interactions Detection
[TBA] Part III: Gaze Understanding and Visual-Language Benchmarks (Michele Mazzamuto) Slides
- Gaze Signal Fundamentals
- Definitions
- Gaze-Based Dataset
- Tasks
- Gaze signal in computer vision
- Gaze prediction
- The use of gaze information
- Attended object detection
- Gaze signal for mistake detection
- Building procedural assistant with VLLM
- Open Challenges and Future Directions
References - Further Reading
Acknowledgment
Future Artificial Intelligence Research (FAIR) – PNRR MUR Cod. PE0000013 - CUP: E63C22001940006. This research has been partially supported by the project EXTRA-EYE - PRIN 2022 - CUP E53D23008280006 - Finanziato dall’Unione Europea - Next Generation EU.