Environmental awareness is a crucial skill for robotic systems intended to autonomously navigate and interact with their surroundings.
Robots access knowledge about their environment through maps. However, currently we see a big “complexity gap” in robotic mapping: while in recent years advances in computer vision have given us the ability to perceive our surroundings like never before through object detection and people tracking, robots still rely on maps containing only enough information for them to be able to navigate, but insufficient for many other tasks required by advanced autonomy. For example, most maps do not host semantic or dynamic information about the environment, needed for any application where interaction with people or specific objects is required. Until this gap is bridged, mobile robots will not be able to operate autonomously in dynamic environments.
Hypermaps lays the groundwork for the next level of interaction between robots and their environment by closing the complexity gap. In this project, we propose to go beyond today’s multi-layer maps by a new formalism, called hypermaps, where spatio-temporal knowledge (e.g., occupancy, semantics through deep object recognition, people movement in the environment) is stored and processed through advanced artificial intelligence to offer the robot task-specific maps to complete its missions. The core hypothesis of the project is that such a formalism will leverage the interplay between different maps to extract even more information and allow deeper reasoning. Anomalies in one map will be detected and corrected by looking at its correlation with the other maps, and information not visible in any single map will be made visible when the information of the layers is combined.
Closing the complexity gap constitutes a fundamental step towards the development of general, fully autonomous robots, able to execute high-level tasks and interact with us and their environment.
related publications
conference articles
IROS
Bayesian Floor Field: Transferring people flow predictions across environments
Francesco Verdoja, Tomasz Piotr Kucner, and Ville Kyrki
In 2024 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Oct 2024
Mapping people dynamics is a crucial skill for robots, because it enables them to coexist in human-inhabited environments. However, learning a model of people dynamics is a time consuming process which requires observation of large amount of people moving in an environment. Moreover, approaches for mapping dynamics are unable to transfer the learned models across environments: each model is only able to describe the dynamics of the environment it has been built in. However, the impact of architectural geometry on people’s movement can be used to anticipate their patterns of dynamics, and recent work has looked into learning maps of dynamics from occupancy. So far however, approaches based on trajectories and those based on geometry have not been combined. In this work we propose a novel Bayesian approach to learn people dynamics able to combine knowledge about the environment geometry with observations from human trajectories. An occupancy-based deep prior is used to build an initial transition model without requiring any observations of pedestrian; the model is then updated when observations become available using Bayesian inference. We demonstrate the ability of our model to increase data efficiency and to generalize across real large-scale environments, which is unprecedented for maps of dynamics.
workshop articles
ICRA
Evaluating the quality of robotic visual-language maps
Matti Pekkanen, Tsvetomila Mihaylova, Francesco Verdoja, and Ville Kyrki
May 2024
Presented at the “Vision-Language Models for Navigation and Manipulation (VLMNM)” workshop at the IEEE Int. Conf. on Robotics and Automation (ICRA)
Visual-language models (VLMs) have recently been introduced in robotic mapping by using the latent representations, i.e., embeddings, of the VLMs to represent the natural language semantics in the map. The main benefit is moving beyond a small set of human-created labels toward open-vocabulary scene understanding. While there is anecdotal evidence that maps built this way support downstream tasks, such as navigation, rigorous analysis of the quality of the maps using these embeddings is lacking. In this paper, we propose a way to analyze the quality of maps created using VLMs by evaluating two critical properties: queryability and consistency. We demonstrate the proposed method by evaluating the maps created by two state-of-the-art methods, VLMaps and OpenScene, using two encoders, LSeg and OpenSeg, using real-world data from the Matterport3D data set. We find that OpenScene outperforms VLMaps with both encoders, and LSeg outperforms OpenSeg with both methods.
ICRA
Using occupancy priors to generalize people flow predictions
Francesco Verdoja, Tomasz Piotr Kucner, and Ville Kyrki
May 2024
Presented at the “Long-term Human Motion Prediction” workshop at the IEEE Int. Conf. on Robotics and Automation (ICRA)
Mapping people dynamics is a crucial skill for robots, because it enables them to coexist in human-inhabited environments. However, learning a model of people dynamics is a time consuming process which requires observation of large amount of people moving in an environment. Moreover, approaches for mapping dynamics are unable to transfer the learned models across environments: each model is only able to describe the dynamics of the environment it has been built in. However, the impact of architectural geometry on people’s movement can be used to anticipate their patterns of dynamics, and recent work has looked into learning maps of dynamics from occupancy. So far however, approaches based on trajectories and those based on geometry have not been combined. In this work we propose a novel Bayesian approach to learn people dynamics able to combine knowledge about the environment geometry with observations from human trajectories. An occupancy-based deep prior is used to build an initial transition model without requiring any observations of pedestrian; the model is then updated when observations become available using Bayesian inference. We demonstrate the ability of our model to increase data efficiency and to generalize across real large-scale environments, which is unprecedented for maps of dynamics.