Visual Perception and Spatial Computing

Teacher: Luigi Freda (Inglobe Technologies)

Today, commodity cameras are used extensively everywhere (e.g. robots, cars, smartphones, AR/VR headsets, wearable devices, etc.). These lightweight and low-cost sensors provide very rich information which allows to build a 3D model of the surrounding environment and understand its structure. This course introduces perception methods and tools for building 3D models and extract their semantic structure. Such a capability crucially enables AI systems to attain an intelligent and long-term scene interaction. 

To this aim, we provide an introduction to visual SLAM and real-time techniques, focusing on how to 

* robustly localize a camera system with respect to the environment,

* compute a dense 3D reconstruction of the surrounding scene,

* segment the obtained 3D model by using both geometry and semantics,

* use deep learning for empowering advanced scene understanding and improve SLAM performances.

We present emerging spatial AI techniques which have many potential applications, including mixed-reality, virtual reality, and cognitive robotics.