Visual Perception and Spatial Computing (12 hours)

Teacher: Luigi Freda (PhD, Robotics & Computer Vision Engineer/Researcher)

Today, commodity cameras are used extensively everywhere (e.g. robots, cars, smartphones, AR/VR headsets,
wearable devices, etc.). These lightweight and low-cost sensors provide very rich information which allows to build a
3D model of the surrounding environment and understand its structure. This course introduces perception methods and tools for building 3D models and extract their semantic structure. Such a capability crucially enables AI systems to attain an intelligent and long-term scene interaction.
To this aim, we provide an introduction to visual SLAM and real-time techniques, focusing on how to

* robustly localize a camera system with respect to the environment,

* compute a dense 3D reconstruction of the surrounding scene,

* segment the obtained 3D model by using both geometry and semantics,

* use deep learning for empowering advanced scene understanding and improve SLAM performances.

We present emerging spatial AI techniques which have many potential applications, including mixed-reality, virtual
reality, and cognitive robotics. Hands-on use of the illustrated techniques, based on dedicated libraries for computer
vision such as OpenCV, applied to selected case studies.