Embodied Intelligence is an advanced exploration of AI systems that perceive, act, and learn through direct interaction with the physical world. It combines machine learning, computer vision, robotics, and language technologies (NLP and Generative AI). Unlike traditional AI, which relies on abstract models and symbolic reasoning, this course focuses on intelligence as an emergent property of real-world interaction. By integrating perception, movement, and decision-making, embodied systems develop a richer understanding of their environments and execute tasks with greater adaptability. This course covers key principles of embodied AI, including multimodal sensory processing, behavior-driven intelligence enabled by ML, deep Learning and Generative AI, and robotics in unstructured environments. Students will explore the interplay between perception and action, studying techniques in vision-language navigation, embodied task completion, adaptive control, and learning from experience. Core topics include sensorimotor coordination, real-time decision-making, spatial reasoning, and the role of physical embodiment in shaping AI capabilities. Through hands-on projects, homework and critical discussions, students will gain a deep understanding of how AI systems, especially Generative AI [LLMs, Multi-modal Large Models (MLMs) and World Models (WMs)], can be designed to interact with the physical world. By the end of the course, students will be able to develop intelligent agents that integrate perception, action, and learning to operate effectively in dynamic, real-world settings.
Course Prerequisite(s)
EN.705.643 – Deep Learning Developments with PyTorch