Self-Supervised Representation and Multimodal Learning

Course Number

705.746

Primary Program

Course Format

Online - Asynchronous

This course explores the foundations, methodologies and applications of self-supervised and multimodal representation learning, two of the most transformative paradigms in the advancements of modern deep learning era. We provide comprehensive coverage on how different self-supervised learning techniques leverage unlabeled data through pretext tasks, contrastive objectives, negative/positive sample designs, representation learning, and generative modeling, enabling state-of-the-art performance in various downstream tasks and applications with minimal supervision. This course also systematically explores the classic theory and methods of multimodal learning, where information from multiple data modalities – such as images, videos, texts, audio, and sensor streams – are integrated to build powerful and generalizable machine perception systems. Students will study key principles of multimodal representations, fusion strategies, and alignment methods that enable effective cross-modal reasoning, as well as pretraining paradigms, multimodal neural architecture, and the design of large-scale foundation models.

Self-Supervised Representation and Multimodal Learning - 705.746

Stay Connected

Address

Contact

Site Menu