This foundation course provides an overview of data analysis process, and introduces students to common techniques for data preprocessing, feature extraction, and the creation of statistical models. In particular, students will develop competence in areas of high importance for data scientists and engineers, such as: exploring the trade-off between bias and variance, selecting and creating features, regularizing models, determining optimal hyperparameters, and evaluating model performance. Multiple datasets and data types (e.g., unstructured text, imagery, and time-varying signals) will be considered with the goal of building student confidence across a spectrum of analysis challenges. Particular topics include linear and non-linear regression, decision trees, various approaches to dimensionality reduction, clustering, topic modeling, Bayesian methods, and neural networks.
Programming experience in Python, introductory linear algebra, and probability theory recommended.