This course will cover the core concepts and skills in the emerging field of data science. The data science pipeline will be explored in depth: problem formulation, the acquisition and cleaning of multisource data sets, data summarization and exploratory analysis, model building, analysis and evaluation, and the presentation of results. Topics covered will include types of data sources and databases, web scraping and APIs, text parsing and regular expressions, experimental design, summary statistics, data visualizations, supervised (regression, logistic regression, decision trees, random forests, etc.) and unsupervised (clustering, network analysis) machine learning techniques, model evaluation and testing, and the construction of web applications and reports to present results. Students will gain direct experience in solving the programming and analytical challenges associated with data science through short assignments and a larger project.
Programming experience in Python is recommended.