The development of microarray technology, rapid sequencing, protein chips, and metabolic data has led to an explosion in the collection of "high-content" biological data. This course explores the analysis and mining of gene expression data and high-content biological data. A survey of gene and protein arrays, laboratory information management systems, data normalization, and available tools is followed by a more in-depth treatment of differential gene expression detection, clustering techniques, pathway extraction, network model building, biomarker evaluation, and model identification. Both clinical and research data will be considered. The student will develop skills in statistical analysis and data mining including statistical detection theory, nonlinear and multiple regression, entropy measurement, detection of hidden patterns in data, heuristic search and learning algorithms. Applied mathematical concepts and biological principles will be introduced, and students will focus on algorithm design and software application for designing and implementing novel ways of analyzing gene, protein and metabolic expression data. The statistical programming language R is used extensively in lecture and homework. Packages from Bioconductor, including many which contain data sets, are used regularly as well. Students will complete data analysis assignments individually and in small teams.

Course prerequisite(s): 

605.205 Molecular Biology for Computer Scientists or equivalent or a prior course in Bioinformatics, a course in probability and statistics, and ability to program in a high-level language.

Course note(s): 

There are no exams, but programming assignments are intensive. Students in the MS Bioinformatics program may take both this course and 410.671 Microarrays and Analysis, as the content is largely mutually exclusive.