The sequencing of thousands of genomes, including those related to disease states, interest in proteomics, epigenetics, and variation have resulted in an explosive growth in the number of biological databases, as well as the need to develop new databases to handle the diverse new content being generated. The course focuses on the design of biological databases and examines issues such as those related to data modeling, heterogeneity, interoperability, evidence, and tool integration. It also surveys a wide range of biological databases and their access tools and enables students to develop proficiency in their use. Databases introduced include genome and sequence databases such as GenBank and Ensembl, as well as protein databases such as PDB and UniProt. Databases related to RNA, sequence variation, pathways and interactions, metagenomics, and epigenomics are also presented. Tools for accessing and manipulating data from databases such as BLAST, genome browsers, multiple sequence alignment, gene finding, and protein tools are reviewed. The programming language Perl is introduced, along with the use of Perl in obtaining data via web services and in storing data in an SQLite database. Students will use Perl (Python may be used for some assignments), biological databases, and database tools to complete homework assignments. They will also design a database and will write code in the language of their choice to create their own database as a course project.
(For JHEP Students) EN.605.205 Molecular Biology for Computer Scientists or AS.410.634 Practical Computer Concepts for Bioinformatics or equivalent; EN.605.641 Principles of Database Systems or equivalent; EN.605.202 Data Structures and EN.605.201 Introduction to Programming Using Java.