This course investigates the theory and practice of modern large-scale database systems. Large-scale approaches include distributed relational databases; data warehouses; and the Hadoop ecosystem (Hadoop, Accumulo, and the Mahout machine learning libraries). Topics discussed include data design and architecture; database security, integrity, query processing, query optimization, transaction management, concurrency control, and fault tolerance; and query formulation, algorithms, and cloud analytics. At the end of the course, students will understand the principles of several common large-scale data systems including their architecture, performance, and costs. Students will also gain a sense of which approach is recommended for different circumstances.
605.202 Data Structures; 605.641 Principles of Database Systems or equivalent. Familiarity with “big-O” concepts and notation is recommended.