The course complements distributed systems courses, with a focus on processing, storing and analyzing massive data. It prepares the students for master projects, and Ph.D. studies in the area of data-intensive computing systems. The main objective of this course is to provide the students with a solid foundation for understanding large scale distributed systems used for storing and processing massive data.
More specifically after the course is completed the student will be able to
- explain the architecture and properties of the computer systems needed to store, search and index large volumes of data
- describe the different computational models for processing large data sets for data at rest (batch processing) and data in motion (stream processing)
- use various computational engines to design and implements nontrivial analytics on massive data
- explain the different models for scheduling and resource allocation computational tasks on large computing clusters
- elaborate on the tradeoffs when designing efficient algorithms for processing massive data in a distributed computing setting.