Skip to content

Latest commit

 

History

History
92 lines (75 loc) · 5.54 KB

syllabus.md

File metadata and controls

92 lines (75 loc) · 5.54 KB

Course Syllabus: CS 599 L1 "User-centric Systems for Data Science" (Fall 2022)

Instructor Name: John Liagouris
TA: Vivek Unnikrishnan
Course Time & Location: Tue/Thu 12:30-13:45, WED 130
Instructor's Office Hours: Tue 4-6pm, MCS 207
TA's Office Hours: Fri 10am-12pm, MCS 103

Courseware

  • We will use the course website to maintain an up-to-date class schedule.
  • We will use Piazza for discussions on course topics, as well as questions and clarifications regarding the assignments. You may also post on Piazza if you have any question on course logistics.

Course Description

CS 599 L1 will be taught in the style of a graduate course that requires reading research papers and independent exploration of the material. The class will include lectures, readings, quizzes, programming assignments, and an optional final project. During the semester, we will discuss the following topics:

  • PART I: Data provenance
    • Lineage, Why, How and Where provenance
    • Why-not provenance
    • Database causality and responsibility
  • PART II: Interpretable ML
    • Glass-box models
    • Explaining classification results (LIME)
    • Interpretting model predictions (SHAP)
  • PART III: System observability
    • Causal profiling
    • End-to-end tracing
    • Critical path analysis

Course Objectives

The course aims at training students in fundamental and emerging techniques that help humans understand complex data processing pipelines. The course is divided in three parts. In Part I, we will discuss concepts of database provenance and causality that provide insights into query results. In Part II, we will discuss AI techniques for explaining classifications and interpreting model predictions. In Part III, we will discuss state-of-the-art profiling, tracing, and performance analysis techniques for distributed systems.

At the end of the course, successful students will have a solid understanding of:

  • techniques that provide insights into the outputs and performance of data processing pipelines
  • the challenges and trade-offs one needs to consider when designing systems with a focus on explainability and observability

Course Materials

There is no required textbook for this class. After each lecture, slides will be posted on Piazza. Further publicly available resources are listed in the course website, under "Readings". Parts of the resources listed there will be given as (non-graded) reading assignments during the course. You should be able to access all of these for free when connected to the campus network.

Class Schedule

The (tentative) lecture schedule is available here.

Attendance

Students are expected to attend lectures in person. All course material will be posted on Piazza. Ultimately, students are responsible for their own learning and, thus, for keeping up with the material.

Grading Scheme

The course includes lectures, quizzes, and hands-on assignments. There is no final exam. Your grade will be determined as follows:

  • 3 in-class quizzes: 15%:
    • Each 20-min quiz contributes 5% to the final grade.
  • 4 assignments (or 3 assignments and a final project): 85%:
    • Assignment #1 contributes 15%.
    • Assignment #2 contributes 20%.
    • Assignment #3 contributes 20%.
    • Assignment #4 / Final Project contributes 30%.
    • To be considered complete, code deliverables must include sufficient documentation.

Final project (optional)

  • Instead of Assignment #4.
  • Can be done individually or in teams of at most 3 students.
  • Students must discuss the scope of the project with the instructor and submit a project proposal.
  • Deadline for project proposals is November 1st, 2022.
  • Final projects also require a 15min presentation.

Late Submission

Assignment solutions and final projects will be submitted using the course Gitlab. Students who submit assignments late will only be eligible for up to 50% of the original score.

Prerequisites

The course will be self-contained. Each one of the three parts will have an introductory lecture on the necessary concepts to understand the related research papers. Students must have strong programming skills (C/Python) and basic knowledge of data structures, algorithms, and computer systems (CS 112, CS 210 or equivalent experience).

Academic Conduct

All hands-on assignments must be completed individually. Discussion with fellow students via Piazza or in-person are encouraged, but presenting the work of another person as your own is expressly forbidden. This includes "borrowing", "stealing", copying programs/solutions or parts of them from others. Note that we may use an automated plagiarism checker. Cheating will not be tolerated under any circumstances.

Please review the BU Academic Conduct Code for more information.

Supporting Students With Disabilities

If you are a student with a disability or believe you might have a disability that requires accommodations, please contact the Office for Disability Services (ODS) at (617) 353-3658 or [email protected] to coordinate any reasonable accommodation requests. ODS is located at 25 Buick Street on the 3rd floor.