Capstone Sequence in Data Science

The two quarter Capstone Sequence, required of every advanced student in the Data Science program, challenges students to begin with a problem in a chosen domain and attempt to better understand, approach, and solve the problem using the computational and statistical tools they have developed over the duration of their coursework.

A capstone sequence in Data Science is unusual in two ways:

  • Data Science students may lack meaningful domain knowledge in their chosen domain. However, they have skills a domain expert likely does not. This is a property of Data Science in general.
  • The possible Data Science Capstone projects are a sparse set inside a high-dimensional space. Natural choices for projects may come from the sciences, engineering, the social sciences, or the arts and humanities.

To address these issues, in a manner that scales to a couple hundred students per year, the Data Science Capstone Sequence centers around sponsoring “Domains of Inquiry”. A Domain of Inquiry is an approachable problem area acessible to advanced undergraduate students that are rich enough to support a broad set of questions – i.e. a small neighborhood inside this high dimensional space. Students choose a Domain of Inquiry at the outset of the two quarters and commit to pursuing a project in that domain. Each domain supports numerous group projects.

The core of the Capstone Sequence is thus a loosely coordinated collection of small workshops, divided by domain of inquiry, where groups of students engage in a discussion-based learning environment with a mentor (“domain expert”) and other groups of students. Additionally, a parallel lecture component sets “methodological best practices” for projects across the domains of inquiry.

After replicating results in their chosen area, student propose their own project within the area. See the project directory to browse the projects from 2019-2020.

Course Materials

The course page contains all course material.