Capstone Sequence in Data Science

The two quarter Capstone Sequence, required of every advanced student in the Data Science program, challenges students to begin with a problem in a chosen domain and attempt to better understand, approach, and solve the problem using the computational and statistical tools they have developed over the duration of their coursework.

A capstone sequence in Data Science is unusual in two ways:

  • Data Science students may lack meaningful domain knowledge in their chosen domain. However, they have skills a domain expert likely does not. This is a property of Data Science in general.
  • The possible Data Science Capstone projects are a sparse set inside a high-dimensional space. Natural choices for projects may come from the sciences, engineering, the social sciences, or the arts and humanities.

To address these issues, in a manner that scales to a couple hundred students per year, the Data Science Capstone Sequence centers around sponsoring “Domains of Inquiry”. A Domain of Inquiry is an approachable problem area acessible to advanced undergraduate students that are rich enough to support a broad set of questions – i.e. a small neighborhood inside this high dimensional space. Students choose a Domain of Inquiry at the outset of the two quarters and commit to pursuing a project in that domain. Each domain supports numerous group projects.

The core of the Capstone Sequence is thus a loosely coordinated collection of small workshops, divided by domain of inquiry, where groups of students engage in a discussion-based learning environment with a mentor (“domain expert”) and other groups of students. Additionally, a parallel lecture component sets “methodological best practices” for projects across the domains of inquiry.

Example domains of inquiry include:

After replicating results in each of these areas, student propose their own project within the area.

See the project directory to browse the projects from 2019-2020.

Course Materials

The course consists of a two part sequence, each with two components:

Part IDS-Project DevelopmentResult Replication
Part IITeam work and communicationStudent Project Development


The Syllabi for the two quarter sequence:


The quarter-1 assignments are here:

(Domain) Instructor Material

The “Domain of Inquiry” proposal template, student assignment template, and schedule for guided self-learning are general enough to reasonably adapt to data science projects in most domains. The templates are meant to ensure a common rubric across a wide variety of projects.