The two quarter Capstone Sequence, required of every advanced student in the Data Science program, challenges students to begin with a problem in a chosen domain and attempt to better understand, approach, and solve the problem using the computational and statistical tools they have developed over the duration of their coursework.
A capstone sequence in Data Science is unusual in two ways:
- Data Science students may lack meaningful domain knowledge in their chosen domain. However, they have skills a domain expert likely does not. This is a property of Data Science in general.
- The possible Data Science Capstone projects are a sparse set inside a high-dimensional space. Natural choices for projects may come from the sciences, engineering, the social sciences, or the arts and humanities.
To address these issues, in a manner that scales to a couple hundred students per year, the Data Science Capstone Sequence centers around sponsoring “Domains of Inquiry”. A Domain of Inquiry is an approachable problem area acessible to advanced undergraduate students that are rich enough to support a broad set of questions – i.e. a small neighborhood inside this high dimensional space. Students choose a Domain of Inquiry at the outset of the two quarters and commit to pursuing a project in that domain. Each domain supports numerous group projects.
The core of the Capstone Sequence is thus a loosely coordinated collection of small workshops, divided by domain of inquiry, where groups of students engage in a discussion-based learning environment with a mentor (“domain expert”) and other groups of students. Additionally, a parallel lecture component sets “methodological best practices” for projects across the domains of inquiry.
Example domains of inquiry include:
- Wikipedia Edit Wars (Roberts)
- Quantitative Measurement of Artistic Style (Twomey)
- Fair Policing and Predictive Policing (Fraenkel)
- Clustering the Human Genome (Ellis)
- Malware and Graph Embeddings (Fraenkel)
After replicating results in each of these areas, student propose their own project within the area.
See the project directory to browse the projects from 2019-2020.
The course consists of a two part sequence, each with two components:
|Part I||DS-Project Development||Result Replication|
|Part II||Team work and communication||Student Project Development|
The Syllabi for the two quarter sequence:
The quarter-1 assignments are here:
(Domain) Instructor Material
The “Domain of Inquiry” proposal template, student assignment template, and schedule for guided self-learning are general enough to reasonably adapt to data science projects in most domains. The templates are meant to ensure a common rubric across a wide variety of projects.