Quarter 1 of the capstone covers two parallel topics:
- The basics of “data science methodology” for a large project, including best practices for data handling and project reproducibility.
- Beginning research into your choice of domain. Acquaintance with a domain is made via replicating a specified result on a curated dataset in the listed possible areas (e.g. genetics, cognitive science, computer vision, oceanography, political science, sociology, industry analytics, and art).
The replication of the domain result in (2) will use the best practices learned in (1). This work then serves as a foundation for project proposals due at the end of the quarter. The projects will be worked on, in groups, in the second quarter. While the methodology portion is taught in a traditional lecture setting, most of the material covered in this course will be done through reading, data exploration, and ensuing discussion.
Lecture (data science methodology)
One hour per week will be devoted to lecture on data science methodology. There will be accompanying light homework assignments.
Two hours per week will be devoted to discussion about domain specific topics, as such you must attend the section for your choice of domain. Discussion will involve discussion about reading and assignments, so it is imperative that you complete the relevant assignments before attending discussion section. Each section begins with a set of questions to which you will write a response; your response will serve to stimulate class discussion.
Remark on how the course is split
As is common in Data Science, you will likely find yourself as a bridge between domain specialists and (computing) methodology specialists. In the case of this course, it is expected and normal that discussion section leaders will not know specifics of your code (or even know the language you are coding in!). You will have both (1) office hours with a methodology expert and (2) office hours and discussion with domain experts. As such, it is up to you to formulate your questions for the appropriate audience (domain expert or computing expert), so that you can adequately communicate with them to solve the problem you are facing.
Weekly HW on the lecture component of the course is meant to reinforce lecture topics enough to enable you to apply the topics to the domain-specific project. This HW is worked on individually.
Replication of a result in chosen domain
These assignments serve the dual purpose of introducing you to your domain of choice, as well as helping you learn the mechanics of putting together a project in your specific domain. Your output will serve “starter code” for your own project work in Q2. This output generally falls into the following categories:
- Literature review (written report)
- Exploratory data analysis of dataset (written report)
- Result replication (written report)
- Data acquisition + cleaning (code)
- Full project pipeline (code)
This HW is worked on individually
The final course deliverable is a project proposal for the second quarter project. This proposal (along with the project itself) will be worked on in small groups.
- Written proposal (with background research)
- Skeleton project workflow / github repository / page.
- Elevator pitch (video on Piazza)
This proposal (along with the project itself) will be worked on in small groups.
Assessments and Grades
The course grade will be computed using the following proportions:
|Component||% of Grade|
|Discussion Section Participation||10%|
|Domain result replication (3 reports)||30%|
|Domain result replication (workflow)||20%|
Collaboration Policy and Academic Integrity
In DSC 180, we expect you to work hard and engage with material that originates outside the academic walls. All ideas and work must be your own, that of your approved group, or properly cited. Act with integrity and don’t cheat.
In DSC 180 you are encouraged to use outside resources to help with your work. However, you must properly cite any concepts, writing, or code that originates from other sources. If you are unsure of whether something needs a citation, it’s best to:
- consult the domain expert for your section, and
- follow the examples in course readings.
- place code citations with the relevant link in comments.
The following activities are considered cheating and ARE NOT ALLOWED in DSC 180 (this is not an exhaustive list):
- Using or submitting either writing or code acquired from other students (except your partner, where allowed).
- Not properly citing ideas, writing, or code acquired from outside sources. (Citations are a good thing!)
- Having any other student complete any part of an assignment on your behalf.
- Completing an assignment on behalf of someone else.
The following activities are examples of appropriate collaboration and ARE ALLOWED in DSC 180:
- Discussing the general approach to understanding or solving a problem.
- Talking about debugging/cleaning strategies or issues you ran into and how you solved them.
- Using outside material with proper citations (including StackOverflow code!).