Measuring Conflict on Wikipedia
Topics
This week’s assignments will guide you through the following topics:
- An introduction to edit wars on Wikipedia
- Using edits to measure of participation on Wikipedia
Reading
Please read the following:
Optional Reading
On the topic of edit session data, you may find this article’s careful
usage of this data type interesting:
- R. Stuart Geiger and Aaron Halfaker. “Using Edit Sessions to Measure
Participation in Wikipedia.”
CSCW 2013. [Link]
Tasks
Complete the following tasks:
- Read the companion webpage for the Edit Wars paper and
download/analyze the following contents:
WikiWarMonitor. Pay careful attention to:
- ‘Light-dump’ data, the intermediate data the authors use to
identify edit wars.
- The script that transforms wikipedia dump data to light dump data.
- The script that computes the M-Statistic.
- There are two articles contained in two files in the location
/teams/DSC180A_FA20_A00/b03onlinecommunities/testdata
. One of the
files is the wikipedia edit dump format, while the other is in
light-dump format. Look at the contents and identify the reversions
in the light-dump data with the actual change in text in the
XML (you may find it useful to look at the articles in the browser,
as well). Write code that translates wikipedia edit data into light dump
data. Hint: you can use hashing to tell if contents of two
articles are exactly the same.
Weekly Questions
Answer the following questions:
- How to the authors of Edit Wars define ‘contraversiality’ in
wikipedia edits? Give two answers:
- an answer that is understandable to a general audience,
- an answer that is precise enough to translate into code.
- Give an obvservation you found interesting in the wikipedia policy
on edit wars. Try to give an observation relevant to the Edit Wars
paper.
- In the test data, how often does a reversion (as defined in the
light-dump data) have the word ‘revert’ in the ‘comment’ field of
the XML?