Understanding the M-Statistic
Topics
This week’s assignments will guide you through the following topics:
- Calculating the M-Statistic
- Doing a data analysis on the M-Statistic.
Note: it’s very important to work through this week’s tasks (and
last week’s as well). They serve as the base for everything that we
will work on going forward.
Reading
Please read the following:
- Again, read Edit Wars on Wikipedia Section 02 carefully. Again,
really understand it and try to verify what you are reading in the
data.
- Read through Section III (‘Other Indicators of Controversiality’)
and verify the observations in that section in the data.
- Read about the temporal aspects of editing articles by the same
authors in Dynamics of Conflicts in
Wikipedia
Tasks
Complete the following tasks:
- Calculate the M-Statistic for all articles in the light-dump data.
- Analyze the M-statistic of all articles and incorporate this into
your EDA. In Particuar, use this to address the issues discussed in
Section III of the ‘Edit Wars’ paper.
- Among those articles with edit wars, calculate the M-Statistic as a
function of time (or edit-number of the article). When does the
M-Statistic go up in time? When does it go down in time? What is the
correlation between an article’s M-Statistic and various
time-related variables (e.g. age of article, time since last edit,
etc).
Weekly Questions
Answer the following questions:
- In english and simple english wikipedias, give the 50/90/95/99
percentile M-Statistic. (Think about how to explain the descrepancy
– though don’t turn it in).
- Give an observation from your data analysis you found interesting
(and tell me why it’s interesting!)
- What was a difficulty you came across doing your work this week?