Understanding the M-Statistic

Topics

This week’s assignments will guide you through the following topics:

  • Calculating the M-Statistic
  • Doing a data analysis on the M-Statistic.

Note: it’s very important to work through this week’s tasks (and last week’s as well). They serve as the base for everything that we will work on going forward.

Reading

Please read the following:

  • Again, read Edit Wars on Wikipedia Section 02 carefully. Again, really understand it and try to verify what you are reading in the data.
  • Read through Section III (‘Other Indicators of Controversiality’) and verify the observations in that section in the data.
  • Read about the temporal aspects of editing articles by the same authors in Dynamics of Conflicts in Wikipedia

Tasks

Complete the following tasks:

  • Calculate the M-Statistic for all articles in the light-dump data.
  • Analyze the M-statistic of all articles and incorporate this into your EDA. In Particuar, use this to address the issues discussed in Section III of the ‘Edit Wars’ paper.
  • Among those articles with edit wars, calculate the M-Statistic as a function of time (or edit-number of the article). When does the M-Statistic go up in time? When does it go down in time? What is the correlation between an article’s M-Statistic and various time-related variables (e.g. age of article, time since last edit, etc).

Weekly Questions

Answer the following questions:

  • In english and simple english wikipedias, give the 50/90/95/99 percentile M-Statistic. (Think about how to explain the descrepancy – though don’t turn it in).
  • Give an observation from your data analysis you found interesting (and tell me why it’s interesting!)
  • What was a difficulty you came across doing your work this week?