Analyzing Text Data
Topics
This week’s assignments will guide you through the following topics:
- Know the essential components of text analysis
- Analyze text on Wikipedia
Reading
Please read these two papers:
- Sergio Martinez-Ortuno, Deepak Menghani, and Lars Roemheld.
“Sentiment as a Predictor of Wikipedia Editor Activity”
[Link]
- Chan Young Park, Xinru Yan, Anjalie Field, and Yulia Tsvetkov.
“Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia”
[Link]
Tasks
Complete the following tasks:
- Continue to compute/optimize the M-Statistic for all English Wikipedia Articles
- Compare the text generated by human vs. generated by bots
- Pick some highly-edited pages
- Consider the text generated by bots vs human (can consider only the parts of the text being changed)
- Run topic model (ex. LDA) OR sentiment analysis on two sets of edits
- Report what you think you find
- You might find these links helpful:
Weekly Questions
Answer the standard participation
questions