Which Politicians Receive Abuse?

dataset

posted on 2020-05-20, 18:29 authored by Genevieve Gorrell, Mehmet Bakir, Ian RobertsIan Roberts, Mark GreenwoodMark Greenwood, Kalina BontchevaKalina Bontcheva

The spreadsheets contain aggregate statistics for abusive language found in tweets to UK politicians in 2019. An overview spreadsheet is provided for each of the months of January to November ("per-mp-xxx-2019.csv" where xxx is the abbreviation for the month), with one row per MP, and a spreadsheet with data per day is provided for the campaign period of the UK 2019 general election, with one row per candidate, starting at the beginning of November and finishing on December 15th, a few days after the election ("campaign-period-per-cand-per-day.csv"). These spreadsheets list, for each individual, gender, party, the start and end times of the counts, tweets authored, retweets *by* the individual, replies by the individual, the number of times the individual was retweeted, replies received by the individual ("replyTo"), abusive tweets received in total and abusive tweets received in each of the categories sexist, racist and political.

Two additional spreadsheets focus on topics; "topics-of-cands.csv" and "topics-of-replies.csv". In the first, counts of tweets mentioning each of a set of topics are given, alongside counts of abusive tweets mentioning each topic, in tweets *by* each candidate. In the second, the counts are of replies received when a candidate mentions a topic, alongside abusive replies received when they mentioned that topic.

The data complement the forthcoming paper "Which Politicians Receive Abuse? Four Factors Illuminated in the UK General Election 2019", by Genevieve Gorrell, Mehmet E Bakir, Ian Roberts, Mark A Greenwood and Kalina Bontcheva. The way the data were acquired is described more fully in the paper.

Ethics approval was granted to collect the data through application 25371 at the University of Sheffield.

Funding

ESRC Grant number ES/T012714/1 "Responsible AI for Inclusive, Democratic Societies: A cross-disciplinary approach to detecting and countering abusive language online"

History

Ethics

The project has ethical approval and have included the number in the description field

Policy

The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

The data can be shared openly

Data description

The file formats are open or commonly used

Methodology, headings and units

There is a readme.txt file describing the methodology, headings and units

Usage metrics

Keywords

Politics Twitter Online abuse Election Natural language processing Human Information Behaviour Natural Language Processing Social and Community Informatics

Licence

CC BY 4.0