ArXiv Preprint
Content moderation is the process of flagging content based on pre-defined
platform rules. There has been a growing need for AI moderators to safeguard
users as well as protect the mental health of human moderators from traumatic
content. While prior works have focused on identifying hateful/offensive
language, they are not adequate for meeting the challenges of content
moderation since 1) moderation decisions are based on violation of rules, which
subsumes detection of offensive speech, and 2) such rules often differ across
communities which entails an adaptive solution. We propose to study the
challenges of content moderation by introducing a multilingual dataset of 1.8
Million Reddit comments spanning 56 subreddits in English, German, Spanish and
French. We perform extensive experimental analysis to highlight the underlying
challenges and suggest related research problems such as cross-lingual
transfer, learning under label noise (human biases), transfer of moderation
models, and predicting the violated rule. Our dataset and analysis can help
better prepare for the challenges and opportunities of auto moderation.
Meng Ye, Karan Sikka, Katherine Atwell, Sabit Hassan, Ajay Divakaran, Malihe Alikhani
2023-02-19