Cyberbullying is a a real and worrisome problem affecting many school children across the world. Sadly, it is becoming customary to hear tragic stories in the news about cases of cyberbullying leading to self harm or worse. Although these extreme cases are a minority, a large number of children go through innumerable hours of pain and suffering because of things that are written about them in social media fora.

This problem has attracted quite a lot of attention, and several researchers have tapped into natural language processing technology to develop tools to detect cases of cyberbullying. One notable example is the research at the MIT Media Lab (Ganging Up on Cyberbullying), and there are others.

To contribute a bit to that cause, Jacob Reckhard, Christopher West, and Ibrahim Elmallah, three Edmonton high-school students funded through the Ross and Verna Tate High School Internship Program working in my lab during the Summer of 2016, developed a custom keyboard for Android devices that checks whether some typed text can be seen as offensive by expressing a negative sentiment.

Here are examples of the output of the keyboard when positive or neutral text is typed:

positive sentence

And here is an example of negative text:

negative sentence

Our motivation was to provide a way to give children and adolescents a chance to reflect and possibly revise what they were going to post, by having the keyboard verify the content and require further action when that content was deemed negative.

The keyboard can be set as the default on an Android device, for all apps, thus checking the text for every single post on every platform.

The students developed and evaluated both rule-based and machine-learned statistical classifier approaches for the problem, considering efficacy as well as easiness of deployment. Their evaluation was done on a corpus of 5K tweets compiled and labeled by Sanders Analytics. They considered both the efficacy of each model as well as the impact on the user experience on the device.

From a technical point of view was a great opportunity to teach them fundamental ideas of natural language processing and machine learning. They experienced hands-on what it is to teach a model and how to deploy it inside an application. In the end we chose a simple regression model, which was accurate enough and very efficient to deploy. The project was also a great opportunity for the students to consider and reflect on how technology can be put to use for a good cause.

The code is open source (and built on other open source tools). We hope the resources below are useful.

Resources

Media Coverage