Virtually all modern communication technologies rely on data. This data is often inherently equipped with biases, naturally reflecting our own human and social biases. If dealt with blindly, these biases are not only reproduced, but magnified by technology with potentially severe and discriminating effects. What is more, different manifestations of bias have evolved in recent years. For example, if Google Translate is trained on data predominantly associating doctors with men, nurses with women, this will present itself in suggested translations.
A different scenario is research on hate speech detection. Preparing data still largely relies on human annotators, who are predominantly Caucasian. Too often this leads to examples of the African American variety being falsely categorised as offensive, simply due to the unfamiliarity with the language. In an automated hate speech detection system, this will inevitably lead to a negative bias towards African American language use.
This talk focuses on gender and racial bias in data to explore these problems and discuss some of the possibilities for improvement.
Presented by Stefanie Ullmann, Giving Voice to Digital Democracies Research Project.