Published by Springer, 2022
Author: Stefanie Ullmann, Giving Voice to Digital Democracies, Postdoctoral Research Associate, 2018 – 2022
Edited by Ariane Hanemaayer, Visiting Fellow 2019- 2021
In recent years, headlines such as ‘Is Google Translate Sexist?’ (Mail Online, Is Google translate sexist? Users report biased results when translating gender-neutral languages into English in 2017) or ‘The Algorithm that Helped Google Translate Become Sexist’ (Olson, The Algorithm that Helped Google Translate Become Sexist in 2018) have appeared in the technology sections of the world’s news providers. The nature of our highly interconnected world has made online translators indispensable tools in our daily lives. However, their output has the potential to cause great social harm. Due to the continuous pursuit to create ever larger language models and, as a consequence thereof, the opaque nature of unsupervised training datasets, language-based AI systems, such as online translators, can easily produce biased content. If left unchecked, this will inevitable have detrimental consequences. This chapter addresses the nature, impact and risks of bias in training data by looking at the concrete example of gender bias in machine translation (MT). The first section will provide an introduction to recent proposals for ethical AI guidelines in different sectors and the field of natural language processing (NLP) will be presented. Next, I will explain different types of bias in machine learning and how they can manifest themselves in language models. This is followed by presenting the results of a corpus-linguistic analysis I performed of a sample dataset that was later used to train a MT system. I will explore the gender-related imbalances in the corpus that are likely to give rise to biased results. In the final section of this chapter, I will discuss different approaches to reduce gender bias in MT and present findings from a set of experiments my colleagues and I conducted ourselves to mitigate bias in MT. The research presented in this chapter takes a highly interdisciplinary approach, as it takes expertise from linguistics, philosophy, computer science and engineering in order to successfully dismantle and solve the complex problem of bias in NLP.