The Giving Voice to Digital Democracies project hosted its second workshop on 17 May 2019 at the Centre for Research in the Arts, Social Sciences and Humanities (CRASSH). The project is part of the Centre for the Humanities and Social Change, Cambridge, funded by the Humanities and Social Change International Foundation.
The project investigates the social impact of Artificially Intelligent Communications Technology (AICT). The talks and discussions of this second workshop focused specifically on different aspects of the complex relationships between language, gender, and technology. The one-day event brought together experts and researchers from various academic disciplines, looking at questions of gender in the context of language-based AI from linguistic, philosophical, sociological, and technical perspectives.
Professor Alison Adam (Sheffield Hallam University) was the first speaker of the day, and she asked the very pertinent question of how relevant traditional feminist arguments and philosophical critiques of AI are nowadays in relations to gender, knowledge, and language. She pointed out that while older critiques focused on the distinction between machines and humans, newer critical approaches address concerns for fairness and bias. Despite these shifts, Adam stressed the abiding need to allow feminist arguments to inform the discussion.
Taking an approach based on formal semantics and computational psycholinguistics, Dr Heather Burnett (CNRS – Université Diderot Paris) presented the results of a study investigating the overuse of masculine pronouns in English and French. The talk ranged across numerous topics, including the way in which dominance relations can affect similarity judgments, making them no longer commutative. For instance, in male dominated professions women are likely to be considered similar to men, but in female dominated professions, men are unlikely to be considered similar to women. These asymmetries have implications for language use.
In his talk, Dr Dirk Hovy (Bocconi University) focused on the relation between gender and syntactic choices. He concluded, for instance, that how people identify in terms of gender (subconsciously) determines syntactic constructions they use in language (e.g., women use intensifiers [e.g., ‘very’] more often, while men use downtoners [e.g., ‘a bit’] more often). On the basis of a study of Wall Street Journal articles, he also notes that training data consisting mostly of female writing would in fact be beneficial for both men and women as women’s writing has shown to be more diverse overall. The importance of the corpora used for AICT research was emphasised repeatedly. Any linguistic corpus is a sample of a language, but it is also a sample of a particular demographic (or set of demographics).
Dr Ruth Page (University of Birmingham) discussed ‘Ugliness’ on Instagram and how the perception and representation of ‘ugly’ images on social media relate to identity and gender. She took a multimodal approach combining image and discourse analysis. Her research indicates that perceptions and discourses of ugliness are shifting on social media and, particularly, that users distinguish between playful and ironic illustrations of ugliness (using the hashtag #uglyselfie) and painful, negative posts (#ugly). While ‘ugly’ is much more frequent in relation to girls than boys, the opposite is true for man/woman. She also showed that males favour self-deprecation more, whereas women are more likely to use self-mockery.
In her talk, Dr Stefanie Ullmann (University of Cambridge) presented a corpus study of representations of gender in the OPUS English-German parallel data. She showed that the data sets are strongly biased towards male forms, particularly in German occupation words. The results of her study also indicate that representations of men and women reflect traditional gender-related stereotypes, such as doctors are male, nurses are female or women are caretakers, men are dominant and powerful. Using such clearly skewed texts as training data for machine translation inevitably leads to biased results and errors in translation.
Finally, Dr Dong Nguyen (Alan Turing Institute, University of Utrecht) took a computational sociolinguistic perspective on the relation between language and gender. She presented the results of an experimental study in which a system (TweetGenie) had been trained to predict gender and age of people based on tweets they had written. She showed how speakers construct their own identity linguistically, and this process involves the gendered aspects of their language. Consequently, gender as manifest in written texts is fluid and variable, rather than something biological and fixed.
The workshop ended with a roundtable discussion involving all speakers, which gave the very engaged and interested audience a final chance to ask questions. It also provided an opportunity for the speakers, after hearing each other’s talks, to reconsider some core issues and discuss overarching themes and issues in more detail. One notable conclusion from the discussion was that all participants had similarly experienced the difficulty of addressing and representing non-binary gender notions in their research. It was observed that technology tends to impose binary gender with very little to no data available for analysis on other forms of gender-identification.
The workshop demonstrated the great and acute contemporary relevance of the topic of gender in relation to language-based AI. The engaged participation of the audience, which included representatives from several tech companies, emphasised the importance of this issue when seeking to understand the social impact of language-based AI systems.
The views, thoughts and opinions expressed on the CRASSH blog belong solely to the authors and do not necessarily represent the views of CRASSH or the University of Cambridge.