Publication: Gender Bias in Machine Translation: Improving Gendered Disambiguation of Pronouns Using Context-Aware Translation
No Thumbnail Available
Open/View Files
Date
2019-08-23
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Xu, Susan. 2019. Gender Bias in Machine Translation: Improving Gendered Disambiguation of Pronouns Using Context-Aware Translation. Bachelor's thesis, Harvard College.
Research Data
Abstract
We examine gender bias in machine translation in the context of disambiguating genderless pronouns to gendered ones. For example, Turkish uses the genderless pronoun ‘o’, which can be translated to ‘he’ or ‘she’. We argue that context-aware translation, a type of machine learning model that incorporates more context from the text than current models, is a possible method of improving this disambiguation.
We present a task to quantify how gender-biased one translation model is compared to another by applying a translation model in a classification setting to disambiguate a genderless pronoun into a gendered one. We define a ‘parity score’, given as the harmonic mean of the class-conditional accuracies.
We find that a baseline model is able to achieve relatively high translation quality (31 BLEU) but fails the classification task (parity = 0). We then experiment with using context-aware translation to improve performance on this task while still attaining a high level of translation quality. We compare the use of a context-aware translation model to standard debiasing methods in natural language processing and find that balancing the dataset to have an equal frequency of male and female pronouns in combination with a context-aware model yields the best results on our task (parity = .42).
Description
Other Available Sources
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service