Gender Bias in Machine Translation: Improving Gendered Disambiguation of Pronouns Using Context-Aware Translation
Citation
Xu, Susan. 2019. Gender Bias in Machine Translation: Improving Gendered Disambiguation of Pronouns Using Context-Aware Translation. Bachelor's thesis, Harvard College.Abstract
We examine gender bias in machine translation in the context of disambiguating genderless pronouns to gendered ones. For example, Turkish uses the genderless pronoun ‘o’, which can be translated to ‘he’ or ‘she’. We argue that context-aware translation, a type of machine learning model that incorporates more context from the text than current models, is a possible method of improving this disambiguation.We present a task to quantify how gender-biased one translation model is compared to another by applying a translation model in a classification setting to disambiguate a genderless pronoun into a gendered one. We define a ‘parity score’, given as the harmonic mean of the class-conditional accuracies.
We find that a baseline model is able to achieve relatively high translation quality (31 BLEU) but fails the classification task (parity = 0). We then experiment with using context-aware translation to improve performance on this task while still attaining a high level of translation quality. We compare the use of a context-aware translation model to standard debiasing methods in natural language processing and find that balancing the dataset to have an equal frequency of male and female pronouns in combination with a context-aware model yields the best results on our task (parity = .42).
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364657
Collections
- FAS Theses and Dissertations [6847]
Contact administrator regarding this item (to report mistakes or request changes)