Gender Bias in Machine Translation: Improving Gendered Disambiguation of Pronouns Using Context-Aware Translation
CitationXu, Susan. 2019. Gender Bias in Machine Translation: Improving Gendered Disambiguation of Pronouns Using Context-Aware Translation. Bachelor's thesis, Harvard College.
AbstractWe examine gender bias in machine translation in the context of disambiguating genderless pronouns to gendered ones. For example, Turkish uses the genderless pronoun ‘o’, which can be translated to ‘he’ or ‘she’. We argue that context-aware translation, a type of machine learning model that incorporates more context from the text than current models, is a possible method of improving this disambiguation.
We present a task to quantify how gender-biased one translation model is compared to another by applying a translation model in a classification setting to disambiguate a genderless pronoun into a gendered one. We define a ‘parity score’, given as the harmonic mean of the class-conditional accuracies.
We find that a baseline model is able to achieve relatively high translation quality (31 BLEU) but fails the classification task (parity = 0). We then experiment with using context-aware translation to improve performance on this task while still attaining a high level of translation quality. We compare the use of a context-aware translation model to standard debiasing methods in natural language processing and find that balancing the dataset to have an equal frequency of male and female pronouns in combination with a context-aware model yields the best results on our task (parity = .42).
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364657
- FAS Theses and Dissertations