Publication:
Gender Bias in Machine Translation: Improving Gendered Disambiguation of Pronouns Using Context-Aware Translation

No Thumbnail Available

Date

2019-08-23

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Xu, Susan. 2019. Gender Bias in Machine Translation: Improving Gendered Disambiguation of Pronouns Using Context-Aware Translation. Bachelor's thesis, Harvard College.

Research Data

Abstract

We examine gender bias in machine translation in the context of disambiguating genderless pronouns to gendered ones. For example, Turkish uses the genderless pronoun ‘o’, which can be translated to ‘he’ or ‘she’. We argue that context-aware translation, a type of machine learning model that incorporates more context from the text than current models, is a possible method of improving this disambiguation. We present a task to quantify how gender-biased one translation model is compared to another by applying a translation model in a classification setting to disambiguate a genderless pronoun into a gendered one. We define a ‘parity score’, given as the harmonic mean of the class-conditional accuracies. We find that a baseline model is able to achieve relatively high translation quality (31 BLEU) but fails the classification task (parity = 0). We then experiment with using context-aware translation to improve performance on this task while still attaining a high level of translation quality. We compare the use of a context-aware translation model to standard debiasing methods in natural language processing and find that balancing the dataset to have an equal frequency of male and female pronouns in combination with a context-aware model yields the best results on our task (parity = .42).

Description

Other Available Sources

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories