Publication: Hebrew Transformed: Machine Translation of Hebrew Using the Transformer Architecture
No Thumbnail Available
Open/View Files
Date
2022-02-07
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Crater, David T. 2022. Hebrew Transformed: Machine Translation of Hebrew Using the Transformer Architecture. Master's thesis, Harvard University Division of Continuing Education.
Research Data
Abstract
This thesis presents the first known end-to-end application to Hebrew language of Google’s state-of-the-art Transformer architecture for natural language processing (NLP). The state of the art in machine translation (MT) of Hebrew remains poor. Scholarly work in MT, deep learning (DL), and other areas of NLP for Hebrew began to develop much later and remains much less mature than for other languages. The problem is difficult because of the nature of Hebrew as a morphologically-rich language (MRL), the small size of the total corpus of electronic Hebrew documents available as training material, and the small size of the Hebrew-literate computing community worldwide. Nonetheless, significant advances in Hebrew NLP tools, data, methods, and scholarly infrastructure over the last 15 years, combined with recent advances in general NLP and MT over the last few years, especially the rise of neural networks and deep learning, create an enticing opportunity to attempt to advance the current state of Hebrew MT. More specifically, Google’s Transformer neural network and associated technologies such as bidirectional encoder representations from Transformers (BERT) have revolutionized general MT and hold great promise for improving automatic Hebrew translation. This thesis demonstrates that, as measured by METEOR scores, a basic Hebrew Transformer trained in a few hours on a single GPU (graphics processing unit) exceeds the current performance of Google Translate on in-genre Hebrew translation tasks and is not far behind Google Translate on Hebrew translation tasks in general.
Description
Other Available Sources
Keywords
bidirectional encoder representations from transformers (BERT), computational linguistics, hebrew, machine translation, natural language processing (NLP), transformer, Computer science, Artificial intelligence, Linguistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service