Publication: Arabic diacritization using weighted finite-state transducers
Loading...
Open/View Files
Date
2005
Authors
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Association for Computational Linguistics
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Rani Nelken and Stuart M. Shieber. Arabic diacritization using weighted finite-state transducers. In Proceedings of the 2005 ACL Workshop on Computational Approaches to Semitic Languages, pages 79-86, Ann Arbor, Michigan, June 2005.
Abstract
Arabic is usually written without short vowels and additional diacritics, which are nevertheless important for several applications. We present a novel algorithm for restoring these symbols, using a cascade of probabilistic finite- state transducers trained on the Arabic treebank, integrating a word-based language model, a letter-based language model, and an extremely simple morphological model. This combination of probabilistic methods and simple linguistic information yields high levels of accuracy.
Description
Other Available Sources
Research Data
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service