Publication: A learning approach to improving sentence-level MT evaluation
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The problem of evaluating machine translation (MT) systems is more challenging than it may first appear, as diverse translations can often be considered equally correct. The task is even more difficult when practical circumstances require that evaluation be done automatically over short texts, for instance, during incremental system development and error analysis. While several automatic metrics, such as BLEU, have been proposed and adopted for largescale MT system discrimination, they all fail to achieve satisfactory levels of correlation with human judgments at the sentence level. Here, a new class of metrics based on machine learning is introduced. A novel method involving classifying translations as machine or humanproduced rather than directly predicting numerical human judgments eliminates the need for labor-intensive user studies as a source of training data. The resulting metric, based on support vector machines, is shown to significantly improve upon current automatic metrics, increasing correlation with human judgments at the sentence level halfway toward that achieved by an independent human evaluator.