Publication:
Challenges in Data-to-Document Generation

Thumbnail Image

Date

2017

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

Association for Computational Linguistics
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Wiseman, Sam, Stuart M. Shieber and Alexander M. Rush. 2017. Challenges in Data-to-Document Generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP2017), Copenhagen, Denmark, September 7-11, 2017: 2243–2253.

Research Data

Abstract

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.

Description

Other Available Sources

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Open Access Policy Articles (OAP), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories