Person:
Rush, Alexander Sasha

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Rush

First Name

Alexander Sasha

Name

Rush, Alexander Sasha

Search Results

Now showing 1 - 10 of 10
  • Thumbnail Image
    Publication
    Learning Neural Templates for Text Generation
    (Association for Computational Linguistics, 2018-10) Wiseman, Sam; Shieber, Stuart; Rush, Alexander Sasha
    While neural, encoder-decoder models have had significant empirical success in text generation, there remain several unaddressed problems with this style of generation. Encoder-decoder models are largely (a) uninterpretable, and (b) difficult to control in terms of their phrasing or content. This work proposes a neural generation system using a hidden semimarkov model (HSMM) decoder, which learns latent, discrete templates jointly with learning to generate. We show that this model learns useful templates, and that these templates make generation both more interpretable and controllable. Furthermore, we show that this approach scales to real data sets and achieves strong performance nearing that of encoder-decoder text generation models.
  • Publication
    Adapting Sequence Models for Sentence Correction
    (Association for Computational Linguistics, 2017) Schmaltz, Allen; Kim, Yoon; Shieber, Stuart; Rush, Alexander Sasha
    In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches. Our strongest sequence-to-sequence model improves over our strongest phrase-based statistical machine translation model, with access to the same data, by $6 M^2$ (0.5 GLEU) points. Additionally, in the data environment of the standard CoNLL-2014 setup, we demonstrate that modeling (and tuning against) diffs yields similar or better $M^2$ scores with simpler models and/or significantly less data than previous sequence-to-sequence approaches.
  • Thumbnail Image
    Publication
    Antecedent Prediction Without a Pipeline
    (Association for Computational Linguistics, 2016) Wiseman, Sam Joshua; Rush, Alexander Sasha; Goodridge, Andrew
    We consider several antecedent prediction models that use no pipelined features generated by upstream systems. Models trained in this way are interesting because they allow for side-stepping the intricacies of upstream models, and because we might expect them to generalize better to situations in which upstream features are unavailable or unreliable. Through quantitative and qualitative error analysis we identify what sorts of cases are particularly difficult for such models, and suggest some directions for further improvement.
  • Thumbnail Image
    Publication
    Learning Anaphoricity and Antecedent Ranking Features for Coreference Resolution
    (Association for Computational Linguistics, 2015) Wiseman, Sam Joshua; Rush, Alexander Sasha; Goodridge, Andrew; Weston, Jason
    We introduce a simple, non-linear mention-ranking model for coreference resolution that attempts to learn distinct feature representations for anaphoricity detection and antecedent ranking, which we encourage by pre-training on a pair of corresponding subtasks. Although we use only simple, unconjoined features, the model is able to learn useful representations, and we report the best overall score on the CoNLL 2012 English test set to date.
  • Thumbnail Image
    Publication
    Induction of Probabilistic Synchronous Tree-Insertion Grammars
    (2005) Nesson, Rebecca; Goodridge, Andrew; Rush, Alexander Sasha
    Increasingly, researchers developing statistical machine translation systems have moved to incorporate syntactic structure in the models that they induce. These researchers are motivated by the intuition that the limitations in the finite-state translation models exemplified by IBM’s “Model 5” follow from the inability to use phrasal and hierarchical information in the interlingual mapping. What is desired is a formalism that has the substitution-based hierarchical structure provided by context-free grammars, with the lexical relationship potential of n-gram models, with processing efficiency no worse than CFGs. Further, it should ideally allow for discontinuity in phrases, and be synchronizable, to allow for multilinguality. Finally, in order to support automated induction, it should allow for a probabilistic variant. We introduce probabilistic synchronous tree-insertion grammars (PSTIG) as such a formalism. In this paper, we define a restricted version of PSTIG, and provide algorithms for parsing, parameter estimation, and translation. As a proof of concept, we successfully apply these algorithms to a toy problem, corpus-based induction of a statistical translator of arithmetic expressions from postfix to partially parenthesized infix.
  • Thumbnail Image
    Publication
    Challenges in Data-to-Document Generation
    (Association for Computational Linguistics, 2017) Wiseman, Sam Joshua; Shieber, Stuart; Rush, Alexander Sasha
    Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.
  • Thumbnail Image
    Publication
    Word Ordering Without Syntax
    (Association for Computational Linguistics, 2016) Schmaltz, Allen; Rush, Alexander Sasha; Shieber, Stuart
    Recent work on word ordering has argued that syntactic structure is important, or even required, for effectively recovering the order of a sentence. We find that, in fact, an n-gram language model with a simple heuristic gives strong results on this task. Furthermore, we show that a long short-term memory (LSTM) language model is even more effective at recovering order, with our basic model outperforming a state-of-the-art syntactic model by 11.5 BLEU points. Additional data and larger beams yield further gains, at the expense of training and search time.
  • Thumbnail Image
    Publication
    Sentence-level grammatical error identification as sequence-to-sequence correction
    (Association of Computational Linguistics, 2016) Schmaltz, Allen; Kim, Yoon; Rush, Alexander Sasha; Shieber, Stuart
    We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016. The attention-based encoder-decoder models can be used for the generation of corrections, in addition to error identification, which is of interest for certain end-user applications. We show that a character-based encoder-decoder model is particularly effective, outperforming other results on the AESW Shared Task on its own, and showing gains over a word-based counterpart. Our final model— a combination of three character-based encoder-decoder models, one word-based encoder-decoder model, and a sentence-level CNN—is the highest performing system on the AESW 2016 binary prediction Shared Task.
  • Thumbnail Image
    Publication
    Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference
    (Association of Computational Linguistics, 2019-07) Belinkov, Yonatan; Poliak, Adam; Shieber, Stuart; Van Durme, Benjamin; Rush, Alexander Sasha
    Natural Language Inference (NLI) datasets often contain hypothesis-only biases—artifacts that allow models to achieve non-trivial performance without learning whether a premise entails a hypothesis. We propose two probabilistic methods to build models that are more robust to such biases and better transfer across datasets. In contrast to standard approaches to NLI, our methods predict the probability of a premise given a hypothesis and NLI label, discouraging models from ignoring the premise. We evaluate our methods on synthetic and existing NLI datasets by training on datasets containing biases and testing on datasets containing no (or different) hypothesis-only biases. Our results indicate that these methods can make NLI models more robust to dataset-specific artifacts, transferring better than a baseline architecture in 9 out of 12 NLI datasets. Additionally, we provide an extensive analysis of the interplay of our methods with known biases in NLI datasets, as well as the effects of encouraging models to ignore biases and fine-tuning on target datasets.
  • Thumbnail Image
    Publication
    On Adversarial Removal of Hypothesis-only Bias in Natural Language Inference
    (2019-06) Belinkov, Yonatan; Poliak, Adam; Shieber, Stuart; Van Durme, Benjamin; Rush, Alexander Sasha
    Popular Natural Language Inference (NLI) datasets have been shown to be tainted by hypothesis-only biases. Adversarial learning may help models ignore sensitive biases and spurious correlations in data. We evaluate whether adversarial learning can be used in NLI to encourage models to learn representa- tions free of hypothesis-only biases. Our analyses indicate that the representations learned via adversarial learning may be less biased, with only small drops in NLI accuracy.