Person:
Yamangil, Elif

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Yamangil

First Name

Elif

Name

Yamangil, Elif

Search Results

Now showing 1 - 7 of 7
  • Thumbnail Image
    Publication
    Rich Linguistic Structure from Large-Scale Web Data
    (2013-10-18) Yamangil, Elif; Shieber, Stuart M.; Shieber, Stuart; Grosz, Barbara; Adams, Ryan
    The past two decades have shown an unexpected effectiveness of Web-scale data in natural language processing. Even the simplest models, when paired with unprecedented amounts of unstructured and unlabeled Web data, have been shown to outperform sophisticated ones. It has been argued that the effectiveness of Web-scale data has undermined the necessity of sophisticated modeling or laborious data set curation. In this thesis, we argue for and illustrate an alternative view, that Web-scale data not only serves to improve the performance of simple models, but also can allow the use of qualitatively more sophisticated models that would not be deployable otherwise, leading to even further performance gains.
  • Thumbnail Image
    Publication
    Correction Detection and Error Type Selection as an ESL Educational Aid
    (Association for Computational Linguistics, 2012) Swanson, Ben; Yamangil, Elif
    We present a classifier that discriminates between types of corrections made by teachers of English in student essays. We define a set of linguistically motivated feature templates for a log-linear classification model, train this classifier on sentence pairs extracted from the Cambridge Learner Corpus, and achieve 89% accuracy improving upon a 33% baseline. Furthermore, we incorporate our classifier into a novel application that takes as input a set of corrected essays that have been sentence aligned with their originals and outputs the individual corrections classified by error type. We report the F-Score of our implementation on this task.
  • Thumbnail Image
    Publication
    Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars
    (Association for Computational Linguistics, 2013) Yamangil, Elif; Shieber, Stuart
    In the line of research extending statistical parsing to more expressive grammar formalisms, we demonstrate for the first time the use of tree-adjoining grammars (TAG). We present a Bayesian nonparametric model for estimating a probabilistic TAG from a parsed corpus, along with novel block sampling methods and approximation transformations for TAG that allow efficient parsing. Our work shows performance improvements on the Penn Treebank and finds more compact yet linguistically rich representations of the data, but more importantly provides techniques in grammar transformation and statistical inference that make practical the use of these more expressive systems, thereby enabling further experimentation along these lines.
  • Thumbnail Image
    Publication
    Estimating Compact Yet Rich Tree Insertion Grammars
    (Association for Computational Linguistics, 2012) Yamangil, Elif; Shieber, Stuart
    We present a Bayesian nonparametric model for estimating tree insertion grammars (TIG), building upon recent work in Bayesian inference of tree substitution grammars (TSG) via Dirichlet processes. Under our general variant of TIG, grammars are estimated via the Metropolis-Hastings algorithm that uses a context free grammar transformation as a proposal, which allows for cubic-time string parsing as well as tree-wide joint sampling of derivations in the spirit of Cohn and Blunsom (2010). We use the Penn treebank for our experiments and find that our proposal Bayesian TIG model not only has competitive parsing performance but also finds compact yet linguistically rich TIG representations of the data.
  • Thumbnail Image
    Publication
    Mining Wikipedia's Article Revision History for Training Computational Linguistics Algorithms
    (AAAI Press, 2008) Nelken, Rani; Yamangil, Elif
    We present a novel paradigm for obtaining large amounts of training data for computational linguistics tasks by mining Wikipedia's article revision history. By comparing adjacent versions of the same article, we extract voluminous training data for tasks for which data is usually scarce or costly to obtain. We illustrate this paradigm by applying it to three separate text processing tasks at various levels of linguistic granularity. We first apply this approach to the collection of textual errors and their correction, focusing on the specific type of lexical errors known as "eggcorns''. Second, moving up to the sentential level, we show how to mine Wikipedia revisions for training sentence compression algorithms. By dramatically increasing the size of the available training data, we are able to create more discerning lexicalized models, providing improved compression results. Finally, moving up to the document level, we present some preliminary ideas on how to use the Wikipedia data to bootstrap text summarization systems. We propose to use a sentence's persistence throughout a document's evolution as an indicator of its fitness as part of an extractive summary.
  • Thumbnail Image
    Publication
    A Context Free TAG Variant
    (Association for Computational Linguistics, 2013) Swanson, Ben; Yamangil, Elif; Charniak, Eugene; Shieber, Stuart
  • Thumbnail Image
    Publication
    Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
    (Association for Computational Linguistics, 2010) Yamangil, Elif; Shieber, Stuart
    We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.