Publication: Geometric Methods for Quantitative Analysis of Romance Languages
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Previous work has introduced various quantitative methods to investigate the historical and/or phonetic interrelation of languages and their speakers. Additionally, environments such as hyperbolic space have been found (both theoretically and empirically) to be conducive to representing hierarchically-structured datasets, such as phylogenetic cell data. This thesis tests the suitability of hyperbolic space for representing pronunciation data from several Romance languages, a linguistic family that apparently developed per a hierarchical structure – i.e., one where modern languages are interrelated via tree-like descent from common ancestors. The thesis involves Python implementations of a.) a pipeline that transforms audio files into workable mathematical objects and b.) baseline methods for the aggregation and analysis of this speech data with respect to language-wise covariance structures. We then outline a framework for analyzing the speech data in a hyperbolic setting, whose performance we compare to that of the baseline methods on the tasks of a.) language space reconstruction and b.) interspeaker interpolation. We find that with proper hyperparameter tuning, the Poincaré disk model of hyperbolic geometry is indeed capable of representing the language space and speaker interrelations apparent in our Romance language dataset, suggesting that the hyperbolic setting could be a promising quantitative framework for future linguistic analysis.