Publication: Geometric Methods for Quantitative Analysis of Romance Languages
No Thumbnail Available
Open/View Files
Date
2024-11-26
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
McDonald, Patrick William. 2024. Geometric Methods for Quantitative Analysis of Romance Languages. Bachelor's thesis, Harvard University Engineering and Applied Sciences.
Research Data
Abstract
Previous work has introduced various quantitative methods to investigate the historical and/or phonetic interrelation of languages and their speakers. Additionally, environments such as hyperbolic space have been found (both theoretically and empirically) to be conducive to representing hierarchically-structured datasets, such as phylogenetic cell data. This thesis tests the suitability of hyperbolic space for representing pronunciation data from several Romance languages, a linguistic family that apparently developed per a hierarchical structure – i.e., one where modern languages are interrelated via tree-like descent from common ancestors. The thesis involves Python implementations of a.) a pipeline that transforms audio files into workable mathematical objects and b.) baseline methods for the aggregation and analysis of this speech data with respect to language-wise covariance structures. We then outline a framework for analyzing the speech data in a hyperbolic setting, whose performance we compare to that of the baseline methods on the tasks of a.) language space reconstruction and b.) interspeaker interpolation. We find that with proper hyperparameter tuning, the Poincaré disk model of hyperbolic geometry is indeed capable of representing the language space and speaker interrelations apparent in our Romance language dataset, suggesting that the hyperbolic setting could be a promising quantitative framework for future linguistic analysis.
Description
Other Available Sources
Keywords
Applied mathematics, Computer science, Linguistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service