Publication: Applications of Machine Learning for Modeling Sound Symbolic Systems in Japanese and Korean Ideophones
No Thumbnail Available
Open/View Files
Date
2024-06-12
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Fernandes, Jacob Jared. 2024. Applications of Machine Learning for Modeling Sound Symbolic Systems in Japanese and Korean Ideophones. Bachelor's thesis, Harvard University Engineering and Applied Sciences.
Research Data
Abstract
Ideophones, or ‘onomatopoeia’, are a special class of adverbial modifiers present in Japanese, Korean, and many other languages across Asia and Sub-Saharan Africa that are strongly associated with imagery and certain audio-visual and other sensory traits. Ideophones are an oft neglected linguistic anomaly with research largely restricted to the kind of phonesthetic phenomena in English and other Indo-European languages (Azari and Sharififar 2017).
Yet, evidence suggests that ideophones play a large role in the discourse of Japanese, Korean, and many other languages (ibid.). As such, research into ideophones has been of particular focus in Japanese linguists, with Hirose (1981) and Hamano(1998) publishing landmark research on ideophones, although much of the existing research into ideophones has been restricted to their typology (Kita 1997). This study expands on Hamano’s sound-symbolic system, investigating the links between sound and meaning for mimetics and their significant verbal collocations, with particular reference to Akita and Usuki (2016)’s finding that the semantic content of ideophones are strongly connected to complement information. This connection, along with the promise of Frame Semantics of complements as an effective way of evaluating adverbs in Natural Language Processing (Nikolaev et al. 2023), suggests that a model correlating the phonological form of the mimetic with the semantic frame of significant collocations may lead to insights into the semantic behavior of ideophones.
Using three classifiers—a Naïve Bayes, Random Forest, and Recurrent Neural Network (RNN)— I construct a model trained on distinctive featural representations of the phonological form on a large set of mimetics across a variety of vocabulary sources to predict the semantic content of verbal collocations in the form of Semantic Frames, assisted by FrameNet as used by Akita (2013). I found that vowels in particular had a significant effect on the semantic frame of a complement, with [±High] as the most significant feature in both Korean and Japanese mimetics. While these models particularly struggled with the polysemy present in Korean and Japanese mimetics and verbal elements, this thesis found that Random Forest classifiers in particular are very effective learners of the relationships between sound and meaning. I found further that mimetic adverbials indeed performed better than their non-mimetic adverbial counterparts, indicating real iconicity in the so-called ideophones that is not present in regular manner adverbials in either language
Description
Other Available Sources
Keywords
Classification, Ideophones, Japanese, Korean, NLP, Sound-Symbolism, Linguistics, Computer science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service