Publication:

Neural Network-Aided Audio Processing for Automated Vocal Coaching

Loading...
Thumbnail Image

Date

2019-08-23

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Craigo, Ethan T. 2019. Neural Network-Aided Audio Processing for Automated Vocal Coaching. Bachelor's thesis, Harvard College.

Abstract

Research is conducted assessing the feasibility of using neural networks to detect features of the singing voice both idiomatic and unidiomatic to the Western classical singing tradition. This is done in the context of a hypothetical “automated vocal coach” capable of providing singing voice advice independently of any human agent. Data are presented focusing on four criteria of Western classical singing: phonation, laryngeal registration, resonance management, and vibrato. Literature reviews are also presented on all of these criteria to determine their physiology, acoustical properties, and strategies in vocal pedagogy. Networks are binary and ternary classifiers that apply convolution to mel-scaled spectrograms of the isolated singing voice and make judgments based on resulting features. Training results are promising overall but are severely hampered by the absence of large and contrastive datasets illustrating these criteria. Unsurprisingly, factors of singing most visible on a spectrogram are easiest for networks to distinguish. An automated vocal coach does not appear impossible to build, but its constituent networks require much more data collection to be useful in practice.

Description

Other Available Sources

Research Data

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories