Publication: Deep Learning for Music Composition: Generation, Recommendation and Control
No Thumbnail Available
Open/View Files
Date
2019-05-21
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Huang, Cheng-Zhi Anna. 2019. Deep Learning for Music Composition: Generation, Recommendation and Control. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
Research Data
Abstract
Technology has always helped expand the range of musical expression, from the fortepiano to synthesizers to electronic sequencers. Could machine learning further extend human creativity? We explore three ways deep learning supports the creative process: generation, recommendation, and control. Generative models can synthesize stylistic idioms, enabling artists to explore a wider palette of possibilities. Recommendation tools can assist artists in curation. Better model control helps artists stay in the creative loop. Furthermore, this control could take place at one or more musically-meaningful levels -- the score, the performance, or timbre -- or on a non-musical level, such as a subjective quality like “scary.” This dissertation posits that deep learning models designed to better match the structure of music can generate, recommend and provide control in the creative process, making music composition more accessible. I describe four projects to support this statement. AdaptiveKnobs uses Gaussian Processes to capture the nonlinear multimodal relationship between low-level sound synthesis parameters and perceived sound qualities. By using active learning, we assist sound designers in defining their own intuitive knobs by querying them on sounds that the model expects to improve the controls most. ChordRipple uses Chord2Vec to learn chord embeddings for recommending creative substitutions and a Ripple mechanism to propagate changes, allowing novices to compose more adventurous chord progressions. Music Transformer uses self-attention mechanisms to capture the self-similarity structure of music, generating coherent expressive piano music from scratch. As the model processes composition and performance as one, improvisers can play an initial motif and have the model develop it in a coherent fashion. Coconet uses convolutions to capture pitch and temporal invariance. The generative model fills in arbitrarily-partial musical scores, allowing it to perform a wide range of musical tasks. The model uses Gibbs sampling to approximate how human composers improve their music through rewriting. Recently, Coconet powered the Bach Doodle, harmonizing more than 50 million melodies composed by users. We hope machine learning can enable new ways of approaching the creative process for both novices and musicians.
Description
Other Available Sources
Keywords
Generative models for music, music and computing, computer music, music composition, artificial intelligence, machine learning, deep learning, human computer interaction
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service