Publication: Minimizing NUMA Effects on Machine Learning Workloads in Virtualized Environments
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
This thesis is an investigation into the performance ramifications of making specialized component reservations for machine learning workloads in virtualized environments. The reliance of machine learning on floating-point operations makes graphics processing units an important part of processing these workloads quickly. While virtualization is one of the most widely-used consolidation techniques used in data centers of all sizes, compatibility issues between graphics processing units and virtualization have slowed the adoption of virtualization for machine learning workloads. To that end, this paper discusses the motivations and history behind virtualization and the application-specific acceleration devices used and how they are applied to machine learning on various public and private computing platforms. This is followed by a presentation of an experimental framework for testing the impact of controlling for non-uniform memory access when running machine learning workloads. Using this framework, a series of experiments were performed and documented in this thesis that test multiple placement configurations for graphics processing units in a virtualized system and how throughput of data from the host system to the device was affected. Current virtualization platforms offer recommendations to use these settings, but do not talk about the specific impacts of implementing them. Based on the results of the experiments, configuration parameters and placement recommendations are presented along with information about how these settings can help optimize the machine learning pipeline and the potential pitfalls to their use.