Publication: Enabling High Performance, Efficient, and Sustainable Deep Learning Systems At Scale
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The world has witnessed an exponential rise in AI, particularly deep learning-based engines, over the last decade. These deep learning-based AI engines form the backbone of the modern Internet, determining how we interact with technology and society on a daily basis. Deep learning engines pose a multitude of barriers to the design, development, and deployment of modern software and hardware systems. These barriers owe to unique algorithm-level requirements including high compute, memory, and storage intensity, and application-level requirements given the scale of deep learning engines. This dissertation investigates how to enable high performance, efficient, and sustainable deep learning systems at scale. The thesis first identifies deep learning-based personalized recommendation engines as the dominating consumer of AI training and inference cycles in production data centers; the high infrastructure demands not only impede efficiency but also levy high environmental costs. To tackle the unique system design challenges personalized recommendation engines pose, this thesis designs solutions across the software and hardware stack to optimize inference efficiency by jointly considering application-level characteristics, unique neural network model architectures, data-center scale implications, and the underlying hardware. Furthermore, given the rapidly growing infrastructure demands posed by AI and recommendation engines, we show that systems must go beyond performance, power, and energy efficiency to consider environmental footprint as a first order design target to enable sustainable computing. The dissertation concludes by charting paths to designing future systems that enable emerging AI-driven applications by balancing performance, efficiency, and sustainability.