Publication: Systolic Architectures for Efficient Deep Neural Network Implementations with Assured Performance
No Thumbnail Available
Open/View Files
Date
2022-01-24
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Zhang, Sai Qian. 2021. Systolic Architectures for Efficient Deep Neural Network Implementations with Assured Performance. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Deep neural networks (DNNs) have gained major interests in a variety of application domains including computer vision, natural language processing, autonomous driving, etc. While DNNs deliver state-of-the-art accuracies on many AI tasks, they comes at the cost of high computational complexity. Accordingly, designing efficient hardware architectures for DNNs is an important step towards enabling the wide deployment of DNNs in the modern AI system. Although DNNs have been developed to solve a variety of problems, the major arithmetic operations involved in these DNNs are almost identical: a series of matrix multiplications between the inputs and the DNN weights. Therefore, an efficient hardware architecture for supporting matrix operations plays a key role in DNN implementations.
Systolic arrays are widely known to be appealing for matrix multiplication operation due to their regular layout of processing elements, efficient interprocessor communication, and reduced memory access. This has led to numerous commercial DNN accelerators using systolic arrays for matrix multiplication (e.g., Google Tensor Processing Unit).
In this thesis, we explore innovative systolic architectures that support DNN executions at low cost.
We leverage the recent advances in DNN pruning (removing superfluous parameters from DNNs) and DNN quantization (converting the high-precision DNN parameters into low-precision parameters), and describe (1) techniques to overcome the limitation of systolic arrays in addressing sparse matrices of irregular sparsity structures and in addressing load balancing over systolic cells, and (2) new uses of systolic arrays in DNNs.
By evaluating the novel systolic architectures with both~\textit{Application-specific integrated circuit (ASIC)} and~\textit{field-programmable gate array (FPGA)}, we demonstrate the superior performance of our systolic designs in terms of both processing latency and energy efficiency across multiple DNN models and datasets. In addition to the evaluation results, we provide detailed justification for the reasons behind the assured performance of our proposed solution.
Description
Other Available Sources
Keywords
Computer science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service