Publication:
Systolic Architectures for Efficient Deep Neural Network Implementations with Assured Performance

No Thumbnail Available

Date

2022-01-24

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Zhang, Sai Qian. 2021. Systolic Architectures for Efficient Deep Neural Network Implementations with Assured Performance. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Deep neural networks (DNNs) have gained major interests in a variety of application domains including computer vision, natural language processing, autonomous driving, etc. While DNNs deliver state-of-the-art accuracies on many AI tasks, they comes at the cost of high computational complexity. Accordingly, designing efficient hardware architectures for DNNs is an important step towards enabling the wide deployment of DNNs in the modern AI system. Although DNNs have been developed to solve a variety of problems, the major arithmetic operations involved in these DNNs are almost identical: a series of matrix multiplications between the inputs and the DNN weights. Therefore, an efficient hardware architecture for supporting matrix operations plays a key role in DNN implementations. Systolic arrays are widely known to be appealing for matrix multiplication operation due to their regular layout of processing elements, efficient interprocessor communication, and reduced memory access. This has led to numerous commercial DNN accelerators using systolic arrays for matrix multiplication (e.g., Google Tensor Processing Unit). In this thesis, we explore innovative systolic architectures that support DNN executions at low cost. We leverage the recent advances in DNN pruning (removing superfluous parameters from DNNs) and DNN quantization (converting the high-precision DNN parameters into low-precision parameters), and describe (1) techniques to overcome the limitation of systolic arrays in addressing sparse matrices of irregular sparsity structures and in addressing load balancing over systolic cells, and (2) new uses of systolic arrays in DNNs. By evaluating the novel systolic architectures with both~\textit{Application-specific integrated circuit (ASIC)} and~\textit{field-programmable gate array (FPGA)}, we demonstrate the superior performance of our systolic designs in terms of both processing latency and energy efficiency across multiple DNN models and datasets. In addition to the evaluation results, we provide detailed justification for the reasons behind the assured performance of our proposed solution.

Description

Other Available Sources

Keywords

Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories