Publication: Device, Circuit, and Algorithm Co-Design for Efficient Neural Network Inference in Hardware
No Thumbnail Available
Date
2023-11-21
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Ma, Siming. 2023. Device, Circuit, and Algorithm Co-Design for Efficient Neural Network Inference in Hardware. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Artificial neural networks (ANN) using machine learning algorithms have achieved amazing performance in many artificial intelligent tasks. However, the hardware implementations of these ANNs are faced with two major challenges: a memory bottleneck and a von Neumann architecture bottleneck. This thesis pursues co-design opportunities across the fields of devices, circuits, and algorithms for efficient implementations of hardware ANN inference, targeting low-cost low-power always-on applications in edge devices. First, I present a novel fully-CMOS multi-level-cell embedded nonvolatile memory device technology (CMOS-MLC eNVM). Experimental results in both 16nm FinFET and 28nm planar processes have achieved high density, low power, low cost, and robust retention of MLC nonvolatile storage, showing these devices' potential for solving the memory bottleneck of hardware ANN inference. I also design a novel ADC/DAC-free processing-in-memory (PIM) architecture co-designed with an effective training algorithm for binary-activations and multi-level weight (BA-MLW) ANNs. This new ANN algorithm uses binary activations, that enable efficient PIM implementations for solving the von Neumann architecture bottleneck without introducing the costly ADCs and DACs which were plaguing previous PIM architecture implementations; meanwhile, the multi-level weights take full advantage of MLC nonvolatile storage devices. I implement the novel PIM architecture with the new CMOS-MLC eNVM in 16nm FinFET for a trigger word detection accelerator, to showcase a low-cost internet-of-things (IoT) application example using hardware ANN inference. Chip testing results show high inference accuracy and superior performance in area, cost, and energy compared with alternative designs. I also discuss the generalizability of the novel PIM architecture, as well as promising future research directions using co-design techniques for efficient ANN inference in hardware.
Description
Other Available Sources
Keywords
Artificial Neural Networks, Embedded Non Volatile Memory, Hardware-Software Co-Design, Hot Carrier Injection, Processing In Memory, Electrical engineering
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service