Publication:
Device, Circuit, and Algorithm Co-Design for Efficient Neural Network Inference in Hardware

No Thumbnail Available

Date

2023-11-21

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Ma, Siming. 2023. Device, Circuit, and Algorithm Co-Design for Efficient Neural Network Inference in Hardware. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Artificial neural networks (ANN) using machine learning algorithms have achieved amazing performance in many artificial intelligent tasks. However, the hardware implementations of these ANNs are faced with two major challenges: a memory bottleneck and a von Neumann architecture bottleneck. This thesis pursues co-design opportunities across the fields of devices, circuits, and algorithms for efficient implementations of hardware ANN inference, targeting low-cost low-power always-on applications in edge devices. First, I present a novel fully-CMOS multi-level-cell embedded nonvolatile memory device technology (CMOS-MLC eNVM). Experimental results in both 16nm FinFET and 28nm planar processes have achieved high density, low power, low cost, and robust retention of MLC nonvolatile storage, showing these devices' potential for solving the memory bottleneck of hardware ANN inference. I also design a novel ADC/DAC-free processing-in-memory (PIM) architecture co-designed with an effective training algorithm for binary-activations and multi-level weight (BA-MLW) ANNs. This new ANN algorithm uses binary activations, that enable efficient PIM implementations for solving the von Neumann architecture bottleneck without introducing the costly ADCs and DACs which were plaguing previous PIM architecture implementations; meanwhile, the multi-level weights take full advantage of MLC nonvolatile storage devices. I implement the novel PIM architecture with the new CMOS-MLC eNVM in 16nm FinFET for a trigger word detection accelerator, to showcase a low-cost internet-of-things (IoT) application example using hardware ANN inference. Chip testing results show high inference accuracy and superior performance in area, cost, and energy compared with alternative designs. I also discuss the generalizability of the novel PIM architecture, as well as promising future research directions using co-design techniques for efficient ANN inference in hardware.

Description

Other Available Sources

Keywords

Artificial Neural Networks, Embedded Non Volatile Memory, Hardware-Software Co-Design, Hot Carrier Injection, Processing In Memory, Electrical engineering

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories