On the Design and Optimization of Specialized Hardware With Applications in Deep Learning
MetadataShow full item record
CitationReagen, Brandon. 2018. On the Design and Optimization of Specialized Hardware With Applications in Deep Learning. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractTraditional computer architecture no longer offers sufficient performance scaling.
As power now bounds the capabilities of integrated circuits, new architectures and mechanisms are needed to continue to improve performance in light of strict power budgets.
Over the course of my PhD, I have been proposing methods, paradigms, and tools to build specialized hardware accelerators. Accelerators offer orders of magnitude power-efficiency and performance gains but pose a set of four research challenges.
The four challenges associated with accelerator centric computing are:design, optimization, integration/programmability, and workload selection.In the first half of my PhD these are the problems I sought to understand and make contributions in.
I found that high-level synthesis tools can substantially reduce design cost but expose a vast optimization/design space. Next, I collaborated to develop the Aladdin accelerator simulator framework to rapidly conduct design space exploration and identify optimal designs.
I then built three accelerators for the RoboBee brain SoC, these were integrated as slave bus ports and used explicitly managed shared scratch-pad memories.
Finally, I developed the accepted accelerator benchmark suite (MachSuite) for accelerator design tools and accelerator centric system studies.
In the second half of my PhD I worked to apply everything I had learned about low-power, efficient accelerator design to deep learning. I set out working with the goal to enable truly ubiquitous DNN inference execution.While DNNs were being used everywhere, due to the limited resources of constrained devices many relied on the cloud for execution.
For me, the most fascinating aspect of accelerating DNNs was the need for creative solutions. DNNs are composed of repeated vector-matrix and convolution kernels.To find the limit of power-efficient inference, intricate co-designs between the unique algorithmic properties of DNN and the circuits they execute on were needed.
The most notable aspect I exploited was the implicit resiliency of DNNs, namely in their weights,to ease correctness requirements during execution to improve efficiency. I began with the Minerva project to reduce the power budget of DNN accelerators by 8x without compromising accuracy. When concluding Minerva, it was clear there was more work needed to understand the relationship between faults and DNN structures.
To quantify these relationships I built Ares, a fault-injection framework for DNN inference. Another related problem facing ubiquitous inference is model distribution, as weights can be large. To solve this problem I developed a novel, lossy weight encoding technique named Weightless that was experimentally able to compress a layer's weights by 496x.
Finally, to conclude my PhD I built Aergia: a sparse-vector sparse-matrix accelerator for processing bi-directional recurrent networks targeting speech recognition and translation tasks.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:41129181
- FAS Theses and Dissertations