Publication:

Advancing System-Level Analysis and Design of Specialized Architectures

Loading...
Thumbnail Image

Date

2018-08-31

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Xi, Likun. 2018. Advancing System-Level Analysis and Design of Specialized Architectures. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Abstract

Over the course of the past decade, computation has increasingly spread to the cloud and mobile devices. With the growing computation demands placed by contemporary cloud and mobile workloads, architects have increasingly turned to hardware specialization. Once the niche of standardized computation like video decoding and audio processing, hardware accelerators have now expanded into many more fields. Fueled by the recent explosion of demand for deep neural networks, a huge amount of effort has been poured into advancing state-of-the-art accelerator architectures and designs. However, current research into specialized hardware often overlooks evaluating the complete system, and this can often lead to non-optimal designs. For example, a large fraction of the total power consumed by a system-on-chip (SoC) is actually due to CPUs, but the accuracy of widely used CPU power models have not been thoroughly validated. Second, while we know how to design efficient accelerators in isolation, we have less understanding of how SoC integration impacts their performance and power. In addition, we have not explored how we can leverage SoC-accelerator interfaces to improve efficiency. Finally, architects have mostly explored “deep” acceleration, which focuses on compute-heavy workloads with hot functions, but we have largely ignored “broad” acceleration, which aims to accelerate common low-level routines present across a diverse set of workloads. This dissertation presents the case for a holistic approach to accelerator design that accounts for the surrounding system’s constraints, both for “deep” acceleration and “broad” acceleration. First, it presents a comprehensive validation of McPAT, a widely used CPU power model, with a quantitative analysis of its sources of error. Second, it presents gem5-Aladdin, an complete SoC simulator that can model complex specialized SoCs and can run end-to-end accelerated workloads without the need to write any RTL. Third, this dissertation shows how considerations of system-level effects and SoC interfacing during accelerator design can dramatically improve its overall efficiency, with a deep dive into accelerating deep neural networks and vision pipelines. Finally, it leverages recent work in datacenter system-wide profiling to make a case for broad acceleration. It presents the design of an accelerator for dynamic memory allocation, a widely used programming paradigm that accounts for a significant fraction of total CPU cycles in a major cloud provider’s datacenters. The work presented in this dissertation identifies both challenges and opportunities for extracting maximum performance from acceleration at the system level, both for traditional deep acceleration and broad acceleration in the cloud. We hope it will stimulate more interest and spur further research and development for holistic accelerator design.

Description

Other Available Sources

Research Data

Keywords

accelerator, aladdin, modeling, specialized architectures, system, holistic, soc, dnn, broad acceleration

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories