Publication: Early-Stage Non-Conventional Hardware Accelerator Discovery via Optimization Methods and Compiler Analysis
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
In the post-Moore era, where we witness the diminishing returns of traditional transistor scaling, a pivotal transition in accelerator design methodologies has been necessitated to continually enhance power, performance, and area (PPA) characteristics. To this end, High-Level Synthesis (HLS) tools have emerged as a prominent solution, effectively translating high-level programs into specific Register-Transfer Level (RTL) implementations. These tools offer a multitude of implementation pathways, thereby accommodating the escalating demand for PPA optimization.
Concurrently, the development of advanced design space exploration (DSE) utilities has significantly refined the simulation of hardware-software co-designed systems, delivering high-fidelity point solutions without the complexity of HLS toolchains or their accompanying simulators. However, these late-stage DSE tools are not without their shortcomings; they impose a substantial manual burden on the developer to dissect the application. This task demands a nuanced understanding of computational granularity, partitioning overheads, and opportunities for reusing circuitry—insights not readily furnished by existing tools.
The prevailing trajectory in this field is to escalate the abstraction level at which designers can evaluate system configurations, thereby reducing reliance on HLS and late-stage DSE tools and enhancing automation. At the research frontier, we probe whether it is possible to derive insights into potential accelerator designs, hardware-software partitioning, and circuit reuse directly from the source code of applications coded in high-level languages like C++, supplemented by profiling data.
In this thesis, we identify the limitations and opportunities available at this level of abstraction. We discover coarse-grained patterns that help design area and energy-efficient accelerators and partitioning schemes aware of the function call-graph hierarchies. We identify sequence alignment techniques, mixed integer linear programming, and machine learning for systems as great methods to assist the next generation of system-design tools. We observe that related work does not analyze the impact of hardware models in the accelerator selection or pattern detection problems. Our analysis in these areas allows us to create tools better at selecting energy, area, and latency-efficient accelerators. Our approach allows extracting this valuable information in the earliest design stages with low error, with close-to-global optimum designs.