Person:
Brooks, David

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Brooks

First Name

David

Name

Brooks, David

Search Results

Now showing 1 - 10 of 19
  • Thumbnail Image
    Publication
    Eliminating voltage emergencies via software-guided code transformations
    (Association for Computing Machinery (ACM), 2010) Reddi, Vijay Janapa; Campanoni, Simone; Gupta, Meeta S.; Smith, Michael; Wei, Gu-Yeon; Brooks, David; Hazelwood, Kim
    In recent years, circuit reliability in modern high-performance processors has become increasingly important. Shrinking feature sizes and diminishing supply voltages have made circuits more sensitive to microprocessor supply voltage fluctuations. These fluctuations result from the natural variation of processor activity as workloads execute, but when left unattended, these voltage fluctuations can lead to timing violations or even transistor lifetime issues. In this article, we present a hardware--software collaborative approach to mitigate voltage fluctuations. A checkpoint-recovery mechanism rectifies errors when voltage violates maximum tolerance settings, while a runtime software layer reschedules the program's instruction stream to prevent recurring violations at the same program location. The runtime layer, combined with the proposed code-rescheduling algorithm, removes 60% of all violations with minimal overhead, thereby significantly improving overall performance. Our solution is a radical departure from the ongoing industry-standard approach to circumvent the issue altogether by optimizing for the worst-case voltage flux, which compromises power and performance efficiency severely, especially looking ahead to future technology generations. Existing conservative approaches will have severe implications on the ability to deliver efficient microprocessors. The proposed technique reassembles a traditional reliability problem as a runtime performance optimization problem, thus allowing us to design processors for typical case operation by building intelligent algorithms that can prevent recurring violations.
  • Thumbnail Image
    Publication
    HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing.
    (Association for Computing Machinery, 2012) Campanoni, Simone; Jones, Timothy; Holloway, Glenn; Reddi, Vijay Janapa; Wei, Gu-Yeon; Brooks, David
    We describe and evaluate HELIX, a new technique for automatic loop parallelization that assigns successive iterations of a loop to separate threads. We show that the inter-thread communication costs forced by loop-carried data dependences can be mitigated by code optimization, by using an effective heuristic for selecting loops to parallelize, and by using helper threads to prefetch synchronization signals. We have implemented HELIX as part of an optimizing compiler framework that automatically selects and parallelizes loops from general sequential programs. The framework uses an analytical model of loop speedups, combined with profile data, to choose loops to parallelize. On a six-core Intel® Core i7-980X, HELIX achieves speedups averaging 2.25 x, with a maximum of 4.12x, for thirteen C benchmarks from SPEC CPU2000.
  • Thumbnail Image
    Publication
    Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling
    (IEEE, 2010) Reddi, Vijay Janapa; Kanev, Svilen; Kim, Wonyoung; Campanoni, Simone; Smith, Michael; Wei, Gu-Yeon; Brooks, David
    Parameter variations have become a dominant challenge in microprocessor design. Voltage variation is especially daunting because it happens so rapidly. We measure and characterize voltage variation in a running Intel Core2 Duo processor. By sensing on-die voltage as the processor runs single-threaded, multi-threaded, and multi-program workloads, we determine the average supply voltage swing of the processor to be only 4 percent, far from the processor's 14percent worst-case operating voltage margin. While such large margins guarantee correctness, they penalize performance and power efficiency. We investigate and quantify the benefits of designing a processor for typical-case (rather than worst-case) voltage swings, assuming that a fail-safe mechanism protects it from infrequently occurring large voltage fluctuations. With today's processors, such resilient designs could yield 15 percent to 20 percent performance improvements. But we also show that in future systems, these gains could be lost as increasing voltage swings intensify the frequency of fail-safe recoveries. After characterizing micro architectural activity that leads to voltage swings within multi-core systems, we show that a voltage-noise-aware thread scheduler in software can co-schedule phases of different programs to mitigate error recovery overheads in future resilient processor designs.
  • Thumbnail Image
    Publication
    A fully integrated battery-connected switched-capacitor 4:1 voltage regulator with 70% peak efficiency using bottom-plate charge recycling
    (2016-02-24) Tong, Tao; Zhang, Xuan; Kim, Wonyoung; Brooks, David; Wei, Gu-Yeon
    This work presents a switched-capacitor (SC) DC-DC voltage regulator that converts a 3.7V battery voltage down to ~0.8V in order to power the `brain' SoC of a flapping-wing microrobotic bee. A cascade of two 2:1 SC converters offers high efficiency for a 4:1 conversion ratio. A charge recycling technique reduces the flying capacitor's bottom-plate parasitic loss by 50% and overall conversion efficiency reaches 70%. The output droop is less than 10% of the nominal output voltage for a worst-case 47mA load step.
  • Thumbnail Image
    Publication
    A fully-integrated 3-level DC/DC converter for nanosecond-scale DVS with fast shunt regulation
    (IEEE, 2011) Kim, Wonyoung; Brooks, David; Wei, Gu-Yeon
    In recent years, chip multiprocessor architectures have emerged to scale performance while staying within tight power constraints. This trend motivates per core/block dynamic voltage and frequency scaling (DVFS) with fast voltage transition. Given the high cost and bulk of off-chip DC/DC converters to implement multiple on-chip power domains, there has been a surge of interest in on-chip converters. This paper presents the design and experimental results of a fully integrated 3-level DC/DC converter that merges characteristics of both inductor-based buck and switched-capacitor (SC) converters. While off-chip buck converters show high conversion efficiency, their on-chip counterparts suffer from loss due to low quality inductors. With the help of flying capacitors, the 3-level converter requires smaller inductors than the buck converter, reducing loss and on-die area overhead. Compared to SC converters that need more com plex structures to regulate higher than half the input voltage, 3-level converters can efficiently regulate the output voltage across a wide range of levels and load currents. Measured results from a 130nm CMOS test-chip prototype demon strate nanosecond-scale voltage transition times and peak conversion efficiency of 77%.
  • Thumbnail Image
    Publication
    Voltage Noise in Production Processors
    (Institute of Electrical & Electronics Engineers (IEEE), 2011) Janapa Reddi, Vijay; Kanev, Svilen; Kim, Wonyoung; Campanoni, Simone; Smith, Michael; Wei, Gu-Yeon; Brooks, David
    Voltage variations are a major challenge in processor design. Here, researchers characterize the voltage noise characteristics of programs as they run to completion on a production Core 2 Duo processor. Furthermore, they characterize the implications of resilient architecture design for voltage variation in future systems.
  • Thumbnail Image
    Publication
    Helix: Making the Extraction of Thread-Level Parallelism Mainstream
    (Institute of Electrical & Electronics Engineers (IEEE), 2012) Campanoni, Simone; Jones, Timothy; Holloway, Glenn; Wei, Gu-Yeon; Brooks, David
    Improving system performance increasingly depends on exploiting microprocessor parallelism, yet mainstream compilers still don't parallelize code automatically. Helix automatically parallelizes general-purpose programs without requiring any special hardware; avoids slowing down compiled programs, making it a suitable candidate for mainstream compilers; and outperforms the most similar historical technique that has been implemented in production compilers.
  • Thumbnail Image
    Publication
    The HELIX project
    (IEEE, 2012) Campanoni, Simone; Jones, Timothy; Holloway, Glenn; Wei, Gu-Yeon; Brooks, David
    Parallelism has become the primary way to maximize processor performance and power efficiency. But because creating parallel programs by hand is difficult and prone to error, there is an urgent need for automatic ways of transforming conventional programs to exploit modern multicore systems. The HELIX compiler transformation is one such technique that has proven effective at parallelizing individual sequential programs automatically for a real six-core processor. We describe that transformation in the context of the broader HELIX research project, which aims to optimize the throughput of a multicore processor by coordinated changes in its architecture, its compiler, and its operating system. The goal is to make automatic parallelization mainstream in multiprogramming settings through adaptive algorithms for extracting and tuning thread-level parallelism.
  • Thumbnail Image
    Publication
    The accelerator store
    (Association for Computing Machinery (ACM), 2012) Lyons, Michael John; Hempstead, Mark; Wei, Gu-Yeon; Brooks, David
    In recent years, circuit reliability in modern high-performance processors has become increasingly important. Shrinking feature sizes and diminishing supply voltages have made circuits more sensitive to microprocessor supply voltage fluctuations. These fluctuations result from the natural variation of processor activity as workloads execute, but when left unattended, these voltage fluctuations can lead to timing violations or even transistor lifetime issues. In this article, we present a hardware--software collaborative approach to mitigate voltage fluctuations. A checkpoint-recovery mechanism rectifies errors when voltage violates maximum tolerance settings, while a runtime software layer reschedules the program's instruction stream to prevent recurring violations at the same program location. The runtime layer, combined with the proposed code-rescheduling algorithm, removes 60% of all violations with minimal overhead, thereby significantly improving overall performance. Our solution is a radical departure from the ongoing industry-standard approach to circumvent the issue altogether by optimizing for the worst-case voltage flux, which compromises power and performance efficiency severely, especially looking ahead to future technology generations. Existing conservative approaches will have severe implications on the ability to deliver efficient microprocessors. The proposed technique reassembles a traditional reliability problem as a runtime performance optimization problem, thus allowing us to design processors for typical case operation by building intelligent algorithms that can prevent recurring violations.
  • Thumbnail Image
    Publication
    Shrink-Fit: A Framework for Flexible Accelerator Sizing
    (Institute of Electrical & Electronics Engineers (IEEE), 2013) Lyons, Michael John; Wei, Gu-Yeon; Brooks, David
    RTL design complexity discouraged adoption of reconfigurable logic in general purpose systems, impeding opportunities for performance and energy improvements. Recent improvements to HLS compilers simplify RTL design and are easing this barrier. A new challenge will emerge: managing reconfigurable resources between multiple applications with custom hardware designs. In this paper, we propose a method to "shrink-fit" accelerators within widely varying fabric budgets. Shrink-fit automatically shrinks existing accelerator designs within small fabric budgets and grows designs to increase performance when larger budgets are available. Our method takes advantage of current accelerator design techniques and introduces a novel architectural approach based on fine-grained virtualization. We evaluate shrink-fit using a synthesized implementation of an IDCT for decoding JPEGs and show the IDCT accelerator can shrink by a factor of 16x with minimal performance and area overheads. Using shrink-fit, application designers can achieve the benefits of hardware acceleration with single RTL designs on FPGAs large and small.