Person: Smith, Michael
Loading...
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
Smith
First Name
Michael
Name
Smith, Michael
17 results
Search Results
Now showing 1 - 10 of 17
Publication Eliminating voltage emergencies via software-guided code transformations(Association for Computing Machinery (ACM), 2010) Reddi, Vijay Janapa; Campanoni, Simone; Gupta, Meeta S.; Smith, Michael; Wei, Gu-Yeon; Brooks, David; Hazelwood, KimIn recent years, circuit reliability in modern high-performance processors has become increasingly important. Shrinking feature sizes and diminishing supply voltages have made circuits more sensitive to microprocessor supply voltage fluctuations. These fluctuations result from the natural variation of processor activity as workloads execute, but when left unattended, these voltage fluctuations can lead to timing violations or even transistor lifetime issues. In this article, we present a hardware--software collaborative approach to mitigate voltage fluctuations. A checkpoint-recovery mechanism rectifies errors when voltage violates maximum tolerance settings, while a runtime software layer reschedules the program's instruction stream to prevent recurring violations at the same program location. The runtime layer, combined with the proposed code-rescheduling algorithm, removes 60% of all violations with minimal overhead, thereby significantly improving overall performance. Our solution is a radical departure from the ongoing industry-standard approach to circumvent the issue altogether by optimizing for the worst-case voltage flux, which compromises power and performance efficiency severely, especially looking ahead to future technology generations. Existing conservative approaches will have severe implications on the ability to deliver efficient microprocessors. The proposed technique reassembles a traditional reliability problem as a runtime performance optimization problem, thus allowing us to design processors for typical case operation by building intelligent algorithms that can prevent recurring violations.Publication Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling(IEEE, 2010) Reddi, Vijay Janapa; Kanev, Svilen; Kim, Wonyoung; Campanoni, Simone; Smith, Michael; Wei, Gu-Yeon; Brooks, DavidParameter variations have become a dominant challenge in microprocessor design. Voltage variation is especially daunting because it happens so rapidly. We measure and characterize voltage variation in a running Intel Core2 Duo processor. By sensing on-die voltage as the processor runs single-threaded, multi-threaded, and multi-program workloads, we determine the average supply voltage swing of the processor to be only 4 percent, far from the processor's 14percent worst-case operating voltage margin. While such large margins guarantee correctness, they penalize performance and power efficiency. We investigate and quantify the benefits of designing a processor for typical-case (rather than worst-case) voltage swings, assuming that a fail-safe mechanism protects it from infrequently occurring large voltage fluctuations. With today's processors, such resilient designs could yield 15 percent to 20 percent performance improvements. But we also show that in future systems, these gains could be lost as increasing voltage swings intensify the frequency of fail-safe recoveries. After characterizing micro architectural activity that leads to voltage swings within multi-core systems, we show that a voltage-noise-aware thread scheduler in software can co-schedule phases of different programs to mitigate error recovery overheads in future resilient processor designs.Publication Voltage Noise in Production Processors(Institute of Electrical & Electronics Engineers (IEEE), 2011) Janapa Reddi, Vijay; Kanev, Svilen; Kim, Wonyoung; Campanoni, Simone; Smith, Michael; Wei, Gu-Yeon; Brooks, DavidVoltage variations are a major challenge in processor design. Here, researchers characterize the voltage noise characteristics of programs as they run to completion on a production Core 2 Duo processor. Furthermore, they characterize the implications of resilient architecture design for voltage variation in future systems.Publication Performance issues in correlated branch prediction schemes(1995) Gloy, Nicolas; Smith, Michael; Young, CliffAccurate static branch prediction is the key to many techniques for exposing, enhancing, and exploiting Instruction Level Parallelism (ILP). The initial work on static correlated branch prediction (SCBP) demonstrated improvements in branch prediction accuracy, but did not address overall performance. In particular, SCBP expands the size of executable programs, which negatively affects the performance of the instruction memory hierarchy. Using the profile information available under SCBP, we can minimize these negative performance effects through the application of code layout and branch alignment techniques. We evaluate the performance effect of SCBP and these profile-driven optimizations on instruction cache misses, branch mispredictions, and branch misfetches for a number of recent processor implementations. We find that SCBP improves performance over (traditional) per-branch static profile prediction. We also find that SCBP improves the performance benefits gained from branch alignment. As expected, SCBP gives larger benefits on machine organizations with high mispredict/misfetch penalties and low cache miss penalties. Finally, we find that the application of profile-driven code layout and branch alignment techniques (without SCBP) can improve the performance of the dynamic correlated branch prediction techniques.Publication Infrastructure for Research towards Ubiquitous Information Systems(1994) Grosz, Barbara; Kung, H.; Seltzer, Margo; Shieber, Stuart; Smith, MichaelThe availability of fast, inexpensive computers and the growth of network technology have resulted in the proliferation of computing power and an enormous increase in information available in electronic form. However, most of the information stored on computers is extremely difficult for the common person to obtain. Thus, a central challenge for computer science and engineering in the next decade is to create the scientific and technological base for large-scale and easy-to-use information systems. These systems must work together in a coherent and cohesive manner, providing shared information easily for the general user. We refer to these systems as systems for ubiquitous information. The development of the National Information Infrastructure (NII) amplifies the urgent needs for research in this area. We propose to develop a new generation computing facility to support experimental research in ubiquitous information systems. The research to be carried out using this facility spans from the development of new technologies that support the rapid transmission of large amounts of data between computer systems to the development of more flexible and adaptable systems for human-computer communication. The proposed infrastructure will include emerging equipment with new capabilities critical to this new research, such as Asynchronous Transfer Mode (ATM) networks capable of guaranteeing performance, file servers capable of handling video, and graphics work-stations with advanced human interface capabilities. This equipment will supplement the basic computing and networking equipment typically found in computer science departments.Publication Modeling the Effects of Memory Hierarchy Performance on Throughput of Multithreaded Processors(2005) Fedorova, Alexandra; Seltzer, Margo; Smith, MichaelUnderstanding the relationship between the performance of the on-chip processor caches and the overall performance of the processor is critical for both hardware design and software program optimization. While this relationship is well understood for conventional processors, it is not understood for new multithreaded processors that hide a workload's memory latency by executing instructions from several threads in parallel. In this paper we present a model for estimating processor throughput as a function of the cache hierarchy performance. Our model has a closed-form solution, is robust against a range of workloads and input parameters, and gives estimates of processor throughput that are within 13% of measured values for heterogeneous workloads. We demonstrate how this model can be used in an operating system scheduler tailored for multithreaded processor systems.Publication Abstract Execution in a Multi-Tasking Environment(1994) Mazières, David; Smith, MichaelTracing software execution is an important part of understanding system performance. Raw CPU power has been increasing at a rate far greater than memory and I/O bandwidth, with the result that the performance of client/server and I/O-bound applications is not scaling as one might hope. Unfortunately, the behavior of these types of applications is particularly sensitive to the kinds of distortion induced by traditional tracing methods, so that current traces are either incomplete or of questionable accuracy. Abstract execution is a powerful tracing technique which was invented to speed the tracing of single processes and to store trace data more compactly. In this work, abstract execution was extended to trace multi-tasking workloads. The resulting system is more than 5 times faster than other current methods of gathering multi-tasking traces, and can therefore generate traces with far less time distortion.Publication The Impact of Operating System Structure on Personal Computer Performance(1995) Chen, J. Bradley; Endo, Yashuhiro; Chan, Kee; Mazieres, David; Dias, Antonio; Seltzer, Margo; Smith, MichaelThis paper presents a comparative study of the performance of three operating systems that run on the personal computer architecture derived from the IBM-PC. The operating systems, Windows for Workgroups (tm), Windows NT (tm), and NetBSD (a freely available UNIX (tm) variant) cover a broad range of system functionality and user requirements, from a single address space model to full protection with preemptive multi-tasking. Our measurements were enabled by hardware counters in Intel’s Pentium (tm) processor that permit measurement of a broad range of processor events including instruction counts and on-chip cache miss rates. We used both microbenchmarks, which expose specific differences between the systems, and application workloads, which provide an indication of expected end-to-end performance. Our microbenchmark results show that accessing system functionality is more expensive in Windows than in the other two systems due to frequent changes in machine mode and the use of system call hooks. When running native applications, Windows NT is more efficient than Windows, but it does incur overhead from its microkernel structure. Overall, system functionality can be accessed most efficiently in NetBSD; we attribute this to its monolithic structure, and to the absence of the complications created by backwards compatibility in the other systems. Measurements of application performance show that the impact of these differences is significant in terms of overall execution time.Publication A Comparative Analysis of Schemes for Correlated Branch Prediction(1995) Young, Cliff; Gloy, Nicolas; Smith, MichaelModern high-performance architectures require extremely accurate branch prediction to overcome the performance limitations of conditional branches. We present a framework that categorizes branch prediction schemes by the way in which they partition dynamic branches and by the kind of predictor that they use. The framework allows us to compare and contrast branch prediction schemes, and to analyze why they work. We use the framework to show how a static correlated branch prediction scheme increases branch bias and thus improves overall branch prediction accuracy. We also use the framework to identify the fundamental differences between static and dynamic correlated branch prediction schemes. This study shows that there is room to improve the prediction accuracy of existing branch prediction schemes.Publication Cache-Fair Thread Scheduling for Multicore Processors(2006) Fedorova, Alexandra; Seltzer, Margo; Smith, MichaelWe present a new operating system scheduling algorithm for multicore processors. Our algorithm reduces the effects of unequal CPU cache sharing that occur on these processors and cause unfair CPU sharing, priority inversion, and inadequate CPU accounting. We describe the implementation of our algorithm in the Solaris operating system and demonstrate that it produces fairer schedules enabling better priority enforcement and improved performance stability for applications. With conventional scheduling algorithms, application performance on multicore processors varies by up to 36% depending on the runtime characteristics of concurrent processes. We reduce this variability by up to a factor of seven.