Performance of Multithreaded Chip Multiprocessors And Implications for Operating System Design

DSpace/Manakin Repository

Performance of Multithreaded Chip Multiprocessors And Implications for Operating System Design

Citable link to this page

 

 
Title: Performance of Multithreaded Chip Multiprocessors And Implications for Operating System Design
Author: Fedorova, Alexandra; Seltzer, Margo I.; Small, Christopher A.; Nussbaum, Daniel

Note: Order does not necessarily reflect citation order of authors.

Citation: Fedorova, Alexandra, Margo Seltzer, Christopher Small, and Daniel Nussbaum. 2005. Performance of Multithreaded Chip Multiprocessors And Implications for Operating System Design. Harvard Computer Science Group Technical Report TR-09-05.
Full Text & Related Files:
Abstract: An operating system’s design is often influenced by the architecture of the target hardware. While uniprocessor and multiprocessor architectures are well understood, such is not the case for multithreaded chip multiprocessors (CMT) – a new generation of processors designed to improve performance of memory-intensive applications. The first systems equipped with CMT processors are just becoming available, so it is critical that we now understand how to obtain the best performance from such systems. The goal of our work is to understand the fundamentals of CMT performance and identify the implications for operating system design. We have analyzed how the performance of a CMT processor is affected by contention for the processor pipeline, the L1 data cache, and the L2 cache, and have investigated operating system approaches to the management of these performance-critical resources. Having found that contention for the L2 cache can have the greatest negative impact on processor performance, we have quantified the potential performance improvement that can be achieved from L2-aware OS scheduling. We evaluated a scheduling policy based on the balance-set principle and found that it has a potential to reduce miss ratios in the L2 by 19-37% and improve processor throughput by 27-45%. To achieve a similar improvement in hardware requires doubling the size of the L2 cache.
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:24829606
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)

 
 

Search DASH


Advanced Search
 
 

Submitters