Show simple item record

dc.contributor.advisorBrooks, David
dc.contributor.advisorWei, Gu-Yeon
dc.contributor.authorKanev, Svilen
dc.date.accessioned2018-12-20T08:11:25Z
dc.date.created2017-03
dc.date.issued2017-01-25
dc.date.submitted2017
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:37945003*
dc.description.abstractComputation has been steadily migrating from isolated on-premise deployments to the datacenters of a small number of large-scale cloud providers. The datacenters powering the cloud, also known as warehouse-scale computers (WSCs), have a unique set of design constraints, balancing efficiency at scale with ever-growing application needs for performance. Designing next-generation server platforms for WSCs after the end of Dennard scaling is one of the most important challenges for computer architects. In order to guide such future designs, we performed the first (to the best of our knowledge) longitudinal profiling study of a live production WSC. Our performance measurements span tens of thousands of machines over several years, while these machines serve the requests of billions of users. Even though we observe significant diversity, both in applications and architectural behaviors, patterns begin to emerge. We identify the "datacenter tax" -- a set of shared low-level software components that comprise almost 30% of all processor cycles in production datacenters. The constituents of this "tax" -- the necessary components to do distributed computation (data serialization, compression, etc.) -- are also prime candidates for optimization, both in software and through specialized hardware. The latter case has especially high potential upside, but requires hardware accelerators that are markedly different from traditional designs. These new "broad' accelerators face a unique set of challenges: because calls to tax routines tend to be frequent, fast, and interspersed inside other application code, accelerators must be optimized for latency rather than throughput, and because each one accelerator brings a limited amount of overall application speedup, overheads must be kept to a bare minimum. We demonstrate by construction that, while non-trivial, meeting such constraints is possible. Our memory allocation accelerator, Mallacc, reduces the latency of already fast malloc calls by up to 50% while occupying only 0.006% of the silicon area of a typical high-performance core. This thesis identifies the opportunity for broad acceleration and presents first steps towards designing datacenter tax accelerators. We expect that it will spur additional interest, from industry and academia, and will help bridge the gap between research in datacenters and in specialized hardware.
dc.description.sponsorshipEngineering and Applied Sciences - Computer Science
dc.format.mimetypeapplication/pdf
dc.language.isoen
dash.licenseLAA
dc.subjectComputer Science
dc.titleEfficiency in warehouse-scale computers: a datacenter tax study
dc.typeThesis or Dissertation
dash.depositing.authorKanev, Svilen
dc.date.available2018-12-20T08:11:25Z
thesis.degree.date2017
thesis.degree.grantorGraduate School of Arts & Sciences
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
dc.contributor.committeeMemberKohler, Eddie
dc.contributor.committeeMemberMoseley, Tipp
dc.type.materialtext
thesis.degree.departmentEngineering and Applied Sciences - Computer Science
dash.identifier.vireohttp://etds.lib.harvard.edu/gsas/admin/view/1360
dc.description.keywordscomputer architecture; datacenter; warehouse-scale computer; accelerator;
dash.author.emailsvilen.kanev@gmail.com


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record