Publication: Improving Data Placement Decisions for Heterogeneous Clustered File Systems
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
With the advent of cloud computing, datacenters are using distributed applications more than ever. MapReduce is used to generate over 20 petabytes of data per day by using prodigious numbers of commodity servers (Dean & Ghemawat, 2008). Many companies use large scale clusters to perform various computational tasks via the open-source MapReduce implementation, Hadoop (White, 2012), or they can possess a virtualized datacenter, allowing them to migrate virtual machines between various machines for high-availability reasons. As economics change for hardware, it is likely that a scalable cloud will have the requirement to mix node types, which will lead to higher performance and higher capacity nodes to be mixed with lower performance, lower capacity nodes. This thesis presents an adaptive data placement method in the Nutanix distributed file system which will remedy some common problems found in many heterogeneous clustered file systems.