Publication: Timely and Efficient Resource Management in Networked Systems
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Resource management is ubiquitous in networked systems and ensures the effective sharing of resources among various demands. Examples include traffic engineering, cluster scheduling, and load balancing. Two key requirements in resource management are timeliness and efficiency. However, achieving the two requirements together is challenging due to the large scale and high complexity of resource management. In industry, commercial solvers lack timeliness, while heuristics lack efficiency. Meanwhile, recent decomposition-based approaches from academia remain too slow and inefficient.
In this thesis, I build timely and efficient resource management based on two design principles: separating resources and demands to enable massive parallelism and capturing complexity with machine learning. I will first discuss Teal, a traffic engineering algorithm in wide-area networks, which utilizes highly parallelizable neural network inference on GPUs for timeliness while capturing complicated flow patterns on networks for efficiency. Subsequently, I will describe a more general resource allocation framework, DeDe, which decouples resource and demand constraints and decomposes large, complex resource allocation into small, simple per-resource and per-demand allocation in parallel. Outside of resource management, attack defense in networked systems also requires timeliness and efficiency and I will introduce Xatu on boosting DDoS attack detection by learning auxiliary signals before an imminent attack, such as spoofed traffic or small attacks. All the resulting systems achieve timeliness and high efficiency in large-scale real-data evaluations.