Publication: Timely and Efficient Resource Management in Networked Systems
No Thumbnail Available
Open/View Files
Date
2024-08-27
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Xu, Zhiying. 2024. Timely and Efficient Resource Management in Networked Systems. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Resource management is ubiquitous in networked systems and ensures the effective sharing of resources among various demands. Examples include traffic engineering, cluster scheduling, and load balancing. Two key requirements in resource management are timeliness and efficiency. However, achieving the two requirements together is challenging due to the large scale and high complexity of resource management. In industry, commercial solvers lack timeliness, while heuristics lack efficiency. Meanwhile, recent decomposition-based approaches from academia remain too slow and inefficient.
In this thesis, I build timely and efficient resource management based on two design principles: separating resources and demands to enable massive parallelism and capturing complexity with machine learning. I will first discuss Teal, a traffic engineering algorithm in wide-area networks, which utilizes highly parallelizable neural network inference on GPUs for timeliness while capturing complicated flow patterns on networks for efficiency. Subsequently, I will describe a more general resource allocation framework, DeDe, which decouples resource and demand constraints and decomposes large, complex resource allocation into small, simple per-resource and per-demand allocation in parallel. Outside of resource management, attack defense in networked systems also requires timeliness and efficiency and I will introduce Xatu on boosting DDoS attack detection by learning auxiliary signals before an imminent attack, such as spoofed traffic or small attacks. All the resulting systems achieve timeliness and high efficiency in large-scale real-data evaluations.
Description
Other Available Sources
Keywords
Computer science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service