Publication:
Timely and Efficient Resource Management in Networked Systems

No Thumbnail Available

Date

2024-08-27

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Xu, Zhiying. 2024. Timely and Efficient Resource Management in Networked Systems. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Resource management is ubiquitous in networked systems and ensures the effective sharing of resources among various demands. Examples include traffic engineering, cluster scheduling, and load balancing. Two key requirements in resource management are timeliness and efficiency. However, achieving the two requirements together is challenging due to the large scale and high complexity of resource management. In industry, commercial solvers lack timeliness, while heuristics lack efficiency. Meanwhile, recent decomposition-based approaches from academia remain too slow and inefficient. In this thesis, I build timely and efficient resource management based on two design principles: separating resources and demands to enable massive parallelism and capturing complexity with machine learning. I will first discuss Teal, a traffic engineering algorithm in wide-area networks, which utilizes highly parallelizable neural network inference on GPUs for timeliness while capturing complicated flow patterns on networks for efficiency. Subsequently, I will describe a more general resource allocation framework, DeDe, which decouples resource and demand constraints and decomposes large, complex resource allocation into small, simple per-resource and per-demand allocation in parallel. Outside of resource management, attack defense in networked systems also requires timeliness and efficiency and I will introduce Xatu on boosting DDoS attack detection by learning auxiliary signals before an imminent attack, such as spoofed traffic or small attacks. All the resulting systems achieve timeliness and high efficiency in large-scale real-data evaluations.

Description

Other Available Sources

Keywords

Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories