Publication: Detecting System Anomalies Using Kernel-level Data Provenance
No Thumbnail Available
Open/View Files
Date
2022-05-11
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Han, Xueyuan. 2022. Detecting System Anomalies Using Kernel-level Data Provenance. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Modern cyber-attacks are increasingly difficult to detect. For example, advanced persistent threats (APTs) slowly infiltrate a network of systems and can remain undetected for months. Alternatively, attackers can corrupt digital supply chains to distribute malware from trusted software vendors, infesting millions of systems for weeks before the malware (and the corrupted supply channels) is discovered.
We develop anomaly-based techniques to detect attacks without a priori attack knowledge. A key insight of our work is that attackers frequently request system-level services, such as system calls, to accomplish their mission; however, they manipulate kernel objects in ways that deviate from normal system behavior. Therefore, by analyzing interactions between these objects, we can identify anomalies that signify attack footprints. To this end, we leverage kernel-level data provenance, which represents the history of system execution as a directed acyclic graph (DAG) called a provenance graph, as the information source to analyze these interactions.
To detect APTs, we present Unicorn. Unicorn continuously monitors host behavior to detect anomalous system activities that hide amongst normal operations for long periods of time. To do so efficiently as the provenance graph grows large over time, Unicorn summarizes essential properties of the graph into a time series of compact graph sketches. Graph sketches that deviate significantly from known benign sketches indicate the presence of an attack.
To detect malicious software installation, we introduce SIGL. SIGL learns kernel interactions of benign software installations to pinpoint unusual graph patterns from specific installation processes that are likely launched by malware. To do so, SIGL adapts graph-based machine learning (ML) techniques to system-level provenance graphs and uses an unsupervised learning architecture to explain its detection.
In summary, this dissertation demonstrates that modern attacks can exhibit distinguishable system-level behaviors; that these behaviors result in anomalies that are captured in kernel-level data provenance; and that provenance graphs can be used to detect system anomalies to perform effective intrusion and malware detection.
Description
Other Available Sources
Keywords
Applied Machine Learning, Computer Security, Computer Systems, Data Provenance, Intrusion Detection, Malware Detection, Computer science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service