Publication:
Detecting System Anomalies Using Kernel-level Data Provenance

No Thumbnail Available

Date

2022-05-11

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Han, Xueyuan. 2022. Detecting System Anomalies Using Kernel-level Data Provenance. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Modern cyber-attacks are increasingly difficult to detect. For example, advanced persistent threats (APTs) slowly infiltrate a network of systems and can remain undetected for months. Alternatively, attackers can corrupt digital supply chains to distribute malware from trusted software vendors, infesting millions of systems for weeks before the malware (and the corrupted supply channels) is discovered. We develop anomaly-based techniques to detect attacks without a priori attack knowledge. A key insight of our work is that attackers frequently request system-level services, such as system calls, to accomplish their mission; however, they manipulate kernel objects in ways that deviate from normal system behavior. Therefore, by analyzing interactions between these objects, we can identify anomalies that signify attack footprints. To this end, we leverage kernel-level data provenance, which represents the history of system execution as a directed acyclic graph (DAG) called a provenance graph, as the information source to analyze these interactions. To detect APTs, we present Unicorn. Unicorn continuously monitors host behavior to detect anomalous system activities that hide amongst normal operations for long periods of time. To do so efficiently as the provenance graph grows large over time, Unicorn summarizes essential properties of the graph into a time series of compact graph sketches. Graph sketches that deviate significantly from known benign sketches indicate the presence of an attack. To detect malicious software installation, we introduce SIGL. SIGL learns kernel interactions of benign software installations to pinpoint unusual graph patterns from specific installation processes that are likely launched by malware. To do so, SIGL adapts graph-based machine learning (ML) techniques to system-level provenance graphs and uses an unsupervised learning architecture to explain its detection. In summary, this dissertation demonstrates that modern attacks can exhibit distinguishable system-level behaviors; that these behaviors result in anomalies that are captured in kernel-level data provenance; and that provenance graphs can be used to detect system anomalies to perform effective intrusion and malware detection.

Description

Other Available Sources

Keywords

Applied Machine Learning, Computer Security, Computer Systems, Data Provenance, Intrusion Detection, Malware Detection, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories