Publication:
Systems and Algorithms for Efficient, Secure and Private Machine Learning Inference

No Thumbnail Available

Date

2024-04-15

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Lam, Maximilian. 2024. Systems and Algorithms for Efficient, Secure and Private Machine Learning Inference. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

As artificial intelligence and machine learning become ubiquitous, data privacy emerges as a critical concern. The use of sensitive data in machine learning applications exposes vulnerabilities that could jeopardize user privacy, posing ethical and legal risks. Current machine learning systems require significant modifications to protect privacy, such as on-device computation or encryption, which increases computational costs and may reduce accuracy, posing an issue for deployment. These computational challenges are the key barrier to adoption and addressing the challenges at this intersection of machine learning, data privacy, and computational efficiency is essential for the future deployment of privacy-enhanced machine learning systems. My PhD focuses on this unique intersection of machine learning, data privacy and systems, with the high level aim of making privacy-enhanced machine learning techniques efficient enough to be deployed. Over the course of my PhD I have developed systems and algorithms that accelerate by up to an order of magnitude techniques for privacy-preserving machine learning inference, such as on-device machine learning inference and secure neural network inference, by leveraging unique aspects of neural networks like quantization, harnessing systems and hardware acceleration techniques like GPU acceleration, and co-designing these hardware-software optimizations with the specific privacy preserving machine learning algorithm to obtain maximal efficiency at inference time, all while remaining cognizant of and defending against potential attack vectors (i.e: data privacy leaks, such as by gradients in federated learning) that may compromise the security and privacy of the machine learning system. My PhD pushes the boundary of solutions towards machine learning systems that are simultaneously efficient, private and secure. Towards efficiency, we develop \textsc{PrecisionBatching}, a general neural network acceleration technique which utilizes quantization to accelerate neural network inference by up to $2\times$ through maximizing GPU utilization on inference over small batchsizes by turning a memory-bound operation into a compute-bound operation. Although \textsc{PrecisionBatching} makes non-private neural network inference more efficient, it is a crucial step towards realizing the unique application of quantization towards privacy-preserving machine learning systems, as well the criticality of leveraging hardware acceleration, specifically GPU acceleration, for obtaining maximal system performance. Towards privacy, we develop \textsc{Tabula}, an approach which utilizes quantization to enable the use of secure lookup tables to speed up the private computation of activation functions for neural networks by over $100\times$. \textsc{Tabula} enables private neural network inference that is over an order of magnitude more efficient in terms of runtime and communication than prior works, enabling the real-world deployment of secure neural network inference applications. We furthermore develop \textsc{GPU-DPF}, a GPU algorithm that accelerates distributed point functions (DPF) for private information retrieval by over $30 \times$ over a CPU by harnessing massive parallelization towards computing expensive cryptographic primitives, for the purpose of enabling private on-device machine learning inference with embedding tables too large to store on-device. Finally, towards security, we develop \textsc{Gradient Disaggregation}, an attack that the disaggregates sums of gradients of up to thousands of users that are observed during federated learning for the purpose of undermining the privacy safeguards of federated learning systems, and furthermore propose possible defenses against our attack, with the high level goal of developing machine learning systems that are more secure.

Description

Other Available Sources

Keywords

Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories