Publication:
Leveraging Latent Spaces for Fair Results in Vector Database Image Retrieval

No Thumbnail Available

Date

2024-11-26

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Glynn, Alexander P. 2024. Leveraging Latent Spaces for Fair Results in Vector Database Image Retrieval. Bachelor's thesis, Harvard University Engineering and Applied Sciences.

Research Data

Abstract

This work attempts to debias image retrieval from a vector database across a potentially broad class of sensitive attributes. Vector representations of images and text are increasingly common as components in downstream models, and additionally display strong capabilities without further training (zero-shot) for classification and retrieval tasks. However, undesired bias, especially with regard to race and gender, within these representations is well-studied and impacts downstream tasks. We present the Conceptually Diverse Images (CDI) algorithm to confront these biases in image retrieval. CDI debiases image retrieval over a set of flexibly-chosen group attributes that serve as protected classes. CDI leverages latent information from a foundational vector embedding model to work in a zero-shot fashion – no training is required to intervene for a particular set of protected attributes. A concept layer is produced through projecting a set of images similar to the query into a space defined by their position in regard to a zero-shot classification problem across each attribute. By then taking a maximally diverse set across their positions on this “concept bottleneck,” CDI increases the fairness of the returned images across several measurable metrics, including subgroup fairness notions. We present provable results on the relationship between a maximally diverse set on an idealized concept space and fairness notions for retrieval problems and show the in-practice performance of CDI against recent competing methods of debiasing image search from a vector database. CDI displays competitive performance with prior (trained and zero-shot) methods of debiasing, several of which we extend for the first time to subgroup fairness. It shows best-in-class performance on certain metrics, and, on most it extends the Pareto Front of the Precision-Bias curve to allow for more aggressive fairness trade-offs. We additionally examine the use of more attributes than we can measure, which promisingly comes at low-cost to those we can, and conduct multiple ablation tests to justify components of CDI.

Description

Other Available Sources

Keywords

Fairness, Image Retrieval, Machine Learning, Vector Database, Zero-shot, Applied mathematics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories