Publication: Leveraging Latent Spaces for Fair Results in Vector Database Image Retrieval
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Research Data
Abstract
This work attempts to debias image retrieval from a vector database across a potentially broad class of sensitive attributes. Vector representations of images and text are increasingly common as components in downstream models, and additionally display strong capabilities without further training (zero-shot) for classification and retrieval tasks. However, undesired bias, especially with regard to race and gender, within these representations is well-studied and impacts downstream tasks. We present the Conceptually Diverse Images (CDI) algorithm to confront these biases in image retrieval. CDI debiases image retrieval over a set of flexibly-chosen group attributes that serve as protected classes. CDI leverages latent information from a foundational vector embedding model to work in a zero-shot fashion – no training is required to intervene for a particular set of protected attributes. A concept layer is produced through projecting a set of images similar to the query into a space defined by their position in regard to a zero-shot classification problem across each attribute. By then taking a maximally diverse set across their positions on this “concept bottleneck,” CDI increases the fairness of the returned images across several measurable metrics, including subgroup fairness notions. We present provable results on the relationship between a maximally diverse set on an idealized concept space and fairness notions for retrieval problems and show the in-practice performance of CDI against recent competing methods of debiasing image search from a vector database. CDI displays competitive performance with prior (trained and zero-shot) methods of debiasing, several of which we extend for the first time to subgroup fairness. It shows best-in-class performance on certain metrics, and, on most it extends the Pareto Front of the Precision-Bias curve to allow for more aggressive fairness trade-offs. We additionally examine the use of more attributes than we can measure, which promisingly comes at low-cost to those we can, and conduct multiple ablation tests to justify components of CDI.