Publication:

Towards Positive Outcomes in the AI Economy: Mitigating Algorithmic Collusion and Enabling Fair Recourse

Loading...
Thumbnail Image

Date

2024-05-13

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Mibuari, Eric. 2024. Towards Positive Outcomes in the AI Economy: Mitigating Algorithmic Collusion and Enabling Fair Recourse. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

The rise of Artificial Intelligence (AI) promises to solve many important problems in the world. At the same time, awareness has been increasing about its potential and real harms. How can we extract maximum benefit from the promise of AI while minimizing present harms and mitigating future risks?

In this thesis, I frame and answer this question from the perspective of enabling and promoting positive outcomes in the AI-enabled economy, where markets are facilitated using AI algorithms, including agent behavior, pricing, and matching and clearing. The goal of my research is to find and create the conditions under which the benefits of AI are preserved or even enhanced while the tendency to diminish welfare, perhaps to particular groups, is contained to the greatest possible extent.

In lending domains, machine learning can be used to form a predictive model of the probability of default (a ``risk score"), this driving loan decisions. For simple models, this brings the benefits of transparency and explainability, as well as guidance in regard to recourse. An alternative is to use policy learning, that is, learning a policy from borrower characteristics to loan decisions directly, and without explicit risk scoring. This emphasizes profit and can speed up learning, as a lender understands a borrower population, but with a concomitant loss of transparency. I introduce a risk-score based policy learning method, as well as a new metric of recourse effort fairness, and demonstrate that this risk-score based policy learning achieves optimal profits, explainability and transparency, as well as recourse effort fairness.

There are a number of problems where economic actors follow sequential behaviors, for example, in making pricing adjustments over time on e-commerce platforms, or power trading and storage optimization through a typical day in an electricity market. In this thesis, I work with the Stackelberg POMDP framework (partially observable Markov decision process) to design interventions in these kinds of sequential settings, seeking to improve economic welfare. I am especially focused on concerns that can arise in regard to tacit collusion between AI-mediated pricing or trading and storage decisions by automated agents.

In a first setting, I study algorithmic pricing on e-commerce platforms, where reinforcement learning (RL) algorithms have been shown to learn to set collusive prices with nothing more than profit feedback. This raises the question as to whether collusive pricing can be prevented through the design of suitable ``buy boxes," i.e., through the design of the rules that govern the promotion of particular products and prices to consumers. I show that RL can also be used by platforms to learn buy box rules that are effective in preventing collusion by RL sellers. For this, I adopt the Stackelberg POMDP framework, and demonstrate success in learning robust rules that provide high consumer welfare.

In a second setting, I study trading and storage decisions by battery operators in electricity power markets. %requiring power storage are an example of a multi-agent dynamic storage optimization application in which artificial intelligence (AI) algorithms are actively applied by prosumers (battery operators who both produce and consume power). In this application, agents who correspond to battery operators buy and sell power in a market with producers and price-elastic consumers, and where power can be stored at a negligible cost up to an agent-specific capacity. The use of RL algorithms has been shown in this setting to lead to outcomes where battery operators arbitrage the market over time, and that correspond to tacit collusion. I again appeal to the Stackelberg POMDP framework, and demonstrate success in learning collusion-mitigation policies by a regulator, in particular through the use of a network-flow thresholding intervention.

Description

Other Available Sources

Research Data

Keywords

Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories