Publication:

Combatting Collusion Between Reinforcement Learning Agents in Electricity Markets

Loading...
Thumbnail Image

Date

2025-05-28

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Zhang, David. 2023. Combatting Collusion Between Reinforcement Learning Agents in Electricity Markets. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

When markets are well behaved, we expect firms to produce at the point where marginal revenue matches marginal cost. Collusive behavior, on the other hand, arises when firms produce less than this, leading to elevated prices, lower social welfare and higher industry profits.

It is interesting, then, that collusive behavior has been observed between reinforcement learning (RL) agents that act to set prices for goods across repeated interactions in simple, simulated markets. This behavior is the convergence toward a market equilibrium that has lower social welfare or higher industry profits than what is considered a Nash equilibrium for the reinforcement learning agents. In this project, I create a simplified model of an electricity market to confirm the collusive behavior of RL agents, comparing theoretical baselines of profit and welfare to the result of using Q-Learning agents. I then study the effect of various market interventions, in both this simplified model and Abada and Lambin’s model \cite{Abada-Lambin}. The interventions I consider include a) the introduction of a welfare-maximizing agent, b) setting limits on battery and output capacity, c) the use of taxation, and d) a reward-punishment scheme.

In order to assess the suitability of each intervention, a game-theoretic equilibrium is calculated for each intervention and compared to theoretical baselines. This is computed using quadratic program solvers and Scipy optimization packages. The intervention is then implemented in an OpenAI Gym environment to confirm or reject the game-theoretic improvements that were demonstrated. For the welfare-maximizing agent intervention, it was also implemented on the Abada and Lambin model to explore how agents react to the intervention in a more complex environment.

A first result, in both the simplified model as well as Abada and Lambin’s model, is that the introduction of a welfare-maximizing agent fails to provide a desired improvement in social welfare. Likewise, creating restrictions on battery and output capacity fails to provide a desired improvement in social welfare. Rather, I show that a promising direction is to make use of a suitable taxation or reward-punishment scheme, with this able to improve social welfare in both models.

Description

Other Available Sources

Research Data

Keywords

collusion, electricity markets, game theory, Q-Learning, reinforcement learning, Computer science, Economics, Applied mathematics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories