On Learnable Mechanism Design

Computation is increasingly distributed across open networks, and performed by self-interested autonomous agents that represent individuals and businesses. Given that these computational agents are often in situations of strategic interaction, it is natural to turn to economics for ideas to control these systems. Mechanism design is particularly attractive in this setting, given its focus on the de-sign of optimal rules to implement good system-wide outcomes despite individual self-interest. Yet, these rich computational environments present new challenges for mechanism design, for example because of system dynamics and because the computational cost of implementing particular equilibrium outcomes is also important. We discuss some of these challenges and provide a reinterpretation of the mathematics of collective intelligence in terms of learnable mechanism design for bounded-rational agents.


Introduction
Far from confined to a desktop machine, computation is now ubiquitous, and performed by heterogenous and networked computers, many of which are operated by and used on behalf of self-interested users. As such, computational systems increasingly exhibit the properties of economies, and it is natural to turn to incentive-based methods to construct systems with useful system-wide behaviors. Indeed, as an alternative to traditional cooperative or adversarial assumptions in computer science, a reasonable design principle for many distributed systems is that computational devices will be programmed to follow selfish, self-interested, behaviors [13,52].
The framework of economic mechanism design (e.g. [27]) provides a rich theoretical backdrop for an economically-motivated approach to the control of decentralized computational systems. Mechanism design was first proposed as a method to implement social choices (e.g. [7,23]). More recently, mechanism design has been adapted to implement outcomes that are individually-optimal for a single agent, for example in the setting of optimal auction design [41].
The key problem addressed within mechanism design is the information problem, in which agents have private information that the mechanism must elicit in order to implement a good system-wide outcome. Participants must receive ap-propriate incentives to reveal information truthfully. This incentive-compatibility requirement limits the outcomes that can be implemented in a game-theoretic equilibrium (e.g. [42]). However, in certain contexts it has been possible to propose strategyproof mechanisms in which no participant can manipulate an outcome to their own benefit whatever the strategy of any other agent [56].
In this chapter, we introduce some of the basic methods of mechanism design, and briefly discuss some of the new challenges in the application of ideas from mechanism design to distributed computational systems. We focus here on the particular problem of mechanism design in the presence of bounded-rational agents that employ simple learning methods to adjust towards an equilibrium. We suggest a collective-intelligence inspired approach for mechanism design in repeated games with these simple adaptive agents. No background is assumed in either mechanism design or collective intelligence.
We first review the mathematics of mechanism design and the mathematics of collective intelligence. Then, we reinterpret the factoredness and informative methods of collective intelligence in the context of incentive-compatibility (truthrevelation in equilibrium) and learnable mechanism design, in which payments are selected to enable computational agents to learn equilibrium strategies. In particular, we introduce a VCG-WLU mechanism, which is a hybrid of the Groves [23] family of mechanisms and the wonderful life local utility [58] of collectiveintelligence. We present simple experimental results, for a simple auction problem and a simple congestion problem, to demonstrate the advantages of this approach to mechanism design. In closing, we consider a broader agenda of learnable mechanism design in which mechanisms are explicitly constructed to maximize their performance with respect to a model of limited agent-rationality.

Mechanism Design
We begin with a formal introduction to mechanism design theory, and then outline some problems with the approach as applied to distributed and dynamic computational systems. The problems arise because of practical limits on the amount of computation and communication that is reasonable within a system. For a more complete review of mechanism design, see Jackson [27], Chapter 23 of MasColell et al. [37], or Chapter 2 of Parkes [46].
The mechanism design (MD) approach to solving problems in decentralized systems with self-interested agents is to formulate the design problem as an optimization problem. The parameters of the design are the kinds of strategies that will be available to agents, specified for example by the bidding language in an auction, and the rules used to determine an outcome based on agent actions, for example rules to determine the winning bids in an auction.
Perhaps the most successful application of MD has been to the theory of auctions. In recent years auction theory has been applied to the design of a number of real-world markets [38].
The fundamental assumption made in MD is that participants in a mechanism will follow a game-theoretic equilibrium strategy. Simply stated, each agent is expected to select actions that maximize its expected utility, given the strategies of the other agents and the rules of the game, as specified by the mechanism. As such, MD makes a strong behavioral assumption about agents and about the information available to agents, and then selects the mechanism that will maximize some system-wide performance measure, such as seller revenue, with respect to the behavioral assumption.
As an example, suppose that a number of users are competing for access to the data staging capabilities on a data server located in Times Square. In MD the valuations of users for different amounts of disk space are private, and the problem is to design incentives, for example through the calculation of appropriate payments, to allocate disk space to maximize the total value across users. One cannot simply request that users report their valuations and then make an allocation based on reported valuations, because self-interested agents can be expected to overstate their values. Rather, a useful mechanism must design the rules of the game to provide incentives for agents to announce truthful information about their preferences for different outcomes.

Mechanism Design: Preliminaries
A mechanism defines a set of feasible strategies, that restrict the kinds of messages that agents can send to the mechanism. A mechanism also fixes a particular outcome rule, which selects an outcome based on agent strategies. 1 Game theoretic methods are used to analyze the properties of a mechanism, under the assumption that agents are rational and will follow expected-utility maximizing strategies in equilibrium, given beliefs about the strategies of other agents and knowledge of the rules of the mechanism.
Formally, a mechanism, . A strategy defines an action for an agent in all possible states of the mechanism. For example, a strategy in an ascending-price auction defines when an agent will bid, and what price an agent will bid, for all possible states of the auction. The outcome rule in the auction takes these strategies, and then implements the choice (an allocation and payments) based on the strategies. An agent, , has a type, This captures all the private information about an agent that is relevant to its preferences for different outcomes. Formally, a type, 2 0 G , defines an agent's utility, , for each possible 1 A mechanism must be able to make a commitment to use these rules. Without this commitment ability the equilibrium of a mechanism can quickly unravel. For example, if an auctioneer in a secondprice auction cannot commit to selling the item at the second-price than the auction looks more like a first-price auction [50]. type, . Given a strategy profile, . A number of different equilibrium solution concepts are typically considered, including Bayesian-Nash (BNE), ex-post Nash, and Dominant Strategy. 2 Dominant strategy implementations are particularly robust because they do not require agents to have correct beliefs about the types of other agents, or beliefs that other agents will play an equilibrium strategy. Ex post Nash equilibrium also have useful robustness properties.
Two important classes of mechanisms are direct revelation and indirect revelation mechanisms. A mechanism is a direct revelation mechanism (DRM) if the strategy space available to an agent is restricted to its space of types, i.e.

G¡
H G . In words, an agent can only make a claim to the mechanism about its preferences. An indirect mechanism is any mechanism in which the strategy space is something other than the type space of an agent. Computational concerns aside, the revelation principle [21,22] states that it is possible to restrict attention to DRMs, with no loss in implementation power. In particular, anything that can be implemented in some complex mechanism can also be implemented in a DRM. The intuition is that if mechanism implements SCF, , in equilibrium, then we can construct a DRM, r % , that will implement the same SCF by simulating agent equilibrium strategies and mechanism within mechanism s % . Moreover, the revelation principle states that it is sufficient to consider mechanisms in which agents will choose to announce truthful information about their preferences in equilibrium. These truthful and direct-revelation mechanisms are referred to as incentive-compatible mechanisms.
The revelation principle allows an analytic solution to the MD problem in a number of interesting design problems. Examples of solved MD problems include: the bargaining problem [42,6] in which there is a single buyer and a single seller; the single-item optimal auction design problem [41] in which a seller wishes to maximize her expected revenue; and the efficient mechanism design problem in which the objective is to allocate resources to maximize the total value across agents. 2 A strategy profile, t 6 u , is a Bayesian-Nash equilibrium if it maximizes the expected utility to every agent, given the strategies of the other agents and given beliefs about the distribution over agent types. Every agent is assumed to have probabilistic information about the types of the other agents. All agents have the same information, and this is common knowledge. A strategy profile, tu , is an ex post Nash equilibrium, strategy if it is a utility-maximizing strategy for every agent given the strategies of the other agents, and whatever the types of the other agents. In a dominant strategy equilibrium, strategy t ù v a w y x v ¥ , for every , is a utility-maximizing strategy whatever the strategies and whatever the types of the other agents.

Example: Efficient Mechanism Design
It is illustrative to present the well known Vickrey-Clarke-Groves (VCG) mechanism [56,7,23]. The VCG mechanism is an incentive-compatible DRM for the efficient allocation problem. In this setting, the outcome space, , is defined in terms of a choice, for all 2 C n H . The remaining problem is to choose a payment rule that provides incentive-compatibility.
A particularly strong version of incentive-compatibility is that of strategyproofness, in which truth-revelation is a dominant strategy, i.e. utility-maximizing whatever the strategies or preferences of other agents. Formally, strategyproofness requires that the allocation and payment rules satisfy the following constraints: Strategyproofness is a useful property because agents can play their equilibrium strategy without game-theoretic modeling or counterspeculation about other agents.
The Groves [23] mechanisms completely characterize the class of efficient and strategyproof mechanisms [22]. The payment rule in a Groves mechanism is defined as: is an arbitrary function on the reported types of every agent except B , or simply a constant. The Groves payment rule internalizes the externality placed on the other agents in the system by the reported preferences of agent B a nd aligns each agent's incentives with the system-wide goal of allocativeefficiency.
To understand the strategyproofness of the Groves Truth revelation maximizes the sum of the first two terms by construction, and the final term is independent of the reported type. This holds for all reported types from other agents, and strategyproofness follows.
From within the class of Groves mechanisms, the Vickrey-Clarke-Groves (VCG) mechanism is especially important because it minimizes the expected total payments by the mechanism to the agents, across all incentive-compatible, efficient, and individual-rational (IR) mechanisms [33]. An IR mechanism is one in which participation is voluntary and agents can choose not to participate. In the VCG mechanism, the payment { v w 8 } G , is computed as: is the efficient allocation as computed with agent B r emoved from the system. For a single item allocation problem, this VCG mechanism reduces to the well-known Vickrey [56] auction, which is a second-price sealed-bid auction.
The VCG mechanism has received considerable attention within computer science and operations research in recent years, in particular in application to distributed optimization problems with self-interested agents (e.g. [43,46]). Later, in Section 1.4, we will draw some connections between the payments in the VCG mechanism and the mathematics of collective intelligence.

Computational Considerations
Computational and informational considerations are largely missing from economic mechanism design. Simply stated, mechanism design assumes that all equilibrium have the same cost. In fact, a pair of otherwise equivalent mechanisms can have equilibrium solutions with vastly different computational properties.
Briefly, some computational considerations that can impact the choice of a mechanism in practice, include: The computational cost, both to agents to compute an equilibrium strategy and to the mechanism to compute the outcome based on agent strategies. Strategyproof mechanisms are one compelling class of mechanisms that do not place an unreasonable game-theoretic burden on participants. Approximations to strategyproof mechanisms that are intractable to implement have been suggested for some problems, where the goal is to retain strategyproofness but also simplify the problem facing the mechanism infrastructure (e.g. [35,43]).
The informational cost of computing an equilibrium. Direct revelation mechanisms require that agents submit complete and exact information about their types, to enable the mechanism to compute optimal outcomes for all possible reports from other agents. This can be unreasonable, for example when the preference elicitation problem involves collecting information from users, or solving hard optimization problems to evaluate different outcomes, of which their can be exponentially many. In comparison, agents can often compute equilibrium strategies in indirect mechanisms with approximate information about their own types [45,49,8]. For example, in an ascending-price auction an agent can bid with upper-and lower-bounds on its value for the item. A number of recent studies consider the design of mechanisms in settings with costly preference elicitation [25,47,9], and the communication complexity of mechanisms [24,44].
The basic idea in all of the aforementioned approaches to computational MD is to reduce the complexity of the proposed solution until the equilibrium of the mechanism is computable.
A fundamentally different approach is to perform MD with respect to a model of the bounded-rationality of an agent. For example, to determine the optimal mechanism given a model of satisficing agent behavior, such as myopic bestresponse to prices, or simple reinforcement-learning behavior. Although some studies have considered the performance of different mechanisms for models of bounded-rational agents [49,8,48], there are currently no theories for the design of optimal mechanisms with respect to an explicit model of bounded-rational agent behavior. It is an intriguing challenge, to develop an analytically tractable, yet meaningful, model of agent bounded-rationality.
The mathematics of collective intelligence provides some useful insights into the challenging issues that agent bounded-rationality introduced into mechanism design. In particular, the methods of collective intelligence suggest a third approach, in which mechanisms are designed to explicitly support convergence towards equilibrium outcomes by simple adaptive agents. We pick up on this in more detail in Section 1.4.

Dynamic Considerations
Another place where traditional MD breaks down is in its assumption that all agents are present in a system, and able to reveal their type information simultaneously within the mechanism. This assumption is often unreasonable in dynamic systems, for example in open network environments such as the Internet, where multi-period and asynchronous interactions across anonymous and shifting agent populations is a more reasonable model [17].
It is interesting to consider the online mechanism design problem, in which agents are continually entering and leaving a system and the goal is to make both truth-revelation of valuations and truthful announcements of arrival an equilibrium of the system (see Friedman & Parkes [18]). Online MD is interesting, for example, in the context of ad hoc network formation across peer-to-peer WiFi networks in which services maintain robust overlay networks as nodes enter, leave, and fail.
Once an agent's arrival and departure time is introduced into its type, the revelation principle continues to apply and reduces the problem to the space of mechanisms in which agents truthfully announce their type, and therefore also their arrival time, in equilibrium. However, standard approaches, such as the family of Groves mechanisms, only provide strategyproof solutions in combination with optimal online algorithms that are able to make optimal sequential allocation decisions as agents arrive and announce their type. Relaxing to Bayesian-optimal online algorithms corresponds to Bayesian-Nash incentive-compatible mechanisms [18]. A simpler special case of the online MD problem occurs when it is reasonable to assume truthful arrivals, which can be justified if agents are myopic and react to the current state of the system as soon as they arrive. The online/truthful problem remains interesting because the sequential-decision aspect of the problem is retained.
This problem of online mechanism design with myopic agents is closely related to the problem of endogenous network and group formation in economics (e.g. [11,26,10,28]). In this work, agents arrive into a system and choose an action to maximize their myopic utility. 3 The focus is on identifying which network structures can emerge from a sequence of myopic decisions by self-interested agents. A typical setting is one of network formation, in which agents choose how to add a new link into a network. Recently, there has also been some attention to a stochastic model of network formation, in which agents arrive by some random arrival process and then take a myopic decision (e.g. [57]). Dutta & Jackson [11] identify the mechanism design approach as an important future direction for the study of network formation, where payoff division rules are imposed by a designer in order to promote the emergence of useful networks.
Finally, there is an interesting comparison to be drawn between the online mechanism design problem and the recent literature on utility-based models to explain the existence and generation of complex networks with particular statistical properties (see e.g. [1]). On one hand, one can view the structure and statistics of complex networks from the perspective of solving a constrained global optimization problem [5]. On the other hand, recent models have provided local optimization-based models to explain the existence of complex networks [12]. It is interesting to ask whether there is an opportunity to leverage mechanism design in the positioning of appropriate local incentives to align local agent decisions with optimal system-wide performance, in order to encourage the emergence of complex networks with desirable topologies and statistical properties.

Collective Intelligence
In this section we review the mathematical theory of collectives (see Chapter ??Wolpert?? for an introduction to collective intelligence). We provide a simple treatment that will enable a useful comparison to be drawn with the mathematics of economic mechanism design. Although there are a number of important differences between collectives and the traditional setting of mechanism design, it will nevertheless be useful to draw parallels between mechanism design and collective intelligence.
The main distinction between collective intelligence (COIN) and MD is that the approach of collective intelligence is fundamentally an indirect paradigm. Rather than focusing on direct-revelation mechanisms in which agents reveal private information to the mechanism, that implements a particular outcome, the COIN approach implements an optimal system-wide outcome through the direct actions of agents.
Another distinction between COIN and MD is that in COIN it is assumed to be possible to directly set the utility functions of agents in order to guide their choices. In comparison, MD is more constrained because a designer can only indirectly influence agent payoffs via her choice of the payment and outcome rules.
Additional distinctions between COIN and MD include: No private information. The center in COIN is assumed to have enough information about local agent preferences to compute the system-wide value for a particular state.
Limited rationality. The theory of COIN allows agents to have limitedrationality, including agents that follow simple best-response dynamics in response to an evolving multiagent system.
Finally, COIN has typically been implemented in multiple period problems, in which agents are able to adjust their strategies across time. In comparison, MD has traditionally been applied to static environments. 4 The ability to adjust payoffs directly, coupled with the absence of private information, would render the COIN problem trivial from a MD perspective, because there are no incentive issues to solve. However, COIN focuses instead on computational goals, such as providing payoffs to enable a system of boundedrational agents to quickly adjust to good system-wide outcomes. This focus in COIN nicely complements the obsession with incentives, but relative complacency about computational issues, within MD.

Collective Intelligence: Preliminaries
We define in this section a standard and simple COIN model. To maintain notational consistency with MD, our notation will necessarily depart from the notation in Chapter ??Wolpert??.
Let r d denote the time period, and The type in COIN is used analogously to the type in MD, and is intended to capture any information that is relevant to the behavior and preferences of an agent. Since agents in COIN may be bounded-rational, and not necessarily game-theoretic, the type in COIN captures information that relates to an agent's computational and belief state, in addition to its preferences. Furthermore, the type is indexed with time period, , and allowed to change across periods to reflect changing preferences and changing computational state. Finally, in the standard COIN model the type can be directly controlled by the system designer.
An agent follows a strategy,

# a G
, which captures information about the actions taken by the agent in period . Taken together, the strategy and the type of an agent, defines its state, which is denoted

P G ¢
, is used to denote the restriction to states up to and including period . The designer's goal is to maximize the system-wide utility across all states, denoted ¢ . Shorthand, ¢ , is used to denote the restriction to the social value for time periods up to and including . The system-wide utility can depend on both the strategies and types of agents, and need not be the sum over the individual agent utilities, but can be quite general.
The central solution concept in COIN is that of a factored system. A system is factored if the individual utility of every agent is always aligned with the systemwide utility:  The factoredness of a system is a stronger concept than the game-theoretic implementation concept in indirect MD. The main difference is that every agent should always want to select its own strategy to maximize the social value, ¢ , whatever the state of the system. In comparison, an incentive-compatible mechanism that implements a particular social-choice function must only ensure that an agent's incentives are aligned with the social good when other agents also play an equilibrium strategy. In this aspect, the factoredness solution concept of COIN is reminiscent of the strategyproof solution concept of MD (although applied here to indirect mechanisms), in that it does not require on-equilibrium play of other agents.
We can readily verify that factoredness is sufficient to implement the choice that maximizes the system-wide utility, ¢ , in a dominant-strategy equilibrium. Let f denote the optimal social choice. Consider agent However, we observe that full factoredness is a strong property, and is not necessary for a system of agents to maximize the value of in equilibrium.
As an example, suppose that each agent selects an action is a Gaussian function with mean 7 ¤ £ and variance 1. Now, suppose that the system is factored for agent 1, with otherwise for some small,¨U ª X . Despite this failure of factoredness, the system is factored in the neighborhood of the optimal system-wide outcome of , and this is a stable outcome.
Factoredness is a useful property for a dynamic system because it implies that a rational agent will always play to maximize the social-value of the choice whatever the actions of other agents, and therefore gives useful robustness and stability properties in environments with bounded-rational, or faulty, agents.

Informative Local Utilities
The key challenge in COIN is to select factored local utilities to promote good convergence to a desired state. COIN agents are modeled as simple boundedrational agents, rather than as traditional game-theoretic agents. In particular, the standard COIN model assumes that agents play a myopic best-response strategy as the state of a system evolves, given their current beliefs about payoffs for different strategies. Given a particular system-wide utility, ¢ , the designer in COIN retains some flexibility in her choice of factored utility functions for individual agents. Specifically, the designer can choose the functions G and c oG in (FAC). The theory of collective intelligence demonstrates that utility functions that are better able to isolate the effect of an individual agent's strategy on the system-wide utility are more informative for simple learning agents, and improve the rate of convergence to optimal system-wide agent strategies.
An example of a factoring choice with poor information properties is to set-up a team game [40], with P ¬ } G¢ ¡ ¢ This is clearly factored, because every agent's utility is exactly that of the overall system. However, team game utilities are not very informative because the marginal effect of an agent's own action on the system utility is likely to be masked by the effect of the actions of the other agents.
In developing a factored utility, it is important that the function, . The independent states are denoted b° g ® ¥ ¢ B , with shorthand oG . Another idea for a factored local utility is presented by the aristocratic utility (AU), in which: The expectation is defined with respect to current beliefs over the probability of different states in the effect set of an agent. 5 This aristocratic utility is maximallyinformative, with respect to a reasonable definition of the information provided by a reinforcement signal [58].
In the special case of a repeated single-stage game the effect-set of agent B i s simply its own state, and the aristocratic utility simplifies to , is dropped, set ¤G denotes the space of legal actions for agent B , and the expectation is taken with respect to the distribution of actions played by agent B . Another idea for a factored local utility, is presented by the wonderful life utility (WLU). This can be a useful approximation to AU, in particular when AU is itself difficult to compute. In WLU the states in the effect set of agent B a re replaced with a single "clamped" set of states: , is the clamping factor. This clamping defines a fixed strategy for every agent in the effect set of agent B a cross all periods. 5 It is important to notice in this construction that the dynamics of the combined state do not need to be consistent with the dynamical laws of the system. They are purely used as a counterfactual operator [54].
In the special case of an iterated single-stage game the wonderful life utility simplifies to: , is dropped, and À ÁG C ¤G is some fixed strategy for agent B . This is factored for any choice of clamping factor.

Back to VCG Mechanisms
Interestingly, the WLU factored local utility function brings us full circle, back to the Vickrey-Clarke-Groves mechanism that was introduced in Section 1.2 as a canonical example of a strategyproof solution in economic mechanism design. Consider the special-case of a WLU with clamping factor, À ÁG¡ Ç AE , to represent a null move by agent B . In this case an agent's utility for some strategy is equal to the marginal contribution of its own strategy to the system-wide utility. Thus, WLU with a null clamping factor, implements the same equilibrium outcome as the VCG mechanism in the special case of a revelation game in which agent strategies define the revelation of type information and the system-wide utility is the sum of individual agent utilities.
Looking at COIN, both theoretical and experimental results suggest that it is better to use a WLU with a clamping factor that is equal to an agent's expected move. This provides a mean-field approximation to the AU, which itself has provably optimal informative properties. This suggests that the informativeness of the information provided by payments in an economic mechanism will be of interest in environments in which simple, but self-interested agents, are learning optimal strategies through best-response dynamics, and should be considered in selecting the appropriate mechanism from the class of Groves mechanisms. We pick up this theme in Section 1.4, where we introduce a COIN-inspired approach to MD for repeated games with simple adaptive agents.

Example: A Congestion Game
In this section, we present an example of COIN in the setting of a congestion game and review experimental results from Wolpert & Tumer [58]. Consider a simple variation on the El Farol bar problem [2]. Each player is deciding which night in the week to attend a bar, and if too few people attend the bar is boring but if too many people attend the bar is too crowded. This is a simple coordination game, in which the system-wide goal is to spread the attendance evenly across the nights.
Formally, there are È players and 7 nights, and every player wants to attend for the same number of nights, is the total attendance on night . The congestion function, , and attendance¨, captures the idea that the bar should not be too empty or too crowded on any single night. The function is maximized for an attendance of exactly Ö . The optimal system-wide outcome has players making a coordinated decision, with the same number of players attending on each night.
Notice that this problem has characteristics that are consistent with the COIN approach, but different from the MD approach. First, the system-wide utility function is known to the mechanism. Second, the protocol is an indirect mechanism in which the outcome is determined directly by the strategies of the players.
Experimental results [58] compare the performance of different WLUs, for different clamping factors. This is an iterated single-stage game, so the effect set for player B i s simply its strategy in period a nd the clamping factor defines a static strategy.
The mean-field approximation clamping factor, À Á " G ¡ p ß É T Ø 4 Ù (i.e. with each player assuming that the nights are well-balanced) is shown to outperform the other WLUs, with the p clamp outperforming the p clamp except when the p clamp is a good approximation to the mean-field clamp. This can occur as congestion increases and the number of nights that a player wants to attend approaches 6. The experimental results also demonstrate that all difference methods, with , are more effective than team game methods, with P G¢ g ¡ ¢ .

Learnable Mechanism Design for Episodic Systems
One approach to the problem of mechanism design with bounded-rational agents is to consider agents with simple learning algorithms, that can adjust towards Nash equilibrium strategies [51,17]. Indeed, there is an established literature that considers the ability of agents to learn to play equilibria in games [19,20,31,53]. The emphasis is on simple learners that can adjust, for example through myopic bestresponse, towards equilibrium strategies. A useful learnable mechanism would provide information, for example via price signals, to maximize the effectiveness with which individual agents can learn equilibrium strategies. Given the focus on methods to address agents with limited-rationality in COIN, it is interesting to reinterpret the methods of COIN in terms of designing learnable mechanisms for episodic systems, such as iterated single-stage games. The methods of COIN suggest a new criteria for selecting a mechanism from within the Groves family of mechanisms, namely to provide informative signals to agents that are using simple learning methods to adjust towards equilibrium play.

The VCG-WLU Mechanism
Consider the following multi-period MD problem. Each period, , is a single-stage game, that is repeated across periods. There is a set of choices, , to implement in each period, and the goal is to implement the efficient choice, f , which maxi- . Given this choice rule, the system-wide utility function from COIN is Of course, in MD, there is uncertainty about this system-wide utility because types,

2G
, are private to agents. Consider a Groves mechanism, with payment w Q y to agent B g iven reported types 2 , and some arbitrary function, " oG , on the reported types of all agents except B . These payments, combined with outcome rule, q T , induce individual utility functions on agents: These individual agent utility functions are not always factored. However, they are factored when other agents, â ã ¡ B , follow equilibrium strategies and 2oG ¡ Î 2oG . This follows from the incentive-compatibility of Groves mechanisms: Although the Groves payments do not make the agent utilities factored out of equilibrium, they are still sufficient to provide convergence towards equilibrium. For any strategy, 2 q oG , from agents â q ã ¡ B , the best-response, In the following sections we present experimental results to compare the effectiveness of the VCG-WLU mechanism with the VCG mechanism in an auction problem with simple adaptive agents, and also for a variant of the congestion game that was introduced in Section 1.3.4.

Example: Auction Game
Consider a simple allocation problem, in which there is a single item to allocate È agents each with type,

2G
, uniformly distributed between 0 and 1. The type of an agent defines its value for the item. Let¨G The social choice function is to maximize the allocative-efficiency of the outcome, for feasible allocation¨. A Groves mechanism for this single-item allocation problem first asks each agent to report its type, and then implements the outcomë is the average type, in this example equal to 0.5. Putting this together, and assuming that agent 1 announces the highest type and agent 2 announces the second-highest type, then the payment rules simplify to: In our experiments we adopt the approach in Wolpert & Tumer [58], and use a simple reinforcement learning algorithm for individual agents. Each agent, In all experiments the initial temperature, , is equated to the average utility during the initial training period, and the decay rate, , for the learning-rate 6 The use of this weighted average over an exponential decay function reflects the fact that the environment is non-stationary. is set to fix the learning-rate at the end of the final period to 0.001. We experimentally optimized the choice of the initial learning-rate, ¦ ) , and the temperature decay rate, , for each different choice of design with a logarithmic search in parameter space to select parameters that maximized the average efficiency and minimized the average distance to the equilibrium strategy, across all periods.
Experimental Results.
In the first set of experiments we considered an auction with 5 agents, and set the initial training period to 200 periods and the learning period to 2000 periods. All results are averaged over 40 runs. In the second set of experiments we considered an auction with 3 agents, having noticed that the payments in VCG approximate those in VCG-WLU, and are different only when the average value of an agent is greater than the second-highest reported value. For these experiments the initial training period was 200, and the learning period was 1000. As with 5 agents, the results are averaged over 40 runs. 7 Figures 1.1 and 1.2 compare the performance of the mechanisms in the 5 agent and 3 agent settings. First, we plot the convergence of agent strategy to the equilibrium strategy, which is truth-revelation in all auctions. Given maximum value. In both cases, we find it convenient to plot the moving-averages (with a window size of 100). This smoothes out random fluctuations from periodto-period due to Boltzmann learning. Also, the first 200 periods represent the initialization period, in which agents select random strategies.
As expected, the performance of the VCG and VCG-WLU auctions dominates that of the TG auction with simple learning agents. Moreover, the VCG-WLU method appears to slightly outperform the VCG method during the early learning periods with 5 agents, both in terms of the accuracy of agent strategies and the average efficiency. In addition, the effect of VCG-WLU is more striking with 3 agents, with the COIN-inspired VCG payments providing a clear performance advantage over the regular VCG payments.

Example: Congestion Game
We illustrate the VCG-WLU mechanism on a direct-revelation variation of the congestion game. Consider a simple variation on Arthur's El Farol bar problem. There are È players, one bar, and one night. The problem is interesting because the type of each player defines its tolerance for congestion, and is private to each player. Moreover, it is not certain which players will attend the bar.
On any night, the problem is to decide which players attend the bar. Let The problem is to strike a compromise between the different preferences of players for the level of crowdiness in the bar. For example, if enough players prefer a crowded bar than it can be beneficial from the system-wide perspective to make players attend even if that makes it more crowded than desirable for other players. The difference from the earlier congestion game (in Section 1.3.4) is that the mechanism itself implements a particular attendance profile, and must elicit information about player types to implement an optimal solution. The Groves mechanism for this problem first asks players to report their type, where, as before, oG is some arbitrary function on the reported types of the other players. We again consider the particular variations that implement the VCG, VCG-WLU, and TG payments. The VCG payments require that the mechanism computes an alternative solution without each player in attendance, while the VCG-WLU payments require that the mechanism computes an alternative solution with the type of each player replaced, in turn, with the average player type. and maintains an estimate of the utility for each strategy. This induces a Boltzmann distribution to define a probability with which the player selects a particular strategy. Utility estimates are again initially set to an uninformative prior, during an initial training period in which all players follow random strategies. We experimentally optimized the choice of initial learning-rate and temperature decay rate for each different choice of design. 8 In the 8-player variation, we set the initial training period to 200 and the learning period to 1000. In the 4-player variation, we set the initial training period to 8   100 and the learning period to 500. The 8-player results are averaged over 20 runs and the 4-player results are averaged over 60 runs. Again, we were interested to compare the 8-player and the 4-player variations, because we expected the VCG-WLU to VCG comparison to be more noticeable with less players. Figures 1.3 and 1.4 compare the performance of the mechanisms in the 8 player and 4 player settings. We plot the mean absolute error between the player strategies and truthful strategies in each period, and the efficiency of the outcome in each period, which in this problem is measured as the ratio between the total value of the implemented outcome to the total value of the optimal outcome. We plot the moving averages (window size of 50), to smooth out the random fluctuations due to Boltzmann learning.
Just as in the auction example, the performance of the TG mechanism with simple learning agents, in terms of both the speed of convergence towards an equilibrium strategy and the overall efficiency across periods, is dominated by both the VCG and the VCG-WLU mechanisms. Most striking in this congestion game example, is that the performance of VCG-WLU mechanism itself dominates that of the VCG mechanism, both for the 8 player and the 4 player variations. This provides some experimental justification for a collective-intelligence inspired approach to mechanism design in the presence of bounded-rational agents.

Summary: Towards Learnable Mechanism Design
The integration of methods from COIN into methods in mechanism design can be viewed as a first step towards learnable mechanism design. Learnable mechanism design is a natural direction to take mechanism design in complex decentralized settings. Classic mechanism design formulates an explicit normative model of the equilibrium behavior of an agent, and selects mechanism rules that are optimal with respect to that model. In particular, the Myerson program formulates the problem as a constrained optimization problem, in which one selects an outcome rule that maximizes a set of desiderata subject to incentive-compatibility constraints. In contrast, the idea presented in learnable mechanism design is to design a mechanism that is optimal with respect to a behavioral model of boundedrational agents, and in particular to worry about the performance along the path towards equilibrium as well as in equilibrium itself.
The methods in this chapter assume simple Boltzmann learners, and adopt the idea of informative utilities from COIN to select an instance of the Groves family of mechanisms in which payments to agents are especially informative in the feedback they provide about the effect of an agent's choice of strategy on her individual utility. But, this opens up many interesting questions. In particular, there has been an explosion of research into algorithms to compute Nash equilibrium (and special-classes of equilibrium such as correlated equilibrium) in gametheoretic settings (e.g. [32,55,29]), and also to identify tractable special-cases of the equilibrium-computation problem (e.g. [36]). Many of the algorithms have a best-response/learning flavor (e.g. [16,14,15,39,31]). A very natural question arises: can we design mechanisms that induce games that have computable equilibrium, or equilibrium that are readily computed by simple learning agents?
In addition to identifying classes of mechanisms that induce game-theoretic situations with good computational properties, we can also consider whether there is a role for automated mechanism design in which the rules of mechanisms are automatically adjusted online to provide robustness against unmodeled properties of a real system, such as those due to the limited-rationality of participants.