Payment rules through discriminant-based classifiers

In mechanism design it is typical to impose incentive compatibility and then derive an optimal mechanism subject to this constraint. By replacing the incentive compatibility requirement with the goal of minimizing expected ex post regret, we are able to adapt statistical machine learning techniques to the design of payment rules. This computational approach to mechanism design is applicable to domains with multi-dimensional types and situations where computational efficiency is a concern. Specifically, given an outcome rule and access to a type distribution, we train a support vector machine with a special discriminant function structure such that it implicitly establishes a payment rule with desirable incentive properties. We discuss applications to a multi-minded combinatorial auction with a greedy winner-determination algorithm and to an assignment problem with egalitarian outcome rule. Experimental results demonstrate both that the construction produces payment rules with low ex post regret, and that penalizing classification errors is effective in preventing failures of ex post individual rationality.


INTRODUCTION
Mechanism design studies situations where a set of agents each hold private information about their preferences over different outcomes. The designer chooses a center that receives claims about such preferences, selects and enforces an outcome, and optionally collects payments. The classical approach is to impose incentive compatibility, ensuring that agents truthfully report their preferences in strategic equilibrium. Subject to this constraint, the goal is to identify a mechanism, i.e., a way of choosing an outcome and payments based on agents' reports, that optimizes a given design objective like social welfare, revenue, or some notion of fairness.
There are, however, significant challenges associated with this classical approach. First of all, it can be analytically cumbersome to derive optimal mechanisms for domains that are "multi-dimensional" in the sense that each agent's private information is described through more than a single number, and few results are known in this case. 1 Second, incentive compatibility can be costly, in that adopting it as a hard con-straint can preclude mechanisms with useful economic properties. For example, imposing the strongest form of incentive compatibility, truthfulness in a dominant strategy equilibrium or strategyproofness, necessarily leads to poor revenue, vulnerability to collusion, and vulnerability to false-name bidding in combinatorial auctions where valuations exhibit complementarities among items [Ausubel and Milgrom 2006;Rastegari et al. 2011]. A third difficulty occurs when the optimal mechanism has an outcome or payment rule that is computationally intractable.
In the face of these difficulties, we adopt statistical machine learning to automatically infer mechanisms with good incentive properties. Rather than imposing incentive compatibility as a hard constraint, we start from a given outcome rule and use machine learning techniques to identify a payment rule that minimizes agents' expected ex post regret relative to this outcome rule. Here, the ex post regret an agent has for truthful reporting in a given instance is the amount by which its utility could be increased through a misreport. While a mechanism with zero ex post regret for all inputs is obviously strategyproof, we are not aware of any additional direct implication in terms of equilibrium properties. 2 Support for expected ex post regret as a quantifiable target for mechanism design rather comes from a simple model of manipulation where agents face a certain cost for strategic behavior. If this cost is higher than the expected gain, agents can be assumed to behave truthfully. We do insist on mechanisms in which the price to an agent, conditioned on an outcome, is independent of its report. This provides additional robustness against manipulation in the sense that there is no local price sensitivity. 3 Our approach is applicable to domains that are multi-dimensional or for which the computational efficiency of outcome rules is a concern. Given the implied relaxation of incentive compatibility, the intended application is to domains in which incentive compatibility is unavailable or undesirable for outcome rules that meet certain economic and computational desiderata. The payment rule is learned on the basis of a given outcome rule, and as such the framework is most meaningful in domains where revenue considerations are secondary to outcome considerations.
The essential insight is that the payment rule of a strategyproof mechanism can be thought of as a classifier for predicting the outcome: the payment rule implies a price to each agent for each outcome, and the selected outcome must be one that simultaneously maximizes reported value minus price for every agent. By limiting classifiers to discriminant functions 4 with this "value-minus-price" structure, where the price can be an arbitrary function of the outcome and the reports of other agents, we obtain a remarkably direct connection between multi-class classification and mechanism design. 2 The expected ex post regret given a distribution over types provides an upper bound on the expected regret of an agent who knows its own type but has only distributional information on the types of other agents. The latter metric is also appealing, but does not seem to fit well with the generalization error of statistical machine learning. An emerging literature is developing various regret-based metrics for quantifying the incentive properties of mechanisms [Parkes et al. 2001;Day and Milgrom 2008;Lubin 2010;Carroll 2011], and there also exists experimental support for a quantifiable measure of the divergence between the distribution on payoffs in a mechanism and that in a strategyproof reference mechanism like the VCG mechanism [Lubin and Parkes 2009]. An earlier literature had looked for approximate incentive compatibility or incentive compatibility in the large-market limit, see, e.g., the recent survey by Carroll [2011]. Related to the general theme of relaxing incentive compatibility is work of Pathak and Sönmez [2010] that provides a qualitative ranking of different mechanisms in terms of the number of manipulable instances, and work of Budish [2010] that introduces an asymptotic, binary, design criterion regarding incentive properties in a large replica economy limit. Whereas the present work is constructive, the latter seek to explain which mechanisms are adopted in practice. 3 Erdil and Klemperer [2010] consider a metric that emphasizes this property. 4 A discriminant function can be thought of as a way to distinguish between different outcomes for the purpose of making a prediction.
For an appropriate loss function, the discriminant function of a classifier that minimizes generalization error over a hypothesis class has a corresponding payment rule that minimizes expected ex post regret among all payment rules corresponding to classifiers in this class. Conveniently, an appropriate method exists for multi-class classification with large outcome spaces that supports the specific structure of the discriminant function, namely the method of structural support vector machines [Tsochantaridis et al. 2005;Joachims et al. 2009]. Just like standard support vector machines, it allows us to adopt non-linear kernels, thus enabling price functions that depend in a non-linear way on the outcome and on the reported types of other agents.
In illustrating the framework, we focus on two situations where strategyproof payment rules are not available: a greedy outcome rule for a multi-minded combinatorial auction in which each agent is interested in a constant number of bundles, and an assignment problem with an egalitarian outcome rule, i.e., an outcome rule that maximizes the minimum value of any agent. The experimental results we obtain are encouraging, in that they demonstrate low expected ex post regret even when the 0/1 classification accuracy is only moderately good, and in particular better regret properties than those obtained through simple VCG-based payment rules that we adopt as a baseline. In addition, we give special consideration to the failure of ex post individual rationality, and introduce methods to bias the classifier to avoid these kinds of errors as well as post hoc adjustments that eliminate them. As far as scalability is concerned, we emphasize that the computational cost associated with our approach occurs offline during training. The learned payment rules have a succinct description and can be evaluated quickly in a deployed mechanism.
Due to space constraints, we omit some extend examples, proofs, and tables, and refer the reader to the full version of the paper for details. Conitzer and Sandholm [2002] introduced the agenda of automated mechanism design (AMD), which formulates mechanism design as an optimization problem. The output is the description of a mechanism, i.e., an explicit mapping from types to outcomes and payments. AMD is intractable in general, as the type space can be exponential in both the number of agents and the number of items, but progress has recently been made in finding approximate solutions for domains with additive value structure and symmetry assumptions, and adopting Bayes-Nash incentive compatibility (BIC) as the goal [Cai et al. 2012]. Another approach is to search through a parameterized space of incentive-compatible mechanisms [Guo and Conitzer 2010].

Related Work
A parallel literature allows outcome rules to be represented by algorithms, like our work, and thus extends to richer domains. Lavi and Swamy [2005] employ LP relaxation to obtain mechanisms satisfying BIC for set-packing problems, achieving worst-case approximation guarantees for combinatorial auctions. Hartline and Lucier [2010] and Hartline et al. [2011] propose a general approach, applicable to both singleparameter and multi-parameter domains, for converting any approximation algorithm into a mechanism satisfying BIC that has essentially the same approximation factor with respect to social welfare. This approach differs from ours in that it adopts BIC as a target rather than the minimization of expected ex post regret. In addition, it evaluates the outcome rule on a number of randomly perturbed replicas of the instance that is polynomial in the size of a discrete type space, which is infeasible for combinatorial auctions where this size is exponential in the number of items. The computational requirements of our trained rule are equivalent to that of the original outcome rule. Lahaie [2009Lahaie [ , 2010 also adopts a kernel-based approach for combinatorial auctions, but focuses not on learning a payment rule for a given outcome rule but rather on solving the winner determination and pricing problem for a given instance of a combi-natorial auction. Lahaie introduces the use of kernel methods to compactly represent non-linear price functions, which is also present in our work, but obtains incentive properties more indirectly through a connection between regularization and price sensitivity.

PRELIMINARIES
A mechanism design problem is given by a set N = {1, 2, . . . , n} of agents that interact to select an element from a set Ω ⊆ i∈N Ω i of outcomes, where Ω i denotes the set of possible outcomes for agent i ∈ N . Agent i ∈ N is associated with a type θ i from a set Θ i of possible types, corresponding to the private information available to this agent. We write θ = (θ 1 , . . . , θ n ) for a profile of types for the different agents, Θ = i∈N Θ i for the set of possible type profiles, and θ −i ∈ Θ −i for a profile of types for all agents but i. Each agent i ∈ N is further assumed to employ preferences over Ω i , represented by a valuation function v i : Θ i × Ω i → R. We assume that for all i ∈ N and θ i ∈ Θ i there exists an outcome o ∈ Ω with v i (θ i , o i ) = 0.
A (direct) mechanism is a pair (g, p) of an outcome rule g : Θ → i∈N Ω i and a payment rule p : Θ → R n ≥0 . The intuition is that the agents reveal to the mechanism a type profile θ ∈ Θ, possibly different from their true types, and the mechanism chooses outcome g(θ) and charges each agent i a payment of p i (θ) = (p(θ)) i . We assume quasilinear preferences, so the utility of agent i with type θ i ∈ Θ i given a profile θ ′ ∈ Θ of revealed types is u where g i (θ) = (g(θ) i ) denotes the outcome for agent i. A crucial property of mechanism (g, p) is that its outcome rule is feasible, i.e., that g(θ) ∈ Ω for all θ ∈ Θ.
Outcome rule g satisfies consumer sovereignty if for all is dominant strategy incentive compatible, or strategyproof, if each agent maximizes its utility by reporting its true type, irrespective of the reports of the other agents, i.e., if for all i ∈ N , θ i ∈ Θ i , and it satisfies individual rationality (IR) if agents reporting their true types are guaranteed non-negative utility, i.e., if for all i ∈ N , Observe that given reachability of the null outcome, strategyproofness implies individual rationality.
It is known that a mechanism (g, p) is strategyproof if and only if the payment of an agent is independent of its reported type and the chosen outcome simultaneously maximizes the utility of all agents, i.e., if for every θ ∈ Θ, for a price function t i : Θ −i × Ω i → R. This simple characterization is crucial for the main results in the present paper, providing the basis with which the discriminant function of a classifier can be used to induce a payment rule. In addition, a direct characterization of strategyproofness in terms of monotonicity properties of outcome rules explains which outcome rules can be associated with a payment rule in order to be "implementable" within a strategyproof mechanism [Saks and Yu 2005;Ashlagi et al. 2010]. These monotonicity properties provide a fundamental constraint on when our machine learning framework can hope to identify a payment rule that provides full strategyproofness.
We quantify the degree of strategyproofness of a mechanism in terms of the regret experienced by an agent when revealing its true type, i.e., the potential gain in utility by revealing a different type instead. Formally, the ex post regret of agent i ∈ N in mechanism (g, p), given true type θ i ∈ Θ i and reported types θ ′ −i ∈ Θ −i of the other agents, is Analogously, the ex post violation of individual rationality of agent i ∈ N in mechanism (g, p), given true type θ i ∈ Θ i and reported types θ We consider situations where types are drawn from a distribution with probability density function D : Θ → R such that D(θ) ≥ 0 and θ∈Θ D(θ) = 1. Given such a distribution, and assuming that all agents report their true types, the expected ex post Outcome rule g is agent symmetric if for every permutation π of N and all types Given agent symmetry, a price function t 1 : Θ −1 × Ω i → R for agent 1 can be used to generate the payment rule p for a mechanism (g, p), with so that the expected ex post regret is the same for every agent.
We assume agent symmetry in the sequel, which precludes outcome rules that break ties based on agent identity, but obviates the need to train a separate classifier for each agent while also providing some benefits in terms of presentation. Because ties occur only with negligible probability in our experimental framework, the experimental results are not affected by this assumption.

PAYMENT RULES FROM MULTI-CLASS CLASSIFIERS
A multi-class classifier is a function h : X → Y , where X is an input domain and Y is a discrete output domain. One could imagine, for example, a multi-class classifier that labels a given image as that of a dog, a cat, or some other animal. In the context of mechanism design, we will be interested in classifiers that take as input a type profile and output an outcome. What distinguishes this from an outcome rule is that we will impose restrictions on the form the classifier can take.
Classification typically assumes an underlying target function h * : X → Y , and the goal is to learn a classifier h that minimizes disagreements with h * on a given input distribution D on X, based only on a finite set of training data This may be challenging because the amount of training data is limited, or because h is restricted to some hypothesis class H with a certain simple structure, e.g., linear threshold functions. If We consider classifiers that are defined in terms of a discriminant function f : for all x ∈ X. More specifically, we will be concerned with linear discriminant functions of the form for a weight vector w ∈ R m and a feature map ψ : The function ψ maps input and output into an m-dimensional space, which generally allows non-linear features to be expressed.

Mechanism Design as Classification
Assume that we are given an outcome rule g and access to a distribution D over type profiles, and want to design a corresponding payment rule p that gives the mechanism (g, p) the best possible incentive properties. Assuming agent symmetry, we focus on a partial outcome rule g 1 : Θ → Ω 1 and train a classifier to predict the outcome to agent 1. To train a classifier, we generate examples by drawing a type profile θ ∈ Θ from distribution D and applying outcome rule g to obtain the target class g 1 (θ) ∈ Ω 1 .
We impose a special structure on the hypothesis class. A classifier h w : The first term of f w (θ, o 1 ) only depends on the type of agent 1 and increases in its valuation for outcome o 1 , while the remaining terms ignore θ 1 entirely. This restriction allows us to directly infer agent-independent prices from a trained classifier. For this, define the associated price function of an admissible classifier h w as where we again focus on agent 1 for concreteness. By agent symmetry, we obtain the mechanism (g, p w ) corresponding to classifier h w by letting . Even with admissibility, appropriate choices for the feature map ψ will produce rich families of classifiers, and thus ultimately useful payment rules. Moreover, this form is compatible with structural support vector machines, discussed in Section 4.1.

Example: Single-Item Auction
Before proceeding further, we illustrate the ideas developed so far in the context of a single-item auction. In a single-item auction, the type of each agent is a single number, corresponding to its value for the item being auctioned, and there are two possible allocations from the point of view of agent 1: one where it receives the item, and one where it does not. Formally, Θ = R n and Ω 1 = {0, 1}.
Consider a setting with three agents and a training set (θ 1 , o 1 1 ) = ((1, 3, 5), 0), (θ 2 , o 2 1 ) = ((5, 4, 3), 1), (θ 3 , o 3 1 ) = ((2, 3, 4), 0), and note that this training set is consistent with an optimal outcome rule, i.e., one that assigns the item to an agent with maximum value. Our goal is to learn an admissible classifier that performs well on the training set. Since there are only two possible outcomes, the outcome chosen by h w is simply the one with the larger discriminant. A classifier that is perfect on the training data must therefore satisfy the following constraints: 4), 1). This can for example be achieved by setting w 1 = 1 and Recalling our definition of the price function as , we see that this choice of w and ψ corresponds to the second-price payment rule. We will see in the next section that this relationship is not a coincidence. 6

Perfect Classifiers and Implementable Outcome Rules
We now formally establish a connection between implementable outcome rules and perfect classifiers.
THEOREM 3.1. Let (g, p) be a strategyproof mechanism with an agent symmetric outcome rule g, and let t 1 be the corresponding price function. Then, a perfect admissible classifier h w for partial outcome rule PROOF. By the first characterization of strategyproof mechanisms, g must select an outcome that maximizes the utility of agent 1 at the current prices, i.e., , which uses the price function t 1 as its feature map. Clearly, the corresponding classifier h (1,1) maximizes the same quantity as g 1 , and the two must agree if there is a unique maximizer.
The relationship also works in the opposite direction: a perfect, admissible classifier h w for outcome rule g can be used to construct a payment rule that turns g into a strategyproof mechanism. THEOREM 3.2. Let g be an agent symmetric outcome rule, h w : Θ → Ω 1 an admissible classifier, and p w the payment rule corresponding to h w . If h w is a perfect classifier for the partial outcome rule g 1 , then the mechanism (g, p w ) is strategyproof.
We prove this result by expressing the regret of an agent in mechanism (g, p w ) in terms of the discriminant function f w . Let Ω i (θ −i ) ⊆ Ω i denote the set of partial outcomes for agent i that can be obtained under g given reported types θ −i from all agents but i, keeping the dependence on g silent for notational simplicity. LEMMA 3.3. Suppose that agent 1 has type θ 1 and that the other agents report types θ −1 . Then the regret of agent 1 for bidding truthfully in mechanism (g, p w ) is g 1 (θ)). By Lemma 3.3, the regret of agent 1 for bidding truthfully in mechanism (g, p w ) is always zero, which means that the mechanism is strategyproof.
It bears emphasis that classifier h w is only used to derive the payment rule p w , while the outcome is still selected according to g. In principle, classifier h w could be used to obtain an agent symmetric outcome rule g w and, since h w is a perfect classifier for itself, a strategyproof mechanism (g w , p w ). Unfortunately, outcome rule g w is not in general feasible. Mechanism (g, p w ), on the other hand, is not strategyproof when h w fails to be a perfect classifier for g. While payment rule p w always satisfies the agent-independence property (1) required for strategyproofness, the "optimization" property (2) might be violated when h w (θ) = g 1 (θ).

Approximate Classification and Approximate Strategyproofness
A perfect admissible classifier for outcome rule g leads to a payment rule that turns g into a strategyproof mechanism. We now show that this result extends gracefully to situations where no such payment rule is available, by relating the expected ex post regret of a mechanism (g, p) to a measure of the generalization error of a classifier for g.
Fix a feature map ψ, and denote by H ψ the space of all admissible classifiers with this feature map. The discriminant loss of a classifier h w ∈ H ψ with respect to a type profile θ and an outcome o 1 ∈ Ω 1 is given by Intuitively the discriminant loss measures how far, in terms of the normalized discriminant, h w is from predicting the correct outcome for type profile θ, assuming the correct outcome is o 1 . Note that ∆(o 1 , θ) ≥ 0 for all o 1 ∈ Ω 1 and θ ∈ Θ, and ∆ for all o 1 ∈ Ω 1 : even if two classifiers predict the same outcome, one of them may still be closer to predicting the correct outcome o 1 . The generalization error of classifier h w ∈ H ψ with respect to a type distribution D and a partial outcome rule g 1 : Θ → Ω 1 is then given by The following result establishes a connection between the generalization error and the expected ex post regret of the corresponding mechanism.
THEOREM 3.4. Consider an outcome rule g, a space H ψ of admissible classifiers, and a type distribution D. Let h w * ∈ H ψ be a classifier that minimizes generalization error with respect to D and g among all classifiers in H ψ . Then the following holds: (1) If g satisfies consumer sovereignty, then (g, p w * ) minimizes expected ex post regret with respect to D among all mechanisms (g, p w ) corresponding to classifiers h w ∈ H ψ . (2) Otherwise, (g, p w * ) minimizes an upper bound on expected ex post regret with respect to D amongst all mechanisms (g, p w ) corresponding to classifiers h w ∈ H ψ .
PROOF. For the second property, observe that where the last equality holds by Lemma 3.3. If g satisfies consumer sovereignty, then the inequality holds with equality, and the first property follows as well.
Minimization of expected regret itself, rather than an upper bound, can also be achieved if the learner has access to the set Ω 1 (θ −1 ) for every θ −1 ∈ Θ −1 .

A SOLUTION USING STRUCTURAL SUPPORT VECTOR MACHINES
In this section we discuss the method of structural support vector machines (structural SVMs) [Tsochantaridis et al. 2005;Joachims et al. 2009], and show how it can be adapted for the purpose of learning classifiers with admissible discriminant functions.

Structural SVMs
Given an input space X, a discrete output space Y , a target function h * : X → Y , and a set of training examples y 1 ), . . . , (x ℓ , y ℓ )}, structural SVMs learn a multi-class classifier h that on input x ∈ X selects an output y ∈ Y that maximizes f w (x, y) = w T ψ(x, y). For a given feature map ψ, the training problem is to find a vector w for which h w has low generalization error. Given examples {(x 1 , y 1 ), . . . , (x ℓ , y ℓ )}, training is achieved by solving the following convex optimization problem: The goal is to find a weight vector w and slack variables ξ k such that the objective function is minimized while satisfying the constraints. The learned weight vector w parameterizes the discriminant function f w , which in turn defines the classifier h w . The kth constraint states that the value of the discriminant function on (x k , y k ) should exceed the value of the discriminant function on (x k , y) by at least L(y k , y), where L is a loss function that penalizes misclassification, with L(y, y) = 0 and L(y, y ′ ) ≥ 0 for all y, y ′ ∈ Y . We generally use a 0/1 loss function, but consider an alternative in Section 4.2.2 to improve ex post IR properties. Positive values for the slack variables ξ k allow the weight vector to violate some of the constraints. The other term in the objective, the squared norm of w, penalizes scaling of w. This is necessary because scaling of w can arbitrarily increase the margin between f w (x k , y k ) and f w (x k , y) and make the constraints easier to satisfy. Smaller values of w, on the other hand, increases the ability of the learned classifier to generalize by decreasing the propensity to over-fit to the training data. Parameter C is therefore a regularization parameter: larger values of C encourage small ξ k and larger w, such that more points are classified correctly, but with a smaller margin. 4.1.1. The Feature Map and the Kernel Trick. Given a feature map ψ, the feature vector ψ(x, y) for x ∈ X and y ∈ Y provides an alternate representation of the input-output pair (x, y). It is useful to consider feature maps ψ for which ψ(x, y) = φ(χ(x, y)), where χ : X × Y → R s for some s ∈ N is an attribute map that combines x and y into a single attribute vector χ(x, y) compactly representing the pair, and φ : R s → R m for m > s maps the attribute vector to a higher-dimensional space in a non-linear way. In this way, SVMs can achieve non-linear classification in the original space.
While we work hard to keep s small, the so-called kernel trick means that we do not have the same problem with m: it turns out that in the dual of Training Problem 1, ψ(x, y) only appears in an inner product of the form ψ(x, y), ψ(x ′ , y ′ ) , or, for a decomposable feature map, φ(z), φ(z ′ ) where z = χ(x, y) and z ′ = χ(x ′ , y ′ ). For computational tractability it therefore suffices that this inner product can be computed efficiently, and the "trick" is to choose φ such that φ(z), φ(z ′ ) = K(z, z ′ ) for a simple closed-form function K, known as the kernel.
In this paper, we consider polynomial kernels K polyd , parameterized by d ∈ N + , and radial basis function (RBF) kernels K RBF , parameterized by γ = 1/(2σ 2 ) for σ ∈ R + : Both polynomial and RBF kernels use the standard inner product of their arguments, so their efficient computation requires that χ(x, y) · χ(x, y ′ ) can be computed efficiently.

Dealing with an Exponentially Large Output Space.
Training Problem 1 has Ω(|Y |ℓ) constraints, where Y is the output space and ℓ the number of training instances, and enumerating all of them is computationally prohibitive when Y is large. Joachims et al. [2009] address this issue for structural SVMs through constraint generation: starting from an empty set of constraints, this technique iteratively adds a constraint that is maximally violated by the current solution until that violation is below a desired threshold ǫ. Joachims et al. show that this will happen after no more than O( C ǫ ) iterations, each of which requires O(ℓ) time and memory. However, this approach assumes the existence of an efficient separation oracle, which given a weight vector w and an input x finds an output y ∈ arg max y∈Y f w (x, y). The existence of such an oracle remains an open question in application to combinatorial auctions; see Section 5.1.3 for additional discussion.

Required Information.
In summary, the use of structural SVMs requires specification of the following: (1) The input space X, the discrete output space Y , and examples of input-output pairs. (2) An attribute map χ : X × Y → R s . This function generates an attribute vector that combines the input and output data into a single object. (3) A kernel function K(z, z ′ ), typically chosen from a well-known set of candidates, e.g., polynomial or RBF. The kernel implicitly calculates the inner product φ(z), φ(z ′ ) , e.g., between a mapping of the inputs into a high dimensional space. (4) If the space Y is prohibitively large, a routine that allows for efficient separation, i.e., a function that computes arg max y∈Y f w (x, y) for a given w, x.
In addition, the user needs to stipulate particular training parameters, such as the regularization parameter C, and the kernel parameter γ if the RBF kernel is being used.

Structural SVMs for Mechanism Design
We now specialize structural SVMs such that their learned discriminant function will manifest as a payment rule for a given symmetric outcome function g and distribution D. In this application, the input domain X is the space of type profiles Θ, and the output domain Y is the space Ω 1 of outcomes for agent 1. Thus we construct training data by sampling θ ∼ D and applying g to these inputs: {(θ 1 , g 1 (θ 1 )), . . . , (θ ℓ , g 1 (θ ℓ ))} = {(θ 1 , o 1 1 ), . . . , (θ ℓ , o ℓ 1 )}. For admissibility of the learned hypothesis h w (θ) = arg max o1∈Ω1 w T ψ(θ, o 1 ), we require that ψ(θ, o 1 ) = (v 1 (θ 1 , o 1 ), ψ ′ (θ −1 , o 1 )). When learning payment rules, we therefore use an attribute map χ ′ : Θ −1 × Ω 1 → R s rather than χ : Θ × Ω 1 → R s , and the kernel φ ′ we specify will only be applied to the output of χ ′ . This results in the following more specialized training problem: If w 1 > 0 then the weights w together with the feature map ψ ′ define a price function t w (θ −1 , o 1 ) = −(1/w 1 )w T −1 ψ ′ (θ −1 , o 1 ) that can be used to define payments p w (θ), as described in Section 3.1. In this case, we can also relate the regret in the induced mechanism (g, p w ) to the classification error as described in Section 3.3. Specifically, we can show that on any type profile θ k of the training data, rgt 1 (θ k ) ≤ ξ k /w 1 ; see the full version of the paper for details.
We choose not to enforce w 1 > 0 explicitly in Training Problem 2, because adding this constraint leads to a dual problem that references ψ ′ outside of an inner product and thus makes computation of all but linear or low-dimensional polynomial kernels prohibitively expensive. Instead, in our experiments we simply discard hypotheses where the result of training is w 1 ≤ 0. This is sensible since the discriminant function value should increase as an agent's value increases, and negative values of w 1 typically mean that the training parameter C or the kernel parameter γ is poorly chosen. It turns out that w 1 is indeed positive most of the time, and for every experiment a majority of the choices of C and γ yield positive w 1 values. For this reason, we do not expect the requirement that w 1 > 0 to be a problem in practice. 7 4.2.1. Payment Normalization. One issue with the framework as stated is that the payments p w computed from the solution to Training Problem 2 could be negative.
We solved this problem by normalizing payments, using a baseline outcome o b : if there exists an outcome o ′ such that v 1 (θ 1 , o ′ ) = 0 for every θ 1 , this "null outcome" is used as the baseline; otherwise, we use the outcome with the lowest payment. Let t w (θ −1 , o 1 ) be the price function corresponding to the solution w to Training Problem 2. Adopting the baseline outcome, the normalized payments Note that o b is only a function of θ −1 , even when there is no null outcome, so t ′ w is still only a function of θ −1 and o 1 .

Individual Rationality Violation.
Even after normalization, the learned payment rule p w may not satisfy IR. We offer three solutions to this problem, which can be used in combination.
Payment offsets. One way to decrease the rate of IR violation is to add a payment offset, which decreases all payments (for all type reports) by a given amount. We apply this payment offset to all bundles other than o b ; as with payment normalization, the adjusted payment is set to 0 if it is negative. 8 Note that payment offsets decrease IR violation, but may increase regret.
Adjusting the loss function L. We incur an IR violation when there is a null outcome o null such that g 1 (θ) = o null and f w (θ, o null ) > f w (θ, g 1 (θ)) for some type θ, assuming truthful reports. This happens because f w (θ, o 1 ) is a scaled version of the agent's utility for outcome o 1 under payments p w . If the utility for the null outcome is greater than the utility for g 1 (θ), then the payment t w (θ −1 , g 1 (θ)) must be greater than v 1 (θ 1 , g 1 (θ)), causing an IR violation. We can discourage these types of errors by modifying the constraints of Training Problem 2: when o k 1 = o null and o 1 = o null , we can increase L(o k 1 , o 1 ) to heavily penalize misclassifications of this type. With a larger L(o k 1 , o 1 ), a larger ξ k will be required if f w (θ, o k 1 ) < f w (θ, o null ). As with payment offsets, this technique will decrease IR violations but is not guaranteed to eliminate all of them. In our experimental results, we refer to this as the null loss fix, and the null loss refers to the value we choose for L(o k 1 , o null ) where o k 1 = o null . Deallocation. In settings that have a null outcome and are downward closed (i.e., settings where a feasible outcome o remains feasible if o i is replaced with the null outcome), we modify the function g to allocate the null outcome whenever the price function t w creates an IR violation. This reduces ex post regret and in particular ensures ex post IR. On the other hand, the total value to the agents necessarily decreases under the modified allocation. In our experimental results, we refer to this as the deallocation fix.

APPLYING THE FRAMEWORK
In this section, we discuss the application of our framework to two domains: multiminded combinatorial auctions and egalitarian welfare in the assignment problem.

Multi-Minded Combinatorial Auctions
A combinatorial auction allocates items {1, . . . , r} among n agents, such that each agent receives a possibly empty subset of the items. The outcome space Ω i for agent i thus is the set of all subsets of the r items, and the type of agent i can be represented by a vector θ i ∈ Θ i = R 2 r that specifies its value for each possible bundle. The set of possible type profiles is then Θ = R 2 r n , and the value v i (θ i , o i ) of agent i for bundle o i is equal to the entry in θ i corresponding to o i . We require that valuations are monotone, , and normalized such that v i (θ i , ∅) = 0. Assuming agent symmetry and adopting the view of agent 1, the partial outcome rule g 1 : Θ → Ω 1 specifies the bundle g 1 (θ) allocated to agent 1; we require feasibility, so that no item is allocated more than once.
In a multi-minded CA, each agent is interested in at most b bundles for some constant b. The special case where b = 1 is called a single-minded CA. In our framework, the restriction to multi-minded CAs leads to a number of computational advantages. First, valuation profiles and thus the training data can be represented in a compact way, by explicitly writing down the valuations for the constant number of bundles each agent is interested in. Second, inner products between valuation profiles, which are required to apply the kernel trick, can be computed in constant time.
5.1.1. Attribute Maps. To apply structural SVMs to multi-minded CAs, we need to specify an appropriate attribute map χ. In our experiments we use two attribute maps χ 1 : Θ −1 × Ω 1 → R 2 r (2 r (n−1)) and χ 2 : Θ −1 × Ω 1 → R 2 r (n−1) , which are defined as follows: Here, dec(o 1 ) = r j=1 2 j−1 I j∈o1 is a decimal index of bundle o 1 , where I j∈o1 = 1 if j ∈ o 1 and I j∈o1 = 0 otherwise. Attribute map χ 1 thus stacks the vector θ −1 , which represents the valuations of all agents except agent 1, with zero vectors of the same dimension, where the position of θ −1 is determined by the index of bundle o 1 . The resulting attribute vector is simple but potentially restrictive. It precludes two instances with different allocated bundles from sharing attributes, which provides an obstacle to generalization of the discriminant function across bundles. Attribute map χ 2 stacks vectors θ i \ o 1 , which are obtained from θ i by setting the entries for all bundles that intersect with o 1 to 0. This captures the fact that agent i cannot be allocated any of the bundles that intersect with o 1 if o 1 is allocated to agent 1. 9

Efficient Computation of Inner Products.
Inner products can be computed efficiently for both χ 1 and χ 2 . A detailed discussion can be found in the full version of the paper.

Dealing with an Exponentially Large Output Space.
Recall that Training Problems 1 and 2 have constraints for every training example (θ k , o k 1 ) and every possible bundle of items o 1 ∈ Ω 1 , of which there are exponentially many in the number of items in the case of CAs. In lieu of an efficient separation oracle, a workaround exists when the discriminant function has additional structure, such that the induced payment weakly increases as items are added to a bundle. Given this item monotonicity, it would suffice to include constraints for bundles that have a strictly larger value to the agent than any of their respective subsets. We further discuss this issue in the full version of the paper. 10

The Assignment Problem
In the assignment problem, we are given a set of n agents and a set {1, . . . , n} of items, and wish to assign each item to exactly one agent. The outcome space of agent i is thus Ω i = {1, . . . , n}, and its type can be represented by a vector θ i ∈ Θ i = R n . The set of possible type profiles is then Θ = R n 2 . We consider an outcome rule that maximizes egalitarian welfare in a lexicographic manner: first, the minimum value of any agent is maximized; if more than one outcome achieves the minimum, the second lowest value is maximized, and so forth. This outcome rule can be computed by solving a sequence of integer programs. As before, we assume agent symmetry and adopt the view of agent 1.
To complete our specification of the structural SVM framework for this problem, we need to define an attribute map χ 3 : R n 2 −n × N → R s , where the first argument is the type profile of all agents but agent 1, the second argument is the item assigned to agent 1, and s is a dimension of our choosing. A natural choice for χ 3 is to set where θ i [−j] denotes the vector obtained from θ i by removing the jth entry. The attribute map thus reflects the agents' values for all items except item j, capturing the fact that the item assigned to agent 1 cannot be assigned to any other agent. Since the outcome space is very small, we choose not to use a non-linear kernel on top of this attribute vector.

EXPERIMENTAL EVALUATION
We perform a series of experiments to test our theoretical framework. To run our experiments, we use the SVM struct package [Joachims et al. 2009], which allows for the use of custom kernel functions, attribute maps, and separation oracles.

Setup
We begin by briefly discussing our experimental methodology, performance metrics, and optimizations used to speed up the experiments.
6.1.1. Methodology. For each of the settings we consider, we generate three data sets: a training set, a validation set, and a test set. The training set is used as input to Training Problem 2, which in turn yields classifiers h w and corresponding payment rules p w . For each choice of the parameter C of Training Problem 2, and the parameter γ if the RBF kernel is used, a classifier h w is learned based on the training set and evaluated based on the validation set. The classifier with the highest accuracy on the validation set is then chosen and evaluated on the test set. During training, we take the perspective of agent 1, so a training set size of ℓ means that we train an SVM on ℓ examples. Once a partial outcome rule has been learned, however, it can be used to infer payments for all agents. We exploit this fact during testing, and report performance metrics across all agents for a given instance in the test set.
6.1.2. Metrics. We employ three metrics to measure the performance of the learned classifiers. These metrics are computed over the test set {(θ k , o k )} ℓ k=1 . Classification accuracy measures the accuracy of the trained classifier in predicting the outcome. Each instance of the ℓ instances has n agents, so in total we measure accuracy over nℓ instances: 11 Ex post regret sums over the ex post regret experienced by all agents in each of the ℓ instances in the validation set: Individual rationality violation measures the fraction of individual rationality violation across all agents:  In the case of multi-minded CAs we map the inputs θ −1 into a smaller space, which allows us to learn more effectively with smaller amounts of data. 12 We use instance-based normalization, which normalizes the values in θ −1 by the highest observed value and then rescales the computed payment appropriately, and sorting, which orders agents based on bid values. These techniques are explained in more detail in the full version of the paper.

Single-Item Auction
As a sanity check, we perform experiments on the single-item auction with the optimal outcome rule, where the agent with the highest bid receives the item. We obtain excellent accuracy and very close approximation to the second-price payment rule. The framework is able to automatically learn the payment rule of Vickrey's auction. The complete results are deferred to the full version of the paper.

Multi-Minded CAs
We give a high-level overview of the type distribution and the two outcome rules used in the experiments, details can again be found in the full version of the paper.
The type distribution is inspired by Sandholm's decay distribution for single-minded CAs [Sandholm 2002], and is parameterized by two variables β and ζ: β controls the level of correlation between values of different agents, ζ controls the degree of complementarity between items.
The first outcome rule is the optimal rule g opt , which selects a feasible allocation with maximum total value. It is well known that this outcome rule can be turned into a strategyproof mechanism (g opt , p vcg ) by means of the Vickrey-Clarke-Groves payment rule p vcg . The second outcome rule we experiment with is a generalization of the greedy allocation rule for single-minded CAs [Lehmann et al. 2002], which attempts to find an allocation with good welfare by greedily allocating bundles to agents based on a heuristic score. This rule can be made strategyproof in the special case of single-minded CAs, but not in the general multi-minded case.
We experiment with training sets of sizes 100, 300, and 500, and validation and test sets of size 1000. All experiments we report on are for a setting with 5 agents, 5 items, and 3 bundles per agent, and use β = 0.5, the RBF kernel, and parameters C ∈ {10 4 , 10 5 } and γ ∈ {0.01, 0.1, 1}. Additional experimental results can be found in the full version of the paper.  Tables I and II present the basic results for multi-minded CAs with optimal and greedy outcome rules, respectively. For the greedy outcome rule we also present results for p vcg as a baseline. 13 These results are not shown for the optimal outcome rule, where p vcg has accuracy 100, regret 0, and IR violation 0. Table III. Impact of payment offset and null loss fix for ζ = 0.5 and greedy outcome rule, training set size 300. All results are for χ 2 , null loss values appear in the second row. As expected, accuracy across all instances is negatively correlated with regret and ex post IR violation. Both the outcome rule and degree of complementarity between items controlled by ζ have a major effect on the results. Regret is higher for ζ = 0.5, i.e., when complementarity between items is low, and χ 2 performs better for the optimal outcome rule while χ 1 performs better for the greedy outcome rule. For high complementarity, the greedy outcome rule tends to allocate all items to a single agent, and the learned payment rule sets high prices for small bundles to capture this property. For low complementarity, the allocation tends to be split and less predictable. Still, the best classifiers achieve average ex post regret of less than 0.032, for values normalized to [0, 1], even though the corresponding prediction accuracy can be as low as 63%. For the greedy outcome rule, the performance of the learned payment rule is comparable to that of p vcg when ζ ∈ {1.0, 1.5}, and superior in the case ζ = 0.5, where the difference between and greedy outcome rule and the optimal one is bigger. 6.3.2. Effect of Training Set Size. As expected, increasing the training set size from 100 to 300 to 500 leads to better results with higher accuracy and lower regret. Detailed results can be found in the full version of the paper. Table III summarizes our results regarding the various fixes to IR violations, for the particularly challenging case of the greedy outcome rule and ζ = 0.5. The extent of IR violation decreases with larger payment offset and null loss. Regret tends to move in the opposite direction, but there are cases where IR violation and regret both decrease. The three rightmost columns of Table III list the average ratio between welfare after and before the deallocation fix, across the instances in the test set. With a payment offset of 0, a large welfare hit is incurred if we deallocate agents with IR violations. However, this penalty decreases with increasing payment offsets and increasing null loss. At the most extreme payment offset and null loss adjustment, the IR violation is as low as 2%, and the deallocation fix incurs a welfare loss of only 7%.   Figure 1 shows a graphical representation of the impact of payment offsets and null losses. Each line in the plot corresponds to a payment rule learned with a different null loss, and each point on a line corresponds to a different payment offset. The payment offset is zero for the top-most point on each line, and equal to 0.29 for the lowest point on each line. Increasing the payment offset always decreases the rate of IR violation, but may decrease or increase regret. Increasing null loss lowers the top-most point on a given line, but arbitrarily increasing null loss can be harmful. Indeed, a null loss of 1.5 results in a slightly higher top-most point but significantly lower regret at this top-most point compared to a null loss of 2.0.

The Assignment Problem
In the assignment problem, agents' values for the items are sampled uniformly and independently from [0, 1]. We use a training set of size 600, validation and test sets of size 1000, and the RBF kernel with parameters C ∈ {10, 1000, 100000} and γ ∈ {0.1, 0.5, 1.0}.
The performance of the learned payment rules is compared to that of three VCGbased payment rules. Let W be the total welfare of all agents other than i under the outcome chosen by g, and W eg be the minimum value any agent other than i receives under this outcome. We then consider the following payment rules: (1) the vcg payment rule, where agent i pays the difference between the maximum total welfare of the other agents under any allocation and W ; (2) the tot-vcg payment rule, where agent i pays the difference between the total welfare of the other agents under the allocation maximizing egalitarian welfare and W ; and (3) the eg-vcg payment rule, where agent i pays the difference between the minimum value of any agent under the allocation maximizing egalitarian welfare and W eg .
The results for attribute map χ 3 are shown in Table IV. We see that the learned payment rule p w yields significantly lower regret than any of the VCG-based payment rules, and average ex post regret less than 0.074 for values normalized to [0, 1]. Since we are not maximizing the sum of values of the agents, it is not very surprising that VCG-based payment rules perform rather poorly. The learned payment rule p w can adjust to the outcome rule, and also achieves a low fraction of ex post IR violation of at most 3%.

CONCLUSIONS
We have introduced a new paradigm for computational mechanism design in which statistical machine learning is adopted to design payment rules for given algorithmically specified outcome rules, and have shown encouraging experimental results. Future directions of interest include (1) an alternative formulation of the problem as a regression rather than classification problem, (2) constraints on properties of the learned payment rule, concerning for example the core or budgets, (3) methods that learn classifiers more likely to induce feasible outcome rules, so that these learned outcome rules can be used, (4) optimistically assuming item monotonicity and dropping constraints implied by it, thereby allowing for better scaling of training time with training set size at the expense of optimizing against a subset of the full constraints in the training problem, and (5) an investigation of the extent to which alternative goals such as regret percentiles or interim regret can be achieved through machine learning.