Payment Rules through Discriminant-Based Classifiers

In mechanism design it is typical to impose incentive compatibility and then derive an optimal mechanism subject to this constraint. By replacing the incentive compatibility requirement with the goal of minimizing expected ex post regret, we are able to adapt statistical machine learning techniques to the design of payment rules. This computational approach to mechanism design is applicable to domains with multi-dimensional types and situations where computational efficiency is a concern. Specifically, given an outcome rule and access to a type distribution, we train a support vector machine with a specific structure imposed on the discriminant function, such that it implicitly learns a corresponding payment rule with desirable incentive properties. We extend the framework to adopt succinct k-wise dependent valuations, leveraging a connection with maximum a posteriori assignment on Markov networks to enable training to scale up to settings with a large number of items; we evaluate this construction in the case where k=2. We present applications to multiparameter combinatorial auctions with approximate winner determination, and the assignment problem with an egalitarian outcome rule. Experimental results demonstrate that the construction produces payment rules with low ex post regret, and that penalizing classification error is effective in preventing failures of ex post individual rationality.


INTRODUCTION
Mechanism design studies situations where a set of agents each hold private information regarding their preferences over different outcomes. A mechanism receives claims about agent preferences, selects and enforces an outcome, and optionally collects payments. The classical approach is to impose incentive compatibility on the design, ensuring that agents truthfully report their preferences in equilibrium. Subject to this incentive constraint, the goal is to identify a mechanism, that is, a way of choosing an outcome and payments based on agents' reports, that optimizes a given design objective such as welfare or revenue.
There are, however, significant challenges associated with this classical approach. First, it can be analytically cumbersome to derive optimal mechanisms for domains that are multidimensional, in the sense that each agent's private information is described through more than a single number, and few results are known in this case. An example of a multidimensional domain is a combinatorial auction, where an agent's preferences are described by a value for each of several different bundles of items. Second, incentive compatibility can be costly, in that adopting it as a hard constraint can preclude mechanisms with other desirable properties. For example, imposing the strongest form of incentive compatibility, truthfulness in a dominant strategy equilibrium or strategyproofness, necessarily leads to poor revenue, vulnerability to collusion, and vulnerability to false-name bidding in combinatorial auctions where valuations exhibit complementarities among items [Ausubel and Milgrom 2006;Rastegari et al. 2011;Yokoo et al. 2004]. 1 A third difficulty occurs when the optimal mechanism has an outcome or payment rule that is computationally intractable. In this case, the challenge is to simultaneously handle both the incentive constraints and the requirements of worst-case tractability in the design.

Our Approach
In the face of these difficulties, we adopt statistical machine learning to automatically infer mechanisms with good incentive properties. Rather than imposing incentive compatibility as a hard constraint, we start from a given outcome rule, typically expressed as an algorithm, and then use machine learning techniques to identify a payment rule that minimizes agents' expected ex post regret. The ex post regret of an agent for truthful reporting in a given instance, or just regret where it causes no confusion, is the maximum amount by which its utility could increase through a misreport, while holding constant the reports of others. The expected ex post regret is the average ex post regret over all agents and all preference types, calculated with respect to a distribution on types. Our approach is applicable to domains that are multidimensional, and domains for which the computational efficiency of outcome rules is a concern. The methodology seeks a payment rule that obtains the best possible ex post incentive properties, and views all other aspects related to payments, such as revenue, as secondary. Rather, a designer needs to experiment with modified outcome rules in order to achieve different payment desiderata.
In place of ex post incentives, an alternative design stance would adopt the goal of minimizing interim regret. The interim regret for an agent with a particular type is the maximum amount by which the agent's expected utility, given the conditional distribution on types of others, could increase through a misreport. The expected interim regret of a mechanism would average interim regret over all possible types. Whereas a mechanism with zero expected ex post regret is strategyproof, a mechanism with zero expected interim regret is Bayes-Nash incentive compatible, both with the exception of a set of types with measure zero. In this sense, our design stance, being about ex post incentives, is more in the spirit of strategyproofness than Bayes-Nash incentive compatibility.
1 By the revelation principle, this weakness should be ascribed to insisting on mechanisms that are analyzed in equilibrium (dominant-strategy or otherwise), and not to the imposition of incentive constraints per se. Our approach seeks approximate incentive compatibility, and we do not study the incentive compatible analogues to the mechanisms that are designed through discriminant-based payment rules, nor do we study the equilibrium properties of our designed mechanisms. Still, it is important to emphasize that the interesting application of our approach is to settings in which there is no payment rule that can provide strategyproofness for the given outcome rule. We thus depart in a significant way from the typical approach to mechanism design, which assumes equilibrium behavior on the part of agents. We will not achieve an expected ex post regret of zero, and will therefore not obtain strategyproof designs. The analysis we provide is not an equilibrium analysis. This noted, we can make two observations that start from ex post incentive properties but also yield interim incentive properties in the usual way.
First, our formulation ensures that an agent's payment, conditioned on an outcome, is independent of its report. Because of this, the only way an agent can improve its utility is by changing its report in a way that changes the outcome. Generically, this provides mechanisms where there is no gain in utility from infinitesimal changes in reports around an agent's true report, that is, no marginal benefit from misreports; this holds ex post, and thus also ex interim for an agent with knowledge only of its own type. This local stability property occurs in practice in the generalized secondprice (GSP) auction used for sponsored search, and has also been emphasized by Erdil and Klemperer [2010] in the context of combinatorial auction design.
Second, if the expected ex post regret is bounded from above by some constant 1 > 0, then for a setting with finite types this immediately implies a bound of the following form: the expected ex post regret for an agent with a type sampled from the type distribution is at most 2 > 0 with probability at least 1-δ, for some δ > 0, where 2 δ ≤ 1 . Since the interim regret is bounded from above by the expected ex post regret of an agent with the same type, we additionally have a bound of the following form: the interim regret for an agent with a type sampled from the type distribution is at most 2 > 0 with probability at least 1-δ, for some δ > 0. The effect of this is that an agent for whom strategic behavior is costly, and with interim beliefs about others' types, would be truthful when the cost is greater than the interim regret; in particular, if the cost is greater than 2 , then truthful behavior would be optimal for the agent with probability at least 1 − δ.
Returning now to the main theme, the approach that we take to the design of payment rules is to recognize that the payment rule of a strategyproof mechanism can be thought of as a classifier for predicting the outcome. In particular, the payment rule implies a price to each agent for each outcome, and the selected outcome must simultaneously maximize the reported value minus price for every agent. By limiting ourselves to discriminant-based classifiers, which use a discriminant function to score different outcomes and predict the outcome with the highest score, and in particular to discriminant functions with "value-minus-price" structure where the price can be an arbitrary function of the outcome and the reports of other agents, we obtain a remarkably direct connection between multiclass classification and mechanism design.
For an appropriate loss function, the discriminant function of a classifier that minimizes generalization error over a hypothesis class has a corresponding payment rule that minimizes expected ex post regret among all payment rules corresponding to classifiers in this class. Conveniently, an appropriate method exists for multiclass classification with large outcome spaces that supports the specific structure of the discriminant function, namely the method of structural support vector machines [Joachims et al. 2009;Tsochantaridis et al. 2005]. While use of this method restricts us to learning discriminant functions that are linear in feature vectors depending on agents' reported types, the restriction is not severe: the feature vectors can be nonlinear functions of the reported types, and as with standard support vector machines it is possible to adopt nonlinear kernels. This ultimately enables discriminant functions and thus price functions that depend in a nonlinear way on the outcome and the reported types of agents.
The computational cost associated with our approach occurs offline during training, when a payment rule is learned for a given outcome rule. The learned payment rules have a succinct representation, through the standard support vector machine approach, and are fast to evaluate at run-time in the context of a deployed mechanism. A challenge in the context of structural support vector machines is to handle the large number of possible outcomes, or labels of the classification problem, during training. One way to address this in our setting is to work with valuation functions for which training can be formulated as a succinct convex optimization problem. In particular, we adopt k-wise dependent valuations [Conitzer et al. 2005] and leverage a connection with maximum a posteriori assignment on Markov networks to scale-up our framework in application to combinatorial auctions.

Evaluation
In illustrating the framework, we focus on three situations where strategyproof payment rules are not available: (i) multi-minded combinatorial auctions, in which each agent is interested in a constant number of bundles, and where winner determination is provided through a greedy allocation rule, (ii) an assignment problem with multiple distinct items, agents with unit-demand valuations, and an egalitarian outcome rule, that is, an outcome rule that maximizes the minimum value of any agent, and (iii) combinatorial auctions with k-wise dependent valuations, in which each agent's valuation has a graphical representation, and where winner determination is provided through a greedy allocation rule.
The egalitarian rule, also referred to as max-min fairness, has been used by others to illustrate the challenge of truthful mechanism design; for instance, Lavi et al. [2003], who motivate this in the context of Rawls's theory of justice. Although one might also wish to define fairness with regard to utility, that is, including payments, we follow others in adopting the egalitarian rule as a canonical example of a nonimplementable outcome rule.
Our experimental results demonstrate low expected regret even when the 0/1 classification accuracy is only moderately good, and better regret properties than those obtained through the simple Vickrey-Clarke-Groves (VCG) based payment rules that we adopt as a baseline. In addition, we give special consideration to the failure of ex post individual rationality (IR), and introduce methods to bias the classifier to avoid these kinds of errors and also post hoc methods to adjust trained payments, or even allocations, to reduce or eliminate them.
For setting (i), we find that our learned rules perform similarly to VCG-based rules. In setting (ii), our learned rules perform significantly better than VCG-based rules, which is understandable given that the egalitarian objective is quite different from the welfare maximization objectives for which the VCG idea is designed. In setting (iii), our learned rules provide better regret properties than VCG-based rules for large numbers of items, and allow us to trade off IR violation and regret more effectively than VCGbased rules. We are able to scale to instances with tens of items in setting (iii), as our training problem is polynomial in the number of items even though we are running a combinatorial auction. Conitzer and Sandholm [2002] introduced the agenda of automated mechanism design (AMD) and formulated mechanism design as the search for an allocation rule and a payment rule among a class of rules satisfying incentive constraints. While the basic idea of optimal design is familiar from the seminal work of Myerson [1981], a novel aspect of AMD is its formulation as a search problem over the space of all possible mappings from discrete type profiles to outcomes and payments. mapping from types to outcomes and payments. AMD is intractable when an explicit representation of the outcome and payment rules is used, because the type space is exponentially large in the number of agents.

Related Work
One way to make AMD more tractable is to search through a parameterized space of incentive-compatible mechanisms [Guo and Conitzer 2010]. More recently, advances in AMD have been made by considering domains with additive valuations and symmetry among agents, and by adopting Bayes-Nash incentive compatibility (BIC) rather than strategyproofness [Cai et al. 2012]. Still, these approaches seem limited to domains in which the outcome rule can be succinctly represented, which likely is not the case for the kinds of combinatorial auction problems we consider. Lavi and Swamy [2005] describe a method that takes any approximation algorithm for a set packing problem with a matching integrality gap and turns it into a mechanism with the same approximation guarantee that is strategyproof in expectation. Set packing includes combinatorial auctions as a special case. Bei and Huang [2011] and Hartline et al. [2011] describe an approach for turning an allocation rule into a mechanism that yields essentially the same expected amount of social welfare or social surplus and satisfies BIC. The approach computes an allocation and prices based on types sampled from probability distributions derived from the revealed types, and is applicable to both single-parameter and multiparameter domains.
The target of minimizing expected ex post regret and the imposition of agentindependent prices make the incentive properties of mechanisms designed through our approach incomparable to BIC. On one hand, we are interested in minimizing statistics of ex post regret, and thus provide stronger guarantees than those of BIC. On the other hand, we don't guarantee zero expected regret (which would imply strategyproofness, and therefore BIC.) Another distinction is that our approach can accommodate objectives that are nonseparable across agents, such as in the egalitarian assignment problem.
In addition, in determining the outcome and payments for a given instance, the approach of Bei and Huang and Hartline et al. evaluates the outcome rule on a number of randomly perturbed replicas of that instance that is polynomial in the number of agents, the desired approximation ratio, and a notion capturing the complexity of the type spaces. When type spaces are large, as in the case of combinatorial auctions, this may become intractable. By contrast, our approach evaluates the outcome rule and the trained payment rule once for a given instance and incurs additional computational costs only during training.
The work of Lahaie [2009Lahaie [ , 2010 precedes our work in adopting a kernel-based approach for combinatorial auctions, but focuses not on learning a payment rule for a given outcome rule but rather on solving the winner determination and pricing problem for a given instance of a combinatorial auction. Lahaie introduces the use of kernel methods to compactly represent nonlinear price functions, which is also present in our work, but obtains incentive properties more indirectly through a connection between regularization and price sensitivity. The main distinction between the two lines of work is that Lahaie focuses on the design of scalable methods for clearing and pricing approximately welfare-maximizing combinatorial auctions, while we advance a framework for the automated design of payment rules that provide good incentive properties for a given outcome rule, which need not be welfare-maximizing.
Our discussion of k-wise dependent valuations builds on valuation structure for combinatorial auctions introduced by Conitzer et al. [2005] and Abraham et al. [2012]. Our tractable training results rely on connections between k-wise dependent valuations and associative Markov networks [Taskar et al. 2004]. Carroll [2011] and Lubin and Parkes [2012] provide surveys of related work on approximate incentive compatibility, or incentive compatibility in the large-market limit. A fair amount of attention has been devoted to regret-based metrics for quantifying the incentive properties of mechanisms [e.g., Carroll 2011;Day and Milgrom 2008;Lubin 2010;Parkes et al. 2001]. Pathak and Sönmez [2013] provide a qualitative ranking of different mechanisms without payments in terms of the number of manipulable instances. Budish [2011] introduces an asymptotic, absolute design criterion regarding incentive properties in a large replica economy limit. Lubin and Parkes [2009] provide experimental support that relates the divergence between the payoffs in a mechanism and the payoffs in a strategyproof "reference" mechanism to the amount by which agents deviate from truthful bidding in the Bayes-Nash equilibrium of a mechanism.

PRELIMINARIES
A mechanism design problem is given by a set N = {1, 2, . . . , n} of agents that interact to select an element from a set ⊆ i∈N i of outcomes, where i denotes the set of possible outcomes for agent i ∈ N. Agent i ∈ N is associated with a type θ i from a set i of possible types, corresponding to the private information available to this agent. We write θ = (θ 1 , . . . , θ n ) for a profile of types for the different agents, = i∈N i for the set of possible type profiles, and θ −i ∈ −i for a profile of types for all agents but i. Each agent i ∈ N is further assumed to employ preferences over i , represented by a valuation function v i : i × i → R. We assume that for all i ∈ N and A (direct) mechanism is a pair (g, p) of an outcome rule g : → i∈N i and a payment rule p : → R n ≥0 . The intuition is that the agents reveal to the mechanism a type profile θ ∈ , possibly different from their true types, and the mechanism chooses outcome g(θ ) and charges each agent i a payment of p i (θ ) = (p(θ )) i . We assume quasilinear preferences, so the utility of agent i with type θ i ∈ i given a profile θ ∈ of revealed types is u i (θ , θ i ) = v i (θ i , g i (θ )) − p i (θ ), where g i (θ ) = (g(θ ) i ) denotes the outcome for agent i. A crucial property of mechanism (g, p) is that its outcome rule is feasible, i.e., that g(θ ) ∈ for all θ ∈ .
Outcome rule g satisfies consumer sovereignty if for all i ∈ N, o i ∈ i , and θ −i ∈ −i , there exists θ i ∈ i such that g i θ i , θ −i = o i ; and reachability of the null outcome if for is dominant strategy incentive compatible, or strategyproof, if each agent maximizes its utility by reporting its true type, irrespective of the reports of the other agents, that is, if for all i ∈ N, Observe that given reachability of the null outcome, strategyproofness implies individual rationality.
A mechanism (g, p) is strategyproof if and only if the payment of an agent is independent of its reported type and the chosen outcome simultaneously maximizes the utility of all agents, that is, if for every type profile θ ∈ , for a price function t i : −i × i → R. This simple characterization is crucial for our approach, providing the basic insight into how to utilize the discriminant function of a classifier as a payment rule. The first property is the agent-independent property: conditioned on reports of others, and an outcome, an agent's payment is independent of its own report. The second property is the agent-optimizing property: the outcome should maximize an agent's utility given these agent-independent prices and its reported valuation. The first property is necessary since if it is violated, there exists some type vector θ where an agent i can misreport its type and receive the same outcome but a lower payment. The second property is necessary since if it is violated, there exists some type vector θ where an agent i receives a preferred outcome and payment pair by misreporting its type. Strategyproofness can also be characterized in regard to necessary and sufficient properties of outcome rules alone, and especially through monotonicity properties. These properties characterize those outcome rules for which there exists a payment rule such that the outcome rule and payment rule form a strategyproof mechanism [Ashlagi et al. 2010;Saks and Yu 2005]. These monotonicity properties constrain the space of outcome rules for which it is possible to learn a payment rule that provides full strategyproofness to a designed mechanism.
We quantify the degree of strategyproofness of a mechanism in terms of the regret experienced by an agent when revealing its true type. The ex post regret of agent i ∈ N in mechanism (g, p), given true type θ i ∈ i and reported types θ −i ∈ −i of the other agents, is This is the maximum gain in utility the agent could achieve through a misreport θ i , holding the reports of others fixed. Analogously, the ex post violation of individual rationality of agent i ∈ N in mechanism (g, p), given true type θ i ∈ i and reported types θ −i ∈ −i of the other agents, is This quantity is zero when there is no violation of individual rationality (IR) for the agent at this type profile, but positive when the agent's utility is negative for the outcome and payment.
We consider situations where type profiles θ are drawn from a distribution with probability density function, D : → R, such that D(θ ) ≥ 0 and θ∈ D(θ ) = 1. Given such a distribution, and assuming that all agents report their true types, the expected ex post regret of agent i ∈ N in mechanism (g, Outcome rule g is agent symmetric if for every permutation π of agents N, and all types θ , θ ∈ such that θ i = θ π(i) for all i ∈ N, g i (θ ) = g π(i) (θ ) for all i ∈ N. This specifically requires that i = j and i = j for all i, j ∈ N. Similarly, type distribution D is agent symmetric if D(θ ) = D(θ ), for every permutation π of N, and all types θ , θ ∈ such that θ i = θ π(i) for all i ∈ N. Given agent symmetry, a price function t 1 : −1 × i → R for agent 1 can be used to generate the payment rule p for a mechanism (g, p), with p(θ ) = t 1 (θ −1 , g 1 (θ )), t 1 (θ −2 , g 2 (θ )), . . . , t 1 (θ −n , g n (θ )) , so that the expected ex post regret is the same for every agent.
We assume agent symmetry going forward, which precludes outcome rules that break ties based on agent identity, but obviates the need to train a separate classifier for each agent while also providing some benefits in terms of simplifying the presentation of our results. 2

PAYMENT RULES FROM MULTICLASS CLASSIFIERS
A multiclass classifier is a function h : X → Y, where X is an input domain and Y is a discrete output domain. One could imagine, for example, a multiclass classifier that labels a given image as a dog, cat, or some other animal. In the context of mechanism design, we will be interested in classifiers that take as input a type profile and output an outcome. What distinguishes this from an outcome rule is that we will impose restrictions on the form the classifier can take.
Classification typically assumes an underlying target function h * : X → Y, and the goal is to learn a classifier h that minimizes disagreements with h * on an input distribution D X on X, based only on a finite set of training data x 1 , y 1 , . . . , . , x drawn from D X . This may be challenging because the amount of training data is limited, or because h is restricted to some hypothesis class H with a certain simple structure, for instance, linear threshold functions. If h(x) = h * (x) for all x ∈ X, we say that h is a perfect classifier for h * .
We consider classifiers that are defined in terms of a discriminant function f : for all x ∈ X. More specifically, we will be concerned with linear discriminant functions of the form for a weight vector w ∈ R m and a feature map ψ : The function ψ maps input and output into an m-dimensional space, which allows nonlinear features to be expressed. In general, we allow w to have infinite dimension, while requiring the inner product between w and ψ(x, y) to remain well-defined. Computationally, the infinite-dimensional case is handled through kernels, as described in Section 4.1.1.

Mechanism Design as Classification
Given an outcome rule g and access to a distribution D over type profiles, our goal is to design a payment rule p that gives the mechanism (g, p) the best possible incentive properties, in the sense of expected regret.
Assuming agent symmetry, we focus on a partial outcome rule g 1 : → 1 and train a classifier to predict the outcome to agent 1. To train a classifier, we generate examples by drawing a type profile θ ∈ from distribution D and applying outcome rule g to obtain the target class g 1 (θ ) ∈ 1 .
2 Technically, agent symmetric outcome rules would need to either break ties using randomization or by not allocating anything to agents that are tied. In the former case, the outcome rule would then map to a distribution over outcomes rather than a single outcome. The relation between multiclass classification and mechanism design still holds in this setting, but perfect classification is no longer possible because the outcome rule is not deterministic. We adopt agent symmetry to simplify the presentation of our results, but without agent symmetry, we can still use the same methods to train a separate classifier for each agent based on an agent-specific outcome rule. We also assume agent symmetry and train a single classifier for all agents in the experimental results as ties occur with negligible probability for the settings and outcome rules we study.
We impose a special structure on the hypothesis class. A classifier h w : → 1 is admissible if it is defined in terms of a discriminant function f w of the form for weights w such that w 1 ∈ R >0 and w −1 ∈ R m , and a feature map ψ : −1 × 1 → R m for m ∈ N ∪ {∞}. The first term of f w (θ , o 1 ) only depends on the type of agent 1, and increases in its valuation for outcome o 1 , while the remaining terms ignore θ 1 entirely. This restriction to admissible discriminant functions is crucial because it allows us to directly infer agent-independent prices from the discriminant function of a trained classifier. For this, define the associated price function of an admissible classifier h w , as where we again focus on agent 1 for concreteness. By agent symmetry, we obtain the mechanism (g, p w ) corresponding to classifier h w , by defining payment rule, Even requiring admissibility, the hope is that appropriate choices for the feature map ψ can produce rich function spaces, and thus ultimately useful payment rules. Moreover, this admissibility structure can be adopted in the context of structural support vector machines, as discussed in Section 4.1.

Example: Single-Item Auction
Before proceeding further, we illustrate the ideas developed so far in the context of a single-item auction. In a single-item auction, the type of each agent is a single number, corresponding to its value for the item, and there are two possible allocations from the point of view of an agent: one where it receives the item, and one where it does not. Formally, = R n and 1 = {0, 1} (agent 1 is allocated, or it is not).
Consider a setting with three agents and a training set: θ 1 , o 1 1 = (1, 3, 5), 0 , θ 2 , o 2 1 = (5, 4, 3), 1 , θ 3 , o 3 1 = (2, 3, 4), 0 , and note that this training set is consistent with an optimal outcome rule, that is, one that assigns the item to an agent with maximum value. Our goal is to learn an admissible classifier, that performs well on the training set. Since there are only two possible outcomes, the outcome chosen by h w is simply the one with the larger discriminant. A classifier that is perfect on the training data must therefore satisfy the following constraints: 4), 1). This can, for example, be achieved by setting w 1 = 1, and Recalling our definition of the price function as t w (θ −1 , o 1 ) = −(1/w 1 )w T −1 ψ(θ −1 , o 1 ), we see that this choice of w and ψ corresponds to the second-price payment rule.
In practice, we are limited to hypotheses that are linear in features ψ((θ 2 , θ 3 ), o 1 ), and should not expect that the classifier is exact on the training data or generally on the distribution of inputs. Nevertheless, we will see in Section 4.1.1 that through the use of kernels we can adopt choices of ψ that allow for rich, nonlinear discriminant functions.

Perfect Classifiers and Implementable Outcome Rules
We now formally establish a connection between mechanism design and multiclass classification.
THEOREM 3.1. Let (g, p) be a strategyproof mechanism with an agent symmetric outcome rule g, and let t 1 be the corresponding price function. Then, a perfect admissible classifier h w for partial outcome rule g PROOF. By the first characterization of strategyproof mechanisms, g must select an outcome that maximizes the utility of agent 1 at the current prices, that is, , which uses the price function t 1 as its feature map. Clearly, the corresponding classifier h (1,1) maximizes the same quantity as g 1 , and the two must agree if there is a unique maximizer.
The relationship also works in the opposite direction: a perfect, admissible classifier h w for outcome rule g can be used to construct a payment rule that turns g into a strategyproof mechanism. THEOREM 3.2. Let g be an agent symmetric outcome rule, h w : → 1 an admissible classifier, and p w the payment rule corresponding to h w . If h w is a perfect classifier for the partial outcome rule g 1 , then mechanism (g, p w ) is strategyproof.
We prove this result by expressing the regret of an agent in mechanism (g, p w ) in terms of the discriminant function f w . Let i (θ −i ) ⊆ i denote the set of partial outcomes for agent i that can be obtained under g given reported types θ −i from all agents but i, keeping the dependence on g silent for notational simplicity. LEMMA 3.3. Suppose that agent 1 has type θ 1 and that the other agents report types θ −1 . Then the regret of agent 1 for bidding truthfully in mechanism (g, p w ) is . By Lemma 3.3, the regret of agent 1 for bidding truthfully in mechanism (g, p w ) is always zero, which means that the mechanism is strategyproof.
It bears emphasis that classifier h w is only used to derive the payment rule p w , while the outcome is still selected according to g.
We might ask whether classifier h w could be used to obtain an agent symmetric outcome rule g w , and, since h w is a perfect classifier for itself, a strategyproof mechanism (g w , p w ). In particular, for each agent i, the outcome rule g w would be defined to select But the problem is that this need not be feasible: there need not be a set of outcomes, o * = (o * 1 , . . . , o * n ), such that this outcome is itself feasible. For example, in the context of an auction, the outcome rule g w implied by the trained classifier might seek to give the same item to the more than one agent.
The mechanism that we adopt, namely (g, p w ), has in some sense the opposite problem: it is guaranteed to be feasible because outcome rule g is feasible, but is only strategyproof if h w is a perfect classifier for g. While the learned payment rule, p w , always satisfies the agent-independent property (1), the agent-maximizing property (2) (the second requirement for strategyproofness) is violated when h w (θ ) = g 1 (θ ).

Approximate Classification and Approximate Strategyproofness
A perfect admissible classifier for outcome rule g provides a payment rule for a strategyproof mechanism. We now show that this result extends gracefully to situations where no such payment rule is available, by relating the expected ex post regret of a mechanism (g, p) to a measure of the generalization error of a classifier for outcome rule g.
Fix a feature map ψ, and denote by H ψ the space of all admissible classifiers with this feature map. The discriminant loss of a classifier h w ∈ H ψ with respect to a type profile θ and an outcome o 1 ∈ 1 is given by, Intuitively the discriminant loss measures how far, in terms of the normalized discriminant, h w is from predicting the correct outcome for type profile θ , assuming the correct outcome is o 1 . Note that (o 1 , θ) ≥ 0 for all o 1 ∈ 1 and θ ∈ , and all o 1 ∈ 1 : even if two classifiers predict the same outcome, one of them may still be closer to predicting the correct outcome o 1 . The generalization error of classifier h w ∈ H ψ with respect to a type distribution D and a partial outcome rule g 1 : → 1 , is given by The following result establishes a connection between the generalization error and the expected ex post regret of the corresponding mechanism.
THEOREM 3.4. Consider an outcome rule g, a space H ψ of admissible classifiers, and a type distribution D. Let h w * ∈ H ψ be a classifier that minimizes generalization error with respect to D and g among all classifiers in H ψ . Then the following holds.
(1) If g satisfies consumer sovereignty, then (g, p w * ) minimizes expected ex post regret with respect to D among all mechanisms (g, p w ) corresponding to classifiers h w ∈ H ψ . (2) Otherwise, (g, p w * ) minimizes an upper bound on expected ex post regret with respect to D amongst all mechanisms (g, p w ) corresponding to classifiers h w ∈ H ψ .
PROOF. For the second property, observe that where the last equality holds by Lemma 3.3. If g satisfies consumer sovereignty, then the inequality holds with equality, and the first property follows as well.
Minimization of expected regret itself, rather than an upper bound, can also be achieved even in the absence of consumer sovereignty (which holds for all the outcome rules studied in this article) if the learner has access to the set of available outcomes, 1 (θ −1 ), that are achievable for every θ −1 ∈ −1 .

A SOLUTION USING STRUCTURAL SUPPORT VECTOR MACHINES
In this section we discuss the method of structural support vector machines (structural SVMs) [Tsochantaridis et al. 2005;Joachims et al. 2009]. In particular, we show how structural SVMs can be adapted for the purpose of learning classifiers with admissible discriminant functions.

Structural SVMs
Given an input space X, a discrete output space Y, a target function h * : SVMs learn a multiclass classifier h that given input x ∈ X selects an output y ∈ Y to maximize f w (x, y) = w T ψ(x, y). For a given feature map ψ, the training problem is to find a vector w for which h w has low generalization error. For those readers familiar with SVMs (for binary classification), structural SVMs apply similar insights to solve a multiclass classification problem. While SVMs try to find a boundary that separates the positive examples from the negative examples and maximizes the minimum distance (or margin) to any example, structural SVMs try to find weights that separate the discriminant function values for the correct class from the discriminant function values for all other classes as much as possible. By maximizing this re-defined notion of margin, structural SVMs attempt to learn weights which induce discriminant functions with low generalization error.
Given examples {(x 1 , y 1 ), . . . , (x , y )}, training is achieved by solving the following convex optimization problem: The goal is to find a weight vector w and slack variables ξ k such that the objective function is minimized while satisfying the constraints. The learned weight vector w parameterizes the discriminant function f w , which in turn defines the classifier h w .
The kth constraint states that the value of the discriminant function on x k , y k should exceed the value of the discriminant function on x k , y by at least L y k , y , where L is a loss function that penalizes misclassification, with L(y, y) = 0 and L(y, y ) ≥ 0 for all y, y ∈ Y. the loss function is optional (since it can be set to 0 everywhere), but is a useful tool to tune the classifiers that are learned. We generally use a 0/1 loss function, but consider an alternative in Section 4.2.2 to improve ex post IR properties. Positive values for the slack variables ξ k allow the weight vector to violate some of the constraints. The other term in the objective, the squared norm of the weights, penalizes larger weight vectors. Without this, scaling up the weight vector w can arbitrarily increase the margin between f w x k , y k and f w x k , y , and make the constraints easier to satisfy. Smaller values of w, on the other hand, increases the ability of the learned classifier to generalize by decreasing the propensity to over-fit to the training data.
Parameter C ≥ 0 is a regularization parameter: larger values of C encourage small ξ k and larger w, such that more points are classified correctly, but with a smaller margin (and thus perhaps with less generalization power).

The Feature Map and the Kernel Trick.
Given a feature map ψ, the feature vector ψ(x, y) for x ∈ X and y ∈ Y provides an alternate representation of the input-output pair (x, y). It is useful to consider feature maps ψ for which ψ(x, y) = φ(χ(x, y)), where χ : X × Y → R s for some s ∈ N is an attribute map that combines x and y into a single attribute vector, χ(x, y), which compactly represents the pair. Given this, function φ : R s → R m , for m > s, maps the attribute vector to a higher-dimensional space and can introduce additional nonlinear interactions between attributes. In this way, SVMs can achieve nonlinear classification in the attribute space.
What is commonly described as "feature engineering" occurs in our setting through a combination of designing the attribute map χ and the function φ.
The use of kernels allows for a large (even unbounded) m, because ψ(x, y) appears only in the dual of Training Problem 1 within an inner product of the form ψ(x, y), ψ(x , y ) , or, for a decomposable feature map, φ(q), φ(q ) where q = χ(x, y) and q = χ(x , y ) (see Joachims et al. [2009] for a complete derivation of the dual). For computational tractability it suffices that this inner product can be computed efficiently, and the kernel "trick" is to choose φ such that φ(q), φ(q ) = K(q, q ) for a simple closed-form function K, which is known as the kernel.
Two common kernels are the polynomial kernel K polyd , which is parameterized by degree d ∈ N + , and the radial basis function (RBF) kernel K RBF , which is parameterized by γ = 1/(2σ 2 ) for σ ∈ R + : Both polynomial and RBF kernels use the standard inner product of their arguments, so their efficient computation requires only that χ(x, y)·χ(x, y ) can be computed efficiently. In our experimental results we adopt the RBF kernel for part of our study on combinatorial auctions, but develop our other experimental results without making use of the kernel trick.

Dealing with an Exponentially Large Output Space.
Training Problem 1 has (|Y| ) constraints, where Y is the output space and the number of training instances, and enumerating all of them is computationally prohibitive when Y is large. Joachims et al. [2009] address this issue for structural SVMs through constraint generation: starting from an empty set of constraints, this technique iteratively adds a constraint that is maximally violated by the current solution until the violation is below a desired threshold . Joachims et al. show that this will happen after no more than O C iterations, each of which requires O( ) resp. O 2 time and memory if linear (resp. polynomial or RBF) kernels are used.
However, this approach assumes the existence of an efficient separation oracle, which given a weight vector w, an input x k , and a target y k , finds an output y ∈ arg max y ∈Y f w x k , y + L y k , y . The subproblem solved by this separation oracle is referred to as the separation problem. The existence of such an oracle remains an open question in application to multi-minded combinatorial auctions; see Section 5.1.3 for additional discussion.
Sometimes the separation problem can be written as a polynomially sized linear program of a particular form. We will see this in the context of succinct, graphical representations of agent valuations in the combinatorial auction domain. In this case, we can modify Training Problem 1 so that constraint generation is not needed, even when the output space is exponential in the problem size [Taskar et al. 2004]. Indeed, adopting the approach of Taskar et al. [2004], suppose that we can write max y ∈Y f w x k , y + L y k , y as a linear program of the form: where A, B, b are functions of x k . Assuming that this program is feasible and bounded, we have a dual linear program that attains the same objective value: In this case, we can rewrite Training Problem 1 by replacing the many constraints for a single training example with a single constraint that uses a max function: We can now apply the LP formulation for finding the maximum value of f w x k , y + L(x, y).
By LP duality, we can replace the max linear program with a min linear program.
We can now drop the min on the right hand side and add the constraints under the min directly into the linear program. This is valid since the only place z occurs is on the right hand side of these constraints. As a result, even without explicitly minimizing, the optimization will choose a value of z that allows for the most flexibility in the left hand side of these constraints.
We therefore have a single, succinct primal convex program even though the number of original constraints was exponentially large: We apply these ideas in Section 5.2 to combinatorial auctions where agents have succinct, graph-based value representations. This allows us to have a scalable training problem even though the winner determination problem remains still NP-hard.
Though we work directly with the features ψ x k , y k in our experiments, it is still possible to use kernels in conjunction with the succinct formulation of the convex program. This would require working with the dual of the succinct primal convex program.

Required Information.
In summary, the use of structural SVMs requires specification of the following.
(1) The input space X, the discrete output space Y, and examples of input-output pairs.
(2) An attribute map χ : X × Y → R s . This function generates an attribute vector that combines the input and output data into a single object. (3) A kernel function K(q, q ), typically chosen from a well-known set of candidates, for instance, polynomial or RBF. The kernel implicitly calculates the inner product φ(q), φ(q ) , for instance, between a mapping of the inputs into a high dimensional space. (4) If the space Y is prohibitively large, a routine that allows for efficient separation, i.e., a function that computes arg max y∈Y f w (x, y) for a given w, x, or a compact representation of the separation problem, enabling a succinct formulation of the training problem in the form of convex optimization.
In addition, the user needs to stipulate particular training parameters, such as the regularization parameter C, and the kernel parameter γ if the RBF kernel is being used.

Structural SVMs for Mechanism Design
We now specialize structural SVMs such that the learned discriminant function will provide a payment rule, for a given symmetric outcome function g and distribution D. In this application, the input domain X is the space of type profiles , and the output domain Y is the space 1 of outcomes for agent 1.
We construct training data by sampling θ ∼ D and applying g to these inputs: For this reason, we must use an attribute map χ : −1 × 1 → R s rather than χ : × 1 → R s , and the kernel φ we specify will only be applied to the output of χ . This results in the following more specialized training problem: If w 1 > 0 then the weights w together with the feature map ψ define a price function that can be used to define payments p w (θ ), as described in Section 3.1. In this case, we can also relate the regret in the induced mechanism (g, p w ) to the classification error as described in Section 3.3. THEOREM 4.1. Consider training data θ 1 , o 1 1 , . . . , θ , o 1 . Let g be an outcome function such that g 1 θ k = o k 1 for all k. Let w, ξ k be the weight vector and slack variables output by Training Problem 2, with w 1 > 0. Consider corresponding mechanism (g, p w ). For each type profile θ k in the training data, PROOF. Consider input θ k . The constraints in the training problem impose that for every outcome o 1 ∈ 1 , This inequality holds for every o 1 ∈ 1 , so where the second inequality holds because L o k 1 , o 1 ≥ 0, and the final inequality follows from Lemma 3.3. This completes the proof.
We choose not to enforce w 1 > 0 explicitly in Training Problem 2, because adding this constraint leads to a dual problem that references ψ outside of an inner product, and thus makes computation of all but linear or low-dimensional polynomial kernels prohibitively expensive. Instead, in our experiments we simply discard hypotheses where the result of training is w 1 ≤ 0. This is sensible since the discriminant function value should increase as an agent's value increases, and negative values of w 1 typically mean that the training parameter C or the kernel parameter γ (if the RBF kernel is used) are poorly chosen.
Looking forward to our experiments, this requirement of positive w 1 did not present a practical concern. For example, for multi-minded combinatorial auctions, 1049/1080 > 97% of the trials had positive w 1 for the trained classifier, and for the egalitarian assignment problem all of the trained classifiers had w 1 > 0.

Payment Normalization.
One issue with the framework as stated is that the payments p w computed from the solution to Training Problem 2 could be negative. We solve this problem by normalizing payments, using a baseline outcome o b . If there exists a null outcome o , such that v 1 (θ 1 , o ) = 0 for every θ 1 , then this outcome provides the baseline. Otherwise, we adopt as the baseline outcome the outcome o b with the lowest price to agent 1 for a given set of types of other agents. For this, let t w (θ −1 , o 1 ) be the price function corresponding to the solution w to Training Problem 2. Adopting the baseline outcome o b , the normalized payments t w (θ −1 , o 1 ), are defined as Even when the baseline outcome is defined as that with the lowest price, it is still only a function of the types of other agents θ −1 , and so the prices t w remain a function of θ −1 and o 1 and are still agent independent.

Individual Rationality Violation.
Even after normalization, the learned payment rule p w may not satisfy individual rationality (IR). Recall that this requires that an agent's payment is no greater than its reported value for the outcome. We offer three solutions to this problem, which can also be used in combination.
Payment Offsets. One way to reduce IR violations is to make an additional adjustment to prices, across all type reports, designed to reduce the prices. In particular, for a given offset off > 0, and given normalized prices t w , we can then further adjust prices by the offset to obtain final prices t w (θ −1 , o 1 ) = max(0, t w (θ −1 , o 1 ) − off ). The effect is to leave the price on the baseline outcome unchanged (since its price was already normalized to zero), but to apply the offset where possible to other outcomes.
Although the use of a payment offset decreases the IR violation it might increase regret because of the nonlinearity in taking the max with zero. For instance, suppose there are only two outcomes o 11 , o 12 , where o 12 is the null outcome. Suppose agent 1 values o 11 at 5 and receives the null outcome if he reports truthfully. Suppose further that payments t w are 7 for o 11 and 0 for the null outcome. With no payment offset, the agent experiences no regret, since he receives utility 0 from the null outcome, but negative utility from o 11 . However, if the payment offset is greater than 2, the agent's regret becomes positive (assuming consumer sovereignty), because he could have reported differently and received o 11 and received positive utility.
Adjusting the Loss Function L. We incur an IR violation when there is a null outcome o null (for example allocating no items to an agent in a combinatorial auction), such that g 1 (θ ) = o null and f w (θ , o null ) > f w (θ , g 1 (θ )) for some type θ ; that is, the discriminant value of the null outcome is greater than that for the actual outcome selected by the outcome rule. This happens because the discriminant f w (θ , o 1 ) is a scaled version of the agent's utility for outcome o 1 under payments p w . If the utility for the null outcome is greater than the utility for g 1 (θ ), and the payment on null outcomes are zero, then the payment t w (θ −1 , g 1 (θ )) must be greater than v 1 (θ 1 , g 1 (θ )) (so that the discriminant value f w (θ , g 1 (θ )) < 0), causing an IR violation.
Recognizing this, we can discourage these types of errors by modifying the constraints of Training Deallocation. In settings that have a null outcome and are downward closed (i.e., settings where a feasible outcome o remains feasible if o i is replaced with the null outcome), we can also choose to modify the function g to allocate the null outcome whenever the price function t w creates an IR violation. This reduces ex post regret, and in particular ensures ex post IR for all instances. On the other hand, the total value to the agents necessarily decreases under the modified allocation, and we begin to deviate from the intended outcome rule. In our experimental results, we refer to this as the deallocation fix.

APPLYING THE FRAMEWORK
In this section, we discuss the application of our framework to three domains: multiminded combinatorial auctions, combinatorial auctions with k-wise dependent valuations, and egalitarian welfare in the assignment problem.

Multi-Minded Combinatorial Auctions
A combinatorial auction allocates items {1, . . . , r} among n agents, such that each agent receives a possibly empty subset of the items. The outcome space i for agent i is the set of all subsets of the r items, and the type of agent i can be represented by a vector θ i ∈ i = R 2 r that specifies its value for each possible bundle. The set of possible type profiles is then = R 2 r n , and the value v i (θ i , o i ) of agent i for bundle o i is equal to the entry in θ i corresponding to o i .
We require that valuations are monotone, such that Assuming agent symmetry and adopting the view of agent 1, the partial outcome rule g 1 : → 1 specifies the bundle g 1 (θ ) allocated to agent 1. We require feasibility of outcome rules, so that no item is allocated more than once.
In a multi-minded CA, each agent is interested in at most κ bundles for some constant κ. The special case where κ = 1 is the well studied problem of single-minded CAs. We choose to study multi-minded CAs rather than single-minded CAs because they provide an example for which truthful, algorithmic mechanism design is not well understood. We choose to study multi-minded CAs in particular, as an example of a multiparameter mechanism design problem, because the valuation profiles and thus the training data can be represented in a compact way. In the case of multi-minded CAs, this is by explicitly writing down the valuations for the bundles in which each agent is interested. In addition, the inner products between valuation profiles, which are required to apply the kernel trick, can be computed in constant time.

Attribute Maps.
To apply structural SVMs to multi-minded CAs, we need to specify an appropriate attribute map χ . The approach that we take in choosing an attribute map is to recognize that the attributes should expose to the classifier enough information about the valuations of agents 2 through n to allow the discriminant function to accurately rank the different bundles that could be allocated to agent 1. In particular, the classifier is accurate when the discriminant function assigns the highest score to the bundle that is allocated to agent 1 by the outcome rule.
Conceptually, we find it helpful in this exercise in feature engineering to conceptualize an outcome rule as maximizing some objective functionf (θ , o), for type profile θ ∈ and outcome o ∈ , where this objective function might approximate social welfare, or approximate max-min value. Given this, and the structure of discriminant function (3), then the attribute map should expose attributes that allow for the accurate estimation of the optimal objective value, when the outcome is restricted to assign bundle o 1 to agent 1. 3 In this sense, a good attribute map represents, perhaps in summary form, the valuation functions of other agents given that bundle o 1 has been assigned to agent 1.
With this in mind, we adopt two simple attribute maps χ 1 : −1 × 1 → R 2 r (2 r (n−1)) and χ 2 : −1 × 1 → R 2 r (n−1) in our experiments, defined as follows: where dec(o 1 ) = r j=1 2 j−1 I j∈o 1 is a decimal index of bundle o 1 , where I j∈o 1 = 1 if j ∈ o 1 and I j∈o 1 = 0 otherwise, and θ i \ o 1 denotes the valuation function that is obtained by setting the value on all bundles that are nondisjoint with o 1 to zero. 3 In the single-item example in Section 3.2, we could have obtained an exact classifier by setting w T −1 ψ θ 2 , θ 3 , o 1 = 0 if o 1 = 1 and max θ 2 , θ 3 if o 1 = 0, and then obtaining the second-price payment rule by first normalizing the payment rule as described in Section 4.2.1. In this way, the discriminant rule f w θ , o 1 would be exactly the objective value for the optimal assignment that is constrained to respect assignment o 1 to agent 1.
Attribute map χ 1 stacks the vector θ −1 (with 2 r (n − 1) entries), which represents the valuations of all agents except agent 1, with zero vectors of the same dimension, where the position of θ −1 is determined by the index of bundle o 1 .
We view attribute map χ 1 as providing a baseline, since no effort is made to encode the effect of assigning bundle o 1 to agent 1 on the valuations of the other agents. Rather, the choice of bundle o 1 is encoded only indirectly, through the position of valuations θ −1 in the attribute vector. An undesirable side effect is that two training instances in which bundle o 1 differs only slightly will have completely distinct sets of nonzero attributes. We might expect this to reduce the generalization power of the classifier.
In comparison, attribute map χ 2 is designed to encode very explicitly the effect of assigning bundle o 1 to agent 1 on the valuations of other agents. In particular, this attribute vector stacks vectors θ i \ o 1 , which are obtained from valuation type θ i by setting the entries for all bundles that share one or more items with bundle o 1 to zero. This captures the fact that another agent cannot be allocated a bundle that intersects with o 1 .
Both χ 1 and χ 2 are defined for a particular number of items and agents, and in our experiments we train a different classifier for each number of agents and items. An attractive alternative to adopt in practice would be to pad out items and agents by setting bids to zero, allowing a single classifier to be trained.

Efficient Computation of Inner
Products. Efficient computation of inner products is possible for both χ 1 , χ 2 . A full discussion of the approach that we take for this is provided in Appendix A.

Dealing with an Exponentially Large Output Space.
Recall that Training Problems 1 and 2 have constraints for every training example θ k , o k 1 and every possible bundle of items o 1 ∈ 1 . For CAs, there will be exponentially many such bundles. In lieu of an efficient separation oracle, a workaround exists when the discriminant function ensures that the induced prices weakly increase as items are added to a bundle. Given this property of item monotonicity, it suffices to include constraints for bundles that have a strictly larger value to the agent than any of their respective subsets. Coupled with the assumption that valuations in CAs are monotone, and the admissibility property of the discriminant function, no other bundles can have a greater discriminant value than these bundles.
But we are not able to impose item monotonicity directly on the training problem with a small number of constraints. 4 For this reason, the baseline experimental results in Section 6 do not assume item monotonicity, and instead use an inefficient separation oracle, that simply iterates over all possible bundles o 1 ∈ 1 .
An alternative that we have also studied is to optimistically assume item monotonicity, and only include the constraints associated with bundles that are explicit in agent valuations. We also present experimental results that test this optimistic approach, and while there is a degradation in performance, results are mostly comparable. This provides a useful approach to scaling up training for representation languages such as the XOR representation adopted for multi-minded CAs for which it is simple to identify 4 For polynomial kernels and certain attribute maps, a possible sufficient condition for item monotonicity is to force the weights w −1 to be negative. However, as with the discussion of enforcing w 1 > 0 directly, these weight constraints do not dualize conveniently and results in the dual formulation to no longer operate on inner products ψ θ −1 , o 1 , ψ θ −1 , o 1 . As a result, we would be forced to work in the primal, and incur extra computational overhead that increases polynomially with the kernel degree d. We have performed some preliminary experiments with polynomial kernels, but we have not looked into reformulating the primal to enforce item monotonicity. Fig. 1. An example of a 2-wise dependent valuation. The values listed in the nodes give the agent's weights for the corresponding items. Each item has some small value on its own, but complementarities exist between pairs of items which give added utility to the agent. Note that while this graph is complete, this is not necessary. Absent edges are assumed to have weight 0. the small set of bundles that are candidates for maximizing the discriminant function (= agent utility.)

Combinatorial Auctions with Positive k-wise Dependent Valuations
We also study combinatorial auctions where agents have positive k-wise dependent valuations [Conitzer et al. 2005]. This setting allows us to apply the ideas discussed in Section 4.1.2 to attain a polynomial time training formulation despite the exponential size of 1 .
When an agent has a k-wise dependent valuation, the agent's valuation is described by a hypergraph G = (R, E) where R is a set of nodes and E is a set of hyperedges of size at most k. The nodes in the graph correspond to the items being auctioned (which is why we use j ∈ R to refer to both nodes and items), and the hyperedges to groups of these items. These nodes and hyperedges are each assigned weights z 1 (j) and z 1 (e) respectively. An agent's value for a subset of items o 1 ∈ 1 is the sum of the weights of nodes and hyperedges contained in o 1 , that is, j∈R,j∈o 1 z 1 (j) + e∈E,e⊆o 1 z 1 (e). Figure 1 gives a pictorial view of a simple 2-wise valuation over 3 items.
A positive k-wise dependent valuation adds the restriction that hyperedge weights are positive. This restriction is required for our results, and is also studied by Abraham et al. [2012]. This forces agent valuations to be supermodular, that is, for all sets of items o 1 , o 2 ∈ 1 . When we have multiple agents, we use z i (j) and z i (e) to denote the weights that agent i assigns to nodes j and edges e. For convenience, z i (e) for an edge not in the agent's edge set is defined to be 0. If we are given the agent's type θ i , then it can be convenient to write z i (θ i , j) or z i (θ i , e) to represent the weights in the agent's underlying graph when its valuation function is θ i . Though these valuations are very different from the multi-minded valuations we discussed earlier, the winner determination problem for positive k-wise dependent valuations is still NP-hard when k = 2 and hence for any values of k > 1 [Conitzer et al. 2005]. Because the winner determination problem is NP-hard, we seek to learn a payment rule for a greedy allocation algorithm.
Going forward, we specialize to the case of k = 2, which represents the case where the agent's hypergraph is just a graph. For this case, it is possible to make Training Problem 1 tractable by carefully choosing the attribute map. We discuss extensions to k > 2 at the end of Section 5.2.3.

A Greedy Algorithm.
We first introduce a simple greedy algorithm, that tries to find an allocation with good welfare. We use this greedy algorithm both in defining the attribute map, and as an outcome rule in our experimental results.
Let R = {1, . . . , r} denote the set of all items. Given some subset of items R ⊆ R, the greedy algorithm orders the items by index and assigns the items incrementally. At each step, the algorithm computes the gain in welfare of assigning the item to each agent and chooses the agent that provides the maximum gain in welfare. Note that if an item j has been assigned to an agent i, then when considering the assignment for item k the gain in welfare of assigning it to agent i includes agent i's node weight for item k as well as agent i's edge weight for edge (j, k) (if the edge exists in the agent's valuation graph). We let GREEDY i (R ) denote agent i's allocation when this greedy algorithm is run on R .

A Concrete Example.
To clarify the construction, we introduce a simple example where agents have 2-wise dependent valuations. Consider a setting where we have 3 agents and 3 items. We denote the agents and items using indices 1, 2, 3 but the association should be clear from context. The agents have the following 2-wise dependent valuations: Applied to this example, the greedy algorithm first considers the assignment of item 1. Agent 3 has the highest value, so 1 goes to agent 3. We then consider item 2. The gain in giving this to agent 1 is 4, the gain to agent 2 is 6, and the gain to agent 3 is 3 + 2 = 5 (for agent 3, we add in both z 3 (2) and z 3 (1, 2) since 1 was given to agent 3). As a result, agent 2 has the highest gain and we give the item to agent 2. Then for item 3, the gains are 2, 2 + 3 = 5, and 1 + 7 = 8 respectively. As a result, item 3 is assigned to agent 3.

Attribute Map.
Before defining our attribute map, we provide some intuition for why we use our particular attribute map. Our goal is to apply the techniques described in Section 4.1.2, which require us to write max o 1 ∈ 1 f w θ , o 1 + L o 1 , o 1 as a linear program. Going forward, we assume that L o 1 , o 1 is 0 everywhere to simplify the presentation. 5 At first glance, 1 has size that is exponential in the number of items being allocated. It may be possible to write an integer program that solves this maximization, but we require a linear program to apply the ideas of Section 4.1.2. We rely on a result of Taskar et al. [2004] for Markov Random Fields, which, when translated to our setting, shows that if a single agent has what we call a semipositive 2-wise dependent valuation, then it is possible to find the bundle of items that maximizes the agent's utility. Of course if an agent's valuation is positive 2-wise dependent, then the value maximizing bundle is always the set of all items. A semipositive 2-wise dependent valuation allows an agent to have a weight for not receiving an item or not receiving any item of a given pair of items, and the weights associated with receiving and not receiving a single item can be negative (though the weights for receiving both items 5 As will become clear in the following discussion, we only require that L(o 1 , o 1 ) be expressed as a sum of products where each product consists of a weight multiplied by (i) an indicator of whether o 1 contains a given subset of items; or (ii) an indicator of whether o 1 does not intersect a given subset of items. As a result, we can adjust the null loss by using an indicator for o 1 not intersecting the entire set of items. in a pair or not receiving any item in a pair must be nonnegative). More concretely, a semipositive 2-wise dependent valuation can be written as where z 1 ( j ) and z 1 ( j ) can take on negative values, and z 1 ( j 1 , j 2 ) and z 1 ( j 1 , j 2 ) are nonnegative. While z 1 is active when an item or a pair of items is included in o 1 , z 1 is active when an item is not in o 1 or neither of a pair of items is in o 1 . This flexibility allows for richer valuation functions which are not permitted with positive 2-wise valuations.
If an agent's valuation is semipositive 2-wise dependent, then though 1 is of exponential size in the number of items, there is a polynomially sized linear program that finds the bundle o 1 that maximizes v(θ 1 , o 1 ).
Our goal in defining an attribute map χ 3 , is therefore to convert f w θ , o 1 into a semipositive 2-wise dependent valuation. It is informative to write out f w θ , o 1 when specialized to the case where agents have positive 2-wise dependent valuations: The part representing agent 1's value for o 1 already has the structure of a positive 2-wise dependent valuation (after all, we have assumed that all agents have this valuation structure). What remains is to design χ 3 so that the entire expression, when summed together, resembles a semipositive 2-wise dependent valuation. If we can accomplish this, then it will be possible to write the maximization max o 1 ∈ 1 f w (θ 1 , o 1 ) as a linear program. The trick is to construct the attribute map in a way such that the right-hand side of the expression decomposes into a sum over individual terms, each of which corresponds to a weight for a node or an edge in a semipositive 2-wise dependent valuation.
In addition to ensuring this structure for our attribute map, we still want the attribute map to capture the effect of an assignment of bundle o 1 to agent 1 on the valuations of the other agents, so that the total value of the outcome rule constrained to allocate o 1 to agent 1 can be estimated. In this case, we use the greedy outcome rule in an explicit way to define our attribute map.
The attribute map χ 3 (θ −1 , o 1 ) maps from −1 × 1 → R 2r+r(r−1) . For each possible item j ∈ {1, . . . , r} that can be in bundle o 1 , we include two entries in the attribute vector χ 3 (θ −1 , o 1 ): where I is an indicator variable. μ j (0) is designed to represent the "gain" to others of not allocating item j to agent 1. We calculate this directly as z i ( j ) in the case that some agent i = 1 is allocated item j in GREEDY i (R), otherwise we set the value μ j (0) to 1. Value μ j (1) does the opposite, approximating the "cost" to others of allocating item j to agent 1. We calculate this as welfare(GREEDY −1 (R)) − welfare(GREEDY −1 (R \ {j}))), where welfare returns the total value of the assignment and GREEDY −1 is GREEDY with agent 1 removed. We also include two entries in attribute vector χ 3 (θ −1 , o 1 ) for each pair of items j 1 , j 2 : where μ j 1 ,j 2 (0) and μ j 1 ,j 2 (1) indicate the value to the other agents when agent 1 is not assigned either j 1 or j 2 and assigned both j 1 and j 2 respectively. Note that we do not specify values for the cases where exactly one of j 1 , j 2 is contained in o 1 as these types of weights are not permitted in a semipositive 2-wise dependent valuation. The values for μ j 1 ,j 2 are obtained by considering the allocation of GREEDY −1 when applied to all items, and considering whether items j 1 and j 2 are assigned to the same agent i. If they are not, then they are set to zero. Otherwise, they are set to −z i (j 1 , j 2 ). The negative sign here is important for tractability since only nonnegative edge weights are permitted in a semipositive 2-wise dependent valuation. The intuition for why we do not make μ j 1 ,j 2 (1) equal to the "cost" of allocating items j 1 , j 2 is that if {j 1 , j 2 } ⊆ o 1 then this cost is already accounted for in μ j 1 (1) and μ j 2 (1). In fact, the cost is double-counted since in both GREEDY −1 (R \ {j 1 }) and GREEDY −1 (R \ {j 2 }) no agents can derive value from edge (j 1 , j 2 ) since one of the items is missing in both cases.
Returning to our example from Section 5.2.2, recall that when run on all agents, the greedy algorithm gives items 1 and 3 to agent 3 and item 2 to agent 2. The total welfare in this case is 19. The total value to agents 2 and 3 is also 19 since agent 1 does not receive any items. In this case, we then have the following attribute values.
A useful way to think of this attribute map χ 3 is that it modifies agent 1's 2-wise valuation, while still maintaining the structure of a semipositive 2-wise dependent valuation. Pictorially, we can think of this as combining agent 1's valuation graph with the valuation graph induced by the feature map. See Figure 2 for an illustration. While we specialize to the case of positive 2-wise dependent valuations, the same approach should be applicable to positive k-wise dependent valuations. Semipositive k-wise dependent valuations, where an agent can have weights for hyperedges, can also be tractably optimized, and a similar approach can be applied where the attribute vector is carefully chosen so that f w (θ 1 , o 1 ) looks like a semipositive k-wise dependent valuation.

A Tractable Training Problem.
We have now given the attribute map χ 3 , with an eye on making f w (θ 1 , o 1 ) look like a semipositive 2-wise dependent valuation. Before proving the main result of this section, namely that we can solve max o 1 ∈ 1 f w (θ 1 , o 1 ) Fig. 2. A pictorial representation of the attribute map χ 3 for our concrete example. We make the attribute map χ 3 resemble a 2-wise dependent valuation (with weights for not being assigned a node and not being assigned any of the items for an edge) so that when combined with agent 1's valuation, we have a modified single-agent problem. The weights are not shown here, but they would be multiplied by the values in χ 3 as illustrated in the graph.
in polynomial time, we need to impose positivity constraints on certain elements of the vector w. This restriction of the space of possible weights enables us to make sure that f w (θ 1 , o 1 ) is a semipositive 2-wise dependent valuation at the possible loss of some pricing accuracy. It also prevents us from using the kernel trick over the positiverestricted weights, although we can still use kernels on the rest. In the present analysis we choose not to add this complexity and work with a linear kernel only.
Before proving our theorem, we formally show that we can find an agents' maximum value if the agent as a semipositive 2-wise dependent valuation.

LEMMA 5.1. If an agent has a semipositive 2-wise dependent valuation, then it is possible to find the agent's value-maximizing bundle in time polynomial in r, the number of items in the auction.
PROOF. Our proof relies on a connection between semipositive 2-wise dependent valuations and Markov networks. Finding the value maximizing bundle for an agent with a semipositive 2-wise dependent valuation is equivalent to finding a maximum a posteriori assignment on a particularly defined Markov network. Consider a Markov network with a node for each item and edges between pairs of nodes. Let each node be a binary variable, and let the node potentials be defined based on the coefficients in the maximization. The potential values for setting a node j to be 0 or 1 be exp z 1 ( j ) and exp z 1 ( j ) and the potential values for an edge j 1 , j 2 being set to 0, 0 and 1, 1 be exp( z 1 ( j 1 , j 2 )), exp z 1 ( j 1 , j 2 ) respectively. The potential values for setting edges to 0, 1 and 1, 0 are set to 1 = exp(0). Finding the value-maximizing bundle for an agent with a semipositive 2-wise dependent valuation defined by z 1 and z 1 is equivalent finding a maximum a posteriori assignment in the Markov network we have defined.
The first set of constraints ensures that exactly one of I j,0 and I j,1 is active. The second set of constraints ensures that I j 1 ,j 2 ,p is active if and only if I j 1 ,p and I j 2 ,p are active. Note that the if direction follows because the objective coefficients of I j 1 ,j 2 ,p are nonnegative, so the second set of constraints will be tight if I j 1 ,p and I j 2 ,p are both set to 1. Therefore, the value of the objective corresponds to the single agent's value for o, where o consists of the items j for which I j,1 is set to one.
To complete the proof, we apply Theorem 3.1 from Taskar et al. [2004] to show that the LP relaxation of this integer program is integral if z 1 ( j 1 , j 2 ) and z 1 ( j 1 , j 2 ) are nonnegative.
THEOREM 5.2. When agents have positive 2-wise dependent valuations and we use attribute map χ 3 without a kernel, then we can solve the structural SVM training problem (with added constraints on the weight vector w discussed earlier) in time polynomial in r, the number of items in the auction and n, the number of agents.
PROOF. To prove this theorem, we just need to show that f w (θ 1 , o 1 ) can be written as v (θ 1 , o 1 ) where v is a semipositive 2-wise dependent valuation.
We observe that χ 3 (θ −1 , o 1 ) is a vector with 2r + r(r − 1) elements. Therefore, the weight vector w −1 will have the same number of elements. We index elements of these vectors using notation similar to the notation we use for χ 3 (θ −1 , o 1 ). That is, we let w j (p) correspond to the attribute term that includes μ j (p), where p ∈ {0, 1}. Similarly, we let w j 1 ,j 2 (p) correspond to the attribute term that includes μ j 1 ,j 2 (p), where p ∈ {0, 1}.
In the primal formulation of Training Problem 1, we add the constraints that w j 1 ,j 2 (p) ≥ 0 for p ∈ {0, 1} and all j 1 , j 2 . While not strictly necessary, we also impose that w 1 = 1 (as we are working with the primal formulation, the enforcement of such a constraint is available to us; alternatively, we could forgo this constraint and operate in the dual, enabling the use of kernels over the unconstrained attributes of the feature map).
Because we impose the constraints that w j 1 , j 2 are nonnegative, the coefficients of the indicator variables for the edges ( j 1 , j 2 ) will be positive. Indeed, as we defined in Section 5.2.3, μ j 1 , j 2 (0) = μ j 1 , j 2 (1) ≤ 0. Combining this with the assumption that z 1 (θ 1 , ( j 1 , j 2 )) ≥ 0 and our constraint that w j 1 , j 2 ≥ 0, we see that the coefficients of the indicator variables for the edges ( j 1 , j 2 ) are positive. Having argued that the coefficients of the indicator variables for edges are positive, it is straightforward to conclude that there exists a semipositive 2-wise dependent v such that f w (θ 1 , o 1 ) = v (θ 1 , o 1 ).
We can now draw a connection to Markov networks. The k-wise Maximization Problem is equivalent to finding a maximum a posteriori assignment on a particularly defined Markov network. Consider a Markov network with a node for each item and edges between pairs of nodes. Let each node be a binary variable, and let the node potentials be defined based on the coefficients in the maximization. The potential values for setting a node j to be 0 or 1 are exp(w j (0)μ j (0)) and exp z 1 (θ 1 , j) + w j (1)μ j (1) respectively. The potential values for an edge j 1 , j 2 being set to 0, 0 and 1, 1 are exp(−w j 1 , j 2 μ j 1 , j 2 (0) and exp z 1 (θ 1 , ( j 1 , j 2 )) − w j 1 , j 2 μ j 1 , j 2 (1) respectively. The potential values for setting edges to 0, 1 and 1, 0 are set to 1 = exp(0). Solving the k-wise Maximization Problem is equivalent to finding a maximum a posteriori assignment in the Markov network we have defined.
The first set of constraints ensures that exactly one of I j,0 and I j,1 is active. The second set of constraints ensures that I j 1 , j 2 ,p is active if and only if I j 1 ,p and I j 2 ,p are active. Note that the if direction follows because the objective coefficients of I j 1 , j 2 ,p are nonnegative, so the second set of constraints will be tight if I j 1 ,p and I j 2 ,p are both set to 1. Therefore, the value of the objective corresponds to f w (θ , o), where o consists of the items j for which I j,1 is set to one.
To complete the proof, we apply Theorem 3.1 of Taskar et al. [2004] to show that the LP relaxation of this integer program is integral.

The Assignment Problem
In the assignment problem, we are given a set of n agents and a set {1, . . . , n} of items, and wish to assign each item to exactly one agent. The outcome space of agent i is thus i = {1, . . . , n}, and its type can be represented by a vector θ i ∈ i = R n . The set of possible type profiles is = R n 2 .
We consider an outcome rule that maximizes egalitarian welfare in a lexicographic manner: first, the minimum value of any agent is maximized; if more than one outcome achieves the minimum, the second lowest value is maximized, and so forth. This outcome rule can be computed by solving a sequence of integer programs. As such, our focus in this application is not on studying the framework for the setting of tractable outcome rules, but rather for understanding its performance on an objective that is far from welfare maximizing. We continue to assume agent symmetry, and adopt the view of agent 1.
To complete our specification of the structural SVM framework for this application, we need to again define an attribute map. In this case, we follow the same approach as the definition of attribute map χ 2 for the multi-minded combinatorial auction application. The attribute map, χ 4 (θ −1 , j), where the second argument indexes the item assigned to agent 1, is constructed as where θ i [ −j] denotes the vector obtained by removing the jth entry from valuation type θ i . The attribute map reflects the effect of assigning item j to agent 1 on the valuations of the other agents, capturing the fact that the item cannot be assigned to any other agent. For this set of experiments, we choose not to apply nonlinear kernels on top of this attribute vector in order to evaluate the effect of a simple feature map.

EXPERIMENTAL EVALUATION
We perform a series of experiments to test our theoretical framework. To run our experiments, we use the SVM struct package [Joachims et al. 2009], which allows for the use of custom kernel functions, attribute maps, and separation oracles.

Setup
We begin by briefly discussing our experimental methodology, performance metrics, and optimizations used to speed up the experiments.
6.1.1. Methodology. For each of the settings we consider, we generate three data sets: a training set, a validation set, and a test set. The training set is used as input to Training Problem 2, which in turn yields classifiers h w and corresponding payment rules p w . For each choice of the parameter C of Training Problem 2, and the parameter γ if the RBF kernel is used, a classifier h w is learned based on the training set and evaluated based on the validation set. The classifier with the highest accuracy on the validation set is then chosen and evaluated on the test set. During training, we take the perspective of agent 1, and so a training set size of means that we train an SVM on examples. Once a partial outcome rule has been learned, however, it can be used to infer payments for all agents. We exploit this fact during testing, and report performance metrics across all agents for a given instance in the test set.
6.1.2. Metrics. We employ three metrics to measure the performance of the learned classifiers. These metrics are computed over the test set Classification Accuracy. Classification accuracy measures the accuracy of the trained classifier in predicting the outcome. Each instance of the instances has n agents, so in total we measure accuracy over n instances: 6 Ex Post Regret. We measure ex post regret by summing over the ex post regret experienced by all agents in each of the instances in the dataset, that is, Individual Rationality Violation. This metric measures the fraction of individual rationality violation across all agents: Expected Individual Rationality Violation. This metric measures the expected amount of individual rationality violation across all agents: We also measure exp-cond-ir-viol which only averages over agents with negative utility.
Individual Rationality Violation Percentiles. ir-viol-95 measures the threshold at which 95% of agents have utility at least the negative of this value. So if this metric is 0.2, this means that 95% of agents have utility at least -0.2. Similarly, we measure cond-ir-viol-95, which provides the same threshold but limited to users with negative utility. 6.1.3. Optimizations. In the case of multi-minded CAs, we first map the inputs θ −1 into a smaller space, which allows us to learn more effectively with smaller amounts of data. 7 For this step, we use instance-based normalization, which normalizes the values in θ −1 by the highest observed value and then rescales the computed payment appropriately, and sorting, which orders agents based on bid values.
Before passing examples θ to the learning algorithm or learned classifier, they are normalized by a positive multiplier so that the value of the highest bid by agents other than agent 1 is exactly 1, before passing it to the learning algorithm or classifier. The values and the solution are then transformed back to the original scale before computing the payment rule p w . This technique of instance-based normalization leverages the observation that agent 1's allocation depends on the relative values of the other agent's reports, so that scaling all reports by a factor does not affect the outcome chosen. We apply this to multi-minded CAs and the assignment problem, but not to our experiments on CAs with positive k-wise dependent valuations.
In the sorting step, instead of choosing an arbitrary ordering of agents in θ −i , we choose a specific ordering based on the maximum value the agent reports. For example, in a single-item setting, this amounts to ordering agents by their value. In the multiminded CA setting, agents are ordered by the value they report for their most desired bundle. The intuition behind sorting is that we can again decrease the space of possible θ −i reports the learner sees and learn more quickly. In the single-item case, we know that the second price payment rule only depends on the maximum value across all other agents, and sorting places this value in the first coordinate of θ −i . We apply sorting to the assignment problem by ordering agents by their maximum value for any item. We do not apply sorting to our experiments with k-wise dependent valuations in CAs. 8

Single-Item Auction
As a sanity check, we first perform experiments in application to a single-item auction with the efficient outcome rule, where the agent with the highest bid receives the item. For the distribution D on value profiles, we simply draw each agent's value independently from a uniform distribution on [ 0, 1]. The outcome rule g allocates the item to the agent with the highest value. We use a training set size of 300, and validation and test set sizes of 1000. We use an RBF kernel and parameters C ∈ {10 4 , 10 5 } and γ ∈ {0.01, 0.1, 1}.
In this case, we know that the associated payment function that makes (g, p) strategyproof is the second price payment rule.
The results reported in Table I and Figure 3 are for attribute maps χ 1 and χ 2 , which can be applied to this setting by observing that single-item auctions are a special case of multi-minded CAs. In particular, letting 0 be the 0 vector of dimension n − 1, For both choices of the attribute map we obtain excellent accuracy and very close approximation to the second-price payment rule. This shows that the framework is able to automatically learn the payment rule of Vickrey's auction. 9 6.3. Multi-Minded CAs 6.3.1. Type Distribution. Recall that in a multi-minded setting, there are r items, and each agent is interested in exactly κ > 1 bundles. For each bundle, we use the following procedure to determine which items are included in the bundle. We first assign an item to the bundle uniformly at random. Then with probability α, we add another random item (chosen uniformly from the remaining items), and with probability (1−α) we stop. We continue this procedure until we stop or have exhausted the items. This procedure is inspired by Sandholm's decay distribution for the single-minded setting [Sandholm 2002], and we use α = 0.75 to be consistent with that setting, where this parameter value generated harder instances of the winner determination problem.
Once the bundle identities have been determined, we sample values for these bundles. Let c be an r-dimensional vector with entries chosen uniformly from (0, 1]. For each agent i, let d i be an r-dimensional vector with entries chosen uniformly from (0, 1]. Each entry of c denotes the common value of a specific item, while each entry of d i denotes the private value of a specific item for agent i. The value of bundle S ij is then given by for parameters β ∈[ 0, 1] and ζ > 1. The inner product in the numerator corresponds to a sum over values of items, where common and private values for each item are respectively weighted with β and (1 − β). The denominator normalizes all valuations to the interval (0, 1]. Parameter ζ controls the degree of complementarity among items: ζ > 1 implies that goods are complements, whereas ζ < 1 means that goods are substitutes. Choosing the minimum over bundles S ij contained in S ij finally ensures that the resulting valuations are monotonic. 6.3.2. Outcome Rules. We use two outcome rules in our experiments on multi-minded CAs. For the optimal outcome rule, the payment rule p vcg makes the mechanism (g opt , p vcg ) strategyproof. Under this payment rule, agent i pays the externality it imposes on other agents. That is, The second outcome rule with which we experiment is a generalization of the greedy outcome rule for single-minded CA [Lehmann et al. 2002]. Our generalization of the greedy rule is as follows. Let θ be the agent valuations and o i ( j ) denote the jth bundle desired by agent i.
The greedy outcome rule orders the desired bundles by this score, and takes the bundle o i ( j ) with the next highest score as long as agent i has not already been allocated a bundle and o i ( j ) does not contain any items already allocated. While this greedy outcome rule has an associated payment rule that makes it strategyproof in the single-minded case, it is not implementable in the multi-minded case, as evidenced by the example in Appendix B.

Description of Experiments.
We experiment with training sets of sizes 100, 300, and 500, and validation and test sets of size 1000. All experiments we report on are for a setting with 5 agents, 5 items, and 3 bundles per agent, and use β = 0.5, the RBF kernel, and parameters C ∈ {10 4 , 10 5 } and γ ∈ {0.01, 0.1, 1}.   Table II presents the basic results for multi-minded CAs with optimal and greedy outcome rules, respectively. For both outcome rules, we present the results for p vcg as a baseline. Because p vcg is the strategyproof payment rule for the optimal outcome rule, p vcg always has accuracy 100, regret 0, and IR violation 0 for the optimal outcome rule. The main findings are that our learned payment rule has low regret for the optimal outcome rule and regret that is about the same as or better than the regret of p vcg when the outcome rule is greedy. Given that greedy winner determination is seeking to maximize total welfare it is natural the VCG-based payments would perform reasonably well in this environment.
Across all instances, as expected, accuracy is negatively correlated with regret and ex post IR violation. The degree of complementarity between items, ζ , as well as the outcome rule chosen, has a major effect on the results. Instances with low complementarity (ζ = 0.5) yield payment rules with higher regret, and χ 1 performs better on the greedy outcome rule while χ 2 performs better on the optimal outcome rule. For high complementarity between items the greedy outcome tends to allocate all items to a single agent, and the learned price function sets high prices for small bundles to capture this property. For low complementarity the allocation tends to be split and less predictable. Still, the best classifiers achieve average ex post regret of less than 0.032 (for values normalized to [0,1]) even though the corresponding prediction accuracy can be as low as 67%.
For the greedy outcome rule, the performance of p vcg is comparable for ζ ∈ {1.0, 1.5} but worse than the payment rule learned in our framework in the case of ζ = 0.5, where the greedy outcome rule becomes less optimal.
6.3.5. Effect of Training Set Size. Table III charts performance as the training set size is varied for the greedy outcome rule. While training data is readily available (we can  Table IV.
Impact of payment offset and null loss fix for ζ = 0.5 and greedy outcome rule, training set size 300. All results are for χ 2 , null loss values across columns.  Table III shows that regret decreases with larger training sets, and for a training set size of 500, the best of χ 1 and χ 2 outperforms p vcg for ζ = 0.5 and is comparable to p vcg for ζ ∈ {1.0, 1.5}.
6.3.6. IR Fixes. Tables IV and V summarize our results regarding the various fixes to IR violations, for the particularly challenging case of the greedy outcome rule and ζ = 0.5. The extent of IR violation decreases with larger payment offset and null loss. Regret tends to move in the opposite direction, but there are cases where IR violation and regret both decrease. The three rightmost columns of Table IV list the average ratio between welfare after and before the deallocation fix, across the instances in the test set. With a payment offset of 0, a large welfare hit is incurred if we deallocate agents with IR violations. However, this penalty decreases with increasing payment offsets and increasing null loss. At the most extreme payment offset and null loss adjustment, the IR violation is as low as 2%, and the deallocation fix incurs a welfare loss of only 7%. Table V provides detail on the amount by which IR is violated. The expected amount of IR violation is low, though this becomes more significant when we condition on agents with IR violations. Larger payment offsets decrease all of these metrics as expected. Figure 4 shows a graphical representation of the impact of payment offsets and null losses. Each line in the plot corresponds to a payment rule learned with a different null loss, and each point on a line corresponds to a different payment offset. The payment offset is zero for the topmost point on each line, and equal to 0.29 for the lowest point on each line. Increasing the payment offset always decreases the rate of IR violation, but may decrease or increase regret. Increasing null loss lowers the topmost point on a given line, but arbitrarily increasing null loss can be harmful. Indeed, in the figure on the left, a null loss of 1.5 results in a slightly higher topmost point but significantly lower regret at this topmost point compared to a null loss of 2.0. It is also interesting to note that these adjustments have much more impact on the hardest distribution with ζ = 0.5. 6.3.7. Item Monotonicity. Table VI presents a comparison of a payment rule learned with explicit enumeration of all bundle constraints (the default that we have been using for our other results) and a payment rule learned by optimistically assuming item monotonicity (see Section 5.1.3). Performance is affected when we drop constraints and optimistically assume item monotonicity, although the effects are small for ζ ∈ {1.0, 1.5} and larger for ζ = 0.5. Because item monotonicity allows for the training problem to be succinctly specified, we may be able to train on more data, and Comparison of performance with and without optimistically assuming item monotonicity. (i-mon) indicates a payment rule learned by optimistically assuming item monotonicity. Greedy outcome rule. Training set size 300.
n ζ accuracy regret ir-violation this seems a very promising avenue for further consideration (perhaps coupled with heuristic methods to add additional constraints to the training problem).

Combinatorial Auctions with Positive k-wise Dependent Valuations
We experiment with our framework on combinatorial auctions with positive k-wise dependent valuations. We find that our learned payment rules can outperform VCGbased payment rules in terms of regret for settings with large numbers of items, and outperform VCG-based payment rules in terms of the trade-off between IR violation and regret. Because we have an efficient separation oracle as discussed in Section 5.2.4, we are able to train payment rules and compute regret for larger instances.
In order to experiment with positive k-wise dependent valuations in combinatorial auctions, we need a way to generate such valuations. To construct agent i's valuation, we first specify the nodes and edges in a graph (R, E), and then assign weights z i ( j ) and z i (e) over the nodes j ∈ R and edges e ∈ E. For every possible edge ( j 1 , j 2 ), we add the edge to the agents' graph with probability ρ. Value z i ( j ) is sampled uniformly at random from [ 0, 1]; the weight for each added edge is also sampled uniformly at random from [ 0, 1]. With this setup, the edge probability parameter ρ lets us generate test instances of varying edge density. So that our regret numbers are comparable across different size instances, we normalize each agent's weights by the expected value for the set of all items.
The outcome rule we use is the greedy outcome rule outlined in Section 5.2.1. We use a training set size of 1000 and validation and test sets of size 500. We compare against a VCG-based payment rule which runs the greedy allocation rule on all agents and on all agents excluding agent i and charges agent i the difference in value to agents other that i in the two allocations. Tables VII and VIII and Figure 5 compare our learned payment rules (with 0 null loss) to the VCG-based payment rule for ρ = 0.1 and ρ = 0.9. The learned payment rule has better regret, despite having worse accuracy. However, the learned payment rule incurs significant IR violation. We examine the IR violation issue in Figure 6. Here we implement the two IR fixes of increasing the null loss value and applying payment offsets. We see that across all instances, we can find settings of the null loss for which our IR / regret curve lies beneath that of the VCG-based payment rule, indicating that we have settings which have better regret and lower IR violation. We also see that despite having significant IR failures when we have no payment offset, we can significantly decrease IR violation at the cost of a small amount of regret increase by using a payment offset.

The Egalitarian Assignment Problem
In the assignment problem, agents' values for the items are sampled uniformly and independently from [ 0, 1]. We use a training set of size 600, validation and test Regret v. Number of Items for learned payment rule and VCG-based payment rules. For 10 and 20 items, we do not have regret number for the VCG-based rules because computing regret requires enumeration over all possible bundles. In this case, the regret for learned payment rules and 10 and 20 items is an upper bound on the true regret obtained by applying our tractable separation oracle. Fig. 6. Regret v. IR Violation trade-off for learned payment rule and VCG-based payment rule for k-wise dependent valuations. We do not have regret numbers for the VCG-based rule and 10 and 20 items because computing regret requires brute force enumeration over all possible bundles. In this case, the regret numbers for the learned payment rule are an upper bound on regret obtained by using our tractable separation oracle. sets of size 1000, and the RBF kernel with parameters C ∈ {10, 1000, 100000} and γ ∈ {0.1, 0.5, 1.0}. We find that our learned payment rules have significantly better accuracy and regret than VCG-based payment rules. We explain the improvement over VCG-based payments by observing that the egalitarian rule is not maximizing total welfare, and thus not compatible in this sense with VCG-based ideas.
The performance of the learned payment rules is compared to that of three VCGbased payment rules. For this, let W be the total welfare of all agents other than i under the outcome chosen by g, and W eg be the minimum value any agent other than i receives under this outcome. We consider the following payment rules: (1) the vcg payment rule, where agent i pays the difference between the maximum total welfare of the other agents under any allocation and W; (2) the tot-vcg payment rule, where agent i pays the difference between the total welfare of the other agents under the allocation maximizing egalitarian welfare and W; and (3) the eg-vcg payment rule, where agent i pays the difference between the minimum value of any agent under the allocation maximizing egalitarian welfare and W eg .
The results are shown in Table IX. We see that the learned payment rule p w yields significantly lower regret than any of the VCG-based payment rules, and average ex post regret less than 0.074 for values normalized to [ 0, 1]. Since we are not maximizing  the sum of values of the agents, it is not very surprising that VCG-based payment rules perform rather poorly. The learned payment rule p w can adjust to the outcome rule, and also achieves a low fraction of ex post IR violation of at most 3%.

CONCLUSIONS
We have introduced a new paradigm for computational mechanism design, in which statistical machine learning is adopted to design payment rules for outcome rules, and have shown encouraging experimental results. The mechanism design domain can be multiparameter, and the outcome rules can be specified algorithmically and need not be designed for objectives that are separable across agents. Central to our approach is to relax incentive compatibility as a hard constraint on mechanism design, adopting in its place the goal of minimizing expected regret while requiring agent-independent prices. Future directions of interest include: (1) considering alternative learning paradigms, including formulations of the problem as a regression rather than classification problem; (2) developing formulations that can impose constraints on properties of the learned payment rule, concerning for example the core, budgets, or individualrationality properties; (3) developing methods that learn classifiers that induce feasible outcome rules, so that these learned outcome rules can be used directly; (4) extending the approach to domains without money by developing a structure on discriminant functions appropriate to the incentive considerations facing rational selfinterested agents in such domains; (5) investigating the extent to which alternative goals can be achieved through machine learning, such as regret percentiles (maximizing the probability that the ex post regret is no greater than some amount λ > 0), or directly minimizing the expected interim regret; (6) explore alternate attribute maps (e.g., it would be interesting to adopt attributes that encode economic concepts such as the total externality imposed on others by by assigning a bundle of items to agent 1), kernels, and succinct valuation representations.

A. EFFICIENT COMPUTATION OF INNER PRODUCTS
For both χ 1 and χ 2 , computing inner products reduces to the question of whether inner products between valuation profiles are efficiently computable. For χ 1 , we have that We next develop efficient methods for computing the inner products θ i , θ i on compactly represented valuation functions. The computation of θ i \ o 1 , θ i \ o 1 can be done through similar methods.
In the single-minded setting, let θ i correspond to a bundle S i ⊆ {1, . . . , r} of items with value v i , and θ i correspond to a set S i ⊆ {1, . . . , r} of items valued at v i .
Each set containing both S i and S i contributes v i v i to θ T i θ i , while all other sets contribute 0. Since there are exactly 2 r−|S i ∪S i | sets containing both S i and S i , we have This is a special case of the formula for the multi-minded case.
LEMMA A.1. Consider a multi-minded CA and two bid vectors x 1 and x 1 corresponding to sets S = {S 1 , . . . , S s } and S = {S 1 , . . . , S t }, with associated values v 1 , . . . , v s and v 1 , . . . , v t . Then, PROOF. The contribution of a particular bundle B of items to the inner product is (max S i ∈S,S i ⊆B v i ) · (max S j ∈S ,S j ⊆B v j ), and thus By the maximum-minimums identity, which asserts that for any set {x 1 , . . . , x n } of n numbers, max{x 1 , . . . , x n } = Z⊆X ((−1) |Z|+1 · (min x i ∈Z x i )), The inner product can thus be written as Finally, for given T ⊆ S and T ⊆ S , there exist exactly 2 r−|( S i ∈T S i )∪( S j ∈T S j )| bundles B such that S i ∈T S i ⊆ B and S j ∈T S j ⊆ B , and we obtain If S and S have constant size, then the sum on the right hand side of (6) ranges over a constant number of sets and can be computed efficiently.

B. GREEDY ALLOCATION RULE IS NOT WEAKLY MONOTONE
Consider a setting with a single agent and four items.
If the valuations θ 1 of the agent are