Specification faithfulness in networks with rational nodes

It is useful to prove that an implementation correctly follows a specification. But even with a provably correct implementation, given a choice, would a node choose to follow it? This paper explores how to create distributed system specifications that will be faithfully implemented in networks with rational nodes, so that no node will choose to deviate. Given a strategyproof centralized mechanism, and given a network of nodes modeled as having rational-manipulation faults, we provide a proof technique to establish the incentive-, communication-, and algorithm-compatibility properties that guarantee that participating nodes are faithful to a suggested specification. As a case study, we apply our methods to extend the strategyproof interdomain routing mechanism proposed by Feigenbaum, Papadimitriou, Sami, and Shenker (FPSS) [7], defining a faithful implementation.


OVERVIEW
This paper considers how to create provably faithful specifications that are implemented on networks of rational nodes. In these networks, a node acts in self-interested fashion to better its outcome in some distributed mechanism.
It is not hard to find evidence of rational behavior in existing distributed systems. Internet users can game their TCP settings to obtain better service at the expense of others [27]. Users cheat in distributed computations in order to drive up their "contributed computation" credit [15]. There is interesting work documenting the "free rider" problem [1] and the "tragedy of the commons" [12] in a data centric peer to peer setting.
This behavior can be classified as a type of failure, which should stand independently from other types of failures in distributed systems and is indicative of an underlying incentive problem in a system's design when run on a network with rational nodes. Whereas traditional failure models are overcome by relying on redundancy, rational manipulation can also be overcome with design techniques such as problem partitioning, catch-and-punish, and incentives. In such a network one can state a strong claim about the faithfulness of each node's implementation. This claim of faithfulness, like traditional distributed systems claims (e.g: safety and liveness [35]), is made with particular assumptions about the knowledge available to network participants. Typical distributed systems knowledge assumptions include node failure characteristics: for instance, a given specification might state safety properties on the assumption that no link failures will occur in the network. The knowledge assumptions particularly relevant to our scenario are drawn from economics and known as equilibrium concepts. This paper focuses on (and justifies) the use of ex post Nash (without collusion) as a reasonable knowledge assumption. The ex post Nash solution concept does not require nodes to have any knowledge of the private information of other nodes, but does assume that nodes are rational utility-maximizers and model other participants as such.
When one can prove that a specification will be faithfully followed by rational nodes in a distributed network, one can certify the system to be incentive-, communication-, and algorithm-compatible (IC, CC and AC). Such a system is provably robust against rational manipulation.
We introduce a general decomposition proof technique, that splits a distributed algorithm into disjoint phases, each of which can be proven IC, CC, and AC by showing that a node cannot benefit from any combination of deviation from the specification relevant to that phase. To ensure that one phase is disjoint from the next, a phase is certified before the subsequent phase begins. 1 To demonstrate this decomposition technique, we modify a well-defined lowest cost interdomain routing problem proposed by Feigenbaum, Papadimitriou, Sami, and Shenker (FPSS) [7], to create a specification that is ex post Nash incentive-, communication-, and algorithm-compatible. Unlike the original FPSS, we do not assume that nodes will be faithful in their computation or message passing.

RELATED WORK
The ideal prerequisites and reference reading for this paper are found in the distributed systems and algorithmic mechanism design literature. A "short list" in distributed systems would include a paper characterizing failure models [29], as well as an introduction to specifications and their proof techniques [35]. The economics subfield of mechanism design (MD) studies how to build systems that exhibit good behavior in equilibrium, when self-interested nodes pursue self-interested strategies. Readers unfamiliar with the fundamentals of mechanism design may wish to seek out a concise introduction [24,13].
Algorithmic mechanism design (AMD) [21,22] assumes centralized decision making, with nodes reporting complete private information to a center, but seeks to make the central computation tractable. Indirect mechanisms, on the other hand, provide an increased computational role to nodes, but still remain largely centralized in that nodes are directly connected to a center and are limited to what we term "information-revelation" actions [24].
Distributed algorithmic mechanism design (DAMD) [7,8] considers MD in a network setting with no center, and distributes computation across the self-interested nodes. DAMD is full of new challenges since one can no longer assume an obedient networking and mechanism infrastructure where rational players control the message paths and mechanism computation. In concluding their seminal work on AMD, Nisan and Ronen noted the "set of problems" that come in implementing a mechanism in a network [23], suggesting that cryptography and distributed payment handling be considered. Feigenbaum et al. [7] identify the problem in DAMD as "the need to reconcile the strategic model with the computational model." Our work attempts a comprehensive treatment of rational manipulation in distributed systems, and provides a framework and a way of reasoning about faithful behavior in mechanisms. Specifically, we are concerned with bringing all aspects of the distributed algorithm itself into an equilibrium. In a companion paper focused on problems in distributed AI [25], we consider a variant on the model in this paper in which there is still a center and trusted communication with nodes, but in which the goal is to off-load as much of the computation as possible onto nodes. Some of the general principles (the partition-, information-revelation, and redundancy principles) in that work also prove useful when dealing with fully-distributed implementations on networks. The specific contribution here is to provide general proof techniques and to extend the ideas to apply to networks without a center.
A number of research projects can be viewed as foreshadowing aspects of achieving general algorithm faithfulness. Some TCP research has focused on modifying the specification and introducing obedient participants to bring faithful computation into line [27]. Other work has assumed trusted communication (or a totally connected communication graph), but no trusted entity to perform computation and made heavy use of cryptography [3]. In mobile networks, some work has looked at achieving faithful mes-sage passing in resource-constrained environments [36,14], through the use of payments and penalties. Message passing has also been studied in the context of an auction over a peer to peer network, with a centralized obedient auctioneer but nodes that may wish to drop or change bids from their neighbors [20].

RATIONAL MANIPULATION
Imagine that a designer specifies a leader-election algorithm to select a computation server in a network whose nodes are distributed across many administrative domains. The winner of this leader election is responsible for running some CPU-intensive task. The designer wants the most powerful node to be selected and specifies an algorithm where each node is to submit its true computation power and then come to a distributed consensus as to which node should be leader. The designer provides a correct implementation, but is dismayed to find that in practice, the protocol fails to elect the most powerful node. What has gone wrong?
In this toy election problem, it is possible that nodes (representing users) do not want to participate faithfully in the distributed algorithm. By truthfully revealing a node's computational power and following the distributed election protocol, a node is in danger of being tasked with a cpuintensive chore that would take resources away from local jobs. The selfish administrator of that node might like to change the election-protocol code to execute something other than the code provided by the system designer.
Researchers have previously characterized the nuances of node failure according to observed behavior and failure remedy [28,17,5,9]. Into the traditional taxonomy that ranges from Failstop to Byzantine, it is appropriate to introduce rational manipulation as a class of system failure. Extending the typical distributed systems failure taxonomy to include rational manipulation is justified for several reasons: • The Internet is already showing evidence of rational manipulation in algorithms that were not designed to handle this type of failure. An anecdotal list appears in previous work [30].
• The behavior of a node that is deviating from a specification for selfish reasons would currently be classified in distributed systems failure taxonomies as a subset of Byzantine behavior. However, rational failures are predictable and motivated because a node will only manipulate in order to increase its own utility in the mechanism. This provides new opportunities for designing against failure, through tools such as incentives and careful problem partitioning.
• It is either suboptimal, or impossible, to use Byzantine Fault Tolerance (BFT) techniques to build systems robust to rational-manipulation failure [30]. BFT requires minimum levels of obedient connectivity and computation to work [18], whereas we might want to design systems in which every participant is capable of rational manipulation. BFT algorithms can be suboptimal in the sense that they require a large processing overhead.

Modeling Node Behavior
Before defining a distributed mechanism specification we need a language for specifications. This language will also make clear the range of behaviors available to a node. We find it useful to consider a mechanism specification expressed in terms of behaviors generated by state machines [35]. A state machine SM consists of the following components: 1. A set L of states, a subset of which are initial states. Given this state machine SM , a specification s : L → A, defines an action s(l) ∈ A for each state l ∈ L. A node's state captures all relevant information about its role in a mechanism. For instance, the state will include received messages, partial computations, private knowledge about itself, and derived or estimated knowledge about other nodes and the world. External actions in a distributed computational system represent actions with some external effect; these actions generate a message to one or more neighbors. These messages can represent the results of calculations, messages forwarded from other nodes, or simply contain information about this node. Internal actions are those that do not generate a message. Internal actions can eventually cause an external action to occur.

Traditional Mechanism Design
State machines are a good way to describe a mechanism specification. Traditional mechanism design (MD), however, often assumes that the only actions available to a node are information revelation actions. Here, a node is allowed to provide input into a center, which is often described as a function from everybody's information revelation to some mechanism outcome.
In mechanism design language, consider a system with nodes, i ∈ I. There are N nodes altogether. Traditional MD considers an implementation problem, in which nodes have private information θi ∈ Θi (often referred to as the type of a node) and the goal is to implement an outcome f (θ) ∈ O with useful properties (as defined by the designer), from a set of feasible outcomes O. A node's type defines all relevant information that pertains to the outcome decision, as well as capturing information about its preferences for different outcomes. Notation θ−i = (θ1, . . . , θi−1, θi+1, . . . , θN ) denotes the type vector without node i.
A centralized (often called a direct-revelation) mechanism, M = (f, Θ) asks nodes to report typesθ ∈ Θ = Θ1×. . .×ΘN to a trusted obedient center that then selects the outcome f (θ). Nodes need not be truthful, but are instead modeled as game-theoretic utility-maximizers (rational), with a utility function, ui(o; θi) ∈ R, that induces a preference ordering s.t. ui(o1; θi) < ui(o2; θi) implies that node i prefers o2 to o1. Incentives are provided so that node i chooses to report θi = θi in equilibrium.
In this model, the center can collect type information from nodes without interference from other nodes, and then compute the outcome, report the outcome to the nodes, and finally enforce the outcome. But, in problems such as leader election there is no trusted center and we need to involve the self-interested nodes themselves in computing, communicating, and enforcing the outcome of the mechanism.

Distributed Mechanism Specification
Distributed mechanism design should encompass actions beyond private information revelation. Rather than dis-cussing a node's reported typeθi, it makes sense to talk of a node's strategy s(θ), which captures how it behaves in all states of the world. Rather than an outcome rule f (θ) ∈ O, that depends on the reported type information, we now must speak of an outcome rule g(s(θ)) ∈ O that depends on the sequence of actions taken by a node. We now present the distributed state machine description in an alternate form, in which the actions, states, and transitions are subsumed by strategy and outcome function. It is helpful to think of the suggested strategy s m i ∈ Σi as the algorithm that the designer would like node i to follow. Strategy s m i is conditioned on the type θi of a node, with s m i (θi) defining the specification of the action that node i with type θi should take in each state (within the statemachine model of node behavior). The feasible strategy space, Σi, places no constraints on internal actions but can constrain external actions. Outcome rule, g(s(θ)) ∈ O, describes the outcome when nodes follow strategy s and have types θ. Suggested strategy s m i decomposes into three strategies, , with information-revelation strategy r m i , message-passing strategy p m i , and computational strategy c m i . Each sub-strategy is responsible for generating one of three kinds of external actions (those corresponding with the sub-strategy), which we formally define in Section 3.4.
Formally, we can model this as each strategy simulating the entire specification, s m i (θi), but only performing its corresponding external actions. Notice that because only one action is taken in each state, no pair of sub-strategies will engage in multiple external actions simultaneously.
Given a distributed mechanism specification we are interested in understanding whether nodes have any incentive to deviate from the suggested actions.

Action Classification
The leader election example shows the three components of a strategy: information revelation, in providing an input to the election algorithm; message passing between nodes; and computation, in following the consensus algorithm. We now provide a formal classification of the external actions in a mechanism specification.
Information Revelation. A node may be asked to reveal information about its private type, such as its computational power in the leader-election setting. We also extend the notion of type to include the concept of semi-private type information. The type θi in traditional MD is most usually viewed as private information to a node. Alternatively, some information can be common knowledge to all nodes or some subset of nodes. In distributed systems it is useful to define the notion of semi-private type information. Semi-private type information exists when some subset of the type of a node is known to at least one other node, but not to all nodes. A good example is network topology: we can imagine that the link between node A and node B is common knowledge to both nodes, while other type information (such as node transit cost) remains private, and other nodes need not know about the existence of this link. Definition 2. External actions ri ∈ EA are informationrevelation actions when the only effect is to reveal consistent (perhaps partial and untruthful) information about a node's type to some other node(s).
Here, we use consistent to mean that there is a single typê θi that would have sent these messages in the suggested specification.
We can provide a careful definition of the informationrevelation actions in the suggested specification. 2 Suppose for exposition that all nodes except node i follow the suggested specification. Then, the information-revelation actions for node i in s m i (θi) are those for which any deviation on any subset of the actions will do no more than implement an outcome that would be selected by following the suggested specification for some (perhaps untruthful) typê , for all s that deviate from s m i (θi) only in information-revelation actions, and for all θi and all θ−i. Thus, information-revelation actions provide a node with no more power to manipulate than that available to a node in a centralized mechanism. 3 Message passing. A node may be asked to pass messages as part of the mechanism computation. For instance, if two nodes are communicating over a logical direct connection, there may still be rational nodes acting as part of the physical underlay.
Definition 3. External actions pi ∈ EA are messagepassing actions when the only effect is to send a message, received from another node, to one (or more) neighbors.

Computation.
A node may be asked to take part in a mechanism calculation. For instance, a node in the lowestcost interdomain-routing setting can be asked to perform part of a distributed computation to determine the lowestcost path, or to determine payments. Definition 4. External actions ci ∈ EA are computational actions when the action can affect the outcome rule used in the distributed mechanism specification (and when the action is more than simple message-passing).
Computational actions have a wider effect than simply forwarding a message or revealing type information. Unlike information-revelation actions, these computational actions introduce opportunities for a node to affect the outcome of a mechanism that do not exist in centralized mechanisms! By definition, a computational action is one for which there is at least one deviation from the suggested specification that will implement an outcome that would not be selected by g(s m (θi, θ−i)) for any reportθi by the node. 2 Notice that we need to rule out the possibility that an agent can provide inconsistent information about its type, for example different information to different neighbors, because a deviation from any number of information-revelation actions must have no more effect than that of misreporting the agent's type and following the suggested specification. 3 The definition excludes actions in which useful computation is also "smuggled" within the message, for example "solve problem P1 (and report the solution) if your value is v1 and solve problem P2 (and report the solution) if your value is v2." We classify these kinds of actions as computational actions when the solutions to P1 or P2 can change the outcome rule g, and not just the information that a node reveals during an implementation.

Knowledge Assumptions
A formal definition of rational manipulation requires that we are clear about the knowledge assumptions that we, as designers, make of nodes in a system. In an economic context, these knowledge assumptions must support the equilibrium solution concept that is adopted to model the behavior of rational nodes.
In traditional MD, it is common to design for a dominantstrategy equilibrium. Recall that a traditional mechanism M = (f, Θ) implements outcome f (θ) ∈ O based on reportŝ θ, perhaps untruthful. A mechanism is strategyproof when truth-revelation is a dominant-strategy equilibrium.
Strategyproofness is particularly useful because it makes an extremely weak knowledge assumption. Nodes need not know the types of other nodes, and nodes need not even believe that other nodes will be rational. In this work we adopt ex post Nash equilibrium as our solution concept, which requires correspondingly stronger knowledge assumptions.
, for all nodes, for all s i = s * i , for all types θi, and for all types θ−i of other nodes.
In an ex post equilibrium no node would like to deviate from its strategy even if it knows the private type information of the other nodes. Thus, as designers we can be agnostic as to whether or not nodes have any knowledge about the private type of other nodes. The main assumption when adopting ex post Nash is that the rationality of nodes is common knowledge amongst nodes. Although a stronger assumption than required for a strategyproof mechanism, we view this as a necessary cost in moving away from centralized computation on a trusted node. Nodes are now involved in implementing the rules of a mechanism, and it seems unlikely that arbitrary deviations by other nodes will still sustain the appropriate incentives for a node to behave faithfully.
The knowledge assumption in ex post Nash is still much weaker than that required in the more standard Nash equilibrium solution concept, which has received greater attention in recent literature on network games [26]. Adopting this notion of Nash equilibrium in our setting would require a node to have knowledge of other nodes' private information, which is usually unrealistic. Remark 1. We assume that nodes, although self-interested, are also benevolent in the sense that a node will implement the suggested strategy as long as it does not strictly prefer some other strategy. Thus, a weak ex post Nash equilibrium (in which a node can have other equally good best-responses) is considered sufficient for a faithful implementation.

Remark 2.
A distributed mechanism may have multiple equilibria, but we are content to achieve implementation in but one of these equilibria. We agree with Brafman and Tennenholtz [2], that the fact that we are considering computational systems makes this assumption more palatable. The typical problem that arises with multiple equilibria is that of selection: how can nodes select the same equilibrium. By distributing an implementation of a suggested specification, it is reasonable to expect that some nodes will be obedient and follow the suggested specification. This acts as a correlating device, preventing other nodes from playing another equilibrium.
Remark 3. Although truth-revelation may remain a dominant-strategy for nodes given that all nodes follow the suggested computational and message-passing actions, in equilibrium a rational node must also reason about whether or not other nodes will follow these suggested computational and message-passing actions. Thus, the equilibrium solution concept must adopt the "lowest-common denominator," which is ex post Nash in our model.

Rational Manipulation
We use the term rational node to describe a node that attempts rational manipulation.

Definition 7.
A node exhibits rational manipulation if it fails to implement the suggested specification in an attempt to selfishly better its outcome in a distributed mechanism, given a particular knowledge assumption about other nodes.
Formally, if s m i ∈ Σi is the suggested strategy, and if s−i is the set of strategies that node i believes all other nodes will follow, then a node exhibits rational manipulation if it follows some alternate strategy si = s m i for which ui(g(si, s−i); θi) > ui(g(s m i , s−i); θi). Nodes are actively working to change the actions, transitions, and states in both their internal state machine and in the state machines of other nodes (through the effect of messages sent to these nodes) for selfish reasons.
We are seeking specifications with the following property: Definition 8. Distributed mechanism specification dM = (g, Σ, s m ) is an (ex post) faithful implementation of outcome g(s m (θ)) ∈ O when suggested strategy s m is an ex post Nash equilibrium.

Useful Properties: IC, CC and AC
We introduce communication-and algorithm compatibility as ways of describing mechanisms tolerant to rational manipulation. We also extend the idea of incentive compatibility, found in the mechanism design literature, to allow for incremental information-revelation.
Each statement can be defined for a particular equilibrium concept, that itself must be justified by a knowledge assumption. To keep these definitions concrete we adopt ex post Nash, which seems to be useful when considering distributed implementations of strategyproof mechanisms. Definition 9. A distributed mechanism specification dM = (g, Σ, s m ) is incentive compatible (IC) when there exists an ex post Nash equilibrium in which node i cannot receive higher utility by deviating from the suggested informationrevelation strategy r m i (θi), for all nodes i and all types θ. Most commonly, the suggested information-revelation strategy for a node will expect the node to reveal truthful information about its private type through communication with other nodes. IC means that a rational node will choose to follow these actions. CC means that a rational node will choose to participate in the suggested message-passing actions within the distributed-mechanism specification. Definition 11. A distributed mechanism specification dM = (g, Σ, s m ) is algorithm compatible (AC) when there exists an ex post Nash equilibrium in which node i cannot receive higher utility by deviating from the suggested computational strategy c m i (θi), for all nodes i and all types θ.
AC means that a rational node will choose to participate in the suggested computational actions within the distributedmechanism specification. Properties IC, CC and AC are required for a faithful distributed implementation. Moreover, IC, CC and AC are sufficient for a faithful implementation: Proof. IC, CC and AC provide for the existence of an equilibrium in which nodes will follow suggested informationrevelation r m i , and similarly for message-passing p m i and computation c m i . To achieve faithfulness we simply need that there is an equilibrium that achieves each one of these simultaneously.

Strong AC and Strong CC
This section provides a general proof method to demonstrate specification faithfulness in networks with rational nodes. We define strong-AC and strong-CC, and show that together with the strategyproofness of the corresponding centralized mechanism, strong-AC and strong-CC provide IC, and in turn a faithful implementation.
We reduce the problem of proving (ex post Nash) faithfulness to that of: 1. Demonstrating that the corresponding centralized mechanism is strategyproof.
2. Strong-CC: a rational node should always follow its suggested message-passing strategy (whatever its information-revelation and computational actions).
3. Strong-AC: a rational node should always follow its suggested computational strategy (whatever its information-revelation and message-passing actions).
In fact, we will further break-up the proof, by advocating a further decomposition into disjoint mechanism phases. Definition 12. A distributed mechanism specification dM = (g, Σ, s m ) is strong-CC if a node cannot receive higher utility by deviating from the suggested message-passing actionŝ ci, whatever its computational and information-revelation actions, when other nodes follow the suggested specification.
Definition 13. A distributed mechanism specification dM = (g, Σ, s m ) is strong-AC if a node cannot receive higher utility by deviating from the suggested computational actionspi, whatever its message-passing and information-revelation actions, when other nodes follow the suggested specification.
Together, strong-CC and strong-AC rule out any useful joint deviations in which a node changes its communication, computational, and its information-revelation actions to gain an advantage.

Proposition 2.
A distributed mechanism specification dM = (g, Σ, s m ) is a faithful implementation of outcome g(s m (θ)) when the corresponding centralized mechanism is strategyproof and when the specification is strong-CC and strong-AC.
Proof. To prove that the specification is an ex post Nash equilibrium we first assume that every node except i is following suggested specification, s m −i . By strong-CC and strong-AC, node i will follow the suggested message-passing and computation actions. (Notice that we can rule out joint deviations of both message-passing and computation actions). To prove IC, we can now assume that all nodes follow suggested communication-and message-passing actions. Let f (θ) = g(s m (θ)) denote the outcome rule in the corresponding centralized mechanism. By definition of informationrevelation actions, the space of possible outcomes becomes g(s m i (θi), s m −i (θ−i)), but g(s m i (θi), s m −i (θ−i)) = f (θi,θ−i), and ui(f (θi,θ−i); θi) ≥ ui(f (θi,θ−i); θi) for allθ−i, all θi, and all θi = θi by strategyproofness of g(s m (θ)) = f (θ).

Remark 4.
In applying Proposition 2 one must be careful to ensure that actions labeled as "information-revelation" within the suggested specification satisfy the technical requirement of consistent information-revelation which can require consistency checking.
Remark 5. Distributed implementations of mechanisms introduces a new issue not encountered in traditional centralized MD, which is that the outcome computed by nodes must be enforced (we call this the "execution phase" in the interdomain routing example). Typically, the execution actions that correspond with the outcome must themselves be shown to be strong-AC and strong-CC. In interdomain routing, this means that nodes choose to follow lowest-cost paths and choose to respect payments.

A General Proof Technique
For faithful adherence to a specification, we need to demonstrate strong-CC and strong-AC as well as the consistency of information-revelation actions. The following approaches are useful to this end: Break Into Phases. A distributed mechanism can be decomposed into disjoint phases, each of which is proven strong-CC and strong-AC without worrying about joint deviations involving actions in other phases. Phases are separated during runtime with checkpoints where some node(s) certify a phase outcome and start a subsequent phase. One must be sensitive to the added computational and communication complexity in using checkpoints. This decomposition technique is powerful because it can allow an exponential reduction in the number of joint manipulation actions that must be checked in a faithfulness proof. Tools for Strong-AC, Strong-CC and Consistency. Strong-AC, strong-CC, and information-revelation consistency can be achieved within a phase through the use of various techniques. Payments can be used to avoid untruthful information revelation, and also to provide incentives for nodes to perform faithful computation and message passing. Redundancy can be powerful, with computational and message-passing actions that deviate from a specification detected and penalized (via catch and punish). Catch and Punish can also be used without redundancy, when a subset of "checker" nodes can combine forces to completely monitor the behavior of a third node. Another technique is problem partitioning. At one extreme, partitioning could mean mean running the mechanism on nodes that cannot benefit from the mechanism outcome, with nodes split and one half computing the mechanism outcome for the other. More interestingly, we can also structure the computation so that a node is not involved in a calculation where it has a vested interest in the outcome [25]. Finally, cryptography can be used to make deviations from a specified algorithm detectable [3], and to make it impossible to rationally change a message, which can be useful for communication compatibility [20]. 4

EXAMPLE: INTERDOMAIN ROUTING
The remainder of the paper will focus on building a distributed mechanism specification that is faithful. The specification extends an interdomain routing distributed mechanism created by Feigenbaum, Papadimitriou, Sami, and Shenker (FPSS) [7]. FPSS is the first research to combine mechanism design ideas with a common Internet algorithm (the Border Gateway Patrol (BGP) interdomain routing protocol). The importance of this section is to show how various techniques can be used to prove strong-CC and strong-AC in a real system.

FPSS Interdomain Routing
The Internet is composed of many separate domains known as autonomous systems (ASs) such as Harvard, Berkeley, Microsoft, etc. Each AS can be modeled as a rational node. The goal in FPSS is to maximize network efficiency by routing packets along lowest cost paths (LCPs) for various traffic source-destination pairs. Each node incurs a per-packet transit cost for transiting traffic on behalf of other nodes. The cost represents the additional load imposed by external traffic on the internals of an individual node. It costs nothing for a node to transit a packet originating or terminating at that node.
For instance, Figure 1 shows a network with the LCPs from Z to all other nodes drawn with bold lines. Numbers are the per-packet transit node costs incurred by each node. Assuming that the numbers in this figure represent true transit costs, the total LCP cost of sending a packet from X to Z is 2; the cost of sending a packet from Z to D is 1. The cost of sending a packet from B to D is 0 since there are no transit nodes between B and D.
To compensate a transit node for its routing services, each transit node is given a payment for carrying traffic. FPSS observes, however, that "under many pricing schemes, a node could be better off lying about its costs; such lying would cause traffic to take non-optimal routes and thereby interfere with overall network efficiency." Example 1: In Figure 1, path X-D-C-Z is the lowest cost path between X and Z; if C declared a cost of 5, X-A-Z would become the X to Z LCP. C can benefit from this manipulation, even if it loses the X to Z traffic, if it can make up the financial loss with higher payments received by transiting D to Z traffic. This has damaged overall efficiency -packets from X to Z are now being routed over a path whose true cost is higher. FPSS seeks a pricing scheme that is dominant strategy incentive compatible (strategyproof), meaning that nodes can do no better than to declare their true transit costs. They achieve this by using a Vickrey-Clarke-Groves (VCG) mechanism [34,11,6] where transit nodes are paid based on the utility that they bring to the routing system plus their declared cost. The FPSS algorithm is distributed; lowest cost paths (LCPs) and pricing tables are computed by each node using information from neighbors in an iterative calculation. Following the abstract model of the Border Gateway Protocol (BGP) proposed by Griffin & Wilfong (GW) [10], FPSS assumes a static environment. FPSS also assumes a biconnected graph to ensure that the VCG payments well-defined. The abstract model of GW is extended to add additional information to the local state stored at nodes and to messages sent.
We find it useful to describe FPSS in terms of construction and execution phases. A construction phase deals with mechanism set-up, while execution deals with actual usage. Each node maintains three types of data for the mechanism construction phases: In the first construction phase the transit-cost information [DATA1] is constructed. In the second construction phase, routing and pricing tables are computed and stored as [DATA2] and [DATA3]. Nodes relay any changes of local data to their neighbors. Neighbors, in turn, update their local data with this new information and propagate changes to their neighbors. This continues until the information converges. Nodes can then use the mechanism to route traffic, and FPSS enters the execution phase. FPSS uses one additional type of data for mechanism execution: • [DATA4] Payment list. Contains the amount of money that this node owes others for having originated traffic that traversed those transit nodes. Each node is expected to use the pricing table correctly to calculate and store the payments that it owes transit nodes. This payment is compensation for requiring those nodes to transit packets originated locally. FPSS suggests that this list of total payments ([DATA4]) can be reported to an accounting and charging mechanism, that we call a bank.
This specification could be formalized with a state machine. The external actions already in the original FPSS would be included in this state machine, as follows: declaring the transit cost and providing connectivity information are information-revelation actions; relaying other nodes' transit-cost announcement are message-passing actions; and updating and forwarding routing and pricing tables are computation actions. An additional computation action comes in a node's reporting payments to the bank, and message passing actions exist for those nodes on the path to the bank. Message-passing is also used during execution for routing along LCPs.

Extending the FPSS Specification
In FPSS there is nothing to prevent nodes from rationally manipulating routing and pricing tables, or lying about connectivity information in the construction phase, or reporting inaccurate payments during execution (indeed, this was not their research goal). In addition, other nodes are in a position to intercept and selfishly modify payment tallies sent to the bank. In building a faithful specification based on FPSS, the key challenge is to prove strong-CC and strong-AC. (The corresponding centralized mechanism is already strategyproof for transit-cost declarations, and manipulations of the semi-private connectivity information will become apparent in the routing table calculations.) In unpublished work, Mitchell et al. [19] have explored the use of cryptographic signing between routing nodes to ensure truthful connectivity declarations, and have suggested that this technique could extend to mechanism computation as well. In our presentation, we will instead favor redundancy, catch and punish, and problem partitioning, and bring the entire specification-message-passing and computation and execution -into equilibrium. We, too, are forced to rely on a small amount of cryptographic signing (in our case, to ensure communication compatibility) but the use of the three other techniques makes this signing requirement small. Given certain network topologies, it may be possible to eliminate signing altogether. 5 We introduce a new role for nodes, that of checker nodes. These checker nodes perform redundant computation, which creates the opportunity for a catch-and-punish scheme that provides incentives for rational nodes to be faithful. The assignment of the checker nodes is very important: every neighbor of a node is assigned as a checker for that node. The node that is being checked is known as the principal, to refer to its role in the core distributed algorithm. Every node in the biconnected network plays the role of both a principal node and a checker node for all of its neighbors. 5 Two examples when this may happen: The first example is when the network of rational nodes is an overlay that runs on top of an obedient underlay, similar to how many peer to peer applications work today. Here, certain messages to nodes outside of the mechanism (such as a bank) can follow an obedient overlay. The second example is when in a network of rational nodes a node is able to establish a path to some endpoint where all intermediate nodes are guaranteed to be partitioned from any information contained in the message. A special case of this scenario assumed by some previous work [3], is when the communication graph is fully connected, and therefore there are no intermediate nodes that could have an interest in the message. The checker nodes execute a redundant computation that mirrors what the principal is computing, and must receive a complete set of the messages received by the principal. Even though some checkers rely on the principal to forward these messages, there is always at least one checker that will catch any attempted deviation from the suggested specification. Ultimately, we rely on the bank to compare state-information reported by the principal and checker and penalize nodes for any deviation.
Our bank goes beyond (in FPSS language) "whatever accounting and charging mechanisms [that] are used to enforce the pricing scheme." In our specification, the bank is a trusted and obedient entity that can also perform simple comparisons, and enforce penalties when it detects a problem. The way that problems are detected is phase specific. In the construction phases, this penalization takes the form of not allowing the mechanism to progress to the next phase. In the execution phase, this penalty is a well-defined monetary unit that is epsilon-above the attempted deviation. 6 All communication between the bank and a node is signed with acknowledgments to ensure communication compatibility of these messages.
As an example, Figure 2 illustrates the role of checker nodes C1, C2 and C3 in monitoring principal P . Each checker node is asked to perform the internal computation of P based on the copies it receives of P 's messages. First, one should establish that the checker will follow this algorithm in equilibrium with a faithful P . This can be argued in our FPSS setting through partitioning-a checker cannot individually benefit from allowing a deviation by P . Second, suppose P is supposed to forward m to nodes C2 and C3 to allow them to replicate its calculations, but deviates and forwards the message as m . Although this might change the view that C2 and C3 have of P 's input, checker C1 was on the incoming path of m and still has the correct view. Also, P cannot deviate in its calculation without deviating in its message-passing because the checkers will perform the correct calculations and when a check is made of internal state there will be a discrepancy. 7 6 The bank is not as powerful as the traditional mechanism center. It does not actually perform the distributed mechanism computation, and instead checks results performed by others. The complexity of bank operation is described in the longer technical report [33]. It is an open problem to design a distributed bank that runs on the same network of rational nodes. 7 Furthermore, the principal cannot report its true routing table but follow another routing table later because checks are made during the execution phase against the (correct) routing tables in checker nodes.

Faithfulness Proofs
In this section, we show how a faithfulness proof can be constructed for our extension to FPSS. The entire proof cannot be presented due to space constraints. We opt instead to show strong-AC, strong CC and consistent information revelation for the second construction phase and leave the rest to the extended version of this paper [33].
For each phase, we must show strong-AC, strong-CC, and consistent information revelation irrespective of a node's behavior in other phases of the mechanism. Once this is shown for each phase, one can use Proposition 2 to show that the entire mechanism is faithful. We assume that every node wishes to make progress in the mechanism, and indeed has a strong negative value when a construction phase does not progress. We further assume that the checkpointing node responsible for halting one phase and "green-lighting" the next phase is the bank node. Keeping with the FPSS specification, we assume a static network in these proofs.
Given strong-AC and strong-CC, the first construction phase terminates, and terminates with common transit cost tables This second-construction phase uses problem partitioning, node redundancy, and catch-and-punish for strong-AC and strong-CC. It needs no cryptography in intra-node messages, but for CC between a node and the bank, we do assume that messages to and from the bank are signed.
Checker nodes completely surround a principal in the network, and act as a clone of the principal in all computation respects. The difference between the principal and the checker is that the checker does not send outputs of computations to neighbors. It is critical to understand the role and limitation of these checkers: the checkers perform the "heavy lifting" of checking a computation, but do not actually catch manipulation problems; this task is left to the checkpointing bank. In this phase, a deviation in calculating routing or pricing tables results in the bank not proceeding to the execution phase.
We In fact, the original FPSS formulation already exhibits limited problem partitioning. The price-update rules are specified in a way that prevents a node from increasing its incoming payment through changing the pricing messages. 8 However, problem partitioning alone cannot ensure strong faithfulness properties. There remain the following possible manipulations, which must be considered jointly: The goal of these manipulations is either to increase incoming payment from other nodes, or decrease the outgoing payment due to other nodes. It is important that at the end of this phase, the correct LCPs are reflected in the routing tables, and the pricing tables are accurate and refect the correct per-message prices as defined by the FPSS algorithm. We show that this phase is strong-AC and strong-CC. There is no revelation of private (transit cost) information, and deviations in revelation of semi-private connectivity information revelation are protected against though strong-AC and strong-CC (because such a deviation would require not forwarding messages along some link or not performing appropriate updates to internal LCP tables).
First, observe that the suggested specification for checkers in this phase is strong-AC and strong-CC. No checker wants to deviate if the other nodes (in their role as principals and checkers) are faithful, because deviation would cause the phase to be restarted.
For any principal-destination pair, one checker must be on the shortest path from principal to destination. That checker has a correct view of the cost of that path (in its other role as a principal in the network), because each node has a local transit cost table by the end of the first phase. Assume the behavior specified in [CHECK1]. Now, all checkers ignore LCP information that is not judged correct through their local transit cost table (known as a principal). The result is that a principal has no way to successfully change the LCP information stored by every checker. Any routing table deviation shows up by comparing a hash of [DATA2] between the principal and its multiple checkers. This means that manipulations (1-2 above) are caught by [BANK1].
Manipulations in the pricing table information (3-4 above) are more subtle, since here a checker node does not have an innate ability to verify messages based on internal information that it already has as a principal. Also notice that while problem partitioning ensures a node has no reason to modify its newly created outgoing pricing update messages (since its utility is not affected by changes caused by these messages), a node might like to change the pricing table that it must use for its own originated traffic. To show how checking nodes make these manipulations detectable, assume the behavior specified in [CHECK2]. Now, consider a principal A and a pair of checker nodes B and C. First, dropping a pricingtable message received from B (ie. not forwarding to C) will result in an inconsistency between B and C and therefore a restart by the bank. The same argument holds for changing an incoming message. More difficult is the third case, where a principal can spoof a new fake message. However, this spoof will create an inconsistency in the identity tag information in [DATA3*]. This inconsistency will be caught by [BANK2].
To see strong-AC and strong-CC, and to also see why joint deviations are not possible, notice that the routing and pricing information are two disjoint sets that must be kept in agreement with all checker nodes. A deviation is potentially useful only if the principal can cause every checker to deviate in a way that generates the same routing and/or pricing tables. By the arguments above, this is not possible since each manipulation has a disjoint side effect with every other manipulation. Any such deviation would cause a restart. Theorem 1. The extended FPSS specification is a faithful implementation of the VCG-based shortest-path interdomain routing mechanism.
Proof. By Proposition 2, and since all phases of this specification are strong-AC, strong-CC, and have consistent information-revelation, and since the corresponding centralized mechanism is strategyproof.

DISCUSSION
While this paper concerns rational manipulation, it has taken a fairly narrow view of rationality. For example, certain nodes may make worsening the outcome of other nodes the main goal besides maximizing their own utility. In the real world, companies are willing to take a short-term loss to drive competitors out of business. Such anti-social behavior has been previously characterized [4]. Other nodes might act maliciously, where their rational behavior is described as bringing down a system. imply introducing other failures, such as general omissions or even failstop, may cause the system to falsely detect and punish manipulation. Further work needs to explore how other failure models affect faithfulness in systems with the rational-manipulation failure model.

ACKNOWLEDGMENTS
Thanks to Joan Feigenbaum, Steven Gortler, Rahul Sami, Margo Seltzer, Danni Tang, and anonymous reviewers for their useful comments. Thanks to Jim Waldo for casting rational manipulation in the role of a canonical system failure. Invaluable feedback on early versions of this work was received from poster session participants at EC-03 [32] and SOSP-03 [31], and by seminar participants at Stanford and MIT. This work is supported in part by NSF grants IIS-0238147 and ACI-0330244.