Yisrael Aumann is 80. To celebrate this occasion, the Center for Rationality at the Hebrew University of Jerusalem organized a two-day feast, where most of Aumann’s students present papers. I would like to write about the work that Itai Arieli, the 14th and youngest student of Aumann, presented, which is a joint work with Yakov Babichenko, a Ph.D. student of Sergiu Hart.
Consider a multiplayer repeated game. Ask yourself: does there exist a simple decentralized algorithm that, if employed by the players, ensures that the long-run average payoff converge to the Pareto frontier of the set of feasible and individually rational payoffs.
Plainly, any feasible payoff is a convex combination of the payoffs in the matrix, and therefore for any desired target payoff one can construct an algorithm that repeats actions in a proper order and achieves the desired payoff as a long-run average payoff. But this algorithm must count where we are in the cycle, and therefore it is considered not simple. We seek a simpler algorithm, with a small memory.
This is the algorithm proposed by Arieli and Babichenko. Suppose that every player has an aspiration level that is stage dependent: at stage n player i would like to receive a payoff at least a(i,n). If his stage payoff at stage n is above his aspiration level at that stage – he is satisfied. Otherwise he is not satisfied.
Fix a natural number k. The algorithm is played in stages, and in the algorithm the players in fact play k copies of the game.
In stage 1, each player randomly selects k actions, one for each copy of the game. The same action can be chosen several times. Calculate the average stage payoff of each player in all the k games. Set the aspiration level of each player to 0 (assuming all payoffs are non-negative).
If the average stage-payoff of player i is above his aspiration level for stage 1 (which is 0), his aspiration level for the second stage is epsilon plus his aspiration level for the first stage, and he will play in the second stage in each copy of the game the same action that he played in that copy at the first stage.
If the average stage-payoff of player i is below his aspiration level for stage 1, in stage 2 he will randmoly choose k new actions for the k copies of the game, and his aspiration level will be his average payoff in all k copies of the game in all stages up to the current stage.
This procedure is repeated ad infinitum.
Arieli and Babichenko proved that, if k is large, the long-run average payoff will be epsilon-close to the Pareto frontier of the set of feasible and individually rational payoffs.
The argument, modulo technical issues, is as follows: because k is large, the set of payoffs that can be supported as the average stage-payoff of k action profiles is epsilon dense in the set of feasible payoffs. As long as the current average stage-payoff is not high for all players, at least one player randomly chooses a new set of k actions. Because the set of actions is finite, there are finite number of k copies of action profiles, and therefore in a bounded time the players will choose, by luck, k action profiles whose corresponding average stage-payoff is above the current aspiration level of all players. The players will then play these action profiles repeatedly, while the aspiration levels increase by epsilon at each stage. When the aspiration level exceeds the current average stage-payoff of at least one player, the players will get into a new phase in which they randomly choose k action profiles, until a new average stage-payoff that exceed the aspiration level is found. This way the average stage-payoff is bound to increase in the long run, and stay around the Pareto frontier.
In a sense, what is done is an exhaustive search, where the aspiration levels keep track of the best payoff vector that was found so far. This algorithm is simple, and decentralized, because each player only cares about his average payoff and about his aspiration level, but it is not efficient: it would take the players an enourmous amount of time to converge to the Pareto frontier.
As is well known, there are simple, efficient and decentralized algorithms that ensure that the play of the players converge to the set of correlated equilibria (for example, the no-regret mechanisms that were developed by Hart and Mas-Colell). Hart and Mansour proved that the only decentralized algorithm that ensures convergence to the set of Nash equilibria ia exhaustive search (more precisely, all decentralized algorithms that converge to the set of Nash equilibria do it in exponential time (or slower), as exhaustive search). A natural question is whether there is an efficient decentralized algorithm that ensures convergence to the Parto frontier of the set of feasible and individual rational payoffs. I hope that by Aumann’s 85 birthday we will have an answer to this question.
4 comments
June 10, 2010 at 4:20 am
noam
“A natural question is whether there is an efficient decentralized algorithm that ensures convergence to the Parto frontier of the set of feasible and individual rational payoffs.”
The communication complexity argument used by Hart and Mansour for pure Nash equilibria has only 0/1 payoffs for the players, and has the pure Nash equilibrium have utility of 1 for all players. This would also be the only Pareto-efficient outcome in the game, so the exponential lower bound applies for finding a Pareto-optimal outcome as well.
June 13, 2010 at 6:17 am
Yakov Babichenko
The comment of Noam is not completely correct.
If we consider the question of finding strongly Pareto outcome (i.e., an outcome such that there is no other feasible outcome that increases the payoff of some players and not decreases the payoff of others), then the comment of Noam is correct. BUT this counter-example is not robust; A perturbation by epsilon of the payoffs changes extremely the Pareto outcomes, and the counter-example fails.
Therefore, in my opinion, one should consider the weakly Pareto outcomes.
The communication complexity of weakly Pareto outcomes (i.e., an outcome such that there is no other feasible outcome that weakly increases the payoffs of all the players) is low: Player 1 can inform the other players which action maximizes his payoff, and we got an efficient outcome.
But, if in addition to efficiency we require individual rationality, then the communication complexity became exponential. Hopefully in few weeks i will publish a DP about it.
June 14, 2010 at 11:47 am
noamnisan
Won’t adding a 3rd strategy for each player where he can unilaterally make sure that everyone gets utility 1/2 make the (1…1) outcome the only weakly-PO, IR option, hence giving the required lower bound?
June 21, 2010 at 4:38 am
Yakov Babichenko
1/2 is not good, but 2/3 (or any c>1/2) will do the job.