Yisrael Aumann is 80. To celebrate this occasion, the Center for Rationality at the Hebrew University of Jerusalem organized a two-day feast, where most of Aumann’s students present papers. I would like to write about the work that Itai Arieli, the 14th and youngest student of Aumann, presented, which is a joint work with Yakov Babichenko, a Ph.D. student of Sergiu Hart.
Consider a multiplayer repeated game. Ask yourself: does there exist a simple decentralized algorithm that, if employed by the players, ensures that the long-run average payoff converge to the Pareto frontier of the set of feasible and individually rational payoffs.
Plainly, any feasible payoff is a convex combination of the payoffs in the matrix, and therefore for any desired target payoff one can construct an algorithm that repeats actions in a proper order and achieves the desired payoff as a long-run average payoff. But this algorithm must count where we are in the cycle, and therefore it is considered not simple. We seek a simpler algorithm, with a small memory.
This is the algorithm proposed by Arieli and Babichenko. Suppose that every player has an aspiration level that is stage dependent: at stage n player i would like to receive a payoff at least a(i,n). If his stage payoff at stage n is above his aspiration level at that stage – he is satisfied. Otherwise he is not satisfied.
Fix a natural number k. The algorithm is played in stages, and in the algorithm the players in fact play k copies of the game.
In stage 1, each player randomly selects k actions, one for each copy of the game. The same action can be chosen several times. Calculate the average stage payoff of each player in all the k games. Set the aspiration level of each player to 0 (assuming all payoffs are non-negative).
If the average stage-payoff of player i is above his aspiration level for stage 1 (which is 0), his aspiration level for the second stage is epsilon plus his aspiration level for the first stage, and he will play in the second stage in each copy of the game the same action that he played in that copy at the first stage.
If the average stage-payoff of player i is below his aspiration level for stage 1, in stage 2 he will randmoly choose k new actions for the k copies of the game, and his aspiration level will be his average payoff in all k copies of the game in all stages up to the current stage.
This procedure is repeated ad infinitum.
Arieli and Babichenko proved that, if k is large, the long-run average payoff will be epsilon-close to the Pareto frontier of the set of feasible and individually rational payoffs.
The argument, modulo technical issues, is as follows: because k is large, the set of payoffs that can be supported as the average stage-payoff of k action profiles is epsilon dense in the set of feasible payoffs. As long as the current average stage-payoff is not high for all players, at least one player randomly chooses a new set of k actions. Because the set of actions is finite, there are finite number of k copies of action profiles, and therefore in a bounded time the players will choose, by luck, k action profiles whose corresponding average stage-payoff is above the current aspiration level of all players. The players will then play these action profiles repeatedly, while the aspiration levels increase by epsilon at each stage. When the aspiration level exceeds the current average stage-payoff of at least one player, the players will get into a new phase in which they randomly choose k action profiles, until a new average stage-payoff that exceed the aspiration level is found. This way the average stage-payoff is bound to increase in the long run, and stay around the Pareto frontier.
In a sense, what is done is an exhaustive search, where the aspiration levels keep track of the best payoff vector that was found so far. This algorithm is simple, and decentralized, because each player only cares about his average payoff and about his aspiration level, but it is not efficient: it would take the players an enourmous amount of time to converge to the Pareto frontier.
As is well known, there are simple, efficient and decentralized algorithms that ensure that the play of the players converge to the set of correlated equilibria (for example, the no-regret mechanisms that were developed by Hart and Mas-Colell). Hart and Mansour proved that the only decentralized algorithm that ensures convergence to the set of Nash equilibria ia exhaustive search (more precisely, all decentralized algorithms that converge to the set of Nash equilibria do it in exponential time (or slower), as exhaustive search). A natural question is whether there is an efficient decentralized algorithm that ensures convergence to the Parto frontier of the set of feasible and individual rational payoffs. I hope that by Aumann’s 85 birthday we will have an answer to this question.