Zero-sum games are usually easier to handle than non-zero-sum games. It is often easier to prove the existence of the value in zero-sum games than to prove the existence of equilibrium payoffs in non-zero-sum games, and it is easier to calculate optimal strategies in zero-sum games than equilibria in non-zero-sum games.

I found it striking that there are cases where the solution of zero-sum games delivers for free the solution of non-zero-sum games. Such examples include Borel games, that I discussed in length in previous posts, differential games, and stopping games in continuous time.

Suppose that we are given a multi-player Borel game: there are n players, who alternately choose actions in a given set A of actions, which can be finite or infinite. The payoff function of each player i is some real valued function u_i(a_1,a_2,…) of the infinite play (so that a_t is the action that was chosen by player t modulo n at stage t). As I mentioned in the post “Borel Games: II”, in two-player zero-sum Borel games, the value exists, and each player has a subgame-perfect epsilon-optimal strategy: a strategy that, in every subgame, guarantees the value of that subgame, up to epsilon. How can we use optimal strategies in some properly crafted two-player zero-sum games to construct an equilibrium in the multi-player game?

For every player i, consider an auxiliary two-player zero-sum game G_i, where player i is the maximizer, his payoff is u_i(a_1,a_2,…) , and all other players try to minimize player i’s payoff. Thus, all other players form a coalition, effectively a single player, who tries to lower player i’s payoff as much as possible. The game G_i has a value, player i has a subgame-perfect epsilon-optimal strategy sigma_i, and the minimizers, all players except of player I, have a subgame-perfect epsilon-optimal strategy tau_i = (tau^j_i)_{j \neq i}. Note that sigma_i is also a strategy of player i in the original game, and tau_i^j is also a strategy of player j in the original game.

Let us now construct a strategy profile in the original game: each player i follows a subgame-perfect epsilon-optimal strategy sigma_i. Because the strategy sigma_i is subgame-perfect epsilon-optimal, when the players follow the strategy profile (sigma_i)_i, the payoff of each player is at least his min-max value minus epsilon. And this is true in every subgame of the game. Suppose that some player, say player i, deviates from sigma_i. Because the game has perfect information, a deviation is detected immediately. Let’s add threat strategies: if some player j deviates, the other players switch to the epsilon-optimal strategy of the minimizers in the auxiliary game G_j. To summarize, the strategy profile that we constructed is: follow (sigma_i)_i as long as no player deviates. Once some player, say player j, deviates, then all other players switch to an epsilon-optimal strategy of the minimizers in the game G_j.

Let us verify that this strategy profile is a 2epsilon-eqilibrium. Consider the game after the history h, and suppose it is player j’s turn to play. If no player deviates, the payoff of player j is at least his min-max value in the subgame that starts at h, v(h), minus epsilon. If player j deviates and play an action a’ that he is not supposed to play after h, then his deviation is immediately detected, and he will be punished by the other players; his payoff will then be at most his min-max value in the subgame that starts at (h,a’), v(h,a’), plus epsilon. But v(h) is at least v(h,a’), because when the other players try to punish him, at h player j will choose the action that leads him to the maximal min-max value: v(h) = sup_{a’’} v(h,a’’). Therefore, the payoff upon deviation is at most v(h,a’)+epsilon, which is smaller than v(h) + epsilon, which is smaller than the payoff if player j does not deviate + two epsilons.

What did we need so that this proof method would work? First, we needed the existence of a subgame-perfect epsilon-optimal strategy. Second, we need that a deviation be detected immediately.

Are there other models where these two requirements are satisfied, so that the solution of zero-sum games yields the solution of non-zero-sum games for free? Yes!

Differential games are games played in continuous time. The state variable is a point in a Euclidean space, and its location is controlled by two players. The game is played on a finite time interval, and the payoff is a sum of two quantities. The first depends on the final location of the state variable, and the second is an integral of “stage payoffs”, that depend on the whole path the state variable did along the game. Under proper continuity conditions, differential games have a subgame-perfect epsilon-optimal strategy, and the value is a continuous function of the state variable and the horizon. Thus, the first requirement of the construction holds. The nature of continuous time implies that deviations are detected in infinitesimal time. Because the value function is continuous, if a player deviates at time t, and his deviation is detected at time t+dt, then the player can be punished approximately at the level of his min-max value at time t. Therefore the second requirement holds as well, and the construction presented above holds.

Another model where the method applies is stopping games in continuous time. In this model at every time instance each player decides whether to stop the game or to continue; the game terminates once at least one player stops, and the payoff is a function of the terminal time and the set of players who decide to stop at the terminal moment. Under proper continuity conditions, zero-sum games have a value and subgame-perfect epsilon-optimal strategies, and the value is right-continuous, so that as for differential game the construction applies.

Are there additional classes of games where this approach works? Are there other solution concepts where it may be useful?

## Recent Comments