Anne and Bill play the following game: at odd stages Anne chooses either 0 or 1, and at even stages Bill chooses either 0 or 1. This way an infinite sequence of bits is generated. This sequence is the binary representation of a unique number x in the interval [0,1]. The payoff is given by some real-valued function u(x); the game is zero-sum, Anne is the maximizer and Bill is the minimizer.
This game is a two-player zero-sum Borel game. As I mentioned in a previous post, the action sets of the two players can in fact be arbitrary sets (finite or infinite), and they may be history dependent. In such a case, the payoff function will be a real-valued function defined over the set of all possible infinite plays.
In a previous post I argued that a two-player zero-sum game admit a value, provided the payoff function is bounded and Borel measurable. That is, there is a real number v, and for every positive epsilon Anne has a strategy sigma_* and Bill has a strategy tau_*, such that:
1) When Anne plays according to sigma_*, then the payoff is at least v-epsilon, whatever Bill plays.
2) When Bill plays according to tau_*, then the payoff is at most v+epsilon, whatever Anne plays.
The strategies sigma_* and tau_* are epsilon-optimal strategies of Anne and Bill.
If Anne and Bill are rational, then the outcome of the game will be close to v.
When Anne plays according to the strategy sigma_*, and Bill plays according to the strategy tau_*, the payoff will be between v-epsilon and v+epsilon. If Anne decides to deviate from sigma_*, while Bill sticks to tau_*, the payoff will be at most v+epsilon, so that Anne’s gain from such a deviation is at most 2epsilon. Similarly, if Bill decides to deviate from tau_*, while Anne sticks to sigma_*, the payoff will be at least v-epsilon, so that Bill’s gain from such a deviation is at most 2epsilon as well.
If the range of the payoff function u is finite, then for epsilon sufficiently small, every epsilon-optimal strategy is actually 0-optimal: because payoffs are discrete, if Anne’s strategy guarantees a payoff at least v-epsilon, then because for epsilon small enough there are no possible outcomes between v-epsilon and v, this strategy in fact guarantees v.
Let’s take a concrete example. Suppose that the function u is determined by the first three bits that Anne and Bill choose:
u(0) = 1
u(10) = 1
u(110) = 1
u(111) = 2
That is, if Anne chooses 0 at the first stage, the payoff is 1;
if Anne chooses 1 at the first stage, and Bill chooses 0 at the second stage, the payoff is 0;
if Anne chooses 1 at the first stage, and Bill chooses 1 at the second stage, and Anne chooses 0 at the third stage, the payoff is 1;
and if Anne chooses 1 at the first stage, Bill chooses 1 at the second stage, and Anne chooses 1 at the third stage, the payoff is 2.
The value of this game is 1: Anne can guarantee obtaining 1 by choosing 0 at the first stage, and Bill can guarantee paying no more than 1 by choosing 0 at the second stage, in the case that Anne chose 1 at the first stage.
Another strategy for Anne that guarantees her a payoff of 1 is to choose 1 at the first stage, and 0 at the second stage. In that case, whatever Bill chooses at the second stage, Anne’s payoff is 1. Call this strategy sigma_1.
Of course, a better strategy for Anne is to choose 1 at the first stage, and 1 at the second stage. With this strategy Anne guarantees for herself at least 1, and, if Bill chooses 1 at the second stage, she actually gets 2. Call this strategy sigma_2.
The strategy sigma_2 is subgame-perfect optimal: it guarantees to Anne the highest quantity that she can guarantee, in every subgame. Here, a subgame corresponds to a finite sequence of actions that Anne and Bill could have played in the first stages of the game. The subgame G(0) corresponds to the continuation game if Anne chose 0 at the first stage; the subgame G(11) corresponds to the continuation game if Anne chose 1 at the first stage, and Bill chose 1 at the second stage; etc.
The value of the game G(0) is 1, as Anne’s payoff is at least 1 whatever the players play, and Bill can guarantee that the payoff is at most 1 (by choosing 0 at the second stage).
The value of the game G(11) is 2, as Anne can guarantee a payoff 2 by choosing 1 at the third stage, and Bill can guarantee that the payoff is at most 2 (because all payoffs in the game are at most 2).
As I mentioned above, by Martin’s (1975) result, every two-player zero-sum game have a value, and both players have epsilon-optimal strategies. Does there exist a subgame-perfect epsilon-optimal strategy? That is, a strategy sigma_* that satisfies the following condition: for every subgame, the strategy sigma_* restricted to that subgame is epsilon-optimal in the subgame?
The answer is positive, and in fact follows quite easily from Martin’s result. To see why, let’s add one more definition: say that a strategy sigma of Anne guarantees an amount w if whatever Bill play, the payoff is at least w. Plainly in such a case w is smaller than the value of the game, because the value is the supermum of all the amounts w that Anne can guarantee.
No for the construction of a subgame-perfect epsilon-optimal strategy: ask Anne to start playing an epsilon/2-optimal strategy sigma_1 in the whole game. At every one of her decision node x, Anne compares the value w(x) that the strategy she plays guarantees, and the value v(x) of the subgame that starts at the current decision node. If the difference v(x)-w(x) is less than epsilon, then sigma_1 guarantees the value (up to epsilon) in the subgame that starts at x, so Anne continues following the strategy sigma_1. If the difference is higher than epsilon, Anne switches to an epsilon/2-optimal strategy sigma_2 in the subgame that starts at x. So that now her strategy guarantees at least v(x)-epsilon/2, which is strictly higher than the amount w(x) that the previous strategy she was following guarantees.
Anne continues to compare the amount guaranteed by the current strategy that she follows to the value of the subgame that starts at the current node, and she switches to a new epsilon/2-optimal strategy whenever the difference is higher than epsilon.
Whenever Anne switches a strategy, the amount that she can guarantee increases by epsilon/2. Because the payoffs are bounded, the number of times she can switch strategies is finite. In particular, from some point on Anne sticks to some epsilon/2-optimal strategy in some subgame, and for every subgame that the play visits, this strategy guarantees at least the value of the subgame minus epsilon. To summarize, this strategy is subgame-perfect epsilon-strategy for Anne.
So we know that a subgame-perfect epsilon-optimal strategy exists. As the next post will show, such a strategy is needed to construct epsilon-equilibria in non-zero-sum Borel games.