You are currently browsing the tag archive for the ‘Blackwell’ tag.
Peter Bickel taught me a nifty, little, literally, proof of the minimax theorem. Peter says it was inspired by his reading of David Blackwell‘s vector generalization of the same. Here goes.
Let be the
payoff matrix of zero-sum game. Here
is the payoff to the row player from playing
when the column player plays
. Let
be $k$-simplex.
For each , let
This is the set of payoffs that the row player can achieve when playing mixed strategy
. By convexity, it must be an interval whose endpoints depend on
. We will only be interested in the left most endpoint, call it
. Thus, we can assume that (strictly speaking should be subset)
Similarly, for each
, let
. By the same observations we may suppose that
For any and
we have
Now, choose
which exists by continuity and compactness. Similarly, choose
Then,
is the value of the game.
No Farkas or separating hyperplane. Now, minimax is equivalent to duality. So, one might wonder whether such a proof could be used to obtain the duality theorem of linear programming. Assuming that the primal and the dual had optimal solutions, yes. However, the case where one is infeasible or unbounded would cause this kind of proof to break.
PS: Then, Olivier Gossner pointed out that perhaps because I wanted to believe so much that it was true, that I had overlooked something. See the comments. He was right. I tried to make the proof work but could not. A proof of minimax that avoids Farkas and/or the separating hyperlane is not outlandish. There is Guillermo Owen’s proof, via induction, for example. By the way this was not the first such elementary proof. There was a flurry elementary proofs of the minimax in the 40s and 50s in response to a challenge of von Neumann. Indeed, Peck (1958) writes in the introduction to his paper:
“There are so many proofs of this theorem in the literature, that an excuse is necessary before exhibiting another. Such may be found by examining the proof given below for the following: it uses no matrices, almost no topology and makes little use of the geometry of convex sets; it applies equally well to the case where only one of the pure strategy spaces is finite; also there is no assumption that the payoff function is bounded.”
That an induction proof should exist is unsurprising given David Gale’s inductive proof of the Farkas lemma. In fact, Gale’s proof can be seen as an instantiation of Fourier’s method for solving linear programs, again, no Farkas or separating hyperplane.
In the lasts posts I talked about a Bayesian agent in a stationary environment. The flagship example was tossing a coin with uncertainty about the parameter. As time goes by, he learns the parameter. I hinted about the distinction between `learning the parameter’, and `learning to make predictions about the future as if you knew the parameter’. The former seems to imply the latter almost by definition, but this is not so.
Because of its simplicity, the i.i.d. example is in fact somewhat misleading for my purposes in this post. If you toss a coin then your belief about the parameter of the coin determines your belief about the outcome tomorrow: if at some point your belief about the parameter is given by some then your prediction about the outcome tomorrow will be the expectation of
. But in a more general stationary environment, your prediction about the outcome tomorrow depends on your current belief about the parameter and also on what you have seen in the past. For example, if the process is Markov with an unknown transition matrix then to make a probabilistic prediction about the outcome tomorrow you first form a belief about the transition matrix and then uses it to predict the outcome tomorrow given the outcome today. The hidden markov case is even more complicated, and it gives rise to the distinction between the two notions of learning.
The formulation of the idea of `learning to make predictions’ goes through merging. The definition traces back at least to Blackwell and Dubins. It was popularized in game theory by the Ehuds, who used Blackwell and Dubins’ theorem to prove that rational players will end up playing approximate Nash Equilibrium. In this post I will not explicitly define merging. My goal is to give an example for the `weird’ things that can happen when one moves from the i.i.d. case to an arbitrary stationary environment. Even if you didn’t follow my previous posts, I hope the following example will be intriguing for its own sake.
Every day there is a probability for eruption of war (W). If no war erupts then the outcome is either bad economy (B) or good economy (G) and is a function of the number of peaceful days since the last war. The function from the number of peaceful days to outcome is an unknown parameter of the process. Thus, a parameter is a function
. I am going to compare the predictions about the future made by two agents: Roxana, who knows
and Ursula, who faces some uncertainty about
represented by a uniform belief over the set of all parameters. Both Roxana and Ursula don’t know the future outcomes and since both of them are rational decision makeres, they both use Bayes’ rule to form beliefs about the unknown future given what they have seen in the past.
Consider first Roxana. In the terminology I introduced in previous posts, she faces no structural uncertainty. After a period of consecutive peaceful days Roxana believes that with probability
the outcome tomorrow will be W and with probability
the outcome tomorrow will be
.
Now consider Ursula. While she does not initially know , as times goes by she learns it. What do I mean here by learning ? Well, suppose Ursula starts observing the outcomes and she sees G,B,W,B,G,…. From this information Ursula she deduces that
, so that if a peaceful day follows a war then it has a bad economy. Next time a war pops up Ursula will know to make a prediction about the outcome tomorrow which is as accurate as Roxana’s prediction. Similarly Ursula can deduce that
. This way Ursula gradually deduces the values of
while she observes the process. However, and this is the punch line, for every
there will be a time when Ursula observes
consecutive peaceful day for the first time and at this day her prediction about the next outcome will be
for war,
for good economy and
for bad economy. Thus there will always be infinitely many occasions in which Ursula’s prediction differ from Roxana.
So, Ursula does learn the parameter in the sense that she gradually deduce more and more values of . However, because at every point in time she may require a different value of
— This is the difference between the stationary environment and the i.i.d. environment ! — there may happen infinitely many times in which she has not yet been able to deduce the value of the parameter which she needs in order to make a prediction about the outcome tomorrow.
You may notice that Ursula does succeed in making predictions most of the times. In fact, the situations when she fails become more and more rare, after observing longer and longer blocks of peaceful days. Indeed, Nabil and I formalize this idea and show that this is the case in every stationary environment with structural uncertainty: the observer makes predictions approximately as if he knew the parameter in almost every day. For that, we use a weak notion of merging which was suggested by Lehrer and Smorodinsky. If you are interested then this is a good time to look at our paper.
Finally, the example given above is our adaptation to an example that appeared first in a paper by Boris Yakovlevich Ryabko. Ryabko’s paper is part of a relatively large literature about non-Bayesian predictions in stationary environment. I will explain the relationship between that literature and our paper in another post.
Department of self-promotion: sequential tests, Blackwell games and the axiom of determinacy.
Recap: At every day an outcome
is realized (`
‘ means a rainy day, `
‘ a non-rainy day). An expert claims to know the distribution of the stochastic process that generates the sequence
of outcomes. A sequential test is given by a Borel set
of infinite sequences of predictions and outcomes. An expert who delivers forecast
fails on a realization
if
where
is the prediction made by
about
given the previous outcomes
.
Let me remind you what I mean by `a test that does not reject the truth with probability ‘. I am going to write an equivalent definition to the one I used up to now, this time using the language of stochastic processes.
Definition 1 The test does not reject the truth with probabality
if for every
-valued stochastic process
one has
We are going to prove the following theorem:
Theorem 2 Consider a sequential test that does not reject the true expert with probability
. Then for every
there exists a
such that
So a charlatan, who doesn’t know anything about the true distribution of the process, can randomize a forecast according to and pass the test with high probability regardless of the actual distribution.
— Discussion —
Before we delve into the proof of the theorem, a couple of words about where we are. Recall that a forecast specifies the Expert’s prediction about rain tomorrow after every possible history
. We denote by
the set of all such forecasts. The most general tests are given by a function
, and specify for every such
the set of realizations
over which the forecast
fails. Since
is infinite we know that there exist tests that passes a true expert and are not manipulable by a strategic charlatan.
Sequential tests have the additional property that the test’s verdict depends only on predictions made by along the realized path: When deciding whether a forecast
passes or fails when the realization is
the test only considers the predictions
made by
along x. We also say that the test does not depend on counter-factual predictions, i.e. predictions about the probability of rainy day after histories that never happen. It seems that counter-factual predictions would be irrelevant to testing anyway, but, as the theorem shows, if the test does not use counter-factual prediction then it is manipulable.
One situation in which sequential tests are the only available tests is when, instead of providing his entire forecast before any outcome is realized, at every day
the expert only provides his prediction
about the outcome
of day
just before it is realized. At infinity, all the information available to the tester is the sequence
of predictions and realized outcomes.
— Sketch of Proof —
We can transform the expert’s story to a two-player normal form zero-sum game as we did before: Nature chooses a realization and Expert chooses a forecast
. Then Expert pays Nature
if
fails on
and
otherwise. The fact that the test does not reject the true expert translates to the fact that the maximin of the game is small. If we knew that the minimax is also small then an optimal mixed strategy
for the Expert will satisfy (3). We only need to prove the existence of value, or as game theorists say, that the game is determined.
Unfortunately, this time we cannot use Fan’s Theorem since we made no topological assumption about the set , so there is no hope to get semi-continuity of the payoff function. Indeed, as we shall see in a moment, the Normal form representation misses an important part of the expert’s story. Instead of using a normal form game, we are going to write the game in extensive form. I will call this game
.
- The game is played in stages
.
- At stage
Nature chooses an outcome
and Expert chooses a prediction
simultaneously and independently.
- Nature does not monitor past actions of Expert.
- Expert monitors past actions
of Nature.
- At infinity, Expert pays Nature
if
and
otherwise.
Now I am going to assume that you are familiar with the concept of strategy in extensive form game, and are aware of Kuhn’s Theorem about the equivalence between behavioral strategies and mixtures of pure strategies (I will make implicit uses of both directions of Kuhn’s Theorem in what follows). We can then look at the normal form representation of this game, in which the players choose pure strategies. A moment’s thought will convince you that this is exactly the game from the previous paragraph: Nature’s set of pure strategies is , Expert’s set of pure strategies is
and the payoff for a strategy profile
is
. So far no real gain. Extensive form games in which one of the players don’t monitor opponent’s actions need not be determined. In order to get a game with a value we are going to twist the game
, and allow Nature to observe past actions of the Expert player. This makes life more difficult for the Expert. Up to a minor inaccuracy which I will comment about later, the resulting game is what’s called Blackwell game and it admits a value by (a seminal theorem of Donald Martin).
Here is the game after the twist. I call this game .
- The game is played in stages
.
- At stage
Nature chooses an outcome
and Expert chooses a prediction
simultaneously and independently.
- Each player monitors past actions of the opponent.
- At infinity, Expert pays Nature
if
and
otherwise.
Now if you internalized the method of proving manipulability that I was advocating in the previous two episodes, you know what’s left to prove: that the maximin of is small, i.e. that, fore every strategy of Nature, Expert has a response that makes the payoff at most
. We know this is true for the game
but in
Nature is more powerful.
Here is the most important insight of the proof: The fact that an expert who knows the distribution of the process can somehow pass the test implies that the maximin in
is small, but this fact alone doesn’t say anything about the maximin of
. To show that the maximin of
is also small we will use the fact that the way such an expert passes the test
is by providing the correct forecast. Until now the distinction was not really important to us. Now it comes into play.
Let be a behavioral strategy of Nature in
, i.e. a contingent plan that specifies the probability that Nature play
after every. Let
be the pure strategy of Expert in
that is given by
So the pure action taken by the Expert player at day is the mixed action that Nature is going to take at day
according to her strategy
. Now assume that Nature follows
and Expert follows
. Let
and
be the random variables representing the actions taken by Expert and Nature at day
. Then the stochastic process
satisfies (2). Therefore from (1) we get that the expected payoff when Nature plays
and Expert plays
is indeed smaller than
.
Now for the minor inaccuracy that I mentioned: For Martin’s Theorem we need the set of actions at every stage to be finite. We can handle this obstacle by restricting Expert’s action at every stage to a grid and applying the coupling argument.
— Non-Borel Tests —
What about pathological sequential tests that are given by a non-Borel set ? Well, if, like me, you find it preposterously impossible to choose one sock from each of infinitely many pairs of socks, then perhaps you live in the AD paradise. Here every set is measurable, Blackwell Games are determined even when the payoff function is not Borel, and Theorem 2 is true without the assumption that
is Borel. See, the AD universe is a paradise for charlatans, since they can do as well as the true experts.
If, on the other hand, you subscribe to the axiom of choice, then you have a non-manipulable test:
Theorem 3 There exists a sequential test with a non-Borel set
that does not reject the truth with probability
and such that for every
there exists some realization
such that
— Summary —
If you plan to remember one conclusion from my last three posts, I suggest you pick this: There exist non-manipulable tests, but they must rely on counter-factual predictions, or be extremely pathological.
Muchas gracias to everyone who read to this point. Did I mention that I have a paper about this stuff ?
Following Jeff, a couple of links to posts about Blackwell’s life and research: Jake Abernethy, Rajiv Sethi, Anand Sarwate, Kevin Bryan, jesús Fernández-Villaverde (spanish), Andrew Gelman
Rajiv:
It takes a particular kind of strength to manage such a productive research career while tolerating the stresses and strains of personal insult, and carrying the aspirations of so many on one’s shoulders. Blackwell was more than a brilliant mathematician, he was also a human being of extraordinary personal fortitude.
Anand:
I’ll always remember what he told me when I handed him a draft of my thesis. “The best thing about Bayesians is that they’re always right.”
I will have more to say about the Stony Brook conference, but first a word about David Blackwell, who passed away last week. We game theorists know Blackwell for several seminal contributions. Blackwell’s approachability theorem is at the heart of Aumann and Maschler’s result about repeated games with incomplete information which Eilon mentions below, and also of the calibration results which I mentioned in my presentation in Stony Brook (Alas, I was too nervous and forgot to mention Blackwell as I intended too). Blackwell’s theory of comparison of experiments has been influential in the game-theoretic study of value of information, and Olivier presented a two-person game analogue for Blackwell’s theorem in his talk. Another seminal contribution of Blackwell, together with Lester Dubins, is the theorem about merging of opinions, which is the major tool in the Ehuds’ theory of Bayesian learning in repeated games. And then there are his contributions to the theory of infinite games with Borel payoffs (now known as Blackwell games) and Blackwell and Fergurson’s solution to the Big Match game.
Blackwell and Girshick about the concept of strategy:
Imagine that you are to play the White pieces in a single game of chess, and that you discover you are unable to be present for the occasion. There is available a deputy, who will represent you on the occasion, and who will carry out your instructions exactly, but who is absolutely unable to make any decisions of his own volition. Thus, in order to guarantee that your deputy will be able to conduct the White pieces throughout the game, your instructions to him must envisage every possible circumstance in which he may be required to move, and must specify, for each such circumstance, what his choice is to be. Any such complete set of instructions constitutes what we shall call a strategy.
Now think about an infinite game, like repeated prisoner’s dilemma. If we take the idea about strategy as set of instructions seriously then not every function from past histories to an action is something we would like to call a strategy, because not every function can be described by a set of instructions ! This should be clear even before we formalize what instructions mean, simply because the set of possible `instructions’ is countable, as every such instructions is just a sentence in English.
So what should be the formal definition of a strategy in these games, a definition that will capture the intuition of a complete set of instructions that specify what your choice is to be for each possible circumstances? You know what I think.
Recent Comments