You are currently browsing the category archive for the ‘Uncategorized’ category.

Apparently, it is quite the rage to festoon one’s slides with company logos, particularly of the  frightful five. At present this is done for free. It suggests a new business. A platform that matches advertisers to faculty. Faculty can offer up their slides and advertisers can bid for the right to place their logos on the slides.

After presenting Richard Weber’s remarkable proof of Gittins’ index theorem in my dynamic optimization class, I claimed that the best way to make sure that you understand a proof is to identify where the assumptions of the theorem are used. Here is the proof again, slightly modified from Weber’s paper, followed by the question I gave in class.

First, an arm or a bandit process is given by a countable state space $S$, a transition function $q(s'|s)$ and a payoff function $r:S\rightarrow [0,1]$. The interpretation is that at every period, when the arm is at state $s$, playing it gives a reward $r(s)$ and the arm’s state changes according to $q(\cdot|s)$.

In the multi-armed bandit problem, at every period you choose an arm to play. The states of the arms you didn’t choose remain fixed. Your goal is to maximize expected total discounted rewards. Gittins’ theorem says that for each arm there exists a function $\gamma:S\rightarrow [0,1]$ called the Gittins Index (GI from now on) such that, in a multi armed problem, the optimal strategy is to play at each period the arm whose current state has the largest GI. In fancy words, the theorem establishes that the choice which arm to play at each period satisfies Independent of Irrelevance Alternatives: Suppose there are three arms $A,B,C$ whose current states are $a,b,c$. If you were going to start by playing $A$ if only $A$ and $B$ were available, then you should not start with $B$ when $A,B,C$ are available.

The proof proceeds in several steps:

1. Define the Gittins Index at state $s$ to be the amount $\gamma$ such that, if the casino charges $\gamma$ every time you play the arm, then both playing and not playing are optimal actions at the state $s$. We need to prove that there exists a unique such $\gamma$. This is not completely obvious, but can be shown by appealing to standard dynamic programming arguments.
2. Assume that you enter a casino with a single arm at some state $s$ with GI $\gamma$. Assume also that the casino charges $\gamma$ every time you play the arm. At every period, you can play, or quit playing, or take a break. From step 1, it follows that regardless of your strategy, the casino will always get a nonnegative net expected net payoff, and if you play optimally then the net expected payoff to the casino (and therefore also to you) is zero. For this reason this $\gamma$ (the GI of the initial state) is called the fair charge. Here, playing optimally means that you either not play at all or start playing and continue to play every period until the arm reaches a state with GI strictly smaller then $\gamma$, in which case you must quit. It is important that as long as the arm is at a state with GI strictly greater than $\gamma$ you continue playing. If you need to take a restroom break you must wait until the arm reaches a state with GI $\le \gamma$.
3. Continuing with a single arm, assume now that the casino announces a new policy that at every period, if the arm reaches a state with GI that is strictly smaller than the GI of all previous states, then the charge for playing the arm drops to the new GI. We call these new (random) charges the prevailing charges. Again, the casino will always get a nonnegative net expected payoff, and if you play optimally then the net expected payoff is zero. Here, playing optimally means that you either not play at all or start playing and continue to play foreover. You can quit or take a bathroom break only at periods in which the prevailing charge equals the GI of the current state.
4. Consider now the multi-arms problem, and assume again that in order to play an arm you have to pay its current prevailing charge as defined in step 3. Then again, regardless of how you play, the Casino will get a nonnegative net payoff (since by step 3 this is the case for every arm separately), and you can still get an expected net payoff $0$ if you play optimally. Playing optimally means that you either not play or start playing. If you start playing you can quit, take a break, or switch to another arm only in periods in which the prevailing charge of the arm you are currently playing equals the GI of its current state.
5. Forget for a moment about your profits and assume that what you care about is maximizing payments to the casino (I don’t mean net payoff, I mean just the charges that the casino receives from your playing). Since the sequence of prevailing charges of every arm is decreasing, and since the discount factor makes the casino like higher payments early, the Gittins strategy — the one in which you play at each period the arm with highest current GI, which by definition of the prevailing charge is also the arm with highest current prevailing charge — is the one that maximizes the Casino’s payments. In fact, this would be the case even if you knew the realization of the charges sequence in advance.
6. The Gittins strategy is one of the optimal strategies from step 4. Therefore, its net expected payoff is $0$.
7. Therefore, for every strategy $\sigma$,
Rewards from $\sigma<=$ Charges from $\sigma<=$ Charges from Gittins strategy
(First inequality is step 4 and second is step 5)
And Charges from Gittins strategy = Rewards from Gittins Strategy
(step 6)
8. Therefore, Gittins strategy gives the optimal possible total rewards.

That’s it.

Now, here is the question. Suppose that instead of arms we would have dynamic optimization problems, each given by a state space, an action space, a transition function, and a payoff function. Let’s call them projects. The difference between a project and an arm is that when you decide to work on a project you also decide which action to take, and the current reward and next state depend on the current state and on your action. Now read again the proof with projects in mind. Every time I said “play arm $i$”, what I meant is work on project $i$ and choose the optimal action. We can still define an “index”, as in the first step: the unique charge $\gamma$ such that, if you need to pay $\gamma$ every period you work on the project (using one of the actions) then both not working and working with some action are optimal. The conclusion is not true for the projects problem though. At which step does the argument break down ?

Volume 42 of the AER, published in 1952, contains an article by Paul Samuelson entitled Spatial Price Equilibrium and Linear Programming’. In it, Samuelson uses a model of Enke (1951) as a vehicle to introduce the usefulness of linear programming techniques to Economists. The second paragraph of the paper is as follows:

In recent years economists have begun to hear about a new type of theory called linear programming. Developed by such mathematicians as G. B. Dantzig, J. v. Neumann, A. W. Tucker, and G. W. Brown, and by such economists as R. Dorfman, T. C. Koopmans, W. Leontief, and others, this field admirably illustrates the failure of marginal equalization as a rule for defining equilibrium. A number of books and articles on this subject are beginning to appear. It is the modest purpose of the following discussion to present a classical economics problem which illustrates many of the characteristics of linear programming. However, the problem is of economic interest for its own sake and because of its ancient heritage.

Of interest are the 5 reasons that Samuelson gives for why readers of the AER should care.

1. This viewpoint might aid in the choice of convergent numerical iterations to a solution.

2. From the extensive theory of maxima, it enables us immediately to evaluate the sign of various comparative-statics changes. (E.g., an increase in net supply at any point can never in a stable system decrease the region’s exports.)

3. By establishing an equivalence between the Enke problem and a maximum problem, we may be able to use the known electric devices for solving the former to solve still other maximum problems, and perhaps some of the linear programming type.

4. The maximum problem under consideration is of interest because of its unusual type: it involves in an essential way such non-analytic functions as absolute value of X, which has a discontinuous derivative and a corner; this makes it different from the conventionally studied types and somewhat similar to the inequality problems met with in linear programming.

5. Finally, there is general methodological and mathematical interest in the question of the conditions under which a given equilibrium problem can be significantly related to a maximum or minimum problem.

My students often use me as a sounding board for their new ventures. A sign that the modern University could pass for a hedge fund with classrooms. The request brings a chuckle as it always reminds me of my first exposure to entrepreneurial activity.

It happened in the most unlikeliest of places as well as times. A public (i.e. private) school in pre-Thatcherite England.  England was then the sick man of Europe and its decline was blamed upon the public schools.  Martin Wiener’s English Culture and the Decline of the Industrial Spirit, for example, argued that the schools had turned a nation of shopkeepers into one of lotus eaters.

Among the boys was a fellow, I’ll call Hodge. He was a well established source of contraband like cigarettes and pornographic magazines. He operated out of a warren of toilets in the middle of the school grounds called the White City. Why the school needed a small building devoted entirely to toilets was a product of the English distrust of indoor plumbing and central heating.

One lesson I learnt from Hodge was never buy a pornographic magazine sight unseen. The Romans call it caveat emptor, but, I think this, more vivid.

Hodge was always on the look out for new goods and services that he could offer for a profit to the other boys. One day, he hit upon the idea of buying a rubber woman (it was plastic and inflatable) and renting it out.  The customer base consisted of 400 teenage boys confined to a penal colony upon a wind blasted heath.

Consider the challenges. How was he to procure one (no internet)? Where would he hide the plastic inamorata to prevent theft or confiscation by the authorities? How would he find customers (no smart phones)? What should he charge? What was to prevent competition? And, of course, what happened? All, I think, best left to the imagination.

There is a test for smarts’ that Sir Peter Medawar was fond of. I often think of it when teaching equilibrium.

If you have ever seen an El Greco, you will notice that the figures and faces are excessively elongated. Here is an example.The eye surgeon Patrick Trevor-Roper, brother to the historian Hugh offered an explanation. Readers of  certain vintage will recall the long running feud between Hugh Trevor-Roper and Evelyn Waugh. Waugh said that the best thing Hugh Trevor-Roper could do would be to change his name and leave Oxford for Cambridge. Hugh Trevor-Roper eventually  became Lord Dacre and left Oxford for Cambridge. But, I digress.

Returning to Patrick, he suggested that El Greco had a form of astigmatism, which distorted his vision and led to elongated images forming on his retina. Medawar’s question was simple: was Patrick Trevor-Roper correct?

In a CS paper, it is common to refer to prior work like [1] and [42] rather than Brown & Bunter (1923) or Nonesuch (2001). It is a convention I have followed in my papers with CS colleagues. Upon reflection, I find it irritating and mean spirited.

1. No useful information is conveyed by the string of numbers masquerading as references beyond the statement: authors think there are X relevant references.’
2. A referee wishing to check if the authors are aware of relevant work must scroll or leaf to the end of the paper to verify this.
3. The casual reader cannot be surprised by some new and relevant reference unless they scroll or leaf to the end of the paper to verify this.
4. Citations are part of the currency (or drug) we live by. Why be parsimonious in acknowledging the contributions of A. N. Other? It shows a want of fellow feeling.

I suspect that the convention is an artifact of the page limits on conference proceedings. A constraint that seems quaint. Some journals, the JCSS for example, follows the odd convention of referring to earlier work as Bede [22]! But which paper by the venerable and prolific Bede does the author have in mind?

Many people say (actually, just one) that the Republican’s have a plan to remove Trump from the Presidency, should he win in November using the 25th amendment. Section 4 of the amendment reads:

Whenever the Vice President and a majority of either the principal officers of the executive departments or of such other body as Congress may by law provide, transmit to the President pro tempore of the Senate and the Speaker of the House of Representatives their written declaration that the President is unable to discharge the powers and duties of his office, the Vice President shall immediately assume the powers and duties of the office as Acting President.’

The VP is Pence. The President pro tempore of the Senate, is the senior senator of the majority party and Paul Ryan is the Speaker of the House.
The President can object. At which point, Congress resolves the matter, specifically,

….two-thirds vote of both Houses that the President is unable to discharge the powers and duties of his office, the Vice President shall continue to discharge the same as Acting President; otherwise, the President shall resume the powers and duties of his office.’

It is not often that Terry Tao gets into politics in his blog, but, as political observers like to say, normal rules don’t apply this year. Tao writes that many of Trump’s supporters secretly believe that he is not even remotedly qualified for the presidency, but they continue to entertain this possibility because their fellow citizens and the media and politicians seem to be doing so. He suggests that more people should come out and reveal their secret beliefs.

I generally agree with Tao’ sentiment and argument, but I have a quibble. Tao describe the current situation as mutual knowledge without common knowledge. This, I think, is wrong. To get politics out of the way, let me explain my position using a similar situation which Tao also mentions: The Emperor’s new clothes. I have already come across people casting the Emperor’s story in terms of mutual knowledge without common knowledge, and I think it is also wrong. The way I understand the story, before the kid shouts, each of the Emperor’s subjects sees that the Emperor is naked, but after observing everybody else’s reaction, each subject updates her own initial belief and deduces that she was probably wrong. The subjects now don’t think that the Emperor is naked. Rather, each subjects thinks that her own eyes deceived her.

But when game theorists and logicians say that an assertion is mutual knowledge (or mutual belief) we mean that each of us, after taking into account our own information including what we deduce about other people’s information, think the assertion is true. In my reading of the Emperor’s new cloths story this is not the case.

For an assertion to be common knowledge, we need in addition that everybody knows that everybody knows that the assertion is true, and that everybody knows that everybody knows that everybody knows that the assertion is true, and onwards to infinity. A good example of a situation with mutual knowledge and no common knowledge is the blue-eyed islanders puzzle (using the story as it appears Terrence’ blog and a big spoiler ahead if you are not familiar with the puzzle): Before the foreigner makes an announcement, it is mutual knowledge that there are at least 99 blue-eyed islanders, but this fact is not common knowledge: If Alice and Bob are both blue-eyed then Alice, not knowing the color of her own eyes, thinks that Bob might observe only 98 blue-eyed islanders. In fact it is not even common knowledge that there are at least 98 blue-eyed Islanders, because Alice thinks that Bob might think that Craig might only observe 97 blue-eyed Islanders. By similar reasoning, before the foreigner’s announcement, it is not even common knowledge that there is at least one blue-eyed islander. Once the foreigner announces it, this fact becomes common knowledge.

No mutual knowledge and no common knowledge are two situations that can have different behavioral implications. Suppose that we offer each of the subjects the following private voting game: Is the emperor wearing clothes ? You have to answer yes or no. If you answer correctly you get a free ice cream sandwich, otherwise you get nothing. According to my reading of the story they will all give the wrong answer, and get nothing. On the other hand, suppose you offer a similar game to the islanders — even before the foreigner arrives — Do you think that there is at least one blue-eyed islander ?  they will answer correctly.

There is an alternative reading of the Emperor’s story, according to which it is indeed a story about mutual knowledge without common knowledge: Even after observing the crowd’s reaction, each subject still knows that the Emperor is naked, but she keeps her mouth shut because she suspects that her fellow subjects don’t realize it and she doesn’t want to make a fool of herself. This reading strikes me as less psychologically interesting, but, more importantly, if that’s how you understand the story then there is nothing to worry about. All the subjects will vote correctly anyway and get the ice cream even without the little kid making it a common knowledge. And Trump will not be elected president even if people continue to keep their mouth shut.

I am not the right person to write about Lloyd Shapley. I think I only saw him once, in the first stony brook conference I attended. He reminded me of Doc Brown from Back to The Future, but I am not really sure why. Here are links to posts in The Economist and NYT following his death.

Shapley got the Nobel in 2012 and according to Robert Aumann deserved to get it right with Nash. Shapley himself however was not completely on board: “I consider myself a mathematician and the award is for economics. I never, never in my life took a course in economics.” If you are wondering what he means by “a mathematician” read the following quote, from the last paragraph of his stable matching paper with David Gale

The argument is carried out not in mathematical symbols but in ordinary English; there are no obscure or technical terms. Knowledge of calculus is not presupposed. In fact, one hardly needs to know how to count. Yet any mathematician will immediately recognize the argument as mathematical…

What, then, to raise the old question once more, is mathematics? The answer, it appears, is that any argument which is carried out with sufficient precision is mathematical

In the paper Gale and Shapley considered a problem of matching (or assignment as they called it) of applicants to colleges, where each applicant has his own preference over colleges and each college has its preference over applicants. Moreover, each college has a quota. Here is the definition of stability, taken from the original paper

Definition: An assignment of applicants to colleges will be called unstable if there are two applicants ${\alpha}$ and ${\beta}$ who are assigned to colleges ${A}$ and ${B}$, respectively, although ${\beta}$ prefers ${A}$ to ${B}$ and ${A}$ prefers ${\beta}$ to ${\alpha}$.
According to the Gale-Shapley algorithm, applicants apply to colleges sequentially following their preferences. A college with quota ${q}$ maintains a waiting list’ of size ${q}$ with the top ${q}$ applicants that has applied to it so far, and rejects all other applicants. When an applicant is rejected from a college he applies to his next favorite college. Gale and Shapley proved that the algorithm terminates with a stable assignment.

One reason that the paper was so successful is that the Gale Shapley method is actually used in practice. (A famous example is the national resident program that assigns budding physicians to hospitals). From theoretical perspective my favorite follow-up  is a paper of Dubins and Freedman “Machiavelli and the Gale-Shapley Algorithm” (1981): Suppose that some applicant, Machiavelli, decides to `cheat’ and apply to colleges in different order than his true ranking. Can Machiavelli improves his position in the assignment produced by the algorithm ? Dubins and Freedman prove that the answer to this question is no.

Shapley’s contribution to game theory is too vast to mention in a single post. Since I mainly want to say something about his mathematics let me mention Shapley-Folkman-Starr Lemma, a kind of discrete analogue of Lyapunov’s theorem on the range of non-atomic vector measures, and KKMS Lemma which I still don’t understand its meaning but it has something to do with fixed points and Yaron and I have used it in our paper about rental harmony.

I am going to talk in more details about stochasic games, introduced by Shapley in 1953, since this area has been flourishing recently with some really big developments. A (two-player, zero-sum) stochastic game is given by a finite set ${Z}$ of states, finite set of actions ${A,B}$ for the players, a period payoff function ${r:Z\times A\times B\rightarrow [0,1]}$, a distribution ${q(\cdot|z,a,b)}$ over ${Z}$ for every state ${z}$ and actions ${a,b}$, and a discount factor ${0<\beta<1}$. At every period the system is at some state ${z\in Z}$, players choose  actions ${a,b}$ simultaneously and independently. Then the column player pays ${r(z,a,b)}$ to the row player. The game then moves to a new state in the next period, randomized according to ${q(\cdot|z,a,b)}$. Players evaluate their infinite stream of payoofs via the discount factor ${\beta}$. The model is a generalization of the single player dynamic programming model which was studied by Blackwell and Bellman. Shapley proved that every zero-sum stochastic game admits a value, by imitating the familiar single player argument, which have been the joy and pride of macroeconomists ever since Lucas asset pricing model (think Bellman Equation and the contraction operators). Fink later proved using similar ideas that non-zero sum discounted stochastic games admit perfect markov equilibria.

A major question, following a similar question in the single player setup, is the limit behavior of the value and the optimal strategies when players become more patient (i.e., ${\beta}$ goes to ${1}$). Mertens and Neyman have proved that the limit exists, and moreover that for every ${\epsilon>0}$ there strategies which are ${\epsilon}$-optimal for sufficiently large discount factor. Whether a similar result holds for Nash equilibrium in ${N}$-player stochastic games is probably the most important open question in game theory. Another important question is whether the limit of the value exists for zero-sum games in which the state is not observed by both players. Bruno Zilloto has recently answered this question by providing a counter-example. I should probably warn that you need to know how to count and also some calculus to follow up this literature. Bruno Zilloto will give the Shapley Lecture in Games2016 in Maastricht. Congrats, Bruno ! and thanks to Shapley for leaving us with some much stuff to play with !