This post is only gonna make sense if you read my last-week post on Gittins Index

So, why does Gittins’ theorem hold for arms and not for projects ?

First, an example. Consider the following two projects A and B. The first period you work on A you have to choose an action Up or Down. Future rewards from working on project A depend on your first action. The (deterministic) rewards sequence is

A:  8, 0, 0, 0, … if you played Up; and     5, 5, 5, 5, … if you played Down.

Project B gives the deterministic rewards sequence

B:  7, 0, 0,…

If your discount factor is high (meaning you are sufficiently patient) then your best strategy is to start with project B, get 7 and then switch to project A, play Down and get 5 every period.

Suppose now that there is an additional project, project C which gives the deterministic payoff sequence

C:  6,6,6,….

This is a violation of Independence of Irrelevant Alternatives: when only A and B are available you first choose B. When C becomes available you suddenly choose  A. The example shows that Gittins’ index theorem does not hold for projects.

What goes wrong in the proof ? We can still define the “fair charges” and the “prevailing charges” as in the multi-armed proof. For example, the fair charge for project A is 8: If the casino charges a smaller amount you will pay once, play Up, and go home, and the casino will lose money.

The point where the proof breaks down is step 5. It is not true that Gittins’ strategy maximizes the payments to the casino. The key difference is that in the multi-project case the sequence of prevailing charges of each bandit depends on your strategy. In contrast, in the multi-arm case, your strategy only determines which charge you will play when, but the charges themselves are independent of your strategy. Since the discounts sequence $1,\beta,\beta^2,\dots$ is decreasing, the total discounted payment in the multi-armed problem is maximized if you pay the charges in decreasing order, as you do under the Gittins Strategy.

After presenting Richard Weber’s remarkable proof of Gittins’ index theorem in my dynamic optimization class, I claimed that the best way to make sure that you understand a proof is to identify where the assumptions of the theorem are used. Here is the proof again, slightly modified from Weber’s paper, followed by the question I gave in class.

First, an arm or a bandit process is given by a countable state space $S$, a transition function $q(s'|s)$ and a payoff function $r:S\rightarrow [0,1]$. The interpretation is that at every period, when the arm is at state $s$, playing it gives a reward $r(s)$ and the arm’s state changes according to $q(\cdot|s)$.

In the multi-armed bandit problem, at every period you choose an arm to play. The states of the arms you didn’t choose remain fixed. Your goal is to maximize expected total discounted rewards. Gittins’ theorem says that for each arm there exists a function $\gamma:S\rightarrow [0,1]$ called the Gittins Index (GI from now on) such that, in a multi armed problem, the optimal strategy is to play at each period the arm whose current state has the largest GI. In fancy words, the theorem establishes that the choice which arm to play at each period satisfies Independent of Irrelevance Alternatives: Suppose there are three arms $A,B,C$ whose current states are $a,b,c$. If you were going to start by playing $A$ if only $A$ and $B$ were available, then you should not start with $B$ when $A,B,C$ are available.

The proof proceeds in several steps:

1. Define the Gittins Index at state $s$ to be the amount $\gamma$ such that, if the casino charges $\gamma$ every time you play the arm, then both playing and not playing are optimal actions at the state $s$. We need to prove that there exists a unique such $\gamma$. This is not completely obvious, but can be shown by appealing to standard dynamic programming arguments.
2. Assume that you enter a casino with a single arm at some state $s$ with GI $\gamma$. Assume also that the casino charges $\gamma$ every time you play the arm. At every period, you can play, or quit playing, or take a break. From step 1, it follows that regardless of your strategy, the casino will always get a nonnegative net expected net payoff, and if you play optimally then the net expected payoff to the casino (and therefore also to you) is zero. For this reason this $\gamma$ (the GI of the initial state) is called the fair charge. Here, playing optimally means that you either not play at all or start playing and continue to play every period until the arm reaches a state with GI strictly smaller then $\gamma$, in which case you must quit. It is important that as long as the arm is at a state with GI strictly greater than $\gamma$ you continue playing. If you need to take a restroom break you must wait until the arm reaches a state with GI $\le \gamma$.
3. Continuing with a single arm, assume now that the casino announces a new policy that at every period, if the arm reaches a state with GI that is strictly smaller than the GI of all previous states, then the charge for playing the arm drops to the new GI. We call these new (random) charges the prevailing charges. Again, the casino will always get a nonnegative net expected payoff, and if you play optimally then the net expected payoff is zero. Here, playing optimally means that you either not play at all or start playing and continue to play foreover. You can quit or take a bathroom break only at periods in which the prevailing charge equals the GI of the current state.
4. Consider now the multi-arms problem, and assume again that in order to play an arm you have to pay its current prevailing charge as defined in step 3. Then again, regardless of how you play, the Casino will get a nonnegative net payoff (since by step 3 this is the case for every arm separately), and you can still get an expected net payoff $0$ if you play optimally. Playing optimally means that you either not play or start playing. If you start playing you can quit, take a break, or switch to another arm only in periods in which the prevailing charge of the arm you are currently playing equals the GI of its current state.
5. Forget for a moment about your profits and assume that what you care about is maximizing payments to the casino (I don’t mean net payoff, I mean just the charges that the casino receives from your playing). Since the sequence of prevailing charges of every arm is decreasing, and since the discount factor makes the casino like higher payments early, the Gittins strategy — the one in which you play at each period the arm with highest current GI, which by definition of the prevailing charge is also the arm with highest current prevailing charge — is the one that maximizes the Casino’s payments. In fact, this would be the case even if you knew the realization of the charges sequence in advance.
6. The Gittins strategy is one of the optimal strategies from step 4. Therefore, its net expected payoff is $0$.
7. Therefore, for every strategy $\sigma$,
Rewards from $\sigma<=$ Charges from $\sigma<=$ Charges from Gittins strategy
(First inequality is step 4 and second is step 5)
And Charges from Gittins strategy = Rewards from Gittins Strategy
(step 6)
8. Therefore, Gittins strategy gives the optimal possible total rewards.

That’s it.

Now, here is the question. Suppose that instead of arms we would have dynamic optimization problems, each given by a state space, an action space, a transition function, and a payoff function. Let’s call them projects. The difference between a project and an arm is that when you decide to work on a project you also decide which action to take, and the current reward and next state depend on the current state and on your action. Now read again the proof with projects in mind. Every time I said “play arm $i$”, what I meant is work on project $i$ and choose the optimal action. We can still define an “index”, as in the first step: the unique charge $\gamma$ such that, if you need to pay $\gamma$ every period you work on the project (using one of the actions) then both not working and working with some action are optimal. The conclusion is not true for the projects problem though. At which step does the argument break down ?

Volume 42 of the AER, published in 1952, contains an article by Paul Samuelson entitled Spatial Price Equilibrium and Linear Programming’. In it, Samuelson uses a model of Enke (1951) as a vehicle to introduce the usefulness of linear programming techniques to Economists. The second paragraph of the paper is as follows:

In recent years economists have begun to hear about a new type of theory called linear programming. Developed by such mathematicians as G. B. Dantzig, J. v. Neumann, A. W. Tucker, and G. W. Brown, and by such economists as R. Dorfman, T. C. Koopmans, W. Leontief, and others, this field admirably illustrates the failure of marginal equalization as a rule for defining equilibrium. A number of books and articles on this subject are beginning to appear. It is the modest purpose of the following discussion to present a classical economics problem which illustrates many of the characteristics of linear programming. However, the problem is of economic interest for its own sake and because of its ancient heritage.

Of interest are the 5 reasons that Samuelson gives for why readers of the AER should care.

1. This viewpoint might aid in the choice of convergent numerical iterations to a solution.

2. From the extensive theory of maxima, it enables us immediately to evaluate the sign of various comparative-statics changes. (E.g., an increase in net supply at any point can never in a stable system decrease the region’s exports.)

3. By establishing an equivalence between the Enke problem and a maximum problem, we may be able to use the known electric devices for solving the former to solve still other maximum problems, and perhaps some of the linear programming type.

4. The maximum problem under consideration is of interest because of its unusual type: it involves in an essential way such non-analytic functions as absolute value of X, which has a discontinuous derivative and a corner; this makes it different from the conventionally studied types and somewhat similar to the inequality problems met with in linear programming.

5. Finally, there is general methodological and mathematical interest in the question of the conditions under which a given equilibrium problem can be significantly related to a maximum or minimum problem.

My students often use me as a sounding board for their new ventures. A sign that the modern University could pass for a hedge fund with classrooms. The request brings a chuckle as it always reminds me of my first exposure to entrepreneurial activity.

It happened in the most unlikeliest of places as well as times. A public (i.e. private) school in pre-Thatcherite England.  England was then the sick man of Europe and its decline was blamed upon the public schools.  Martin Wiener’s English Culture and the Decline of the Industrial Spirit, for example, argued that the schools had turned a nation of shopkeepers into one of lotus eaters.

Among the boys was a fellow, I’ll call Hodge. He was a well established source of contraband like cigarettes and pornographic magazines. He operated out of a warren of toilets in the middle of the school grounds called the White City. Why the school needed a small building devoted entirely to toilets was a product of the English distrust of indoor plumbing and central heating.

One lesson I learnt from Hodge was never buy a pornographic magazine sight unseen. The Romans call it caveat emptor, but, I think this, more vivid.

Hodge was always on the look out for new goods and services that he could offer for a profit to the other boys. One day, he hit upon the idea of buying a rubber woman (it was plastic and inflatable) and renting it out.  The customer base consisted of 400 teenage boys confined to a penal colony upon a wind blasted heath.

Consider the challenges. How was he to procure one (no internet)? Where would he hide the plastic inamorata to prevent theft or confiscation by the authorities? How would he find customers (no smart phones)? What should he charge? What was to prevent competition? And, of course, what happened? All, I think, best left to the imagination.

17 years ago the Clay Institute announced 7 Millennium problems and offered \$1 million for the solution to each.  To date, only one has been solved. It suggests that the supply (of mathematical attention) has not increased to meet demand. Therefore, the value of the prizes are to low. Why might the current value of the  prize prolong the time it takes to obtain a solution? Suppose, two agents, each in possession of an idea that in combination would produce the sought after solution. Neither agent has an incentive to reveal what they know, for fear that the other will build upon this to earn the prize. Pooling their efforts means splitting the prize. Thus, one has a familiar trade-off. Laboring alone delays the reward but it is unshared. Teaming up, accelerates the reward but it must be shared.

There is a test for smarts’ that Sir Peter Medawar was fond of. I often think of it when teaching equilibrium.

If you have ever seen an El Greco, you will notice that the figures and faces are excessively elongated. Here is an example.The eye surgeon Patrick Trevor-Roper, brother to the historian Hugh offered an explanation. Readers of  certain vintage will recall the long running feud between Hugh Trevor-Roper and Evelyn Waugh. Waugh said that the best thing Hugh Trevor-Roper could do would be to change his name and leave Oxford for Cambridge. Hugh Trevor-Roper eventually  became Lord Dacre and left Oxford for Cambridge. But, I digress.

Returning to Patrick, he suggested that El Greco had a form of astigmatism, which distorted his vision and led to elongated images forming on his retina. Medawar’s question was simple: was Patrick Trevor-Roper correct?

The various US intelligence agencies have identified three ways in which the Russian state meddled with the recent US elections:

1. Intrusions into voter registration systems.
2. Cyberattack on then DNC and subsequent release of hacked material.
3. Deployment of fake’ news and internet trolls.

The first two items on this list are illegal. If a PAC or US (or green card holder) Plutocrat had deployed their respective resources on the third item on this list, it would be perfectly legal. While one should expect the Russian’s to continue with item 3 for the next election, so will each of the main political parties.

Why is fake’ news influential? Shouldn’t information from a source with unknown and uncertain quality be treated like a lemon? For example, it is impossible for a user to distinguish between a twitter account associated with a real human from a bot. Nor can a user tell whether individual twitter yawps are independent or correlated.

Perhaps it depends on the distinction between information used to make a decision like which restaurant to go to and that which is for consumpiton value only (gossip). There appears to be no fake news crisis in restaurant reviews. There could be a number of reasons for this. The presence of non-crowd sourced reviews, the relatively low cost of experimentation coupled with frequent repetition and the fact that my decision to go to a restaurant does not compel you to do so comes to mind.

Political communication seems to be different, closer to entertainment than informing decision making.  If I consume political news that coincide with my partisan leanings because these enteratin me the most, it means that the news did not persuade me to lean that way (it follows that surpressing fake news should not change the distribution of political preferences). So, such news must serve another purpose, perhaps it increases turnout. If so, we should expect the DNC to be much more active in the deployment of fake’ news and an increase in turnout.

In a CS paper, it is common to refer to prior work like [1] and [42] rather than Brown & Bunter (1923) or Nonesuch (2001). It is a convention I have followed in my papers with CS colleagues. Upon reflection, I find it irritating and mean spirited.

1. No useful information is conveyed by the string of numbers masquerading as references beyond the statement: authors think there are X relevant references.’
2. A referee wishing to check if the authors are aware of relevant work must scroll or leaf to the end of the paper to verify this.
3. The casual reader cannot be surprised by some new and relevant reference unless they scroll or leaf to the end of the paper to verify this.
4. Citations are part of the currency (or drug) we live by. Why be parsimonious in acknowledging the contributions of A. N. Other? It shows a want of fellow feeling.

I suspect that the convention is an artifact of the page limits on conference proceedings. A constraint that seems quaint. Some journals, the JCSS for example, follows the odd convention of referring to earlier work as Bede [22]! But which paper by the venerable and prolific Bede does the author have in mind?

When discussing the allocation of indivisible objects, I point to randomization as a method. To emphasize it is not merely a theoretical nicety, but is used to allocate objects that are of high value I give the example of suicide lotteries. I first came across them in Marcus Clarke’s For The Term of His Natural Life’. It counts as the first Australian novel. Its hero, an Englishman, Rufus Dawes is transported to Australia for a crime he did not commit. In the course of his tribulations, Dawes is shipped to a penal settlement on Norfolk Island, 800 miles east of Australia; even a prison needs a further prison. Robert Hughes, in his heart rending and eloquent account of the founding of Australia, called the Fatal Shore’, describes Norfolk island in these words:

….a place of breathtaking barbarity……. On Norfolk Island an Irishman named William Riley received 100 lashes for ”Singing a Song” (no doubt a rebel one) and 50 for asking a warder for a chew of tobacco. Deranged by cruelty and misery, some men would opt for a lifetime at the bottom of the carceral heap by blinding themselves; thus, they reasoned, they would be left alone.’

It is in this portion of his book, that Hughes recalls an eyewitness account of a suicide lottery of the type mentioned in Clarke’s novel. Here is Clarke’s succinct description of it:

The scheme of escape hit upon by the convict intellect was simply this. Three men being together, lots were drawn to determine whom should be murdered. The drawer of the longest straw was the “lucky” man. He was killed. The drawer of the next longest straw was the murderer. He was hanged. The unlucky one was the witness. He had, of course, an excellent chance of being hung also, but his doom was not so certain, and he therefore looked upon himself as unfortunate.

Clarke and Hughes deviate slightly upon the precise incentives that would drive participation in the scheme. As many of the convicts on Norfolk island were Irish, the scheme was concocted as a way to to circumvent the Catholic prohibition on suicide. Hughes suggests that, after the murder, witness and culprit would be shipped back to the mainland for trial. Conditions there were better, so for both there was brief respite and a greater opportunity for escape.

Its an arresting story, that one is loath to give up. But, one is compelled to ask, is it true? If yes, was it common? Tim Causer of King’s College London went back to look at the records and says the answers are maybe’ and no’. Here is his summing up:

Capital offences committed with apparent suicidal intent are an important part of Norfolk Island’s history, but they need to be understood more fully. It should be recognised just how rare they were, that ‘suicide lotteries’ are embellishments upon actual cases of state-assisted suicide and repeating the myth only reinforces the sensationalised interpretation of Norfolk Island’s history.’

You can find the full account here.