So, why does Gittins’ theorem hold for arms and not for projects ?
First, an example. Consider the following two projects A and B. The first period you work on A you have to choose an action Up or Down. Future rewards from working on project A depend on your first action. The (deterministic) rewards sequence is
A: 8, 0, 0, 0, … if you played Up; and 5, 5, 5, 5, … if you played Down.
Project B gives the deterministic rewards sequence
B: 7, 0, 0,…
If your discount factor is high (meaning you are sufficiently patient) then your best strategy is to start with project B, get 7 and then switch to project A, play Down and get 5 every period.
Suppose now that there is an additional project, project C which gives the deterministic payoff sequence
C: 6,6,6,….
Your optimal strategy now is to start with A, play Up, and then switch to C.
This is a violation of Independence of Irrelevant Alternatives: when only A and B are available you first choose B. When C becomes available you suddenly choose A. The example shows that Gittins’ index theorem does not hold for projects.
What goes wrong in the proof ? We can still define the “fair charges” and the “prevailing charges” as in the multi-armed proof. For example, the fair charge for project A is 8: If the casino charges a smaller amount you will pay once, play Up, and go home, and the casino will lose money.
The point where the proof breaks down is step 5. It is not true that Gittins’ strategy maximizes the payments to the casino. The key difference is that in the multi-project case the sequence of prevailing charges of each bandit depends on your strategy. In contrast, in the multi-arm case, your strategy only determines which charge you will play when, but the charges themselves are independent of your strategy. Since the discounts sequence is decreasing, the total discounted payment in the multi-armed problem is maximized if you pay the charges in decreasing order, as you do under the Gittins Strategy.
]]>First, an arm or a bandit process is given by a countable state space , a transition function and a payoff function . The interpretation is that at every period, when the arm is at state , playing it gives a reward and the arm’s state changes according to .
In the multi-armed bandit problem, at every period you choose an arm to play. The states of the arms you didn’t choose remain fixed. Your goal is to maximize expected total discounted rewards. Gittins’ theorem says that for each arm there exists a function called the Gittins Index (GI from now on) such that, in a multi armed problem, the optimal strategy is to play at each period the arm whose current state has the largest GI. In fancy words, the theorem establishes that the choice which arm to play at each period satisfies Independent of Irrelevance Alternatives: Suppose there are three arms whose current states are . If you were going to start by playing if only and were available, then you should not start with when are available.
The proof proceeds in several steps:
That’s it.
Now, here is the question. Suppose that instead of arms we would have dynamic optimization problems, each given by a state space, an action space, a transition function, and a payoff function. Let’s call them projects. The difference between a project and an arm is that when you decide to work on a project you also decide which action to take, and the current reward and next state depend on the current state and on your action. Now read again the proof with projects in mind. Every time I said “play arm ”, what I meant is work on project and choose the optimal action. We can still define an “index”, as in the first step: the unique charge such that, if you need to pay every period you work on the project (using one of the actions) then both not working and working with some action are optimal. The conclusion is not true for the projects problem though. At which step does the argument break down ?
]]>In recent years economists have begun to hear about a new type of theory called linear programming. Developed by such mathematicians as G. B. Dantzig, J. v. Neumann, A. W. Tucker, and G. W. Brown, and by such economists as R. Dorfman, T. C. Koopmans, W. Leontief, and others, this field admirably illustrates the failure of marginal equalization as a rule for defining equilibrium. A number of books and articles on this subject are beginning to appear. It is the modest purpose of the following discussion to present a classical economics problem which illustrates many of the characteristics of linear programming. However, the problem is of economic interest for its own sake and because of its ancient heritage.
Of interest are the 5 reasons that Samuelson gives for why readers of the AER should care.
This viewpoint might aid in the choice of convergent numerical iterations to a solution.
From the extensive theory of maxima, it enables us immediately to evaluate the sign of various comparative-statics changes. (E.g., an increase in net supply at any point can never in a stable system decrease the region’s exports.)
By establishing an equivalence between the Enke problem and a maximum problem, we may be able to use the known electric devices for solving the former to solve still other maximum problems, and perhaps some of the linear programming type.
The maximum problem under consideration is of interest because of its unusual type: it involves in an essential way such non-analytic functions as absolute value of X, which has a discontinuous derivative and a corner; this makes it different from the conventionally studied types and somewhat similar to the inequality problems met with in linear programming.
Finally, there is general methodological and mathematical interest in the question of the conditions under which a given equilibrium problem can be significantly related to a maximum or minimum problem.
]]>
It happened in the most unlikeliest of places as well as times. A public (i.e. private) school in pre-Thatcherite England. England was then the sick man of Europe and its decline was blamed upon the public schools. Martin Wiener’s English Culture and the Decline of the Industrial Spirit, for example, argued that the schools had turned a nation of shopkeepers into one of lotus eaters.
Among the boys was a fellow, I’ll call Hodge. He was a well established source of contraband like cigarettes and pornographic magazines. He operated out of a warren of toilets in the middle of the school grounds called the White City. Why the school needed a small building devoted entirely to toilets was a product of the English distrust of indoor plumbing and central heating.
One lesson I learnt from Hodge was never buy a pornographic magazine sight unseen. The Romans call it caveat emptor, but, I think this, more vivid.
Hodge was always on the look out for new goods and services that he could offer for a profit to the other boys. One day, he hit upon the idea of buying a rubber woman (it was plastic and inflatable) and renting it out. The customer base consisted of 400 teenage boys confined to a penal colony upon a wind blasted heath.
Consider the challenges. How was he to procure one (no internet)? Where would he hide the plastic inamorata to prevent theft or confiscation by the authorities? How would he find customers (no smart phones)? What should he charge? What was to prevent competition? And, of course, what happened? All, I think, best left to the imagination.
]]>
]]>
If you have ever seen an El Greco, you will notice that the figures and faces are excessively elongated. Here is an example.The eye surgeon Patrick Trevor-Roper, brother to the historian Hugh offered an explanation. Readers of certain vintage will recall the long running feud between Hugh Trevor-Roper and Evelyn Waugh. Waugh said that the best thing Hugh Trevor-Roper could do would be to change his name and leave Oxford for Cambridge. Hugh Trevor-Roper eventually became Lord Dacre and left Oxford for Cambridge. But, I digress.
Returning to Patrick, he suggested that El Greco had a form of astigmatism, which distorted his vision and led to elongated images forming on his retina. Medawar’s question was simple: was Patrick Trevor-Roper correct?
]]>The first two items on this list are illegal. If a PAC or US (or green card holder) Plutocrat had deployed their respective resources on the third item on this list, it would be perfectly legal. While one should expect the Russian’s to continue with item 3 for the next election, so will each of the main political parties.
Why is `fake’ news influential? Shouldn’t information from a source with unknown and uncertain quality be treated like a lemon? For example, it is impossible for a user to distinguish between a twitter account associated with a real human from a bot. Nor can a user tell whether individual twitter yawps are independent or correlated.
Perhaps it depends on the distinction between information used to make a decision like which restaurant to go to and that which is for consumpiton value only (gossip). There appears to be no fake news crisis in restaurant reviews. There could be a number of reasons for this. The presence of non-crowd sourced reviews, the relatively low cost of experimentation coupled with frequent repetition and the fact that my decision to go to a restaurant does not compel you to do so comes to mind.
Political communication seems to be different, closer to entertainment than informing decision making. If I consume political news that coincide with my partisan leanings because these enteratin me the most, it means that the news did not persuade me to lean that way (it follows that surpressing fake news should not change the distribution of political preferences). So, such news must serve another purpose, perhaps it increases turnout. If so, we should expect the DNC to be much more active in the deployment of `fake’ news and an increase in turnout.
]]>
I suspect that the convention is an artifact of the page limits on conference proceedings. A constraint that seems quaint. Some journals, the JCSS for example, follows the odd convention of referring to earlier work as Bede [22]! But which paper by the venerable and prolific Bede does the author have in mind?
]]>