Game theorists analyze negotiations as if they were split-a-pie games involving selfish players. Because I spent many years during my previous life as an academic researching game theory, some commentators rushed to presume that as Greece’s new finance minister I was busily devising bluffs, stratagems and outside options, struggling to improve upon a weak hand.

Is this a case of a theorist mugged by reality or someone who misunderstands theory? The second. The first sentence quoted proves it because its false. Patently so. Yes, there are split-a-pie models of negotiation but they are not the only models. What about models where the pie changes in size with investments made by the players (i.e. double marginalization)? Wait, this is precisely the situation that Varoufakis sees himself in:

…….table our proposals for regrowing Greece, explain why these are in Europe’s interest…….

He continues:

`If anything, my game-theory background convinced me that it would be pure folly to think of the current deliberations between Greece and our partners as a bargaining game to be won or lost via bluffs and tactical subterfuge.’

Bluff and subterfuge are not the only arrow in the Game Theorist’s quiver. Commitment is another. Wait! Here is Varoufakis trying to signal commitment:

Faithful to the principle that I have no right to bluff, my answer is: The lines that we have presented as red will not be crossed. Otherwise, they would not be truly red, but merely a bluff.

Talk is cheap but credible commitments are not. A `weak’ type sometimes has a strong incentive to claim they are committed to this much and no more. Thus, Varoufakis’ claim that he does not bluff rings hollow, because a liar would say as much. Perhaps Varoufakis should dust off his Schelling and bone up on his signaling as well as war of attrition games. Varoufakis may not bluff, but his negotiating partners think he does. Protestations to the contrary, appeals to justice, Kant and imperatives are simply insufficient.

He closes with this:

One may think that this retreat from game theory is motivated by some radical-left agenda. Not so. The major influence here is Immanuel Kant, the German philosopher who taught us that the rational and the free escape the empire of expediency by doing what is right.

Nobel sentiments, but Kant also reminded us that

“Out of the crooked timber of humanity, no straight thing was ever made.”

My advice to Varoufakis: more Game Theory, less metaphysics.

]]>“Employees must wash hands before returning to work.”

was an example of government over-regulation.

Quoting himself:

“I said that I don’t have any problem with Starbucks if they choose to opt out of this policy as long as they post a sign that says, ‘We don’t require our employees to wash their hands after leaving the restroom.’ The market will take care of that.”

Many found the sentiment ridiculous, but for the wrong reason. Tillis was not advocating the abolition of the hand washing injunction but replacing it with another that would, in his view, have the same effect. More generally, he seems to suggest the following rule: you can opt out of a regulation as long as one discloses this. If the two forms of regulation (all must follow vs. opt out but disclose) are outcome equivalent why should we prefer one to the other?

Monitoring costs are not lower; one still has to monitor those who opt out to verify they have disclosed. What constitutes disclosure? For example:

`We do not require our employees to wash their hands because they do so anyway.’

Would the following be acceptable?

]]>“We operate a hostile work environment, but pay above above average wages to compensate for that.”

**The rules of the auction are as follows:**

- The auction is conducted on the internet, and each participant sits in his office.
- The minimum bid is 2,000,000 NIS per mega-herz (meaning 10M per band).
- Bids must be multiples of 100,000.
- No two bids can be the same: if a participant makes a bid that was already made by another participant, the e-system notifies the participant that his bid is illegal.

The auction is conducted in round.

In the first round:

- Each bidder makes a bid, which consists of the number of bands it asks for and the price he is willing to pay for each mega-herz. As mentioned above, no two participants can offer the same price.
- The e-system allocates the 8 band between the participants according to the bids: the bidder with the highest bid gets the number of bands it asked for, then the bidder with the second highest bid, and so on, until the 8 bands are exhausted.
- Each participant is told the number of bands it was allocated.
- The price in which the 8th band was allocated is called the threshold and posted publicly.
- The minimum price for the next round is the threshold + 200,000 NIS.

In each subsequent round:

- Each participant that got 0 bands in the previous round, as well as any participant who got strictly less bands than the number of bands it bid for (there is at most one bidder that can satisfy this latter condition), can submit a new bid. A participant cannot increase the number of bands it asks for, so if a bidder asked for 2 bands in a certain round, it can bid for 0, 1, or 2 bands in the following round. A participant may choose not to submit any bid, but it can do that only once in a row: a participant who does not submit a bid for two consecutive rounds cannot submit any bid in any subsequent round.
- The new threshold and minimum price are calculated as above.

Along the auction, the bidders know

a) Their own bids.

b) The list of thresholds (or, equivalently, minimum prices).

In particular, the number of participants and their identities is not known, so that nobody knows if and when a participant left the auction.

One feature (or bug) of the system was that the process of submitting a bid had two steps: the participant first submits the bid and is told whether it is legal or not (that is, whether it was already made by another participant), and only if it is legal the participant is asked to confirm it. This allows the participants to know the number of participants that are still in the auction, simply by checking which prices are already taken (except of the price “threshold+100,000″, which cannot be verified).

Important information the participants do not have is the number of bands that each other participant ask for. As we will see, this was the key issue in the actual auction.

**The participants and the relations between them**

There were six participants: three large service providers, Pelephone, Cellcom, and Partner, and three newcomers, Exphone, Golan, and Hot. To construct a viable network one needs at least 4 bands; 4 bands are enough to meet current demand, yet in few years, when the demand for fourth-generation communication will increase, a fifth band will be needed. Consequently, no participant was allowed to bid for more than 4 bands. In addition, each of Cellcom and Partner already owns two bands, which can be used for fourth-generation communication. Consequently, these two participants were allowed to bid for at most two bands. Simple math shows that the 8 new bands, together with the 4 existing bands, are sufficient to create three viable networks, one for each large service provider.

The license of the newcomers requires them to get at least one band each. To be able to provide service, each newcomer planned to join one large service provider: Cellcom had an agreement with Golan for a joint network. Similarly, Partner had an agreement with Hot. Pelephone and Exphone tried to reach an agreement, but failed to do so before the auction began.

To encourage newcomers, the Ministry of Communication gave 50% discount to each one of the three newcomers.

The situation then is as follows:

Large firm | #bands in possession | Maximal #bands it can bid | %discount | Small firm | #bands in possession | Maximal #bands it can bid | %discount |

Pelephone | 0 | 4 | 0% | Exphone | 0 | 4 | 50% |

Cellcom | 2 | 2 | 0% | Golan | 0 | 4 | 50% |

Partner | 2 | 2 | 0% | Hot | 0 | 4 | 50% |

Presumably, the auction is very simple and should end after the first round at a price that is close to the minimum price: Cellcom, Partner, Golan, and Hot each bids for one band; Pelephone and Exphone together bid for 4 bands, most likely Exphone bids for 1 band and Pelephone for 3 bands. In fact, each large firm wanted its smaller partner to get 1 band, because this way they get a band for their network without paying its cost.

Unfortunately for the participants, it is illegal to coordinate bids, and knowledge is not common knowledge. Some concerns that participants may have are:

- A partnership (of a large service provider and a newcomer) that has only 3 bands will not be able to provide a viable network. This situation should be avoided at all cost.
- Pelephone was not able to sign an agreement with Exphone before the auction started. Suppose that Exphone wins a band; will it then join Pelephone, or maybe it will prefer joining one of the other large service providers?
- The price of 2,000,000 NIS is low, and at that price, firms may be willing to purchase more bands than they need, just to be on the safe side, or to have 5 bands to meet future demand for fourth-generation communication.
- The newcomers have smaller pockets than the large service providers. If some participant decides to bid more than the numbers I mentioned above (e.g., Hot bids on 2 bands, or Pelephone on 4 bands) then someone will have to drop out. This is likely to be one of the newcomers. But then its larger partner will be left with 3 bands, a situation which, as mentioned above, should be avoided.

The uncertainty described above increases the probability that there will be over-demand for bands, and that prices will soar.

**Optimal Behavior**

How should one bid in this auction?

A newcomer, who needs one band, should go all the way up to its private value.

A large firm that believes that its smaller party will not leave the auction should similarly ask for 1 (Partner and Cellcom) or 3 (Pelephone) bands.

A large firm who believes that its smaller partner may leave the auction should ask for 2 (Partner and Cellcom) or 4 (Pelephone) bands.

If there is over-demand of 1, then exactly one firm overbid. In this case the firm that made the overbid should ask for one fewer band and end the auction.

But what does one do if there is over-demand of 2 (or more) bands? Maybe some small firm asks for 2 bands? Maybe Pelephone asks for 4 bands? A large firm that bid for one additional band than the number it was supposed to bid, cannot give up this additional band, and therefore has to stick to overbidding.

**The actual auction**

After several rounds, the information available to the participants (that is, the list of thresholds as well as their own bids) allowed them to infer that there is an over-demand of 2 bands. Pelephone, for example, could deduce this information as follows: when the threshold was equal to Pelephone’s offer, it got 2 out of the 3 or 4 bands that it bid for, which implies that there is an over-demand of at least two bands. The threshold in the previous round allowed them to deduce that the over-demand is exactly 2).

An overbid of two bands can occur in two scenarios:

Scenario 1: Pelephone bid 4, one firm bid 2, four firms bid 1.

Scenario 2: Pelephone bid 3, two firms bid 2, three firms bid 1.

Why is it important to distinguish between the two scenarios?

Suppose that Hot bid for 2 bands. If the true scenario is the first one, Hot knows that once it gives up a band, Pelephone may follow suit. If, on the other hand, the true scenario is the second one, then Hot does not know which other firm bid for 2 bands. If it is Cellcom or Partner, then once Hot gives up a band, Cellcom or Partner may do the same. If, on the other hand, it is Golan or Exphone, then both Hot and Golan/Exphone try to get an extra band for a low price, and then even if Hot gives up a band, Golan may stay in, in which case prices will go up.

After several more rounds all participants could infer the actual bids, though they could not know which firm made which bid. The true scenario happened to be the first one. It was quite clear that Pelephone asks for 4 bands. But who bids for 2 bands?

Pelephone feared that Golan or Hot attempts to buy 2 bands, which might cause Exphone to withdraw, so that if they go down to 3 bands, they will end up with 3 bands – a disaster. The participant who bid for 2 bands, call it firm X, could be either a large service provider or a newcomer. If it was a large provider, it feared that Pelephone will go all the way with 4 bands, which means that its small partner may withdraw, so that if they (firm X) give up a band, then they might end up with 3 bands (the new 1 band + the two they already have) – a disaster. If firm X is a newcomer, then they may wish to get a second band for a cheap price. Since a newcomer has a discount of 50%, if the price of a band turns out to be P, then the newcomer will have to pay P for 2 bands, while Pelephone will pay 4P for its 4 bands. There is no way to avoid a price war. Will Pelephone be able to survive it?

Unfortunately for the government that wants to have a high final price, and fortunately for the participants, rational thinking can be useful. The only participants whose strategy is not clear are firm X itself and Pelephone. Consider the point of view of firm X. Suppose that it (firm X) decides to stick to 2 bands.

- If Pelephone sticks to 4 bands whatever happens, the price will increase, and the participants will reach the same situation they have now, but with higher prices. If firm X is a large firm, this will increase the probability that its small partner will drop out, and firm X will have to purchase 2 bands at a high price.
- If Pelephone ask for 4 bands out of fear that Exphone will drop out, then the moment that firm X asks for 1 band only, Pelephone will follow suit. Will they know that firm X asks for only 1 band? Yes, because once firm X does it, the threshold will increase by 100,000 NIS per round, and there will be no round in which the threshold increases by 200,000 NIS. Moreover, whenever the threshold is equal to Pelephone’s bid, it will get 3 out of the 4 bands it asked for, rather than 2 out of the 4 it got so far.
- Because Pelephone does not know who asks for 2 bands, a large firm who is afraid that its small partner will drop out or a small firm that tries to get 2 bands for a low price, at present they cannot allow themselves to give up a band, and therefore if firm X sticks to asking for 2 bands, firm X will never know whether Pelephone plans to stick to 4 bands or whether they will give up a band once firm X does it.

The conclusion that firm X could reach is that they should give up a band, and that this is the optimal course of action, whatever is Pelephone’s reaction. And indeed, firm X gave up a band, in a subsequent round (the first round in which it could do it) Pelephone gave up a band, and the auction was over.

What I find interesting in this auction is

- Even though the rules dictate that the bids are not known, and all a participant knows is the sequence of thresholds, nevertheless the number of bands that the participants asked for could have been deduced by the participants during the first rounds of the auction.
- Since the identity of the participant that made each bid could not be deduced, a price war could have ensued.
- Firm X was the only one who could stop a price war. Luckily for the other participants, it did it, and made all participants profit from a low price.
- Even though monitoring was incomplete, the auction was efficient: the outcome was the scenario that would have been realized if the participants could have coordinated their bids. Moreover, the final price of a band was not high, so the firms did not pay much to learn the missing data.
- There are many instances of auctions in which the deep theory that we develop is not helpful at all. However, strategic thinking can be useful and save hundreds of millions of dollars.

There were, and are two good reasons for why this question should be left to rot in peace. The first is that the comparisons made to arrive at a demarcation are problematic. If Science were a country, Physics might be its capital. If one were to ask whether History is a Science, the customary thing to do is to measure the proximity of History to Science’s capital city. Why proximity to the capital and not to one of its outlying settlements like Geology and Archaeology? The second, better reason, is that the question, `is X a science?’ is of interest only if we believe that scientific knowledge should be privileged in some way. Perhaps it alone is valid and useful while nonscientific knowledge is not. If that is the case, the correct question should not be whether X is a science, but whether X produces knowledge that is valid and useful. Now we have something interesting to discuss: what constitutes useful or valid knowledge?

One might point to accurate prediction, but this alone cannot be the touchstone. How would we feel about the laws of Newtonian motion if we came upon them via regression? I suspect many of us would find such a theory to be incomplete, not least because of the concern with out of sample prediction. By the way, if you think this outlandish, I first learnt Newton’s laws by sending little carts down inclines with bits of ticker tape attached to them to so that we might, by induction, learn a linear relationship between velocity and acceleration. Truth be told, the Physics was sometimes lost in the enormous fun of racing the carts when the master’s back was turned. What if prediction is probabilistic rather than deterministic? In earlier posts on this blog you will find lengthy discussions of the problems associated with evaluating the accuracy of such predictions. I mention all this to hint at how difficult it is to pin down precisely what constitutes useful, reliable or valid knowledge.

]]>What happens when the externality is not `priced in’? The hoary example of two firms, one upstream from the other with the upstream firm releasing a pollutant into the river (That lowers its costs but raises the costs of the downstream firm) was introduced and we went through the possibilities: regulation, taxation, merger/ nationalization and tradeable property rights.

Discussed pros and cons of each. Property rights (i.e. Coase), consumed a larger portion of the time; how would you define them, how would one ensure a perfectly competitive market in the trade of such rights? Nudged them towards the question of whether one can construct a perfectly competitive market for any property right.

To fix ideas, asked them to consider how a competitive market for the right to emit carbon might work. Factories can, at some expense lower carbon emissions. We each of us value a reduction in carbon (but not necessarily identically). Suppose we hand out permits to factories (recall, Coase says initial allocation of property rights is irrelevant) and have people buy the permits up to reduce carbon. Assuming carbon reduction is a public good (non-excludable and non-rivalrous), we have a classic public goods problem. Strategic behavior kills the market.

Some discussion of whether reducing carbon is a public good. The air we breathe (there are oxygen tanks)? Fireworks? Education? National Defense? Wanted to highlight that nailing down an example that fit the definition perfectly was hard. There are `degrees’. Had thought that Education would generate more of a discussion given the media attention it receives, it did not.

Concluded with an in depth discussion of electricity markets as it provides a wonderful vehicle to discuss efficiency, externalities as well as entry and exit in one package. It also provides a backdoor way into a discussion of net neutrality that seemed to generate some interest. As an aside I asked them whether perfectly competitively markets paid agents what they were worth? How should one measure an agents economic worth? Nudged them towards marginal product. Gave an example where Walrasian prices did not give each agent his marginal product (where the core does not contain the Vickrey outcome). So, was Michael Jordan overpaid or underpaid?

With respect to entry and exit I showed that the zero profit condition many had seen in earlier econ classes did not produce efficient outcomes. The textbook treatment assumes all potential entrants have the same technologies. What if the entrants have different technologies? For example, solar vs coal. Do we get the efficient mix of technologies? Assuming a competitive market that sets the Walrasian price for power, I showed them examples where we do not get the efficient mix of technologies.

The abstract lists three points the authors wish to make.

**1) We begin by documenting the relative insularity of economics, using bibliometric data.**

A former colleague of mine once classified disciplines as sources (of ideas) and sinks (absorbers of them). One could just as well as describe the bibliometric data as showing that Economics is a source of ideas while other social sciences are sinks. if one really wanted to put the boot in, perhaps the sinks should be called back holes, ones from which no good idea ever escapes.

**2) Next we analyze the tight management of the field from the top down, which gives economics its characteristic hierarchical structure.**

Economists can be likened to the Borg, which are described by Wikipedia as follows:

“….. the Borg force other species into their collective and connect them to “the hive mind”; the act is called assimilation and entails violence, abductions, and injections of microscopic machines called nanoprobes.”

**3) Economists also distinguish themselves from other social scientists through their much better material situation (many teach in business schools, have external consulting activities), their more individualist worldviews, and in the confidence they have in their discipline’s ability to fix the world’s problems.**

If the authors had known of this recent paper in Science they could have explained all this by pointing out that Economists are wheat people and other social scientists are rice people.

]]>“Past experience shows that most of the time, during six months after elections the stock market was at a higher level than before the elections,” emphasized Zbezinsky (the chief economist, ES). The Meitav-Dash investment house checked the performance of the TA-25 Index (the index of the largest 25 companies in the Israeli stock exchange, ES) in the last six elections. They compared the index starting from 6 months before elections up to six months after elections, and the result was that the average return is positive and equals 6%.

To support this claim, a nice graph is added:

Even without understanding Hebrew, you can see the number 25 at the title, which refers to the TA-25 index, the six colored lines in the graph, where the x-axis measures the time difference from elections (in months), and the year in which each elections took place. Does this graph support the claim of the chief economist? Is his claim relevant or interesting? Some points that came up to a non-economist like me are:

- Six data points, this is all the guy has. And from this he concludes that “most of the time” the market increased. Well, he is right; the index increased four times and decreased only twice.
- Election is due 17-March-2015, which means three and a half months. In particular, taking as a baseline 6 months before election is useless; this baseline is well into the past.
- Some of the colored lines seem to fluctuate, suggesting that some external events, unrelated to elections, may have had an impact on the stock market, like the Intifada in 2001 or the consequences of the Lebanon war before the 2009 elections. It might be a good idea to check whether some of these events are expected to occur in the coming nine months and a half.
- It will also be nice to compare the performance around elections to the performance in between elections. Maybe 6% is the usual performance of the TA-25, maybe it is usually higher, and maybe it is usually lower.

I am sure that the readers will be able to find additional points that make the chief economist statement irrelevant, while others may find points that support his statement. I shudder to the thought that this guy is in charge of some of my retirement funds.

]]>“There is by now a long and fairly imposing line of economists from Adam Smith to the present who have sought to show that a decentralized economy motivated by self-interest and guided by price signals would be compatible with a coherent disposition of economic resources that could be regarded, in a well defined sense, as superior to a large class of possible alternative dispositions. Moreover the price signals would operate in a way to establish this degree of coherence. It is important to understand how surprising this claim must be to anyone not exposed to the tradition. The immediate `common sense’ answer to the question `What will an economy motivated by individual greed and controlled by a very large number of different agents look like?’ is probably: There will be chaos. That quite a different answer has long been claimed true and has permeated the economic thinking of a large number of people who are in no way economists is itself sufficient ground for investigating it seriously. The proposition having been put forward and very seriously

entertained, it is important to know not only whether itistrue, but whether itcouldbe true.”

But how to make it come alive for my students? When first I came to this subject it was in furious debates over central planning vs. the market. Gosplan, the commanding heights, indicative planning were as familiar in our mouths as Harry the King, Bedford and Exeter, Warwick and Talbot, Salisbury and Gloucester….England, on the eve of a general election was poised to leave all this behind. The question, as posed by Arrow and Hahn, captured the essence of the matter.

Those times have passed, and I chose instead to motivate the simple exchange economy by posing the question of how a sharing economy might work. Starting with two agents endowed with a positive quantity of each of two goods, and given their utility functions, I asked for trades that would leave each of them better off. Not only did such trades exist, there were more than one. Which one to pick? What if there were many agents and many goods? Would bilateral trades suffice to find mutually beneficial trading opportunities? Tri-lateral? The point of this thought experiment was to show how in the absence of prices, mutually improving trades might be very hard to find.

Next, introduce prices, and compute demands. Observed that demands in this world could increase with prices and offered an explanation. Suggested that this put existence of market clearing prices in doubt. Somehow, in the context of example this all works out. Hand waved about intermediate value theorem before asserting existence in general.

On to the so what. Why should one prefer the outcomes obtained under a Walrasian equilibrium to other outcomes? Notion of Pareto optimality and first welfare theorem. Highlighted weakness of Pareto notion, but emphasized how little information each agent needed other than price, own preferences and endowment to determine what they would sell and consume. Amazingly, prices coordinate everyone’s actions. Yes, but how do we arrive at them? Noted and swept under the rug, why spoil a good story just yet?

Gasp! Did not cover Edgeworth boxes.

Went on to introduce production. Spent some time explaining why the factories had to be owned by the consumers. Owners must eat as well. However it also sets up an interesting circularity in that in small models, the employee of the factory is also the major consumer of its output! Its not often that a firm’s employers are also a major fraction of their consumers.

Closed with, how in Walrasian equilibrium, output is produced at minimum total cost. Snuck in the central planner, who solves the problem of finding the minimum cost production levels to meet a specified demand. Point out that we can implement the same solution using prices that come from the Lagrange multiplier of the central planners demand constraint. Ended by coming back full circle, why bother with prices, why not just let the central planner have his way?

]]>A poetic interlude. Arrow and Hahn’s book has a chapter that describes Starr’s work and closes with a couple of lines of Milton:

A gulf profound as that Serbonian Bog

Betwixt Damiata and Mount Casius old,

Where Armies whole have sunk.

Milton uses the word concave a couple of times in Paradise Lost to refer to the vault of heaven. Indeed the OED lists this as one of the poetic uses of concavity.

Now, back to brass tacks. Suppose is agent ‘s utility function. Replace the upper contour sets associated with for each by its convex hull. Let be the concave utility function associated with the convex hulls. Let be the Walrasian equilibrium prices wrt . Let be the allocation to agent in the associated Walrasian equilibrium.

For each agent let

where is agent ‘s endowment. Denote by the vector of total endowments and let .

Let be the excess demand with respect to and . Notice that is in the convex hull of the Minkowski sum of . By the Shapley-Folkman-Starr lemma we can find for , such that and .

When one recalls, that Walrasian equilibria can also be determined by maximizing a suitable weighted (the Negishi weights) sum of utilities over the set of feasible allocations, Starr’s result can be interpreted as a statement about approximating an optimization problem. I believe this was first articulated by Aubin and Elkeland (see their ’76 paper in Math of OR). As an illustration, consider the following problem :

subject to

Call this problem . Here is an matrix with .

For each let be the smallest concave function such that for all (probably quasi-concave will do). Instead of solving problem , solve problem instead:

subject to

The obvious question to be answered is how good an approximation is the solution to to problem . To answer it, let (where I leave you, the reader, to fill in the blanks about the appropriate domain). Each measures how close is to . Sort the ‘s in decreasing orders. If is an optimal solution to , then following the idea in Starr’s ’69 paper we get:

]]>

Question 1We have two coins, a red one and a green one. When flipped, one lands heads with probability and the other with probability . Assume that . We do not know which coin is the coin. We initially attach probability to the red coin being the coin. We receive one dollar for each heads and our objective is to maximize the total expected discounted return with discount factor . Find the optimal policy.

This is a dynamic programming problem where the state is the belief that the red coin is . Every period we choose a coin to toss, get a reward and updated our state given the outcome. Before I give my solution let me explain why we can’t immediately invoke uncle Gittins.

In the classical bandit problem there are arms and each arm provides a reward from an unknown distribution . Bandit problems are used to model tradeoffs between exploitation and exploration: Every period we either exploit an arm about whose distribution we already have a good idea or explore another arm. The are randomized independently according to distributions , and what we are interested in is the expected discounted reward. The optimization problem has a remarkable solution: choose in every period the arm with the largest Gittins index. Then update your belief about that arm using Bayes’ rule. The Gittins index is a function which attaches a number (the index) to every belief about an arm. What is important is that the index of an arm depends only on — our current belief about the distribution of the arm — not on our beliefs about the distribution of the other arms.

The independence assumption means that we only learn about the distribution of the arm we are using. This assumption is not satisfied in the red coin green coin problem: If we toss the red coin and get heads then the probability that the green coin is decreases. Googling `multi-armed bandit’ with `dependent arms’ I got some papers which I haven’t looked at carefully but my superficial impression is that they would not help here.

Here is my solution. Call the problem I started with `the difficult problem’ and consider a variant which I call `the easy problem’. Let so that . In the easy problem there are again two coins but this time the red coin is with probability and with probability and, *independently*, the green coin is with probability and with probability . The easy problem is easy because it is a bandit problem. We have to keep track of beliefs and about the red coin and the green coin ( is the probability that the red coin is ), starting with and , and when we toss the red coin we update but keep fixed. It is easy to see that the Gittins index of an arm is a monotone function of the belief that the arm is so the optimal strategy is to play red when and green when . In particular, the optimal action in the first period is red when and green when .

Now here comes the trick. Consider a general strategy that assigns to every finite sequence of past actions and outcomes an action (red or green). Denote by and the rewards that gives in the difficult and easy problems respectively. I claim that

Why is that ? in the easy problem there is a probability that both coins are . If this happens then every gives payoff . There is a probability that both coins are . If this happens then every gives payoff . And there is a probability that the coins are different, and, because of the choice of , conditionally on this event the probability of being is . Therefore, in this case gives whatever gives in the difficult problem.

So, the payoff in the easy problem is a linear function of the payoff in the difficult problem. Therefore the optimal strategy in the difficult problem is the same as the optimal strategy in the easy problem. In particular, we just proved that, for every , the optimal action in the first period is red when and green with . Now back to the dynamic programming formulation, from standard arguments it follows that the optimal strategy is to keep doing it forever, i.e., at every period to toss the coin that is more likely to be the coin given the current information.

See why I said my solution is tricky and specific ? it relies on the fact that there are only two arms (the fact that the arms are coins is not important). Here is a problem whose solution I don’t know:

]]>

Question 2Let . We are given coins, one of each parameter, all possibilities equally likely. Each period we have to toss a coin and we get payoff for Heads. What is the optimal strategy ?