It is, of course, a great mistake to imagine that cheap books are good for the book trade. Actually it is just the other way around. If you have, for instance, five shillings to spend and the normal price of a book is half-a-crown, you are quite likely to spend your whole five shillings on two books. But if books are sixpence each you are not going to buy ten of them, because you don’t want as many as ten; your saturation-point will have been reached long before that. Probably you will buy three sixpenny books and spend the rest of your five shillings on seats at the ‘movies’. Hence the cheaper the books become, the less money is spent on books.

Milton Friedman in his textbook Price Theory, as an exercise, asks readers to analyze the passage. He does not explicitly say what he is looking for, but I would guess this: what can you say about the preferences for such a statement to be true. Its a delightful question. A budget line is given and a point that maximizes utility on the budget lie is identified. Now the price of one of the goods falls, and another utility maximizing point is identified. What kind of utility function would exhibit such behavior?

By the way, there are 60 pence to a shilling and a half a crown is six pennies.

But in the importance and noise of to-morrow

When the brokers are roaring like beasts on the floor of the Bourse

Perhaps a minute to recall to what Stan left behind.

Stan, is well known of his important contributions to mechanism design in collaboration with Hurwicz and Mount. The most well known example of this is the notion of the size of the message space of a mechanism. Nisan and Segal pointed out the connection between this and the notion of communication complexity. Stan would have been delighted to learn about the connection between this and extension complexity.

Stan was in fact half a century ahead of the curve in his interest in the intersection of algorithms and economics. He was one of the first scholars to tackle the job shop problem. He proposed a simple index policy that was subsequently implemented and reported on in Business Week: “Computer Planning Unsnarls the Job Shop,” April 2, 1966, pp. 60-61.

In 1965, with G. Sherman, he proposed a local-search algorithm for the TSP (“Discrete optimizing”, SIAM Journal on Applied Mathematics 13, 864-889, 1965). Their algorithm was able to produce a tour at least as good as the tours that were reported in earlier papers. The ideas were extended with Don Rice to a local search heuristic for non-concave mixed integer programs along with a computation study of performance.

Stan was also remarkable as a builder. At Purdue, he developed a lively school of economic theory attracting the likes of Afriat, Kamien, Sonnenschein, Ledyard and Vernon Smith. He convinced them all to come telling them Purdue was just like New York! Then, to Northwestern to build two groups one in the Economics department and another (in collaboration with Mort Kamien) in the business school.

]]>We compare the productivity of Fields medalists (winners of the top mathematics prize) to that of similarly brilliant contenders. The two groups have similar publication rates until the award year, after which the winners’ productivity declines. The medalists begin to `play the field,’ studying unfamiliar topics at the expense of writing papers.

The prize, Borjas and Doran suggest, like added wealth, allows the winners to consumer more leisure in the sense of riskier projects. However, the behavior of the near winners is a puzzle. After 40, the greatest prize is beyond their grasp. One’s reputation has already been established. Why don’t they `play the field’ as well?

]]>for every where .

You have seen this belief already though maybe not in this form. It is a belief of an agent who tosses an i.i.d. coin and has some uncertainty over the parameter of the coin, given by a uniform distribution over .

In this post I am gonna make a fuss about the fact that as time goes by the agent learns the parameter of the coin. The word `learning’ has several legitimate formalizations and today I am talking about the oldest and probably the most important one — consistency of posterior beliefs. My focus is somewhat different from that of textbooks because 1) As in the first paragraph, my starting point is the belief about outcome sequences, before there are any parameters and 2) I emphasize some aspects of consistency which are unsatisfactory in the sense that they don’t really capture our intuition about learning. Of course this is all part of the grand marketing campaign for my paper with Nabil, which uses a different notion of learning, so this discussion of consistency is a bit of a sidetrack. But I have already came across some VIP who i suspect was unaware of the distinction between different formulations of learning, and it wasn’t easy to treat his cocky blabbering in a respectful way. So it’s better to start with the basics.

Let be a finite set of *outcomes*. Let be a belief over the set of infinite sequences of outcomes, also called *realizations*. A *decomposition* of is given by a set of *parameters*, a belief over , and, for every a belief over such that . The integral in the definition means that the agent can think about the process as a two stages randomization: First a parameter is drawn according to and then a realization is drawn according to . Thus, a decomposition captures a certain way in which a Bayesian agent arranges his belief. Of course every belief admits many decompositions. The extreme decompositions are:

*The Trivial Decomposition.*Take and .*Dirac’s Decomposition.*Take and . A “parameter” in this case is a measure that assigns probability 1 to the realization .

Not all decompositions are equally exciting. We are looking for decompositions in which the parameter captures some `fundamental property’ of the process. The two extreme cases mentioned above are usually unsatisfactory in this sense. In Dirac’s decomposition, there are as many parameters as there are realizations; parameters simply copy realizations. In the trivial decomposition, there is a single parameter and thus cannot discriminate between different interesting properties. For stationary process, there is a natural decomposition in which the parameters distinguish between fundamental properties of the process. This is the ergodic decomposition, according to which the parameters are the ergodic beliefs. Recall that in this decomposition, a parameter captures the empirical distribution of blocks of outcomes in the infinite realization.

So what about learning ? While observing a process, a Bayesian agent will update his belief about the parameter. We denote by the posterior belief about the parameter at the beginning of period after observing the outcome sequence . The notion of learning I want to talk about in this post is that this belief converges to a belief that is concentrated on the true parameter . The example you should have in mind is the coin toss example I started with: while observing the outcomes of the coin the agent becomes more and more certain about the true parameter of the coin, which means his posterior belief becomes concentrated around a belief that gives probability to the true parameter.

Definition 1A decomposition of isconsistentif for -almost every it holds thatfor -almost every realization .

In this definition, is Dirac atomic measure on and the convergence is weak convergence of probability measures. No big deal if you don’t know what it means since it is exactly what you expect.

So, we have a notion of learning, and a seminal theorem of L.J. Doob (more on that below) implies that the ergodic decomposition of every stationary process is consistent. While this is not something that you will read in most textbooks (more on that below too), it is still well known. Why do Nabil and I dig further into the issue of learnability of the ergodic decomposition ? Two reasons. First, one has to write papers. Second, there is something unsatisfactory about the concept of consistency as a formalization of learning. To see why, consider the belief that outcomes are i.i.d. with probability for success. This belief is ergodic, so from the perspective of the ergodic decomposition the agent `knows the process’ and there is nothing else to learn. But let’s look at Dirac’s decomposition instead of the ergodic decomposition. Then the parameter space equals the space of all realizations. Suppose the true parameter (=realization) is , then after observing the first outcomes of the process the agent’s posterior belief about the parameter is concentrated on all that agrees with on the first coordinates. These posterior beliefs converge to , so that Dirac decomposition is also consistent ! We may say that we learn the parameter, but “learning the parameter” in this environment is just recording the past. The agent does not gain any new insight about the future of the process from learning the parameter.

In my next post I will talk about other notions of learning, originating in a seminal paper of Blackwell and Dubins, which capture the idea that an agent who learns a parameter can make predictions as if he new the parameter. Let me also say here that this post and the following ones are much influenced by a paper of Jackson, Kalai, Smorodinsky. I will talk more on that paper in another post.

For the rest of this post I am going to make some comments about Bayesian consistency which, though again standard, I don’t usually see them in textbooks. Especially I don’t know of a reference for the version of Doob’s Theorem which I give below, so if any reader can give me such a reference it will be helpful.

First, you may wonder whether every decomposition is consistent. The answer is no. For a trivial example, take a situation where are the same for every . More generally troubles arrise when the realization does not pin down the parameter. Formally, let us say that function *pins down* or *identifies* the parameter if

for -almost every . If such exists then the decomposition is *identifiable*.

We have the following

Theorem 2 (Doob’s Theorem)A decomposition is identifiable if and only if it is consistent.

The `if’ part follows immediately from the definitions. The `only if’ part is deep, but not difficult: it follows immediately from the martingale convergence theorem. Indeed, Doob’s Theorem is usually cited as the first application of martingales theory.

Statisticians rarely work with this abstract formulation of decomposition which I use. For this reason, the theorem is usually formulated only for the case that and is i.i.d. . In this case the fact that the decomposition is identifiable follows from the strong law of large numbers. Doob’s Theorem then implies the standard consistency of Bayesian estimator of the parameter .

]]>The story is about a company called MegaRed, that peddles fish oil. It wants to target consumers who are receptive to the idea of fish oil because they believe that it confers health benefits. The goal is to get them to try out and perhaps switch to MegaRed.

Facebook proposes a campaign which raises the eyebrows of the marketing director, J. Rodrigo:

“I can go to television at a quarter the price.”

Brett Prescott of Facebook agrees, that yes, Facebook is more expensive than TV. But offers an analogy between advertising on Facebook and firing a shotgun.

“And you are firing that buckshot knowing where every splinter of that bullet is landing.”

If biology is the study of bios, life, and geology is the study of goes, the earth, what does that make analogy?

Some arithmetic to clarify matters. Suppose 1 in 100 of all people would be receptive to the idea of MegaRed’s message. Suppose each of these people is worth $1 on average to MegaRed. If you could reach all 100 of these people via TV, then MegaRed should pay no more than 10 cents per person and so $1 in total.

Enter, stage left, Facebook. It claims that it can target its ads so that they go just to the right person. How much is that worth? $1. In this example, Facebook is no better or worse than TV.

If Facebook has any added value compared to TV it does *not* come from better targeting because one can always compensate for that by paying TV less and reaching more eyeballs. It must come from access to eyeballs unreachable via TV, or, identifying eyeballs that MegaRed would not initially have identified as receptive to their message, or that the medium itself is more persuasive than TV. Is this true for Facebook? If not, MegaRed is better off with TV.

Educational activities enhance the quality of care in an institution, and it is intended, until the community undertakes to bear such education costs in some other way, that a part of the net cost of such activities (including stipends of trainees, as well as compensation of teachers and other costs) should be borne to an appropriate extent by the hospital insurance program .

House Report, Number 213, 89th Congress, 1st session 32 (1965) and Senate Report, Number 404 Pt. 1 89th Congress 1 Session 36 (1965)).

Each year about $9.5 billion in medicare funds and another $2 billion in medicaid dollars go towards residency programs. There is also state government support (multiplied by Federal matching funds). At 100K residents a year, this translates into about about $100 K per resident. The actual amounts each program receives per resident can vary (we’ve seen figures in the range of $50K to $150K) because of the formula used to compute the subsidy. In 1997, Congress capped the amount that Medicare would provide, which results in about 30K medical school graduates competing for about 22.5K slots.

Why should the costs of apprenticeship be borne by the government? Lawyers, also undertake 7 years of studies before they apprentice. The cost of their apprenticeship is borne by the organization that hires them out of law school. What makes Physicians different?

Two arguments we are aware of. First, were one to rely on the market to supply physicians, it is possible that we might get to few (think of booms and busts) in some periods. Assuming sufficient risk aversion on the part of society, there will be an interest in ensuring a sufficient supply of physicians. Note similar arguments are also used to justify farm subsidies. In other words, insurance against shortfalls. Interestingly, we know of no Lawyer with the `dershowitz’ to make such a claim. Perhaps, Dick the butcher (Henry VI, Part 2 Act 4) has cowed them.

The second is summarized in the following from Gbadebo and Reinhardt:

“Thus, it might be argued … that the complete self-financing of medical education with interest-bearing debt … would so commercialize the medical profession as to rob it of its traditional ethos to always put the interest of patients above its own. Indeed, it can be argued that even the current extent of partial financing of their education by medical students has so indebted them as to place the profession’s traditional ethos in peril.”

Note, the Scottish master said as much:

“We trust our health to the physician: our fortune and sometimes our life and reputation to the lawyer and attorney. Such confidence could not safely be reposed in people of a very mean or low condition. Their reward must be such, therefore, as may give them that rank in the society which so important a trust requires. The long time and the great expense which must be laid out in their education, when combined with this circumstance, necessarily enhance still further the price of their labour.”

Interestingly, he includes Lawyers.

If we turn the clock back to before WWII, Hospitals paid for trainees (since internships were based in hospitals, not medical schools) and recovered the costs from patient charges. Interns were inexpensive and provided cheap labor. After WWII, the GI Bill provides subsidies for graduate medical education, residency slots increased and institutions were able to pass along the costs to insurers. Medicare opened up the spigot and residencies become firmly ensconced in the system. Not only do they provide training but they allow hospitals to perform a variety of other functions such as care for the indigent at lower cost than otherwise.

Ignoring the complications associated with the complementary activities that surround residency programs, who should pay for the residency? Three obvious candidates: insurers, hospitals and the doctors themselves. From Coase we know that in a world without frictions, it does not matter. With frictions, who knows?

Having medicare pay makes residency slots an endowment to the institution. The slots assign to a hospital will not reflect what’s best for the intern or the healthcare system. Indeed a recent report by from the Institute of Medicine summarizes some of these distortions. However, their response to is urge for better rules governing the distribution of monies.

If hospitals themselves pay, its unclear what the effect might be. For example, as residents costs less than doctors, large hospitals my bulk up of residents and reduce their reliance of doctors. However, assuming no increases in the supply of residents, wages for residents will rise etc etc. If insurers pay there might be overprovision of residents.

What about doctors? To practice, a doctor must have a license. The renewal fee on a medical license is, at the top end (California), around $450 a year. In Florida it is about half that amount. There are currently about 800K active physicians in the US. To recover $10 billion (current cost of residency programs) one would have to raise the fee by a $1000 a year at least. The average annual salary for the least remunerative specialties is around $150K. At the high end about $400K. From these summary statistics, it does not appear that an extra $1K a year will break the bank, or corrupt physicians, particularly if it is pegged as a percentage rather than flat amount. The monies collected can be funneled to the program in which the physician completed his or her residency.

]]>- Agent 1 believes that outcomes are i.i.d. with probability of success.
- Agent 2 believes that outcomes are i.i.d. with probability of success. She does not know ; She believes that is either or , and attaches probability to each possibility.
- Agent 3 believes that outcomes follow a markov process: every day’s outcome equals yesterday’s outcome with probability .
- Agent 4 believes that outcomes follow a markov process: every day’s outcome equals yesterday’s outcome with probability . She does not know ; Her belief about is the uniform distribution over .

I denote by the agents’ beliefs about future outcomes.

We have an intuition that Agents 2 and 4 are in a different situations from Agents 1 and 3, in the sense that are uncertain about some fundamental properties of the stochastic process they are facing. I will say that they have `structural uncertainty’. The purpose of this post is to formalize this intuition. More explicitly, I am looking for a property of a belief over that will distinguish between beliefs that reflect some structural uncertainty and beliefs that don’t. This property is ergodicity.

Definition 1Let be a stationary process with values in some finite set ofoutcomes. The process isergodicif for every block of outcomes it holds thatA belief is

ergodicif it is the distribution of an ergodic process

Before I explain the definition let me write the ergodicity condition for the special case of the block for some (this is a block of size 1):

In the right side of (1) we have the (subjective) probability that on day we will see the outcome . Because of stationarity this is also the belief that we will see the outcome on every other day. In the left side of (1) we have no probabilities at all. What is written there is the frequency of appearances of the outcome in the realized sequence. This frequency is objective and has nothing to do with our beliefs. Therefore, the probabilities that a Bayesian agent with ergodic belief attaches to observing some outcome is a number that can be measured from the process: just observe it long enough and check the frequency in which this outcome appears. In a way, for ergodic processes the frequentist and subjective interpretations of probability coincide, but there are legitimate caveats to this statement, which I am not gonna delve into because my subject matter is not the meaning of probability. For my purpose it’s enough that ergodicity captures the intuition we have about the four agents I started with: Agents 1 and 3 both give probability to success in each day. This means that if they are sold a lottery ticket that gives a prize if there is a success at day, say, 172, they both price this lottery ticket the same way. However, Agent 1 is certain that in the long run the frequency of success will be . Agent 2 is certain that it will be either or . In fancy words, is ergodic and is not.

So, ergodic processes capture our intuition of `processes without structural uncertainty’. What about situations with uncertainty ? What mathematical creature captures this uncertainty ? Agent 2’s uncertainty seems to be captured by some probability distribution over two ergodic processes — the process “i.i.d. ” and the process “i.i.d. ”. Agent 2 is uncertain which of these processes he is facing. Agent 4’s uncertainty is captured by some probability distribution over a continuum of markov (ergodic) processes. This is a general phenomena:

Theorem 2 (The ergodic decomposition theorem)Let be the set of ergodic distributions over . Then for every stationary belief there exists a unique distribution over such that .

The probability distribution captures uncertainty about the structure of the process. In the case that is an ergodic processes is degenerated and there is no structural uncertainty.

Two words of caution: First, my definition of ergodic processes is not the one you will see in textbooks. The equivalence to the textbook definition is an immediate consequence of the so called ergodic theorem, which is a generalization of the law of large numbers for ergodic processes. Second, my use of the word `uncertainty’ is not universally accepted. The term traces back at least to Frank Knight, who made the distinction between risk or “measurable uncertainty” and what is now called “Knightian uncertainty” which cannot be measured. Since Knight wrote in English and not in Mathematish I don’t know what he meant, but modern decision theorists, mesmerized by the Ellsberg Paradox, usually interpret risk as a Bayesian situation and Knightian uncertainty, or “ambiguity”, as a situation which falls outside the Bayesian paradigm. So if I understand correctly they will view the situations of these four agents mentioned above as situations of risk only without uncertainty. The way in which I use “structural uncertainty” was used in several theory papers. See this paper of Jonathan and Nabil. And this and the paper which I am advertising in these posts, about disappearance of uncertainty over time. (I am sure there are more.)

To be continued…

]]>Both Abraham and Sergiu will be 66 next year. To celebrate this rare occasion, the Center for the Study of Rationality at the Hebrew University of Jerusalem organizes two conferences, one in honor of each of them. The conference in honor of Abraham will be held on June 16–19, 2015, and the conference in honor of Sergiu will follow on June 21–24, 2015.

Mark the dates and reserve tickets.]]>

First, some data: Roughly 50% of authors I know have some presence on RG, but most of them do not maintain their site. In fact, I suspect many of them don’t know they are on RG since a page for author X seems to be automatically created when his co-author Y uploads a paper. Nobody I know of is actively using RG as a way to collaborate with other users by posting questions and answers, which seems to be a big part of the purported RG experience. But there are quite a few who upload their working and published papers.

Some RG features that make it different from other social networks are designed especially for academics types. There is for example the RG score. Academics are obsessed about ranking each other. One of the more difficult requirements for graduating in a top econ program is to memorize the publication records of all economists in the world, who got offers where, how much JETs are worth one ECMA and the historical record of these exchange rates. Well, students will have much easier life if Bill Gates and his fellow RG investors have their way: you will only be tested on a single score for each researcher, his RG score, “a metric that measures scientific reputation based on how all of your research is received by your peers.” I should say though that, at least in games and decision theory, it will probably take some time until the age of the RG score arrives. The current score is, not to put a too fine point on it, totally useless. There is a more or less universally agreed on ranking of scholars which is based on CVs and the offers they get. There is also a correct ranking based on the originality and quality of research. These two rankings are typically very different. The RG score is similar to neither.

If the score is the most useless feature of RG, the most annoying feature is the aggressive way in which they try to force you to update your site. First, their minions search the web for every old version of your papers, and once they find it they will suggest that you add it you your profile. I say `suggest’ but it’s not like you can refuse. You can choose between `yes’ and `maybe later’. And by `later’ they mean next time you log in. In the end you either surrender or accidentally click yes. Even worse is when they nag you to mind other people’s profiles. Here is for example what I get when I go to Janos’ page.

And here is what I go to Rakesh’s

Hey Ricky, just pick one, they are all nice :)

]]>After I corresponded with the editors of *Games and Economic Behavior* and *Journal of Mathematical Economics* and with the Economics Editor of Elsevier, the reason for the privacy breach became clear: the e-system allows each editor to choose whether the blinded comments of one referee to the author and the blinded comments of one referee to the editor will be seen by other reviewers. For each type of blinded comments the editor can decide whether to show it to all reviewers or not. Each editor makes his or her own choice. I guess that often editors are not aware of this option, and they do not know what was the choice that the previous editor, or the one before him, made.

Apparently, the configuration of *Games and Economic Behavior* was to allow reviewers to see only the blinded comments to the author, while the configuration of *Journal of Mathematical Economics* was to allow reviewers to see both types of blinded comments. Once the source of the problem became clear, Atsushi Kajii, the editor of *Journal of Mathematical Economics* decided to change the configuration, so that the blinded comments of reviewers to the editor will not be seen by other reviewers. I guess that in few days this change will become effective. Elsevier also promised to notify all of its journals, in which the configuration was like that of JME, about this privacy issue, and let the editors decide whether they want to keep this configuration or change it. In case this configuration remains, they will add a warning that warns the referee that the blinded comments can be read by other reviewers.

I am happy that the privacy breach came to a good end, and that in the future the e-system will keep the privacy the referees.

Regarding the second issue, Elsevier is not willing to change its user agreement. Reading the user agreements of other publishers, like Springer and INFORMS, shows that user agreements can be reasonable, and not all publishers keep the right to change the user agreement without notifying the users. The Economics Editor of Elsevier wrote: “This clause is not unreasonable as the user can choose to discontinue the services at any time.” As I already wrote in the previous post, I choose to discontinue the service.

]]>