When explaining the meaning of a confidence interval don’t say “the probability that the parameter is in the interval is 0.95” because probability is a precious concept and this statement does not match the meaning of this term. Instead, say “We are 95% confident that the parameter is in the interval”. Admittedly, I don’t know what people will make of the word “confident”. But I also don’t know what they will make of the word “probability”
If you live under the impression that in order to publish an empirical paper you must include the sentence “this holds with p-value x” for some number x<0.05 in your paper, here is a surprising bit of news for you: The editors of Basic and Applied Social Psychology have banned p-value from their journal, along with confidence intervals. In fact, according to the editorial, the state of the art of statistics “remains uncertain” so statistical inference is no longer welcome in their journal.
When I came across this editorial I was dumbfounded by the arrogance of the editors, who seem to know about statistics as much as I know about social psychology. But I haven’t heard about this journal until yesterday, and if I did I am pretty sure I wouldn’t believe anything they publish, p-value or no p-value. So I don’t have the right to complain here.
Here are somebodies who have the right to complain: The American Statistical Association. Concerned with the misuse, mistrust and misunderstanding of the p-value, ASA has recently issued a policy statement on p- values and statistical significance, intended for researchers who are not statisticians.
How do you explain p-value to practitioners who don’t care about things like Neyman-Pearson Lemma, independence and UMP tests ? First, you use language that obscures conceptual difficulties: “the probability that a statistical summary of the data would be equal to or more extreme than its observed value’’ — without saying what “more extreme’’ means. Second, you use warnings and slogans about what p-value doesn’t mean or can’t do, like “p-value does not measure the size of an effect or the importance of a result.’’
Among these slogans my favorite is
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone
What’s cute about this statement is that it assumes that everybody understands what “there is 5% chance that the studied hypothesis is true” and that the notion of P-value is the one that is difficult to understand. In fact, the opposite is true.
Probability is conceptually tricky. It’s meaning is somewhat clear in a situation of a repeated experiment: I more or less understand what it means that a coin has 50% chance to land on Heads. (Yes. Only more or less). But without going full subjective I have no idea what is the meaning of the probability that a given hypothesis (boys who eat pickles in kindergarten have higher SAT score than girls who play firefighters) is true. On the other hand, The meaning of the corresponding P-value relies only on the conceptually simpler notion of probabilities in a repeated experiment.
Why therefore do the committee members (rightly !) assume that people are comfortable with the difficult concept of probability that an hypothesis is true and are uncomfortable with the easy concept of p-value ? I think the reason is that unlike the word “p-value”, the word “probability” is a word that we use in everyday life, so most people feel they know what it means. Since they have never thought about it formally, they are not aware that they actually don’t.
So here is a modest proposal for preventing the misuse and misunderstanding of statistical inference: Instead of saying “this hypothesis holds with p-value 0.03” say “We are 97% confident that this hypothesis holds”. We all know what “confident” means right ?
Platooning, driverless cars and ride hailing services have all been suggested as ways to reduce congestion. In this post I want to examine the use of coordination via ride hailing services as a way to reduce congestion. Assume that large numbers of riders decide to rely on ride hailing services. Because the services use Google Maps or Waze for route selection, it would be possible to coordinate their choices to reduce congestion.
To think thorough the implications of this, its useful to revisit an example of Arthur Pigou. There is a measure 1 of travelers all of whom wish to leave the same origin () for the same destination (). There are two possible paths from to . The `top’ one has a travel time of 1 unit independent of the measure of travelers who use it. The `bottom’ one has a travel time that grows linearly with the measure of travelers who employ it. Thus, if fraction of travelers take the bottom path, each incurs a travel time of units.
A central planner, say, Uber, interested in minimizing total travel time will route half of all travelers through the top and the remainder through the bottom. Total travel time will be . The only Nash equilibrium of the path selection game is for all travelers to choose the bottom path yielding a total travel time of . Thus, if the only choice is to delegate my route selection to Uber or make it myself, there is no equilibrium where all travelers delegate to Uber.
Now suppose, there are two competing ride hailing services. Assume fraction of travelers are signed up with Uber and fraction are signed up with Lyft. To avoid annoying corner cases, . Each firm routes its users so as to minimize the total travel time that their users incur. Uber will choose fraction of its subscribers to use the top path and the remaining fraction will use the bottom path. Lyft will choose a fraction of its subscribers to use the top path and the remaining fraction will use the bottom path.
A straight forward calculation reveals that the only Nash equilibrium of the Uber vs. Lyft game is and . An interesting case is when , i.e., Uber has a dominant market share. In this case , i.e., Lyft sends none of its users through the top path. Uber on the hand will send half its users via the top and the remainder by the bottom path. Assuming Uber randomly assigns its users to top and bottom with equal probability, the average travel time for a Uber user will be
The travel time for a Lyft user will be
Total travel time will be , less than in the Nash equilibrium outcome. However, Lyft would offer travelers a lower travel time than Uber. This is because, Uber which has the bulk of travelers, must use the top path to reduce total travel times. If this were the case, travelers would switch from Uber to Lyft. This conclusion ignores prices, which at present are not part of the model.
Suppose we include prices and assume that travelers now evaluate a ride hailing service based on delivered price, that is price plus travel time. Thus, we are assuming that all travelers value time at $1 a unit of time. The volume of customers served by Uber and Lyft is no longer fixed and they will focus on minimizing average travel time per customer. A plausible guess is that there will be an equal price equilibrium where travelers divide evenly between the two services, i.e., . Each service will route of its customers through the top and the remainder through the bottom. Average travel time per customer will be . However, total travel time on the bottom will be , giving every customer an incentive to opt out and drive their own car on the bottom path.
What this simple minded analysis highlights is that the benefits of coordination may be hard to achieve if travelers can opt out and drive themselves. To minimize congestion, the ride hailing services must limit traffic on the bottom path. This is the one that is congestible. However, doing so makes its attractive in terms of travel time encouraging travelers to opt out.
I am not the right person to write about Lloyd Shapley. I think I only saw him once, in the first stony brook conference I attended. He reminded me of Doc Brown from Back to The Future, but I am not really sure why. Here are links to posts in The Economist and NYT following his death.
Shapley got the Nobel in 2012 and according to Robert Aumann deserved to get it right with Nash. Shapley himself however was not completely on board: “I consider myself a mathematician and the award is for economics. I never, never in my life took a course in economics.” If you are wondering what he means by “a mathematician” read the following quote, from the last paragraph of his stable matching paper with David Gale
The argument is carried out not in mathematical symbols but in ordinary English; there are no obscure or technical terms. Knowledge of calculus is not presupposed. In fact, one hardly needs to know how to count. Yet any mathematician will immediately recognize the argument as mathematical…
What, then, to raise the old question once more, is mathematics? The answer, it appears, is that any argument which is carried out with sufficient precision is mathematical
In the paper Gale and Shapley considered a problem of matching (or assignment as they called it) of applicants to colleges, where each applicant has his own preference over colleges and each college has its preference over applicants. Moreover, each college has a quota. Here is the definition of stability, taken from the original paper
Definition: An assignment of applicants to colleges will be called unstable if there are two applicants and who are assigned to colleges and , respectively, although prefers to and prefers to .
According to the Gale-Shapley algorithm, applicants apply to colleges sequentially following their preferences. A college with quota maintains a `waiting list’ of size with the top applicants that has applied to it so far, and rejects all other applicants. When an applicant is rejected from a college he applies to his next favorite college. Gale and Shapley proved that the algorithm terminates with a stable assignment.
One reason that the paper was so successful is that the Gale Shapley method is actually used in practice. (A famous example is the national resident program that assigns budding physicians to hospitals). From theoretical perspective my favorite follow-up is a paper of Dubins and Freedman “Machiavelli and the Gale-Shapley Algorithm” (1981): Suppose that some applicant, Machiavelli, decides to `cheat’ and apply to colleges in different order than his true ranking. Can Machiavelli improves his position in the assignment produced by the algorithm ? Dubins and Freedman prove that the answer to this question is no.
Shapley’s contribution to game theory is too vast to mention in a single post. Since I mainly want to say something about his mathematics let me mention Shapley-Folkman-Starr Lemma, a kind of discrete analogue of Lyapunov’s theorem on the range of non-atomic vector measures, and KKMS Lemma which I still don’t understand its meaning but it has something to do with fixed points and Yaron and I have used it in our paper about rental harmony.
I am going to talk in more details about stochasic games, introduced by Shapley in 1953, since this area has been flourishing recently with some really big developments. A (two-player, zero-sum) stochastic game is given by a finite set of states, finite set of actions for the players, a period payoff function , a distribution over for every state and actions , and a discount factor . At every period the system is at some state , players choose actions simultaneously and independently. Then the column player pays to the row player. The game then moves to a new state in the next period, randomized according to . Players evaluate their infinite stream of payoofs via the discount factor . The model is a generalization of the single player dynamic programming model which was studied by Blackwell and Bellman. Shapley proved that every zero-sum stochastic game admits a value, by imitating the familiar single player argument, which have been the joy and pride of macroeconomists ever since Lucas asset pricing model (think Bellman Equation and the contraction operators). Fink later proved using similar ideas that non-zero sum discounted stochastic games admit perfect markov equilibria.
A major question, following a similar question in the single player setup, is the limit behavior of the value and the optimal strategies when players become more patient (i.e., goes to ). Mertens and Neyman have proved that the limit exists, and moreover that for every there strategies which are -optimal for sufficiently large discount factor. Whether a similar result holds for Nash equilibrium in -player stochastic games is probably the most important open question in game theory. Another important question is whether the limit of the value exists for zero-sum games in which the state is not observed by both players. Bruno Zilloto has recently answered this question by providing a counter-example. I should probably warn that you need to know how to count and also some calculus to follow up this literature. Bruno Zilloto will give the Shapley Lecture in Games2016 in Maastricht. Congrats, Bruno ! and thanks to Shapley for leaving us with some much stuff to play with !
One of the highlight of last year’s Stony Brook conference was John Milnor’s talk about John Nash. The video is now available online. John Milnor, also an Abel Prize laureate, is familiar to many economists from his book “Topology from the differential approach”. He has known John Nash from the beginning of their careers in the common room in the math school at Princeton University. Watch him talk with clarity and humor about young Nash’s ambitions, creativity, courage, and contributions.
Here is a the handwritten letter which Nash wrote to the NSA in 1955 (pdf), fifteen years before Cook formalized the P/NP problem. In the letter Nash conjectures that for most encryption mechanisms, recovering the key from the cipher requires exponential amount of time. And here is what Nash had to say about proving this conjecture:
Credit for the game that bears his name is due to to Borel. It appears in a 1921 paper in French. An English translation (by Leonard Savage) may be found in a 1953 Econometrica.
The first appearance in print of a version of the game with Colonel Blotto’s name attached is, I believe, in the The Weekend Puzzle Book by Caliban (June 1924). Caliban was the pen name of Hubert Phillips one time head of Economics at the University of Bristol and a puzzle contributor to The New Statesman.
Blotto itself is a slang word for inebriation. It does not, apparently, derive from the word `blot’, meaning to absorb liquid. One account credits a French manufacturer of delivery tricycles (Blotto Freres, see the picture) that were infamous for their instability. This inspired Laurel and Hardy to title one of their movies Blotto. In it they get blotto on cold tea, thinking it whiskey.
Colleagues outside of Economics often marvel at the coordinated nature of the Economics job market. The job market is so efficient, that the profession no longer wastes resources by having everyone read each candidate’s job market paper. That task is assigned to one person (Tyler Cowen) who reports back to the rest of us. In case you missed the report, here it is
Economics is not alone in having a coordinated job market. Philosophy has one, but it has begun to show signs of unraveling. The ability to interview via Skype, for example, has reduced the value in the eyes of many, for a preliminary interview at their annual meeting. In response, the American Philosophy Association posted the following statement regarding the job market calendar:
For tenure-track/continuing positions advertised in the second half of the calendar year, we recommend an application deadline of November 1 or later. It is further recommended that positions be advertised at least 30 days prior to the application deadline to ensure that candidates have ample time to apply.
In normal circumstances a prospective employee should have at least two weeks for consideration of a written offer from the hiring institution, and responses to offers of a position whose duties begin in the succeeding fall should not be required before February 1.
When advertising in PhilJobs: Jobs for Philosophers, advertisers will be asked to confirm that the hiring institution will follow the above guidelines. If an advertiser does not do so, the advertisement will include a notice to that effect.
Its natural to wonder if the Economics market is not far behind. Skype interviews are already taking place. The current set up requires a department to evaluate and select candidates for preliminary interviews within a month (roughly the middle of November to mid December) which is hardly conducive to mature reflection (and argument).
I don’t often go to empirical talks, but when I do, I fall asleep. Recently, while so engaged, I dreamt of the `replicability crisis’ in Economics (see Chang and Li (2015)). The penultimate line of their abstract is the following bleak assessment:
`Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable.’
Eager to help my empirical colleagues snatch victory from the jaws of defeat, I did what all theorists do. Build a model. Here it is.
The journal editor is the principal and the agent is an author. Agent has a paper characterized by two numbers . The first is the value of the findings in the paper assuming they are replicable. The second is the probability that the findings are indeed replicable. The expected benefit of the paper is . Assume that is common knowledge but is the private information of agent. The probability that agent is of type is .
Given a paper, the principal can at a cost inspect the paper. With probability the inspection process will replicate the findings of the paper. Principal proposes an incentive compatible direct mechanism. Agent reports their type, . Let denote the interim probability that agent’s paper is provisionally accepted. Let be the interim probability of agent’s paper not being inspected given it has been provisionally accepted. If a provisionally accepted paper is not inspected, it is published. If a paper subject to inspection is successfully replicated, the paper is published. Otherwise it is rejected and, per custom, the outcome is kept private. Agent cares only about the paper being accepted. Hence, agent cares only about
The principal cares about replicability of papers and suffers a penalty of for publishing a paper that is not replicable. Principal also cares about the cost of inspection. Therefore she maximizes
The incentive compatibility constraint is
Recall, an agent cannot lie about the value component of the type.
We cannot screen on , so all that matters is the distribution of conditional on . Let . For a given there are only 3 possibilities: accept always, reject always, inspect and accept. The first possibility has an expected payoff of
for the principal. The second possibility has value zero. The third has value .
The principal prefers to accept immediately over inspection if
The principal will prefer inspection to rejection if . The principal prefers to accept rather than reject depends if
Under a suitable condition on as a function of , the optimal mechanism can be characterized by two cutoffs . Choose to be the smallest such that
Choose to be the largest such that .
A paper with will be accepted without inspection. A paper with will be rejected. A paper with will be provisionally accepted and then inspected.
For empiricists the advice would be to should shoot for high and damn the !
More seriously, the model points out that even a journal that cares about replicability and bears the cost of verifying this will publish papers that have a low probability of being replicable. Hence, the presence of published papers that are not replicable is not, by itself, a sign of something rotten in Denmark.
One could improve outcomes by making authors bear the costs of a paper not being replicated. This points to a larger question. Replication is costly. How should the cost of replication be apportioned? In my model, the journal bore the entire cost. One could pass it on to the authors but this may have the effect of discouraging empirical research. One could rely on third parties (voluntary, like civic associations, or professionals supported by subscription). Or, one could rely on competing partisan groups pursuing their agendas to keep the claims of each side in check. The last seems at odds with the romantic ideal of disinterested scientists but could be efficient. The risk is partisan capture of journals which would shut down cross-checking.
When analyzing a mechanism it is convenient to assume that it is direct. The revelation principle allows one to argue that this restriction is without loss of generality. Yet, there are cases where one prefers to implement the indirect version of a mechanism rather than its direct counterpart. The clock version of the English ascending auction and the sealed bid second price auction are the most well known example (one hopes not the only). There are few (i.e. I could not immediately recall any) theorems that uniquely characterize a particular indirect mechanism. It would be nice to have more. What might such a characterization depend upon?
1) Direct mechanisms require that agents report their types. A concern for privacy could be used to `kill’ off a direct mechanism. However, one would first have to rule out the use of trusted third parties (either human or computers implementing cryptographic protocols).
2) Indirect mechanism can sometimes be thought of as an extensive form game and one might look for refinements of solution concepts for extensive form games that have no counterpart in the direct version of the mechanism. The notion of obviously dominant strategy-proof that appears here is an example. However, indirect mechanisms may introduce equilibria, absent in the direct counterpart, that are compelling for the agents but unattractive for the designers purposes.
3) One feature of observed indirect mechanisms is that they use simple message spaces, but compensate by using multiple rounds of communication. Thus a constraint on message spaces would be needed in a characterization but coupled with a constraint on the rounds of communication.
From Kris Shaw, a TA in for my ECON 101 class, I learnt that the band Van Halen once required that brown M&M’s not darken their dressing room door. Why? Maybe it was a lark. Perhaps, a member of the band (or two) could not resist chuckling over the idea of a minor factotum appointed to the task of sorting the M&Ms. When minor factotum is asked what they did that day, the response was bound to elicit guffaws. However, minor factotum might have made it a point to not wash their hands before sorting the M&Ms. Then, who would be laughing harder?
A copy of the M&M rider can be found here. Along with van Halen’s explanation of why the rider was included:
……the group has said the M&M provision was included to make sure that promoters had actually read its lengthy rider. If brown M&M’s were in the backstage candy bowl, Van Halen surmised that more important aspects of a performance–lighting, staging, security, ticketing–may have been botched by an inattentive promoter.
So the rider helps screen, apparently, whether the promotor pays attention to detail. I think the explanation problematic. It suggests that it is hard to monitor effort expended by promoter on important things like staging for example. So, monitor something completely irrelevant. The strategic promoter should shirk on the staging and expend effort on the M&Ms.