You are currently browsing Eran’s articles.
This post describes the main theorem in my new paper with Nabil. Scroll down for open questions following this theorem. The theorem asserts that a Bayesian agent in a stationary environment will learn to make predictions as if he knew the data generating process, so that the as time goes by structural uncertainty dissipates. The standard example is when the sequence of outcomes is i.i.d. with an unknown parameter. As times goes by the agent learns the parameter.
One more word about organ selling before I return to my comfort zone and talk about Brownian motion in Lie groups. Selling living human organs is repugnant, in part because the sellers cause damage to their bodies out of desperation. But what about allowing your relatives to sell what’s left of you when you’re gone ? I think this should be uncontroversial. And there are side advantages too, in addition to increasing the number of transplantations. For example, it will encourage you to quit smoking.
Over to you, Walter.
Something funny happened when I started watching Al Roth’s lecture and looked at the paper: I realized that what I always assumed is the meaning of `repugnant transactions’ is not exactly the phenomena that Roth talks about. What I thought `repugnant transaction’ means is a situation of `two rights makes a wrong': it’s totally awesome that Xanders is willing to donate his extra kidney to Zordiac, and it’s really nice of Zordiac to donate money to Xanders, but these two nobles acts done together in exchange for each other is imoral and should be outlawed. Roth however defines `repugnant transaction’ more broadly as any transaction that some people want to engage in and others don’t think they should. Consider the opening example of his paper: laws against selling horse meat in restaurants. Here what is repugnant is not the exchange but the good itself. It’s not two rights makes wrong. It’s just wrong. We outlaw the exchange simply because of constitutional reasons or because it’s impossible to enforce a ban on eating — people will simply order take away and perform the crime of eating at their homes.
Here is Al Roth’s talk in the Lindau Meeting on Economic Sciences about repugnant transactions, which I guess is the technical term for the discomfort I feel at the idea of people donating their extra kidney to those who need it in return to, you know, money.
Before he was a Nobel Laureate Roth was a Nancy L. Schwartz Memorial Lecturer. His talk was about kidney exchanges — these are exchanges between several pairs of donor+recipient involving no money but only kidneys — and he started with a survey of the audience: who is in favor of allowing selling and buying of kidneys in the free market ? (I am glad I didn’t raise my hand. The next question was about selling and buying of living hearts.) I remember noticing that there was a correlation between raised hands and seniority: For whatever reason, seniors were more likely to be in favor of the free market than juniors.
In the dinner after the talk I ended up in a table of juniors & spouses and we got to discuss our objection to the idea of letting Bob sell his Kidney to Alice, so that Bob can afford to send his daughter to college, and in doing so save Alice’s small child from orphanhood. Turned out we agreed on the policy but for different reasons. I don’t remember which was my reason. I still find both of them convincing, though less so simultaneously.
Reason I: The market price would be too low. Hungry people will compete selling their organs for a bowl of red pottage out of desperation. The slippery slope leads to poor people being harvested for their body parts.
Reason II: The market price would be too high. Only the 0.01 % will be able to afford it. The slippery slope leads to a small aristocracy who live forever by regenerating their bodies.
As I said, both (somewhat) convincing. And please don’t ask me what would be the fair price, that is neither too low nor too high.
In the lasts posts I talked about a Bayesian agent in a stationary environment. The flagship example was tossing a coin with uncertainty about the parameter. As time goes by, he learns the parameter. I hinted about the distinction between `learning the parameter’, and `learning to make predictions about the future as if you knew the parameter’. The former seems to imply the latter almost by definition, but this is not so.
Because of its simplicity, the i.i.d. example is in fact somewhat misleading for my purposes in this post. If you toss a coin then your belief about the parameter of the coin determines your belief about the outcome tomorrow: if at some point your belief about the parameter is given by some then your prediction about the outcome tomorrow will be the expectation of . But in a more general stationary environment, your prediction about the outcome tomorrow depends on your current belief about the parameter and also on what you have seen in the past. For example, if the process is Markov with an unknown transition matrix then to make a probabilistic prediction about the outcome tomorrow you first form a belief about the transition matrix and then uses it to predict the outcome tomorrow given the outcome today. The hidden markov case is even more complicated, and it gives rise to the distinction between the two notions of learning.
The formulation of the idea of `learning to make predictions’ goes through merging. The definition traces back at least to Blackwell and Dubins. It was popularized in game theory by the Ehuds, who used Blackwell and Dubins’ theorem to prove that rational players will end up playing approximate Nash Equilibrium. In this post I will not explicitly define merging. My goal is to give an example for the `weird’ things that can happen when one moves from the i.i.d. case to an arbitrary stationary environment. Even if you didn’t follow my previous posts, I hope the following example will be intriguing for its own sake.
A Bayesian agent is observing a sequence of outcomes in . The agent does not know the outcomes in advance, so he forms some belief over sequences of outcomes. Suppose that the agent believes that the number of successes in consecutive outcomes is distributed uniformly in and that all configuration with successes are equally likely:
for every where .
You have seen this belief already though maybe not in this form. It is a belief of an agent who tosses an i.i.d. coin and has some uncertainty over the parameter of the coin, given by a uniform distribution over .
In this post I am gonna make a fuss about the fact that as time goes by the agent learns the parameter of the coin. The word `learning’ has several legitimate formalizations and today I am talking about the oldest and probably the most important one — consistency of posterior beliefs. My focus is somewhat different from that of textbooks because 1) As in the first paragraph, my starting point is the belief about outcome sequences, before there are any parameters and 2) I emphasize some aspects of consistency which are unsatisfactory in the sense that they don’t really capture our intuition about learning. Of course this is all part of the grand marketing campaign for my paper with Nabil, which uses a different notion of learning, so this discussion of consistency is a bit of a sidetrack. But I have already came across some VIP who i suspect was unaware of the distinction between different formulations of learning, and it wasn’t easy to treat his cocky blabbering in a respectful way. So it’s better to start with the basics.
Four agents are observing infinite streams of outcomes in . None of them knows the future outcomes and as good Bayesianists they represent their beliefs about unknowns as probability distributions:
- Agent 1 believes that outcomes are i.i.d. with probability of success.
- Agent 2 believes that outcomes are i.i.d. with probability of success. She does not know ; She believes that is either or , and attaches probability to each possibility.
- Agent 3 believes that outcomes follow a markov process: every day’s outcome equals yesterday’s outcome with probability .
- Agent 4 believes that outcomes follow a markov process: every day’s outcome equals yesterday’s outcome with probability . She does not know ; Her belief about is the uniform distribution over .
I denote by the agents’ beliefs about future outcomes.
We have an intuition that Agents 2 and 4 are in a different situations from Agents 1 and 3, in the sense that are uncertain about some fundamental properties of the stochastic process they are facing. I will say that they have `structural uncertainty’. The purpose of this post is to formalize this intuition. More explicitly, I am looking for a property of a belief over that will distinguish between beliefs that reflect some structural uncertainty and beliefs that don’t. This property is ergodicity.
You may have heard about ResearchGate, the so called facebook of scientists. Yes, another social network. Its structure is actually more similar to twitter: each user is a node and you can create directed edges from yourself to other users. Since I finally got rid of my facebook account (I am a Bellwether. In five years all the cool guys will not be on facebook), I decided to try ResearchGate. I wanted a stable platform to upload my preferable versions of my papers so that they will be the first to pop up on google. Also, I figured if I am returning to blogging then I need stuff to bitch about. ResearchGate only partially fulfill the first goal, but it does pretty well with the second.
When I give a presentation about expert testing there is often a moment in which it dawns for the first time on somebody in the audience that I am not assuming that the processes are stationary or i.i.d. This is understandable. In most modeling sciences and in statistics stationarity is a natural assumption about a stochastic process and is often made without stating. In fact most processes one comes around are stationary or some derivation of a stationary process (think the white noise, or i.i.d. sampling, or markov chains in their steady state). On the other hand, most game theorists and micro-economists who work with uncertainty don’t know what is a stationary process even if they have heard the word (This is a time for you to pause and ask yourself if you know what’s stationary process). So a couple of introductory words about stationary processes is a good starting point to promote my paper with Nabil
First, a definition: A stationary process is a sequence of random variables such that the joint distribution of is the same for all -s. More explicitly, suppose that the variables assume values in some finite set of outcomes. Stationarity means that for every , the probability is independent in . As usual, one can talk in the language of random variables or in the language of distributions, which we Bayesianists also call beliefs. A belief about the infinite future is stationary if it is the distribution of a stationary process.
Stationarity means that Bob, who starts observing the process at day , does not view this specific day as having any cosmic significance. When Alice arrives two weeks later at day and starts observing the process she has the same belief about her future as Bob had when he first arrives (Note that Bob’s view at day about what comes ahead might be different from Alice’s since he has learned something meanwhile, more on that later). In other words, each agent can denote by the first day in which they start observing the process, but there is nothing in the process itself that day corresponds to. In fact, when talking about stationary processes it will clear our thinking if we think of them as having infinite past and infinite future . We just happen to pop up at day .
One reason I like thinking about probability and statistics is that my raw intuition does not fit well with the theory, so again and again I find myself falling into the same pits and enjoying the same revelations. As an impetus to start blogging again, I thought I should share some of these pits and revelations. So, for your amusement and instruction, here are three statistics questions with wrong answers which I came across during the last quarter. The mistakes are well known, and in fact I am sure I was at some level aware of them, but I still managed to believe the wrong answers for time spans ranging from a couple of seconds to a couple of weeks, and I still get confused when I try to explain what’s wrong. I will give it a shot some other post.
— Evaluating independent evidence —
The graduate students in a fictional department of economics are thrown to the sharks if it can be proved at significance level that they are guilty of spending less than eighty minutes a day on reading von Mises `Behavioral econometrics’. Before the penalty is delivered, every student is evaluated by three judges, who each monitors the student in a random sample of days and then conducts a statistical hypothesis testing about the true mean of daily minutes the student spend on von Mises:
The three samples are independent. In the case of Adam Smith, a promising grad student, the three judges came up with p-values . Does the department chair have sufficient evidence against Smith ?
Wrong answer: Yup. The p-value in every test is the probability of failing the test under the null. These are independent samples so the probability to end up the three tests with such p-values is . Therefore, the chair can dispose of the student. Of course it is possible that the student is actually not guilty and was just extremely unlucky to get monitored exactly on the days in which he slacked, but hey, that’s life or more accurately that’s statistics, and the chair can rest assured that by following this procedure he only loses a fraction of of the innocent students.
— The X vs. the Y —
Suppose that in a linear regression of over we get that
where is the idiosyncratic error. What would be the slope in a regression of over ?
Wrong answer: If then , where . Therefore the slope will be with being the new idiosyncratic error.
— Omitted variable bias in probit regression —
Consider a probit regression of a binary response variable over two explanatory variables :
where is the commulative distribution of a standard normal variable. Suppose that and that and are positively correlated, i.e. . What can one say about the coefficient of in a probit regression
of over ?
Wrong answer: This is a well known issue of omitted variable bias. will be larger than . One way to understand this is to consider the different meaning of the coefficients: reflects the the impact on when increases and stays fixed, while reflects the impact on when increases without controlling on . Since and are positively correlated, and since has positive impact on (as ), it follows that .