You are currently browsing Eran’s articles.

In the lasts posts I talked about a Bayesian agent in a stationary environment. The flagship example was tossing a coin with uncertainty about the parameter. As time goes by, he learns the parameter. I hinted about the distinction between `learning the parameter’, and `learning to make predictions about the future as if you knew the parameter’. The former seems to imply the latter almost by definition, but this is not so.

Because of its simplicity, the i.i.d. example is in fact somewhat misleading for my purposes in this post. If you toss a coin then your belief about the parameter of the coin determines your belief about the outcome tomorrow: if at some point your belief about the parameter is given by some {\mu\in [0,1]} then your prediction about the outcome tomorrow will be the expectation of {\mu}. But in a more general stationary environment, your prediction about the outcome tomorrow depends on your current belief about the parameter and also on what you have seen in the past. For example, if the process is Markov with an unknown transition matrix then to make a probabilistic prediction about the outcome tomorrow you first form a belief about the transition matrix and then uses it to predict the outcome tomorrow given the outcome today. The hidden markov case is even more complicated, and it gives rise to the distinction between the two notions of learning.

The formulation of the idea of `learning to make predictions’ goes through merging. The definition traces back at least to Blackwell and Dubins. It was popularized in game theory by the Ehuds, who used Blackwell and Dubins’ theorem to prove that rational players will end up playing approximate Nash Equilibrium. In this post I will not explicitly define merging. My goal is to give an example for the `weird’ things that can happen when one moves from the i.i.d. case to an arbitrary stationary environment. Even if you didn’t follow my previous posts, I hope the following example will be intriguing for its own sake.

Read the rest of this entry »

A Bayesian agent is observing a sequence of outcomes in {\{S,F\}}. The agent does not know the outcomes in advance, so he forms some belief {\mu} over sequences of outcomes. Suppose that the agent believes that the number {d} of successes in {k} consecutive outcomes is distributed uniformly in {\{0,1,\dots k\}} and that all configuration with {d} successes are equally likely:

\displaystyle \mu\left(a_0,a_1,\dots,a_{k-1} \right)=\frac{1}{(k+1)\cdot {\binom{k}{d}}}

for every {a_0,a_1,\dots,a_{k-1}\in \{S,F\}} where {d=\#\{0\le i<k|a_i=S\}}.

You have seen this belief {\mu} already though maybe not in this form. It is a belief of an agent who tosses an i.i.d. coin and has some uncertainty over the parameter of the coin, given by a uniform distribution over {[0,1]}.

In this post I am gonna make a fuss about the fact that as time goes by the agent learns the parameter of the coin. The word `learning’ has several legitimate formalizations and today I am talking about the oldest and probably the most important one — consistency of posterior beliefs. My focus is somewhat different from that of textbooks because 1) As in the first paragraph, my starting point is the belief {\mu} about outcome sequences, before there are any parameters and 2) I emphasize some aspects of consistency which are unsatisfactory in the sense that they don’t really capture our intuition about learning. Of course this is all part of the grand marketing campaign for my paper with Nabil, which uses a different notion of learning, so this discussion of consistency is a bit of a sidetrack. But I have already came across some VIP who i suspect was unaware of the distinction between different formulations of learning, and it wasn’t easy to treat his cocky blabbering in a respectful way. So it’s better to start with the basics.

 

Read the rest of this entry »

Four agents are observing infinite streams of outcomes in {\{S,F\}}. None of them knows the future outcomes and as good Bayesianists they represent their beliefs about unknowns as probability distributions:

  • Agent 1 believes that outcomes are i.i.d. with probability {1/2} of success.
  • Agent 2 believes that outcomes are i.i.d. with probability {\theta} of success. She does not know {\theta}; She believes that {\theta} is either {2/3} or {1/3}, and attaches probability {1/2} to each possibility.
  • Agent 3 believes that outcomes follow a markov process: every day’s outcome equals yesterday’s outcome with probability {3/4}.
  • Agent 4 believes that outcomes follow a markov process: every day’s outcome equals yesterday’s outcome with probability {\theta}. She does not know {\theta}; Her belief about {\theta} is the uniform distribution over {[0,1]}.

I denote by {\mu_1,\dots,\mu_4\in\Delta\left(\{S,F\}^\mathbb{N}\right)} the agents’ beliefs about future outcomes.

We have an intuition that Agents 2 and 4 are in a different situations from Agents 1 and 3, in the sense that are uncertain about some fundamental properties of the stochastic process they are facing. I will say that they have `structural uncertainty’. The purpose of this post is to formalize this intuition. More explicitly, I am looking for a property of a belief {\mu} over {\Omega} that will distinguish between beliefs that reflect some structural uncertainty and beliefs that don’t. This property is ergodicity.

 

Read the rest of this entry »

You may have heard about ResearchGate, the so called facebook of scientists. Yes, another social network. Its structure is actually more similar to twitter: each user is a node and you can create directed edges from yourself to other users. Since I finally got rid of my facebook account (I am a Bellwether. In five years all the cool guys will not be on facebook), I decided to try ResearchGate. I wanted a stable platform to upload my preferable versions of my papers so that they will be the first to pop up on google. Also, I figured if I am returning to blogging then I need stuff to bitch about. ResearchGate only partially fulfill the first goal, but it does pretty well with the second.

Read the rest of this entry »

When I give a presentation about expert testing there is often a moment in which it dawns for the first time on somebody in the audience that I am not assuming that the processes are stationary or i.i.d. This is understandable. In most modeling sciences and in statistics stationarity is a natural assumption about a stochastic process and is often made without stating. In fact most processes one comes around are stationary or some derivation of a stationary process (think the white noise, or i.i.d. sampling, or markov chains in their steady state). On the other hand, most game theorists and micro-economists who work with uncertainty don’t know what is a stationary process even if they have heard the word (This is a time for you to pause and ask yourself if you know what’s stationary process). So a couple of introductory words about stationary processes is a good starting point to promote my paper with Nabil

First, a definition: A stationary process is a sequence {\zeta_0,\zeta_1,\dots} of random variables such that the joint distribution of {(\zeta_n,\zeta_{n+1},\dots)} is the same for all {n}-s. More explicitly, suppose that the variables assume values in some finite set {A} of outcomes. Stationarity means that for every {a_0,\dots,a_k\in A}, the probability {\mathop{\mathbb P}(\zeta_n=a_0,\dots,\zeta_{n+k}=a_{n+k})} is independent in {n}. As usual, one can talk in the language of random variables or in the language of distributions, which we Bayesianists also call beliefs. A belief {\mu\in\Delta(A^\mathbb{N})} about the infinite future is stationary if it is the distribution of a stationary process.

Stationarity means that Bob, who starts observing the process at day {n=0}, does not view this specific day as having any cosmic significance. When Alice arrives two weeks later at day {n=14} and starts observing the process she has the same belief about her future as Bob had when he first arrives (Note that Bob’s view at day {n=14} about what comes ahead might be different from Alice’s since he has learned something meanwhile, more on that later). In other words, each agent can denote by {0} the first day in which they start observing the process, but there is nothing in the process itself that day {0} corresponds to. In fact, when talking about stationary processes it will clear our thinking if we think of them as having infinite past and infinite future {\dots,\zeta_{-2},\zeta_{-1},\zeta_0,\zeta_1,\zeta_2,\dots}. We just happen to pop up at day {0}.

 

Read the rest of this entry »

One reason I like thinking about probability and statistics is that my raw intuition does not fit well with the theory, so again and again I find myself falling into the same pits and enjoying the same revelations. As an impetus to start blogging again, I thought I should share some of these pits and revelations. So, for your amusement and instruction, here are three statistics questions with wrong answers which I came across during the last quarter. The mistakes are well known, and in fact I am sure I was at some level aware of them, but I still managed to believe the wrong answers for time spans ranging from a couple of seconds to a couple of weeks, and I still get confused when I try to explain what’s wrong. I will give it a shot some other post.

— Evaluating independent evidence —

The graduate students in a fictional department of economics are thrown to the sharks if it can be proved at significance level {\alpha=0.001} that they are guilty of spending less than eighty minutes a day on reading von Mises `Behavioral econometrics’. Before the penalty is delivered, every student is evaluated by three judges, who each monitors the student in a random sample of days and then conducts a statistical hypothesis testing about the true mean of daily minutes the student spend on von Mises:

\displaystyle  \begin{array}{rcl} &H_0: \mu \ge 80\\&H_A: \mu<80\end{array}

The three samples are independent. In the case of Adam Smith, a promising grad student, the three judges came up with p-values {0.09, 0.1, 0.08}. Does the department chair have sufficient evidence against Smith ?

Wrong answer: Yup. The p-value in every test is the probability of failing the test under the null. These are independent samples so the probability to end up the three tests with such p-values is {0.09\cdot 0.1\cdot 0.08<0.001}. Therefore, the chair can dispose of the student. Of course it is possible that the student is actually not guilty and was just extremely unlucky to get monitored exactly on the days in which he slacked, but hey, that’s life or more accurately that’s statistics, and the chair can rest assured that by following this procedure he only loses a fraction of {0.001} of the innocent students.

— The X vs. the Y —

Suppose that in a linear regression of {Y} over {X} we get that

\displaystyle Y=4 + X + \epsilon

where {\epsilon} is the idiosyncratic error. What would be the slope in a regression of {X} over {Y} ?

Wrong answer: If {Y= 4 + X + \epsilon} then {X = -4 + Y + \epsilon'}, where {\epsilon'=-\epsilon}. Therefore the slope will be {1} with {\epsilon'} being the new idiosyncratic error.

— Omitted variable bias in probit regression —

Consider a probit regression of a binary response variable {Y} over two explanatory variables {X_1,X_2}:

\displaystyle \text{Pr}(Y=1)=\Phi\left(\beta_0 + \beta_1X_1 + \beta_2X_2\right)

where {\Phi} is the commulative distribution of a standard normal variable. Suppose that {\beta_2>0} and that {X_1} and {X_2} are positively correlated, i.e. {\rho(X_1,X_2)>0}. What can one say about the coefficient {\beta_1'} of {X_1} in a probit regression

\displaystyle \text{Pr}(Y=1)=\Phi\left(\beta_0'+ \beta_1'X_1\right)

of {Y} over {X_1} ?

Wrong answer: This is a well known issue of omitted variable bias. {\beta_1'} will be larger than {\beta_1}. One way to understand this is to consider the different meaning of the coefficients: {\beta_1} reflects the the impact on {Y} when {X_1} increases and {X_2} stays fixed, while {\beta_1'} reflects the impact on {Y} when {X_1} increases without controlling on {X_2}. Since {X_1} and {X_2} are positively correlated, and since {X_2} has positive impact on {Y} (as {\beta_2>0}), it follows that {\beta_1'>\beta_1}.

Here, recounting the debate before the Bin Laden operation.

[The President] said, I have to make this decision what is your opinion. He started with the National Security adviser, the Secretary of State and he ended with me. Every single person in that room hedged their bet, except Leon Panetta. Leon said GO. Everyone else said 49, 51, this…

It got to me. Joe what do you think? I said “You know I didn’t know we have so many economists around the table. We owe the man a direct answer.

Me: First, is `economist’ a synonym to somebody who can’t give a direct answer ? I thought we have a better reputation. Second, the way I see it 49,51 is actually a direct answer and a bold prediction. It means that if they start making independent attempts to kill Bin Laden, and substantially less or substantially more than half of these attempts succeed then the president has all the reasons to give these guys the boot.

So, wikipedia is dark today in protest of an initiative in congress to block sites that link to sites that infringe on copyrighted intellectual property. Ever noticed before how many times a day you use wikipedia ?

Here is what I don’t get about this whole idea of “copyrighted intellectual property”. Is it something advocated on moral grounds or on economic grounds ? I mean, when Bob sneaks into Alice’s vineyard and eats the grapes without permission, we view it as a moral atrocity; It’s just a wicked thing to do; It invokes the wrath of the gods; Moses explicitly forbade it. To be sure, it’s hard to pin down what exactly makes the vineyard belong to Alice without getting into a recursive definition of ownership, and if we try tracing back the vineyard from one legitimate owner to another we arrive to the first man who just fenced a piece of land and said “This is mine”. But here the economic argument kicks in — Most of us don’t begrudge this initial act of illegitimate fencing because the bastard who committed it was the founder of civil society. We like the idea of civil society. We like prosperity and growth. Without protection of private property we will have none of these.

But what about protection of “intellectual property” ? Clearly this is not a necessary condition for a civil society. It’s also not a necessary condition for production of knowledge and culture. We had Plato and Archimedes and Cicero and Shakespeare and Newton before it occurred to anybody that Bob has to gets Alice’s permission to reproduce a code that Alice wrote. Coming to think about it, when did the concept of intellectual property pop up anyway ? Waitaminute let me just check it up on wikipedia. Oops.. What did we ever do before wikipedia ?

The White House thinks that intellectual property is justified on economic grounds

Online piracy is a real problem that harms the American economy, and threatens jobs for significant numbers of middle class workers and hurts some of our nation’s most creative and innovative companies and entrepreneurs. It harms everyone from struggling artists to production crews, and from startup social media companies to large movie studios.

I wonder if this assertion backed by some empirical research ? I realize some people lose their job because of online piracy. Also, Some people lost their jobs following the introduction of ATMs. But we view ATMs as positive development since it made a certain service way cheaper. My guess is that the same is true about intellectual piracy — it makes distribution of culture and knowledge cheaper and therefore makes also the production of culture and knowledge cheaper. True, some companies, particularly the established ones, are damaged by intellectual theft. Other companies, particularly startups, benefit. One may argue that intellectual piracy destroys incentive to produce and therefore no new culture or knowledge will be produced absent some protection for intellectual property. But this is a claim that can be empirically checked no ? We live in a world of file sharing and user generated (often stolen) content sites. Are there less books written ?

Embassy Suite hotel Saturday morning. Photo by Jacob Leshno.

They say that when Alfred Tarski came up with his theorem that the axiom of choice is equivalent to the statement that, for every set {A}, {A} and {A\times A} have the same cardinality, he first tried to publish it in the French PNAS. Both venerable referees rejected the paper: Frechet argued there is no novelty in equivalence between two well known theorems; Lebesgue argued that there is no interest in equivalence between two false statments. I don’t know if this ever happened but it’s a cool story. I like to think about it everytime a paper of mine is rejected and the referees contradict each other.

Back to game theory, one often hears that the existence of Nash Equilibrium is equivalent to Brouwer’s fixed point theorem. Of course we all know that Brouwer implies Nash but the other direction is more tricky less known. I heard a satisfying argument for the first time a couple of months ago from Rida. I don’t know whether this is a folk theorem or somebody’s theorem but it is pretty awesome and should appear in every game theory textbook.

So, assume Nash’s Theorem and let {X} be a compact convex set in {\mathbf{R}^n} and {f:X\rightarrow X} be a continuous function. We claim that {f} has a fixed point. Indeed, consider the two-player normal-form game in which the set of pure strategies of every player is {X}, and the payoffs under strategy profile {(x,y)\in X^2} is {-\|x-y\|^2} for player I and {-\|f(x)-y\|^2} for player II. Since strategy sets are compact and the payoff function is continuous, the game has an equilibrium in mixed strategies. In fact, the equilibrium strategies must be pure. (Indeed, for every mixed strategy {\mu} of player II, player 1 has a unique best response, the one concentrated on the barycenter of {\mu}). But if {(x,y)} is a pure equilibrium then it is immediate that {x=y=f(x)}.

Update I should add that I believe that the transition from existence of mixed Nash Equilibrium in games with finite strategy sets to existence of mixed Nash Equilibrium in games with compact strategy sets and continuous payoffs is not hard. In the case of the game that I defined above, if {\{x_0,x_1,\dots\}} is a dense subset of {X} and {(\mu_n,\nu_n)\in \Delta(X)\times\Delta(X)} is a mixed equilibrium profile in the finite game with the same payoff functions and in which both players are restricted to the pure strategy set {\{x_1,\dots,x_n\}}, then an accumulation point of the sequence {\{(\mu_n,\nu_n)\}_{n\geq 1}} in the weak{^\ast} topology is a mixed strategy equilibrium in the original game.

Join 119 other followers

Follow

Get every new post delivered to your Inbox.

Join 119 other followers

%d bloggers like this: