You are currently browsing the monthly archive for November 2011.

It is well-known that a non-linear transformation of a random variable does not transform the mean in an entirely straightforward way. That is, for a random variable X and function f, we can easily have $E(f(X)) \neq f(E(X))$. In our intro decision science courses, we call this the “flaw of averages,” a term coined by Sam Savage. See his book of that title for many examples of how one can, often inadvertently, fall into the false assumption that it suffices to replace a random variable by its average.

What if instead of the average, we talk about the mode, or most likely outcome? Denote this M(X). Surely, if f is a one-to-one function, the most likely value of f(X) must be f(M(X))? Amazingly, this can be false as well! It is much “closer to true,” in that it is true for all discrete distributions, but we run into trouble with continuous distributions. Here is an example:

Let X follow a standard normal distribution, and let $Y=e^X$. The distribution of Y is called “lognormal.” Here is a graph of its density:

Notice that while the standard normal distribution is peaked at 0, this distribution is not peaked at $e^0=1$! What is going on?

We can work this out with algebra and calculus, but here is a conceptual way of looking at it. The key is that probability densities differ fundamentally from probabilities, and the precise definition of the mode is different for continuous than discrete distributions. Saying that 0 is the “most likely” value for a standard normal variable X isn’t quite right. Any particular exact value, of course, has probability zero. What we really mean is that X is more likely to be near 0 than near any other value. Fine, so then why isn’t $Y=e^X$ more likely to be near 1=exp(0) than any other value? Because the exponential transformation does funny things to “near.” Nearby values get stretched out more for larger X. So, while more realizations of X are near 0 than near -1, they are spread out more thinly when we exponentiate, so that the maximum density of Y occurs not at exp(0) but at exp(-1). It takes some algebra to find the exact value, but this argument makes it fairly clear it should be less than exp(0).

Why is this important? One reason is that in regression analysis we often use a model which predicts ln y rather than y. Then we need to convert the predicted value for ln y to a predicted value for y. It is well-known (and has been taught for years in our intro stats course) that if we are predicting the average y, it does not suffice to exponentiate; a correction must be made. But for predicting an individual observation of y, we all teach that no correction is necessary. It only occurred to me last week, after teaching the course for 6 years, that this is problematic if we seek the “most likely” value of y.

What is the resolution? First of all, there is no distortion for the median. If k is the point prediction for ln y, then we can conclude that y has a 50% chance of being above exp(k). So the method we have been teaching is a fine way to estimate the median value of y. Our lectures and textbook haven’t really said in precise language whether we are predicting a median or modal value, so I am glad to report we haven’t been teaching anything unambiguously wrong. Secondly, the problem goes away if we work with prediction intervals rather than single-value predictions, as we often encourage students to do. If we are 95% confident that ln y is in [a,b], we can certainly conclude that we are 95% confident y is in $[e^a,e^b]$.

Most importantly, we should reinforce the lesson from the “flaw of averages” that any single number – mean, median or mode – is a poor summary of our knowledge of a random variable. This is especially true for a variable that is lognormal (or any asymmetric distribution) rather than normal, in which case all three values are usually different.

1. Show “Reverse Jensen’s inequality for modes”: If f is an increasing convex function, X is a continuous random variable, and both X and f(X) have a unique mode (point of maximum density), then mode(f(X)) <= f(mode(X)) . If f is strictly convex and X has continuously differentiable density, the inequality is strict.

2. (Based on a suggestion from Peter Klibanoff.) Let X have a continuously differentiable density and a unique mode, and Y=exp(X). Define the density of Y “on a multiplicative scale” by

$g(y) = \lim_{\epsilon \rightarrow 0} P(Y \in [y,y(1+\epsilon)])/\epsilon$

Show that g is maximized at exp(mode(X)). Note that the above formula is similar to the standard density, but with $y(1+\epsilon)$ having replaced $y + \epsilon$. That is, if we consider Y to be measured on a multiplicative scale, with multiplicative errors, there is no distortion in the mode.

A peculiarity of my profession is that we first meet our colleagues on the page before we meet them in person. It was in this way that I first met Mort Kamien. A class on optimal control in which lectures were sporadic, and when offered, were delivered in a whisper, was the spur. To compensate, I trolled the library and chanced upon a copy of Kamien and Schwarz’s Dynamic Optimization. Not once did it cross my mind that I would one day become Mort’s colleague.

The path from the Warsaw ghetto to Evanston cannot have been a straight one and I count it a particular honor that it crossed mine. Escaping with the remnants of his family from the grasp of Mr. Hitler, he ended up in New York. This was followed by stints in an orphanage, experience in the rag trade, City College, Purdue (which Mort insisted was just like New York), Carnegie Tech and then Northwestern. An eventful life that ended November 18th, 2011.

Mort will be remembered by others for his contributions to the study of innovation and the value of patents. Also worth recalling is his role (with Stan Reiter) in founding a community devoted to economic theory here at Northwestern. In the decade or so I served as a colleague with Mort, I learnt many useful things from him about people, institutions and academic politics and the recent history of economic ideas.

It is a custom in certain British institutions of learning to hold a founder’s day. On that day, the memory of the founder is recalled and celebrated by a reading from Ecclesiaticus, chapter 44. I can think of no more fitting tribute:

Let us now praise famous men, and our fathers that begat us.

The Lord hath wrought great glory by them through his great power from the beginning.

Such as did bear rule in their kingdoms, men renowned for their power, giving counsel by their understanding, and declaring prophecies:

Leaders of the people by their counsels, and by their knowledge of learning meet for the people, wise and eloquent are their instructions:
Such as found out musical tunes, and recited verses in writing:
Rich men furnished with ability, living peaceably in their habitations:
All these were honored in their generations, and were the glory of their times.

I was asked recently what I thought about research in Social Networks (SN). The correct response to a question of this variety when asked in public before an audience that contains scholars thinking about this subject is to say something noncommital or perhaps imitate Herman Cain on Libya. I did neither. I said what was on my mind and upon reflection, I realize it was not the best answer. With the benefit of time, I think have a better answer (and no, I’m not about to mimic Romney either!).

There are, I think, 3 strands in the literature dubbed SN. The first is the use of information contained in the web of relationships an individual maintains to make inferences about that individual. In one sense this is a standard statistical (or machine learning) exercise but with the wrinkle that the potentially relevant independent variables are associated with properties of the network within which the individual is embedded. The second strand is about how infection or ideas spread through a network. This is a subject that is at least 20 years old (see for example the 1988 survey paper by Hedetniemi, Hedetniemi and Liestman). The new wrinkle here has been the desire to study diffusion in networks that resemble what is observed rather than networks in general. Thus, I would also lump into this strand of the literature the work done to document and describe observed networks (eg the power law stuff).

The third strand is driven by the idea that one’s place in the network determines one’s influence, power or wealth. It is this strand of the literature that I am skeptical about. Note, skeptical not opposed to. First, I accept that `who you know’ matters. But if this is the extent of the insight such models yield, it is thin gruel. Now, there are models  that seek to show how the structure of the network influences the distribution of wealth, say, These usually involve an exogenously given static network. However, one would suspect that such networks are really dynamic and the products of strategic choices of the agents. Thus, these models are still too distant from reality to be compelling. Suggestive, perhaps, but not compelling.

The International Atomic Energy Agency (IAEA) report leaked on Nov 8, 2011 confirmed the obvious: Iran is striving to obtain military nuclear capability, and at this point it is a few months away from materializing it. The spiraling debate about the steps that the US, NATO or Israel should now take mask an important point – namely, that Iran itself has a strong incentive to stop just short of having a nuclear arsenal, and convince the world that it can arm a bomb on a few months notice but that at the same time it is deliberately avoiding this last measure.

To understand why, it is worthwhile considering the motive behind the Iranian military nuclear plan. One potential motive is that Iran simply wants to destroy Israel physically. The Iranian president Ahmadinejad has made it clear, time and again, that he would have liked to see Israel wiped off the face of the earth. It is therefore conceivable that once Iran builds a nuclear arsenal capable of destroying Israel, Ahmadinejad and the Iranian leadership will immediately order to launch it at Israel.

This scenario is conceivable but not necessarily very likely, though. Ahmadinejad cares about his country – he has, for instance, expressed concerns that in a confrontation of Iran with the west over its nuclear plan Iran might receive a blow which will not allow it to rise again for 500 years; he also believes that upon a nuclear strike against Israel mainland, Israel will still be able to respond with a nuclear strike against Iran’s main cities from its submarines – a strike that might withdraw Iran much more than 500 years back.

For this reason, a more likely motive for the Iranian military nuclear plan is to gain prestige and influence as a regional power. Being by now just a step away from possessing a nuclear arsenal, Iran has already gained this payoff from its nuclear plan. It is feared by its neighbors and it is considered a bigger threat than before by Israel, Europe and the US. But somewhat paradoxically, by making the last move to own an armed nuclear arsenal it might jeopardize its own achievement.

The reason for this is the policy that Israel (and likewise the US and Europe) is most likely to adopt upon a nuclear armament by Iran. Israel will have to declare that it will consider Iran, its only known nuclear rival, as responsible for any nuclear attack whatsoever it might suffer – even though *conditional* on a nuclear attack against Israel, its most likely source will not be Iran but rather some evasive terrorist organization, which got its bomb in the international black markets, and against which it is extremely difficult to retaliate. Iran does not control all the organizations which might want to launch a nuclear attack against Israel. Nevertheless, by making itself the unique nuclear Muslim power in rivalry with Israel, technically able to delegate nuclear capability to organizations like Hamas or Hezbollah (even if Iran will never willfully do so), Iran makes itself vulnerable – from its own perspective – as the sole target for Israeli unavoidable retaliation.

There are additional low-probability but pivotal scenarios in which nuclear armament by Iran might literally back-fire. Israeli radars detecting an approaching missile launched from the Shat-el-Arab region might not be able to discern whether it was fired from Iraq or from Iran. Since Israel is such a small country, it might have no choice but to adopt a zero-tolerance policy, treating any such missile as a nuclear one and launching an automatic extreme response. However, Iran cannot actually control any missile launching that Israel might suspect as coming from Iran.

Currently, Iran seems to be humiliated by the IAEA report and by the response of the international community. Humiliating Germany and impoverishing it in the Versailles agreement at the end of World War I led to the rise of Hitler and to the worst atrocities ever in the history of mankind, during World War II.  Iran is a large nation with an ancient and rich culture, and humiliating it can only be counter-productive.

A potentially better strategy would be to encourage Iran to follow its own interest by transparently staying only on the brink of military nuclear capability, and at the same time to admit Iran as a de facto member of the “nuclear club”. If, then, Iran nevertheless prefers to curtail transparency and renounce international recognition of its power, it will not only suffer the consequences of undermining its own interests, but might ignite an escalatory pace in which it is likely to suffer much more.

There are many ways to hold an election with more than two candidates. For most major elections in this country, we are stuck with perhaps the worst, the familiar plurality voting, in which each voter picks one candidate and the highest vote total wins. If you are unfamiliar with the deficiencies of plurality voting, one place to look is this excellent blog post, written by mathematician Tim Gowers in the context of Britain’s recent (sadly unsuccessful) referendum to change their system. If this seems like a mere technical issue, one should look here at a notorious historical election in which the winning party carried 33% of the vote.

Plurality voting is especially bad in a crowded field where no single candidate has strong support, as in the current Republican presidential primary race. We open our daily paper and see that Herman Cain is a front-runner with a whopping 25% of the vote, but cannot tell if he is the second choice or last choice of the other 75%. This surely has an impact on his chances of success, both as the primary field narrows and in the general election.

Now, changing an electoral system is a very difficult thing, not least because those in power got there via the current system. But what of polls? These do not suffer from any similar institutional inertia, and there is no monopoly on polls; Gallup, Rasmussen, etc. are each free to ask whatever questions they see fit. Surely we would get a better picture of the mood of the electorate by, at minimum, asking each pollee to approve/disapprove of each candidate, or, better still, rank each candidate on a numerical scale of 1 to 5 or 10, or simply rank all candidates in order of preference. No single method is perfect, so some variety might be ideal. Since polls have an impact on real elections, improving polls might reduce some of the idiosyncrasies that arise from plurality voting. Of course, it would also be the responsibility of attention-span-deficient news outlets to report not just the familiar plurality-based polls, but also those which give a more detailed picture of the electorate’s preferences.