Nate Silver, needs no introduction. While I should have read his book by now, I have not. From my student Kane Sweeney, I learn I should have. Kane, if I may ape Alvin Roth, is a student on the job market paper this year with a nice paper on the design of healthcare exchanges. Given the imminent roll out of these things I would have expected a deluge of market design papers on the subject. Kane’s is the only one I’m aware of. But, I digress (in a good cause).
Returning to Silver, he writes in his book:
One of the most important tests of a forecast — I would argue that it is the single most important one — is called calibration. Out of all the times you said there was a 40 percent chance of rain, how often did rain actually occur? If over the long run, it really did rain about 40 percent of the time, that means your forecasts were well calibrated
Many years ago, Dean Foster and myself wrote a paper called Asymptotic Calibration. In another plug for a student, see this post. An aside to Kevin: the `algebraically tedious’ bit will come back to haunt you! I digress again. Returning to the point I want to make; one interpretation of our paper is that calibration is perhaps not such a good test. This is because, as we show, given sufficient time, anyone can generate probability forecasts that are close to calibrated. We do mean anyone, including those who know nothing about the weather. See Eran Shmaya’s earlier posts on the literature around this.
5 comments
December 12, 2012 at 12:33 pm
afinetheorem
Hah! I’m only the messenger: It is Fudenberg and Levine who open with “their construction, which relies on a series of approximations, is somewhat complicated; this note provides a shorter and simpler one”!
-Kevin
December 13, 2012 at 10:51 am
rvohra
`somewhat complicated’ is NOT the same as `algebraically tedious’……….the first is accurate the second is `fair and balanced’.
December 19, 2012 at 10:25 am
Curious
Do you think that calibration is of little value then? If so, is there a test that we can use in practice that beats it? (Feel free to define “in practice” as you see fit)
December 20, 2012 at 11:30 pm
rvohra
Dear Curious
The short answer to your question is that I think of calibration as being a necessary condition for a `good’ forecast but not a sufficient one. Now for a longer answer.
Imagine the following sequence of outcomes: 0,1, 0, 1, 0, 1, 0, 1,…….Consider now three distinct sequences of probability forecasts. The goal is to predict the probability of seeing a `1′.
0.5, 0.5, 0.5, 0.5,…..
0,1, 0, 1, 0,……..
0.1, 0.9, 0.1, 0.9,…….
The first and second forecast are both calibrated with respect to the sequence of outcomes. Thus, very different forecasts can be calibrated with respect to the same data. The third forecast is NOT calibrated, but I think you would agree is more `informative’ than the first. This suggests that calibration does not capture everything we would want in a measure of forecast accuracy. In particular, the third forecast seems to have captured the pattern in the data. There is a measure of this `pattern matching’ called coherence or resolution.
However, one of the results that Dean Foster and I left on the cutting room floor, was this: any probability forecast could be adjusted to be calibrated without reducing its `resolution’. If memory serves, Teddy Seidenfeld may have been the first to make this observation.
By the way, as was pointed out to me by a colleague, Nate Silver’s recent probability forecasts on which states would go for Obama were not calibrated!
December 21, 2012 at 5:41 am
Curious
Thanks