Nate Silver, needs no introduction. While I should have read his book by now, I have not. From my student Kane Sweeney, I learn I should have. Kane, if I may ape Alvin Roth, is a student on the job market paper this year with a nice paper on the design of healthcare exchanges. Given the imminent roll out of these things I would have expected a deluge of market design papers on the subject. Kane’s is the only one I’m aware of. But, I digress (in a good cause).

Returning to Silver, he writes in his book:

One of the most important tests of a forecast — I would argue that it is the single most important one — is called calibration. Out of all the times you said there was a 40 percent chance of rain, how often did rain actually occur? If over the long run, it really did rain about 40 percent of the time, that means your forecasts were well calibrated

Many years ago, Dean Foster and myself wrote a paper called Asymptotic Calibration. In another plug for a student, see this post. An aside to Kevin: the `algebraically tedious’ bit will come back to haunt you! I digress again. Returning to the point I want to make; one interpretation of our paper is that calibration is perhaps not such a good test. This is because, as we show, given sufficient time, anyone can generate probability forecasts that are close to calibrated. We do mean anyone, including those who know nothing about the weather. See Eran Shmaya’s earlier posts on the literature around this.