Surprise and Digression

The abstract of a 2005 paper by Itti and Baldi begins with these words:

The concept of surprise is central to sensory processing, adaptation, learning, and attention. Yet, no widely-accepted mathematical theory currently exists to quantitatively characterize surprise elicited by a stimulus or event, for observers that range from single neurons to complex natural or engineered systems. We describe a formal Bayesian definition of surprise that is the only consistent formulation under minimal axiomatic assumptions.

They propose that surprise be measured by the Kullback-Liebler divergence between the prior and the posterior. As with many good ideas, Itti and Baldi are not the first to propose this. C. L. Martin and G. Meeden did so in 1984 in an unpublished paper entitled: `The distance between the prior and the posterior distributions as a measure of surprise.’ Itti and Baldi go further and provide experimental support that this notion of surprise comports with human notions of surprise. Recently, Ely, Frankel and Kamenica in Economics, have also considered the issue of surprise, focusing instead on how best to release information so as to maximize interest.

Surprise now being defined, one might go on to define novelty, interestingness, beauty and humor. Indeed, Jurgen Schmidhuber has done just that (and more). A paper on the optimal design of jokes cannot be far behind. Odd as this may seem, it is a part of a venerable tradition. Kant defined humor as the sudden transformation of a strained expectation into nothing. Birkhoff himself wrote an entire treatise on Aesthetic Measure (see the review by Garabedian). But, I digress.

Returning to the subject of surprise, the Kulback-Liebler divergence is not the first measure of surprise or even the most wide spread. I think that prize goes to the venerable $p$ -value. Orthodox Bayesians, those who tremble in the sight of measure zero events, look in horror upon the $p$ -value because it does not require one to articulate a model of the alternative. Even they would own, I think, to the convenience of having to avoid listing all alternative models and carefully evaluating them. Indeed I. J. Good writing in 1981 notes the following:

The evolutionary value of surprise is that it causes us to check our assumptions. Hence if an experiment gives rise to a surprising result given some null hypothesis $H$ it might cause us to wonder whether $H$ is true even in the absence of a vague alternative to $H$ .

Good, by the way, described himself as a cross between Bayesian and Frequentist, called a Doogian. One can tell from this label that he had an irrepressible sense of humor. Born Isadore Guldak Joseph of a Polish family in London, he changed his name to Ian Jack Good, close enough one supposes. At Bletchley park he and Turing came up with the scheme that eventually broke the German Navy’s enigma code. This led to the Good-Turing estimator. Imagine a sequence so symbols chosen from a finite alphabet. How would one estimate the probability of observing a letter from the alphabet that has not yet appeared in the sequence thus far? But, I digress.

Warren Weaver was, I think, the first to propose a measure of surpirse. Weaver is most well known as a popularizer of Science. Some may recall him as the Weaver on the slim volume by Shannon and Weaver on the Mathematical Theory of Communication. Well before that, Weaver played an important role at the Rockefeller foundation, where he used their resources to provide fellowships to many promising scholars and jump start molecular biology. The following is from page 238 of my edition Jonas’ book `The Circuit Riders’:

Given the unreliability of such sources, the conscientious philanthropoid has no choice but to become a circuit rider. To do it right, a circuit rider must be more than a scientifically literate ‘tape recorder on legs.’ In order to win the confidence of their informants, circuit riders for Weaver’s Division of Natural Sciences were called upon the offer a high level of ‘intellectual companionship – without becoming ‘too chummy’ with people whose work they had, ultimately, to judge.

But, I digress.

To define Weaver’s notion, suppose a discrete random variable $X$ that takes values in the set $\{1, 2, \ldots, m\}$ . Let $p_i$ be the probability that $X = i$ . The surprise index of outcome $k$ is $\frac{\sum_i i p^2_i}{p_k}$ . Good himself jumped into the fray with some generalizations of Weaver’s index. Here is one $\frac{[\sum_iip_i^t]^{1/t}}{p_k}$ . Others involve the use of logs, leading to measures that are related to notions of entropy as well probability scoring rules. Good also proposed axioms that a good measure to satisfy, but I cannot recall if anyone followed up to derive axiomatic characterizations.

G. L. S. Shackle, who would count as one of the earliest decision theorists, also got into the act. Shackle departed from subjective probability and proposed to order degrees of beliefs by their potential degrees of surprise. Shackle also proposed, I think, that an action be judged interesting by its best possible payoff and its potential for surprise. Shackle, has already passed beyond the ken of men. One can get a sense of his style and vigor from the following response to an invitation to write a piece on Rational Expectations:

Rational expectations’ remains for me a sort of monster living in a cave. I have never ventured into the cave to see what he is like, but I am always uneasily aware that he may come out and eat me. If you will allow me to stir the cauldron of mixed metaphors with a real flourish, I shall suggest that ‘rational expectations’ is neo-classical theory clutching at the last straw. Observable circumstances offer us suggestions as to what may be the sequel of this act or that one. How can we know what invisible circumstances may take effect in time-to come, of which no hint can now be gained? I take it that ‘rational expectations’ assumes that we can work out what will happen as a consequence of this or that course of action. I should rather say that at most we can hope to set bounds to what can happen, at best and at worst, within a stated length of time from ‘the present’, and can invent an endless diversity of possibilities lying between them. I fear that for your purpose I am a broken reed.

3 comments

March 20, 2014 at 6:23 am

Anonymous

Great post, Ricky. As a measure of surprise in covariance stationary environments I might suggest the innovation in a Wold representation, which is that part of the current value of a stochastic process that can’t be predicted by linear projection on the infinite past. Indeed that’s why innovations are *called* innovations! Of course the innovations are merely uncorrelated — not necessarily independent since they’re not necessarily Gaussian — so they measure “linear surprise.” They may still be non-linearly predictable, and hence incompletely surprising! For completely surprising innovations, one would need to move to Volterra representations, in which the innovations are indeed iid.

March 20, 2014 at 6:26 am

Frank Diebold

P.S. I think that may have been my all-time first comment on a blog!

March 21, 2014 at 8:59 pm

rvohra

now that is a surprise!

Surprise and Digression

Email Subscription

Entries by:

Recent Posts

Recent Comments

Blogroll

Archives

Meta

3 comments

Surprise and Digression

Related

Email Subscription

Entries by:

Recent Posts

Recent Comments

Blogroll

Archives

Meta

3 comments