In most theoretical models in game theory, the strategies that players may follow can be extremely complex. Because humans, computer programs that implement strategies, and animals, all have bounded computational capacity, many strategies that we, theoreticians, use in our constructions are too complex for the real-life agents that are supposed to play in our games. A strand of the literature that tries to take care of this issue studies repeated games where the players have bounded rationality: they have bounded memory, bounded recall (that is, they remember only the actions taken by the players in the last k stages of the game), or bounded computational capacity, so that they can use, for example, encryption to send private communicate among themselves.
One of the messages of this theory is that if one player is much stronger than the other, that is, his memory space or computational capacity is much larger than that of the other player, then, even if the weaker player chooses his strategy randomly, along the play the stronger player can learn the pure strategy that the weaker player follows, and best respond to it, thereby reducing the payoff of the weaker player to his min-max level in pure strategies. For example, in the game Matching Pennies, if one player’s memory is much larger than the other’s (in fact, exponentially larger), than the stronger player can increase his average payoff to 1, where 0 is the value of the one-shot game.
The theory also tells us that if one player is stronger than the other, but not much stronger, then the larger memory/computational capacity that he has does not translate to higher payoffs: to increase the long-run (or discounted) average payoff the stronger player needs a significantly larger memory than that of the weaker player.
Other papers study one-player decision problems where the decision maker has bounded memory, and they characterize the optimal simple strategy of the decision maker; that is, the optimal strategy among the set of strategies that use a specific memory size.
The proofs of the results on repeated games with boundedly rational players are far from trivial, and usually involve deep mathematical arguments. However, after reading (and writing) papers in this area, I wondered what practical insights this literature tells us.
Consider for example repeated games played by players with bounded memory, as studied, among others, by Elchanan Ben Porath, Ehud Kalai, Abraham Neyman, Daijiro Okada, and Ariel Rubinstein. Let G be a one-shot game. Let G(n1,n2,T) be the T-stage repetition of the game G, where player 1 is restricted to strategies with memory n1 (e.g., he can only follow strategies that can be implemented by a computer program with n1 command lines), and player 2 is restricted to strategies with memory n2. Then the game G(n1,n2,T) has a value v(n1,n2,T) in mixed strategies, this value v(n1,n2,T) is approximately the value of the one shot game if n1 and n2 are close, and the value v(n1,n2,T) is close to the min-max value of the one-shot game G if n1 is much larger than n2, yet both are much smaller than T.
The insights gained from these papers were mentioned above: the relation between the relative computational capacity of the players and the value of the game. The catch, in my view, is the usual one concerning optimal strategies and equilibria in game theory: how do players get to know their optimal strategy? That is, how do boundedly rational players find their optimal mixed strategy in the game G(n1,n2,T)? The computation of such a strategy is a very difficult task, which requires memory much larger than n1 (or n2). The usual answer is evolution or learning. A third answer, which I find more convincing, is that the agents who play the game are indeed boundedly rational, but they are agents of much more sophisticated players, who design the optimal strategies of the agents. This story is convincing when, e.g., the agents undergo a training period before playing the game. For example, salespersons undergo a training period, where they learn how to interact with potential buyers. This training of salespersons in effect dictates the strategy that they will use in the repeated game.
When two players face a new situation, each must use his memory/computational capacity to find his optimal strategy. The meta-program he uses for that purpose, which he inherited by evolution and improved through learning, serves for the same purpose.
Therefore I believe that results from the game theoretic literature on boundedly rational players give us some insights on real life situations.
What are the insights that we gain from results on one-player decision problems with a boundedly rational decision maker? As can be expected, in some models an agent with bounded memory can approximate the behavior of a rational agent, whereas in other, more complex models this cannot be achieved. An interesting paper in this venue is “Bounded Memory and Biases in Information Processing” by Andrea Wilson, which I will now discuss. Wilson studies a problem where there are two states of nature, H and L, and the decision maker should make a decision at an unknown period which is geometrically distributed. There are two possible actions, H and L; each action yields a payoff 1 if it matches the unknown state and 0 otherwise. The decision maker receives a signal at every period, either h or l; the probability to observe the signal h is high when the true state is H, and a similar property holds for the signal l and the state L. A rational decision maker can calculate after the arrival of each new piece of information the posterior belief about the state of nature, and then, when called upon to play, choose the action that corresponds to the more likely state. An agent with bounded memory must forget some of the information that he has, and therefore it is not clear what his optimal strategy looks like. Wilson provides a very neat and sophisticated proof that the optimal strategy has a special structure: the decision maker remembers one integer, k, which reflects the “likelihood” that the state is H. The value of k can be between 1 and the maximal number allowed by the memory size of the decision maker. When called upon to play, the decision maker chooses H if k is above some cutoff, and L if k is below the cutoff. Whenever the signal h is received, k increases by 1; whenever the signal l is observed, k decreases by 1. When k is one of the extreme points, say k=1, and the decision maker observes the signal h, with positive probability the value of k increases by 1, so that the decision maker will not be stuck with the incorrect decision when a lot of evidence in favor of the opposite decision arrives.
Wilson relates the structure of the optimal strategies to two psychological phenomena: confirmatory bias, that is, the tendency to interpret information as supporting the current belief, and overconfidence/underconfidence bias. The reason for the confirmatory bias is that if Alice is in state k=1 (so she is confident that the true state is L), and she observes the sequence of signals hhl, she will most probably remain at k=1, whereas if Bob is the maximal state k (so he is confident that the true state is H) and he observed the same sequence of signals hhl, he will most probably remain at the maximal state. Thus, the same sequence of signals is interpreted differently by the players, according to their initial belief.
Do I find this relation convincing? I must say that I do not. I agree that the optimal strategy resembles confirmatory bias, but do humans actually play using such a strategy? Don’t we sometimes ignore/forget signals? Don’t we sometimes exaggerate? Moreover, this strategy has properties that I believe that humans fail to have. For example, if one follows the optimal strategy, and if the true state is, say, H, then nevertheless there will be periods (infinitely many, in fact) in which the decision maker will be confident that the true state is L. This will happen by shear luck (or bad-luck) – there will be long periods in which the signal will happen to be l, and then the decision maker will shift from one extreme state to the other. True, these periods will be rare, but they will happen. I cannot imagine a hardcore Republican voting for Obama after reading tons of articles in favor of Obama’s candidacy.
The bottom line: I believe that theoretical results on games played by boundedly rational players are mathematically interesting. Sometimes they provide interesting and surprising insights (in addition to the discussion above, see, for example, the recent paper by Bavly and Neyman). Sometimes, as often happens in economic theory, we stretch the relevance of our theoretical results too far. Yet I believe that many insights still wait for us in this area, and I hope that we will not see the end of this exciting topic.