You are currently browsing the monthly archive for November 2015.
Duppe and Weintraub date the birth of Economic Theory, at June 1949. It was the year in which Koopmans organized the Cowles Commission Activity Analysis Conference. It is also counted as conference Zero of the Mathematical Programming Symposium. I mention this because the connections between Economic Theory and Mathematical Programming and Operations Research had, at one time been very strong. The conference, for example, was conceived of by Tjalling Koopmans, Harold Kuhn, George Dantzig, Albert Tucker, Oskar Morgenstern, and Wassily Leontief with the support of the Rand corporation.
One of the last remaining links to this period who straddled, like a Colossus, both Economic Theory and Operations Research, Herbert Eli Scarf, passed away on November 15th, 2015.
Scarf came to Economics and Operations Research by way of Princeton’s mathematics department. Among his classmates was Gomory of the cutting plane method Milnor of topology fame and Shapley. Subsequently, he went on to Rand ( Dantzig, Bellman, Ford & Fulkerson). While there he met Samuel Karlin and Kenneth Arrow who introduced him to inventory theory. It was in this subject that Scarf made the first of many important contributions: the optimality of (S, s) polices. He would go on to establish equivalence of the core and competitive equilibrium (jointly with Debreu), identify a sufficient condition for non-emptiness of the core of a NTU game (now known as Scarf’s Lemma), anticipated the application of Groebner basis in integer programming (neighborhood systems) and of course his magnificent `Computation of Economic Equilibria’.
Exegi monumentum aere perennnius regalique situ pyramidum altius, quod non imber edax, non Aquilo impotens possit diruere aut innumerabilis annorum series et fuga temporum. Non omnis moriar…….
I have finished a monument more lasting than bronze and higher than the royal structure of the pyramids, which neither the destructive rain, nor wild North wind is able to destroy, nor the countless series of years and flight of ages. I will not wholly die………….
A Markov Decision Problem (MDP) is a model for sequential decision making, in which the underlying state of nature evolves in a stationary way. An MDP is given by a set of states S, an initial state s(0) in S, a set of available actions A(s) for each state s in S, and, for each state s in S and available actions a in A(s), a payoff r(s,a) and a probability distribution q(. | s,a) on S.
The process starts at the initial state s(0). At every stage n, the current state s(n) is known, the decision maker chooses an action a(n) in A(s(n)), receives the stage payoff r(s(n),a(n)), and a new state s(n+1) is chosen according to q(. | s(n),a(n)) and is told to the decision maker. The decision maker’s goal is to maximize the discounted sum of his stage payoffs:
sum-over-n-from-0-to-infty of λ-to-the-power-n times r(s(n),a(n)).
The value of the MDP, that is, the maximum that the decision maker can obtain, depends on the discount factor λ. Denote by v(λ) the value function of the MDP. Which functions can be obtained as the value function of some MDP with finite sets of states and actions?
From now on I restrict the discussion to MDP’s with finite sets of states and actions. Blackwell (1965) proved that for every discount factor λ the decision maker has a pure stationary optimal strategy. It is easy to see that the payoff that corresponds to a pure stationary optimal strategy is the solution of a set of equations, which are linear in λ, and whose coefficients are determined by the payoff function r and the transition function q. It follows that for every pure stationary strategy σ, the corresponding payoff function g(λ ; σ,s) is a rational function of λ. Since there are finitely many pure stationary strategies, we deduce that the value function is the maximum of finitely many rational functions.
Can we obtain any maximum of rational functions as the value function of some MDP? The answer is negative. For example, since the set of states and actions are finite, the payoff function r is bounded, say, by M. In particular, the payoff function of any strategy is bounded by M/(1-λ). In particular, any rational function whose denominator has a root inside the unit ball of the complex plane, or that has a root on the unit ball of the complex plane that has multiplicity larger than 1, cannot be the value function of an MDP.
Is that the only restriction that we have? The answer is still negative. It is not difficult to see that the roots of the denominator of the payoff function of a pure stationary strategy are the inverse of the eigenvalues of the transition matrix, which by a known result in matrix theory must be unit roots, that is, for any root ω of the denominator (which is a complex number) there is an integer k such that ω-to-the-power-k is equal to 1. Thus, a rational function whose denominator has a root that lies on the unit ball of the complex plane and is not a unit root cannot be the value function of an MDP.
Is that all? Yes. Let F be the set of all rational functions f : [0,1) → R that satisfy the following property: any root of the denominator either (a) lies outside the unit ball of the complex plane, or (b) lies on the unit ball of the complex plane, has multiplicity 1, and is a unit root. Let V be the set of all functions that are the maximum of finitely many functions in F. A function v is the value function of some MDP if and only if it is in V.
In this post I outlined one direction of the proof. Anyone who is interested in reading the construction that proves the other direction is referred to this paper.