You are currently browsing the category archive for the ‘Operations research’ category.

Duppe and Weintraub date the birth of Economic Theory,  at June 1949. It was the year in which Koopmans organized the Cowles Commission Activity Analysis Conference. It is also counted as conference Zero of the Mathematical Programming Symposium. I mention this because the connections between Economic Theory and Mathematical Programming and Operations Research had, at one time been very strong. The conference, for example, was conceived of by Tjalling Koopmans, Harold Kuhn, George Dantzig, Albert Tucker, Oskar Morgenstern, and Wassily Leontief with the support of the Rand corporation.

One of the last remaining links to this period who straddled, like a Colossus, both Economic Theory and Operations Research, Herbert Eli Scarf, passed away on November 15th, 2015.

Scarf came to Economics and Operations Research by way of Princeton’s mathematics department. Among his classmates was Gomory of the cutting plane method Milnor of topology fame and Shapley. Subsequently, he went on to  Rand ( Dantzig, Bellman, Ford & Fulkerson). While there he met Samuel Karlin and Kenneth Arrow who introduced him to inventory theory. It was in this subject that Scarf made the first of many important contributions: the optimality of (S, s) polices. He would go on to establish equivalence of the core and competitive equilibrium (jointly with Debreu), identify a sufficient condition for non-emptiness of the core of a NTU game (now known as Scarf’s Lemma), anticipated the application of Groebner basis in integer programming (neighborhood systems) and of course his magnificent `Computation of Economic Equilibria’.

Exegi monumentum aere perennnius regalique situ pyramidum altius, quod non imber edax, non Aquilo impotens possit diruere aut innumerabilis annorum series et fuga temporum. Non omnis moriar…….

I have finished a monument more lasting than bronze and higher than the royal structure of the pyramids, which neither the destructive rain, nor wild North wind is able to destroy, nor the countless series of years and flight of ages. I will not wholly die………….

Starr’s ’69 paper considered Walrasian equilibria in exchange economies with non-convex preferences i.e., upper contour sets of utility functions are non-convex. Suppose {n} agents and {m} goods with {n \geq m}. Starr identified a price vector {p^*} and a feasible allocation with the property that at most {m} agents did not receiving a utility maximizing bundle at the price vector {p^*}.

A poetic interlude. Arrow and Hahn’s book has a chapter that describes Starr’s work and closes with a couple of lines of Milton:

A gulf profound as that Serbonian Bog
Betwixt Damiata and Mount Casius old,
Where Armies whole have sunk.

Milton uses the word concave a couple of times in Paradise Lost to refer to the vault of heaven. Indeed the OED lists this as one of the poetic uses of concavity.

Now, back to brass tacks. Suppose {u_i} is agent {i}‘s utility function. Replace the upper contour sets associated with {u_i} for each {i} by its convex hull. Let {u^*_i} be the concave utility function associated with the convex hulls. Let {p^*} be the Walrasian equilibrium prices wrt {\{u^*_i\}_{i=1}^n}. Let {x^*_i} be the allocation to agent {i} in the associated Walrasian equilibrium.

For each agent {i} let

\displaystyle S^i = \arg \max \{u_i(x): p^* \cdot x \leq p^*\cdot e^i\}

where {e^i} is agent {i}‘s endowment. Denote by {w} the vector of total endowments and let {S^{n+1} = \{-w\}}.

Let {z^* = \sum_{i=1}^nx^*_i - w = 0} be the excess demand with respect to {p^*} and {\{u^*_i\}_{i=1}^n}. Notice that {z^*} is in the convex hull of the Minkowski sum of {\{S^1, \ldots, S^n, S^{n+1}\}}. By the Shapley-Folkman-Starr lemma we can find {x_i \in conv(S^i)} for {i = 1, \ldots, n}, such that {|\{i: x_i \in S^i\}| \geq n - m} and {0 = z^* = \sum_{i=1}^nx_i - w}.

When one recalls, that Walrasian equilibria can also be determined by maximizing a suitable weighted (the Negishi weights) sum of utilities over the set of feasible allocations, Starr’s result can be interpreted as a statement about approximating an optimization problem. I believe this was first articulated by Aubin and Elkeland (see their ’76 paper in Math of OR). As an illustration, consider the following problem :

\displaystyle \max \sum_{j=1}^nf_j(y_j)

subject to

\displaystyle Ay = b

\displaystyle y \geq 0

Call this problem {P}. Here {A} is an {m \times n} matrix with {n > m}.

For each {j} let {f^*_j(\cdot)} be the smallest concave function such that {f^*_j(t) \geq f_j(t)} for all {t \geq 0} (probably quasi-concave will do). Instead of solving problem {P}, solve problem {P^*} instead:

\displaystyle \max \sum_{j=1}^nf^*_j(y_j)

subject to

\displaystyle Ay = b

\displaystyle y \geq 0

The obvious question to be answered is how good an approximation is the solution to {P^*} to problem {P}. To answer it, let {e_j = \sup_t [f_j^*(t) - f_j(t)]} (where I leave you, the reader, to fill in the blanks about the appropriate domain). Each {e_j} measures how close {f_j^*} is to {f_j}. Sort the {e_j}‘s in decreasing orders. If {y^*} is an optimal solution to {P^*}, then following the idea in Starr’s ’69 paper we get:

\displaystyle \sum_{j=1}^nf_j(y^*_j) \geq \sum_{j=1}^nf^*_j(y^*_j)- \sum_{j=1}^me_j

Here is the question from Ross’ book that I posted last week

Question 1 We have two coins, a red one and a green one. When flipped, one lands heads with probability {P_1} and the other with probability {P_2}. Assume that {P_1>P_2}. We do not know which coin is the {P_1} coin. We initially attach probability {p} to the red coin being the {P_1} coin. We receive one dollar for each heads and our objective is to maximize the total expected discounted return with discount factor {\beta}. Find the optimal policy.

This is a dynamic programming problem where the state is the belief that the red coin is {P_1}. Every period we choose a coin to toss, get a reward and updated our state given the outcome. Before I give my solution let me explain why we can’t immediately invoke uncle Gittins.

In the classical bandit problem there are {n} arms and each arm {i} provides a reward from an unknown distribution {\theta_i\in\Delta([0,1])}. Bandit problems are used to model tradeoffs between exploitation and exploration: Every period we either exploit an arm about whose distribution we already have a good idea or explore another arm. The {\theta_i} are randomized independently according to distributions {\mu_i\in \Delta(\Delta([0,1]))}, and what we are interested in is the expected discounted reward. The optimization problem has a remarkable solution: choose in every period the arm with the largest Gittins index. Then update your belief about that arm using Bayes’ rule. The Gittins index is a function which attaches a number {G(\mu)} (the index) to every belief {\mu} about an arm. What is important is that the index of an arm {i} depends only on {\mu_i} — our current belief about the distribution of the arm — not on our beliefs about the distribution of the other arms.

The independence assumption means that we only learn about the distribution of the arm we are using. This assumption is not satisfied in the red coin green coin problem: If we toss the red coin and get heads then the probability that the green coin is {P_1} decreases. Googling `multi-armed bandit’ with `dependent arms’ I got some papers which I haven’t looked at carefully but my superficial impression is that they would not help here.

Here is my solution. Call the problem I started with `the difficult problem’ and consider a variant which I call `the easy problem’. Let {r=p/(p+\sqrt{p(1-p)}} so that {r^2/(1-r)^2=p/1-p}. In the easy problem there are again two coins but this time the red coin is {P_1} with probability {r} and {P_2} with probability {1-r} and, independently, the green coin is {P_1} with probability {(1-r)} and {P_2} with probability {r}. The easy problem is easy because it is a bandit problem. We have to keep track of beliefs {p_r} and {p_g} about the red coin and the green coin ({p_r} is the probability that the red coin is {P_1}), starting with {p_r={r}} and {p_g=(1-r)}, and when we toss the red coin we update {p_r} but keep {p_g} fixed. It is easy to see that the Gittins index of an arm is a monotone function of the belief that the arm is {P_1} so the optimal strategy is to play red when {p_r\ge p_g} and green when {p_g\ge p_r}. In particular, the optimal action in the first period is red when {p\ge 1/2} and green when {p\le 1/2}.

Now here comes the trick. Consider a general strategy {g} that assigns to every finite sequence of past actions and outcomes an action (red or green). Denote by {V_d(g)} and {V_e(g)} the rewards that {g} gives in the difficult and easy problems respectively. I claim that

\displaystyle \begin{array}{rcl} &V_e(g)=r(1-r) \cdot P_1/(1-\beta)+ \\ &r(1-r) \cdot P_2/(1-\beta) + (r^2+(1-r)^2) V_d(g).\end{array}

Why is that ? in the easy problem there is a probability {r(1-r)} that both coins are {P_1}. If this happens then every {g} gives payoff {P_1/(1-\beta)}. There is a probability {r(1-r)} that both coins are {P_2}. If this happens then every {g} gives payoff {P_2/(1-\beta)}. And there is a probability {r^2+(1-r)^2} that the coins are different, and, because of the choice of {r}, conditionally on this event the probability of {G} being {P_1} is {p}. Therefore, in this case {g} gives whatever {g} gives in the difficult problem.

So, the payoff in the easy problem is a linear function of the payoff in the difficult problem. Therefore the optimal strategy in the difficult problem is the same as the optimal strategy in the easy problem. In particular, we just proved that, for every {p}, the optimal action in the first period is red when {p\ge 1/2} and green with {p\le 1/2}. Now back to the dynamic programming formulation, from standard arguments it follows that the optimal strategy is to keep doing it forever, i.e., at every period to toss the coin that is more likely to be the {P_1} coin given the current information.

See why I said my solution is tricky and specific ? it relies on the fact that there are only two arms (the fact that the arms are coins is not important). Here is a problem whose solution I don’t know:

Question 2 Let {0 \le P_1 < P_2 < ... < P_n \le 1}. We are given {n} coins, one of each parameter, all {n!} possibilities equally likely. Each period we have to toss a coin and we get payoff {1} for Heads. What is the optimal strategy ?

It states that the Minkowski sum of a large number of sets is approximately convex. The clearest statement  as well as the nicest proof  I am familiar with is due to J. W. S. Cassels. Cassels is a distinguished number theorist who for many years taught the mathematical economics course in the Tripos. The lecture notes  are available in a slender book now published by Cambridge University Press.

This central limit like quality of the lemma is well beyond the capacity of a hewer of wood like myself. I prefer the more prosaic version.

Let {\{S^j\}_{j=1}^n} be a collection of sets in {\Re ^m} with {n > m}. Denote by {S} the Minkowski sum of the collection {\{S^i\}_{i=1}^n}. Then, every {x \in conv(S)} can be expressed as {\sum_{j=1}^nx^j} where {x^j \in conv(S^j)} for all {j = 1,\ldots, n} and {|\{j: x^j \not \in S^j| \leq m}.

How might this be useful? Let {A} be an {m \times n} 0-1 matrix and {b \in \Re^m} with {n > m}. Consider the problem

\displaystyle \max \{cx: Ax = b, x_j \in \{0,1\}\ \forall \,\, j = 1, \ldots, n\}.

Let {x^*} be a solution to the linear relaxation of this problem. Then, the lemma yields the existence of a 0-1 vector {x} such that {cx \geq cx^* = z} and {||Ax - b||_{\infty} \leq m}. One can get a bound in terms of Euclidean distance as well.

How does one do this? Denote each column {j} of the {A} matrix by {a^j} and let {d^j = (c_j, a^j)}. Let {S^j = \{d^j, 0\}}. Because {z = cx^*} and {b = Ax^*} it follows that {(z,b) \in conv(S)}. Thus, by the Lemma,

\displaystyle (z, b) = \sum_{j=1}^n(c_j, a^j)y_j

where each {y_j \in [0,1]} and {|\{j: y_j \in (0,1) \}| \leq m }. In words, {y} has at most {m} fractional components. Now construct a 0-1 vector {y^*} from {y} as follows. If {y_j \in \{0,1\}}, set {y^*_j = y_j}. If {y_j } is fractional, round {y^*_j} upto 1 with probability {y_j} and down to zero otherwise. Observe that {||Ay - b||_{\infty} \leq m} and the {E(cy) = cx^*}. Hence, there must exist a 0-1 vector {x} with the claimed properties.

The error bound of {m} is to large for many applications. This is a consequence of the generality of the lemma. It makes no use of any structure encoded in the {A} matrix. For example, suppose x^* were an extreme point and A a totally unimodular matrix. Then, the number of fractional components of $x^*$ are zero. The rounding methods of Kiralyi, Lau and Singh as well as of Kumar, Marathe, Parthasarthy and Srinivasan exploit the structure of the matrix. In fact both use an idea that one can find in Cassel’s paper. I’ll follow the treatment in Kumar et. al.

As before we start with {x^*}. For convenience suppose {0 < x^*_j < 1} for all {j = 1, \ldots, n}. As {A} as has more columns then rows, there must be a non-zero vector {r} in the kernel of {A}, i.e., {Ar = 0}. Consider {x + \alpha r} and {x -\beta r}. For {\alpha > 0} and {\beta > 0} sufficiently small, {x_j + \alpha r_j, x_j - \beta r_j \in [0,1]} for all {j}. Increase {\alpha} and {\beta} until the first time at least one component of {x +\alpha r} and {x- \beta r} is in {\{0,1\}}. Next select the vector {x + \alpha r} with probability {\frac{\beta}{\alpha + \beta}} or the vector {x- \beta r} with probability {\frac{\alpha}{\alpha + \beta}}. Call the vector selected {x^1}.

Now {Ax^1 = b}. Furthermore, {x^1} has at least one more integer component than {x^*}. Let {J = \{j: x^1_j \in (0,1)\}}. Let {A^J} be the matrix consisting only of the columns in {J} and {x^1(J)} consist only of the components of {x^1} in {J}. Consider the system {A^Jx^1(J) = b - \sum_{j \not \in J}x^1_j}. As long as {A^J} has more columns then rows we can repeat the same argument as above. This iterative procedure gives us the same rounding result as the Lemma. However, one can do better, because it may be that even when the number of columns of the matrix is less than the number of rows, the system may be under-determined and therefore the null space is non-empty.

In a sequel, I’ll describe an optimization version of the Lemma that was implicit in Starr’s 1969 Econometrica paper on equilibria in economies with non-convexities.

Here is the bonus question from the final exam in my dynamic optimization class of last semester. It is based on problem 8 Chapter II in Ross’ book Introduction to stochastic dynamic programming. It appears there as `guess the optimal policy’ without asking for proof. The question seems very natural, but I couldn’t find any information about it (nor apparently the students). I have a solution but it is tricky and too specific to this problem. I will describe my solution next week but perhaps somebody can show me a better solution or a reference ?

 

We have two coins, a red one and a green one. When flipped, one lands heads with probability P1 and the other with probability P2. We do not know which coin is the P1 coin. We initially attach probability p to the red coin being the P1 coin. We receive one dollar for each heads and our objective is to maximize the total expected discounted return with discount factor β. Describe the optimal policy, including a proof of optimality.

The last of the trio, Harold Kuhn, passed away on July 2nd, 2014. Upon hearing the news, I was moved to dig up some old lecture notes of Kuhn’s in which KTK is stated an proved. I’ve been carrying them around with me since 1981. From the condition they are in, this must have been the last time I looked at them. With good reason, for as I re-read them, it dawned upon me how much of them I had absorbed and taken to be my own thoughts. Kuhn motivates the KTK theorem by replacing the non-linear functions by their first order Taylor approximations. This turns the exercise into a linear program. The LP duality theorem suggests the theorem to be proved, and the separating hyperplane theorem does the rest. For details see the relevant chapter of my book. The notes go on to describe Kuhn and Tucker’s excitement and subsequent despair as they uncover a counterexample and the need for a constraint qualification.

William Karush, who passed in 1997, had arrived at the same theorem many years earlier in his 1939 University of Chicago Masters Thesis (Kuhn-Tucker is 1951). When Kuhn learned of Karush’s contribution through a reading of Takayama’s book on Mathematical Economics. Upon doing so he wrote Karush:

In March I am talking at an AMS Symposium on “Nonlinear Programming – A Historical View.” Last summer I learned through reading Takayama’s Mathematical Economics of your 1939 Master’s Thesis and have obtained a copy. First, let me say that you have clear priority on the results known as the Kuhn–Tucker conditions (including the constraint qualification). I intend to set the record as straight as I can in my talk.

The missive closes with this paragraph:

Dick Cottle, who organized the session, has been told of my plans to rewrite history and says `you must be a saint’ not to complain about the absence of recognition. Al Tucker remembers you from RAND, wonders why you never called this to his attention and sends his best regards.

Karush’s reply, 6 days later, equally gracious:

Thank you for your most gracious letter. I appreciate your thoughtfulness in wanting to draw attention to my early work. If you ask why I did not bring up the matter of priority before, perhaps the answer lies in what is now happening – I am not only going to get credit for my work, but I am going to crowned a “saint”.

 

In 1937, representatives of the Plywood trust called upon Comrade Professor Leonid Vitalievich Kantorovich with a problem. The trust produced 5 varieties of plywood using 8 different machines. How, they asked, should they allocate their limited supply of raw materials to the various machines so as to produce the maximum output of plywood in the required proportions? As problems go, it was, from this remove unremarkable. Remarkable is that the Comrade Professor agreed to take it on. The so called representatives might have been NKVD. Why? Uncle Joe’s first act upon taking power in 1929 was to purge the economists, or more precisely the Jewish ones. This was well before the purge of the communist party in 1936. Why the economists? They complained about waste in a planned economy `dizzy with success.’ Yet, here were the apparatchiks of the Trust asking the Comrade Professor to reduce waste.

Kantorovich writes, that at the time he was burnt out by pure mathematics. Combined with a concern at the rise of Hitler, he felt compelled to do something practical. And, so he turned his mind to the problem of the Plywood Trust. Frances Spufford, in his delightful work of `faction’ called Red Plenty, imagines what Kantorovich might have been thinking.

He had thought about ways to distinguish between better answers and worse answers to questions which had no right answer. He had seen a method which could do what the detective work of conventional algebra could not, in situations like the one the Plywood Trust described, and would trick impossibility into disclosing useful knowledge. The method depended on measuring each machine’s output of one plywood in terms of all the other plywoods it could have made.

If he was right — and he was sure he was, in essentials — then anyone applying the new method to any production situation in the huge family of situations resembling the one at the Plywood Trust, should be able to count on a measureable percentage improvement in the quantity of product they got from a given amount of raw materials. Or you could put that the other way around: they would make a measureable percentage saving on the raw materials they needed to make a given amount of product.

He didn’t know yet what sort of percentage he was talking about, but just suppose it was 3%. It might not sound like much, only a marginal gain, an abstemious eking out of a little bit more from the production process, at a time when all the newspapers showed miners ripping into fat mountains of solid metal, and the output of plants booming 50%, 75%, 150%. But it was predictable. You could count on the extra 3% year after year. Above all it was free. It would come merely by organising a little differently the tasks people were doing already. It was 3% of extra order snatched out of the grasp of entropy. In the face of the patched and mended cosmos, always crumbling of its own accord, always trying to fall down, it built; it gained 3% more of what humanity wanted, free and clear, just as a reward for thought. Moreover, he thought, its applications did not stop with individual factories, with getting 3% more plywood, or 3% more gun barrels, or 3% more wardrobes. If you could maximise, minimise, optimise the collection of machines at the Plywood Trust, why couldn’t you optimise a collection of factories, treating each of them, one level further up, as an equation? You could tune a factory, then tune a group of factories, till they hummed, till they purred. And that meant —

An english description of Kantorovich’s appeared in the July 1960 issue of Management Science. The opening line of the paper is:

The immense tasks laid down in the plan for the third Five Year Plan period require that we achieve the highest possible production on the basis of the optimum utilization of the existing reserves of industry: materials, labor and equipment.

The paper contains a formulation of the Plywood Trust’s problem as a linear program. A recognition of the existence of an optimal solution at an extreme point as well as the hopelessness of enumerating extreme as a solution method. Kantorovich then goes on to propose his method, which he calls the method of resolving multipliers. Essentially, Kantorovich proposes that one solve the dual and then use complementary slackness to recover the primal. One might wonder how Kantorovich’s contribution differs from the contributions of Koopmans and Dantzig. That is another story and as fair a description of the issues as I know can be found in Roy Gardner’s 1990 piece in the Journal of Economic Literature. I reproduce one choice remark:

Thus, the situation of Kantorovich is rather like that of the discoverer Columbus. He really never touched the American mainland, and he didn’t give its name, but he was the first one in the area.

As an aside, David Gale is the one often forgotten in this discussion. If the Nobel committee has awarded the prize for Linear Programming, Dantzig and Gale would have been included. Had Gale lived long enough, he might have won it again for matching making him the third to have won the prize twice in the same subject. The others are John Bardeen and Frederick Sanger.

Continuing with Spufford’s imaginings:

— and that meant that you could surely apply the method to the entire Soviet economy, he thought. He could see that this would not be possible under capitalism, where all the factories had separate owners, locked in wasteful competition with one another. There, nobody was in a position to think systematically. The capitalists would not be willing to share information about their operations; what would be in it for them? That was why capitalism was blind, why it groped and blundered. It was like an organism without a brain. But here it was possible to plan for the whole system at once. The economy was a clean sheet of paper on which reason was writing. So why not optimise it? All he would have to do was to persuade the appropriate authorities to listen.

Implementation of Kantorovich’s solution at the Plywood trust led to success. Inspired, Kantorovich sent a letter to Gosplan urging adoption of his methods. Here the fact that Kantorovich solved the dual first rather than the primal is important. Kantorovich interpreted his resolving multipliers (shadow prices today) as objectively determined prices. Kantorovich’s letter to Gosplan urged a replacement of the price system in place by his resolving multipliers. Kantorovich intended to implement optimal production plans through appropriate pieces. Gosplan, responded that reform was unecessary. Kantorovich narrowly missed a trip to the Gulag and stopped practicing Economics, for a while. Readers wanting a fuller sense of what mathematical life was like in this period should consult this piece by G. G. Lorentz.

After the war, Kantorovich took up linear programming again. At Lenningrad, he headed a team to reduce scrap metal produced at the Egorov railroad-car plant. The resulting reduction in waste reduced the supply of scrap iron for steel mills disrupting their production! Kantorovich escaped punishment by the Leningrad regional party because of his work on atomic reactors.

Kantorovich’s interpretation of resolving multipliers which he renamed as objectively determined valuations put him at odds with the prevailing labor theory of value. In the post Stalin era, he was criticized for being under the sway of Bohm-Bawerk, author of the notion of subjective utility. Aron Katsenelinboigen, relates a joke played by one of these critics on Kantorovich. A production problem was presented to Kantorovich where the labor supply constraint would be slack at optimality. Its `objectively determined valuation’ was therefore zero, contradicting the labor theory of value.

Nevertheless, Kantorovich survived. This last verse from the Ballard of L. V. Kantorvich authored by Josph Lakhman explains why:

Then came a big scholar with a solution.
Alas, too clever a solution.
`Objectively determined valuations’-
That’s the panacea for each and every doubt!
Truth be told, the scholar got his knukcles rapped
Slightly rapped
For such an unusual advice
That threatened to overturn the existing order.
After some thought, however, the conclusion was reached
That the valuations had been undervalued

 

This is the first of a series of posts about stability and equilibrium in trading networks. I will review and recall established results from network flows and point out how they immediately yield results about equilibria, stability and the core of matching markets with quasi-linear utility. It presumes familiarity with optimization and the recent spate of papers on matchings with contracts.

The simplest trading network one might imagine would involve buyers ({B}) and sellers ({S}) of a homogenous good and a set of edges {E} between them. No edges between sellers and no edges between buyers. The absence of an edge in {E} linking {i \in B} and {j \in S} means that {i} and {j} cannot trade directly. Suppose buyer {i \in B} has a constant marginal value of {v_i} upto some amount {d_i} and zero thereafter. Seller {j \in S} has a constant marginal cost of {c_j} upto some capacity {s_j} and infinity thereafter.

Under the quasi-linear assumption, the problem of finding the efficient set of trades to execute can be formulated as a linear program. Let {x_{ij}} for {(i,j) \in E} denote the amount of the good purchased by buyer {i \in B} from seller {j \in S}. Then, the following program identifies the efficient allocation:

\displaystyle \max \sum_{(i,j) \in E} (v_i - c_j)x_{ij}

subject to

\displaystyle \sum_{j \in S: (i,j) \in E}x_{ij} \leq d_i\,\, \forall i \in B

\displaystyle \sum_{i \in B:(i,j) \in E}x_{ij} \leq s_j\,\, \forall j \in S

\displaystyle x_{ij} \geq 0\,\, (i,j) \in E

This is, of course, an instance of the (discrete) transportation problem. The general version of the transportation problem can be obtained by replacing each coefficient of the objective function by arbitrary numbers {w_{ij}}. This version of the transportation problem is credited to the mathematician F. J. Hitchcock and published in 1941. Hitchcock’s most famous student is Claude Shannon.

The `continuous’ version of the transportation problem was formulated by Gaspard Monge and described in his 1781 paper on the subject. His problem was to split two equally large volumes (representing the initial location and the final location of the earth to be shipped) into infinitely many small particles and then `match them with each other so that the sum of the products of the lengths of the paths used by the particles and the volume of the particles is minimized. The {w_{ij}}‘s in Monge’s problem have a property since called the Monge property, that is the same as submodularity/supermodularity. This paper describes the property and some of its algorithmic implications. Monge’s formulation was subsequently picked up by Kantorovich and the study of it blossomed into the specialty now called optimal transport with applications to PDEs and concentration of measure. That is not the thread I will follow here.

Returning to the Hitchcock, or rather discrete, formulation of the transportation problem let {p_j} be the dual variables associated with the first set of constraints (the supply side) and {\lambda_i} the dual variables associated with the second or demand set of constraints. The dual is

\displaystyle \min \sum_{j \in S} s_jp_j + \sum_{i \in B}d_i\lambda_i

subject to

\displaystyle p_j + \lambda_i \geq [v_i-c_j]\,\, \forall (i,j) \in E

\displaystyle p_j, \lambda_i \geq 0\,\, \forall j \in S, i \in B

We can interpret {p_j} as the unit price of the good sourced from seller {j} and {\lambda_i} as the surplus that buyer {i} will enjoy at prices {\{p_j\}_{j \in S}}. Three things are immediate from the duality theorem, complementary slackness and dual feasibility.

  1. If {x^*} is a solution to the primal and {(p^*, \lambda^*)} an optimal solution to the dual, then, the pair {(x^*, p^*)} form a Walrasian equilibrium.
  2. The set of optimal dual prices, i.e., Walrasian prices live in a lattice.
  3. The dual is a (compact) representation of the TU (transferable utility) core of the co-operative game associated with this economy.
  4. Suppose the only bilateral contracts we allow between buyer {i} and seller {j} are when {(i,j) \in E}. Furthermore, a contract can specify only a quantity to be shipped and price to be paid. Then, we can interpret the set of optimal primal and dual solutions to be the set of contracts that cannot be blocked (suitably defined) by any buyer seller pair {(i,j) \in E}.
  5. Because the constraint matrix of the transportation problem is totally unimodular, the previous statements hold even if the goods are indivisible.

As these are standard, I will not reprove them here. Note also, that none of these conclusions depend upon the particular form of the coefficients in the objective function of the primal. We could replace {[v_i - c_j]} by {w_{ij}} where we interpret {w_{ij}} to be the joint gain gains from trade (per unit) to be shared by buyer {i} and seller {j}.

Now, suppose we replace constant marginal values by increasing concave utility functions, {\{U_i(\cdot)\}_{i \in B}} and constant marginal costs by {\{C_j (\cdot)\}_{j \in S}}? The problem of finding the efficient allocation becomes:

\displaystyle \max \sum_{i \in B}U_i(\sum_{j: (i,j) \in E}x_{ij}) - \sum_{j \in S}C_j(\sum_{i: (i,j) \in E}x_{ij})

subject to

\displaystyle \sum_{j \in S: (i,j) \in E}x_{ij} \leq d_i\,\, \forall i \in B

\displaystyle \sum_{i \in B:(i,j) \in E}x_{ij} \leq s_j\,\, \forall j \in S

\displaystyle x_{ij} \geq 0\,\, (i,j) \in E

This is an instance of a concave flow problem. The Kuhn-Tucker-Karush conditions yield the following:

  1. If {x^*} is a solution to the primal and {(p^*, \lambda^*)} an optimal Lagrangean, then, the pair {(x^*, p^*)} form a Walrasian equilibrium.
  2. The set of optimal Lagrange prices, i.e., Walrasian prices live in a lattice.
  3. Suppose the only bilateral contracts we allow between buyer {i} and seller {j} are when {(i,j) \in E}. Furthermore, a contract can specify only a quantity to be shipped and price to be paid. Then, we can interpret the set of optimal primal and dual solutions to be the set of contracts that cannot be blocked (suitably defined) by any buyer seller pair {(i,j) \in E}.

Notice, we lose the extension to indivisibility. As the objective function in the primal is now concave, an optimal solution to the primal may occur in the interior of the feasible region rather than at an extreme point. To recover `integrality’ we need to impose a stronger condition on {\{U_i\}_{i \in B}} and {\{C_j\}_{j \in S}}, specifically, they be {M}-concave and convex respectively. This is a condition tied closely to the gross substitutes condition. More on this in a subsequent post.

In an earlier pair of posts I discussed a class of combinatorial auctions when agents have binary quadratic valuations. To formulate the problem of finding a welfare maximizing allocation let {x^k_j = 1} if object {j \in M} is given to agent {k \in N} and zero otherwise. Denote the utility of agent {k \in N} from consuming bundle {S \subseteq M} by

\displaystyle u_k(S) = \sum_{i \in S}u^k_i + \sum_{i, j \in S}w^k_{ij}.

The problem of maximizing total welfare is

\displaystyle \max \sum_{k \in N}\sum_{i \in M}u^k_ix^k_i + \sum_{k \in N}\sum_{i \neq j}w^k_{ij}x^k_ix^k_j

subject to

\displaystyle \sum_{k \in N}x^k_i \leq 1\,\, \forall i \in M

\displaystyle x^k_i \in \{0,1\}\,\, \forall i \in M, k \in N

I remarked that Candogan, Ozdaglar and Parrilo (2013) identified a solvable instance of the welfare maximization problem. They impose two conditions. The first is called sign consistency. For each {i,j \in M}, the sign of {w^k_{ij} } and {w^r_{ij}} for any {k, r \in N} is the same. Furthermore, this applies to all pairs {i, j \in M}.

Let {G^w} be a graph with vertex set {M} and for any {i,j \in M} such that {w^k_{ij} \neq 0} introduce an edge {(i,j)}. Because of the sign consistency condition we can label the edges of {G^w} as being positive or negative depending on the sign of {w^k_{ij}}. Let {E^+ = \{(i,j): w^k_{ij} \geq 0\}} and {E^- = \{(i,j): w^k_{ij} < 0\}}. The second condition is that {G^w} be a tree.

The following is the relaxation that they consider:

\displaystyle \max \sum_{k \in N}\sum_{i \in M}u^k_ix^k_i + \sum_{k \in N}\sum_{(i,j) \in E^+ \cup E^-}w^k_{ij}z^k_{ij}

subject to

\displaystyle \sum_{k \in N}x^k_i \leq 1\,\, \forall i \in M

\displaystyle z^k_{ij} \leq x^k_i, x^k_j\,\, \forall k \in N, (i,j) \in E^+

\displaystyle z^k_{ij} \geq x^k_i + x^k_j - 1\,\, \forall k \in N, (i,j) \in E^-

\displaystyle x^k_i, z^k_{ij} \in \{0,1\}\,\, \forall i \in M, k \in N

Denote by {P} the polyhedron of feasible solutions to the last program. I give a new proof of the fact that the extreme points of {P} are integral. My thanks to Ozan Candogan for (1) patiently going through a number of failed proofs and (2) being kind enough not to say :“why the bleep don’t you just read the proof we have.”

Let {{\cal C}} be the maximal connected components of {G^w} after deletion of the edges in {E^-} (call this {G^+}). The proof will be by induction on {|{\cal C}|}. The case {|{\cal C}| = 1} follows from total unimodularity. I prove this later.

Suppose {|{\cal C}| > 1}. Let {({\bar z}, {\bar x})} be an optimal solution to our linear program. We can choose {({\bar z}, {\bar x})} to be an extreme point of {P}. As {G^w} is a tree, there must exist a {C \in {\cal C}} incident to exactly one negative edge, say {(p,q)}. Denote by {P'} the polyhedron {P} restricted to just the vertices of {C} and by {Q} the polyhedron {P} restricted to just the vertices in the complement of {C}. By the induction hypothesis, both {P'} and {Q} are integral polyhedrons. Each extreme point of {P'} ({Q}) assigns a vertex of {C} (the complement of {C}) to a particular agent. Let {X_1, X_2, \ldots, X_a} be the set of extreme points of {P'}. If in extreme point {X_r}, vertex {p} is assigned to agent {k} we write {X_{rk} = 1} and zero otherwise. Similarly with the extreme points {Y_1, Y_2, \ldots, Y_b} of {Q}. Thus, {Y_{rk} = 1} is {Y_r} assigns vertex {q} to agent {k}. Let {v(X_{r})} be the objective function value of the assignment {X_r}, similarly with {v(Y_{rk})}.

Now {({\bar z}, {\bar x})} restricted to {P'} can be expressed as {\sum_r\lambda_{r}X_{r}}. Similarly, {({\bar z}, {\bar x})} restricted to {Q} can be expressed as {\sum_r\mu_{r}Y_{r}}. We can now reformulate our linear program as follows:

\displaystyle \max \sum_r\lambda_{r}v(X_{r}) + \sum_r\mu_{r}v(Y_{r}) -\sum_{k \in N} |w^k_{pq}|y^k_{pq}

subject to

\displaystyle -\sum_r\lambda_{rk} = -1

\displaystyle -\sum_r\mu_{rk} = -1

\displaystyle \sum_{r: X_{rk} = 1}\lambda_{r}X_{r} + \sum_{r: Y_{rk} = 1}\mu_{r}Y_{r} \leq y^k_{pq} + 1\,\, \forall k \in N

\displaystyle \lambda_{r}, \mu_{r}, y^k_{pq} \geq 0\,\, \forall r, k

The constraint matrix of this last program is totally unimodular. This follows from the fact that each variable appears in at most two constraints with coefficients of opposite sign and absolute value 1 (this is because {X_{rk}} and {X_{rk'}} cannot both be 1, similarly with the {Y}‘s). Total unimodularity implies that the last program has integral optimal solution and we are done. In fact, I believe the argument can be easily modified to to the case where in {G^w} every cycle must contain a positive even number of negative edges.

Return to the case {|{\cal C}| = 1}. Consider the polyhedron {P} restricted to just one {C \in {\cal C}}. It will have the form:

\displaystyle \sum_{k \in N}x^k_i \leq 1\,\, \forall i \in C

\displaystyle z^k_{ij} \leq x^k_i, x^k_j\,\, \forall k \in N, (i,j) \in E^+ \cap C

\displaystyle x^k_i, z^k_{ij} \in \{0,1\}\,\, \forall i \in C, k \in N

Notice the absence of negative edges. To establish total unimodularity we use the Ghouila-Houri (GH) theorem. Fix any subset, {S}, of rows/constraints. The goal is to partition them into two sets {L} and {R} so that column by column the difference in the sum of the non-zero entries in {L} and and the sum of the nonzero entries in {R} differ by at most one.

Observe that the rows associated with constraints {\sum_{k \in N}x^k_i \leq 1} are disjoint, so we are free to partition them in any way we like. Fix a partition of these rows. We must show to partition the remaining rows to satisfy the GH theorem. If {y^k_{ij} - x^k_i \leq 0} is present in {S} but {y^k_{ij} -x^k_j \leq 0} is absent (or vice-versa), we are free to assign the row associated with {y^k_{ij} - x^k_i \leq 0} in any way to satisfy the GH theorem. The difficulty will arise when both {y^k_{ij}}, {x^k_i} and {x^k_j} are present in {S}. To ensure that the GH theorem is satisfied we may have to ensure that the rows associated with {y^k_{ij} - x^k_i \leq 0} and {y^k_{ij} -x^k_j \leq 0} be separated.

When {S} is the set of all constraints we show how to find a partition that satisfies the GH theorem. We build this partition by sequentially assigning rows to {L} and {R} making sure that after each assignment the conditions of the GH theorem are satisfied for the rows that have been assigned. It will be clear that this procedure can also be applied when only a subset of constraints are present (indeed, satisfying the GH theorem will be easier in this case).

Fix an agent {k \in N}. The following procedure will be repeated for each agent in turn. Pick an arbitrary vertex in {C} (which is a tree) to be a root and direct all edges `away’ from the root (when {S} is a subset of the constraints we delete from {C} any edge {(i,j)} in which at most one from the pair {y^k_{ij} - x^k_i \leq 0} and {y^k_{ij} -x^k_j \leq 0} appears in {S}) . Label the root {L}. Label all its neighbors {R}, label the neighbors of the neighbors {L} and so on. If vertex {i \in C} was labeled {L} assign the row {\sum_{k \in N}x^k_i \leq 1} to the set {L} otherwise to the row {R}. This produces a partition of the constraints of the form {\sum_{k \in N}x^k_i \leq 1} satisfying GH.

Initially, all leaves and edges of {C} are unmarked. Trace out a path from the root to one of the leaves of {C} and mark that leaf. Each unmarked directed edge {(i,j)} on this path corresponds to the pair {y^k_{ij} - x^k_i \leq 0} and {y^k_{ij} -x^k_j \leq 0}. Assign {y^k_{ij} - x^k_i \leq 0} to the same set that is the label of {i}. Assign {y^k_{ij} - x^k_j \leq 0} to the same set that is the label of vertex {j}. Notice that in making this assignment the conditions of the GH theorem continues to satisfied. Mark the edge {(i,j)}. If we repeat this procedure again with another path from the root to an unmarked leaf, we will violate the GH theorem. To see why suppose the tree contains edge {(i,j)} as well as {(i,t)}. Suppose {i} was labeled {L} on the first iteration and {(i,j)} was marked. This means {y^k_{ij} - x^k_{i} \leq 0} was assigned to {L}. Subsequently {y^k_{it} - x^k_i \leq 0} will also be assigned to {L} which will produce a partition that violates the GH theorem. We can avoid this problem by flipping the labels on all the vertices before repeating the path tracing procedure.

What is the institutional detail that makes electricity special? Its in the physics that I will summarize with a model of DC current in a resistive network. Note that other sources, like Wikipedia give other reasons, for why electricity is special:

Electricity is by its nature difficult to store and has to be available on demand. Consequently, unlike other products, it is not possible, under normal operating conditions, to keep it in stock, ration it or have customers queue for it. Furthermore, demand and supply vary continuously. There is therefore a physical requirement for a controlling agency, the transmission system operator, to coordinate the dispatch of generating units to meet the expected demand of the system across the transmission grid.

I’m skeptical. To see why, replace electricity by air travel.

Let {V} be the set of vertices and {E^*} the set of edges a the network. It will be convenient in what follows to assign (arbitrarily) an orientation to each edge in {E^*}. Let {E} be the set of directed arcs that result. Hence, {(i,j) \in E} mens that the edge {(i,j)} is directed from {i} to {j}. Notice, if {(i,j) \in E}, then {(i,j) \not \in E}.

Associated with each {(i,j) \in E} is a number {x_{ij}} that we interpret as a flow of electricity. If {x_{ij} > 0} we interpret this to be a flow from {i} to {j}. If {x_{ij} < 0} we interpret this as a flow from {j} to {i}.

  1. Let {\rho_{ij}} is the resistance on link {(i,j)}.
  2. {c_i} unit cost of injecting current into node {i}.
  3. {v_i} marginal value of current consumed at node {i}.
  4. {d_i} amount of current consumed at node {i}.
  5. {s_i} amount of current injected at node {i}.
  6. {K_{ij}} capacity of link {(i,j)}.

Current must satisfy two conditions. The first is conservation of flow at each node:

\displaystyle s_i + \sum_{k: (k,i) \in E}x_{ji} = \sum_{j: (i,j) \in E}x_{ij} + d_i\,\, \forall i \in V

The second is Ohm’s law. There exist node potentials {\{\phi_i\}_{i \in V}} such that

\displaystyle \rho_{ij}x_{ij} = \phi_i - \phi_j\,\, \forall (i,j) \in E.

Using this systems equations one can derive the school boy rules for computing the resistance of a network (add them in series, add the reciprocals in parallel). At the end of this post is a digression that shows how to formulate the problem of finding a flow that satisfies Ohm’s law as an optimization problem. Its not relevant for the economics, but charming nonetheless.

At each node {i \in V} there is a power supplier with constant marginal cost of production of {c_i} upto {S_i} units. At each {i \in V} there is a consumer with constant marginal value of {v_i} upto {D_i} units. A natural optimization problem to consider is

\displaystyle \max \sum_{i \in V}[v_id_i - c_is_i]

subject to

\displaystyle \sum_{j: (i,j) \in E}x_{ij} -\sum_{j: (j,i) \in E}x_{ji} - s_i + d_i= 0\,\, \forall i \in V

\displaystyle \rho_{ij}x_{ij} = \mu_i - \mu_j\,\, \forall (i,j) \in E

\displaystyle -K_{ij} \leq x_{ij} \leq K_{ij}\,\, \forall (i,j) \in E

\displaystyle 0 \leq s_i \leq S_i\,\, \forall i \in V

\displaystyle 0 \leq d_i \leq D_i\,\, \forall i \in V

This is the problem of finding a flow that maximizes surplus.

Let {{\cal C}} be the set of cycles in {(V, E^*)}. Observe that each {C \in {\cal C}} corresponds to a cycle in {(V, E)} if we ignore the orientation of the edges. For each cycle {C \in {\cal C}}, let {C^+} denote the edges in {E} that are traversed in accordance with their orientation. Let {C^-} be the set of edges in {C} that are traversed in the opposing orientation.

We can project out the {\phi} variables and reformulate as

\displaystyle \max \sum_{i \in V}[v_id_i - c_is_i]

subject to

\displaystyle \sum_{j: (i,j) \in E}x_{ij} -\sum_{j: (j,i) \in E}x_{ji} - s_i + d_i= 0\,\, \forall i \in V

\displaystyle \sum_{(i,j) \in C^+}\rho_{ij}x_{ij} - \sum_{(i,j) \in C^-}\rho_{ij}x_{ij} = 0\,\, \forall \,\, C \in {\cal C}

\displaystyle -K_{ij} \leq x_{ij} \leq K_{ij}\,\, \forall (i,j) \in E

\displaystyle 0 \leq s_i \leq S_i\,\, \forall i \in V

\displaystyle 0 \leq d_i \leq D_i\,\, \forall i \in V

Recall the scenario we ended with in part 1. Let {V = \{1, 2, 3\}}, {E = \{(1,3), (1,2), (2,3)\}} and in addition suppose {\rho_{ij} =1} for all {(i,j)}. Only {(1,3)} has a capacity constraint of 600. Let {D_1 = D_2 = 0} and {S_3 = 0}. Also {c_1 = 20} and {c_2 = 40} and each have unlimited capacity. At node 3, the marginal value is {V > 40} upto 1500 units and zero thereafter. The optimization problem is

\displaystyle \max Vd_3 - 20s_1 - 40 s_2

subject to

\displaystyle x_{12} + x_{13} - s_1 = 0

\displaystyle x_{23} - s_2 - x_{12} = 0

\displaystyle d_3 - x_{13} - x_{23} = 0

\displaystyle x_{13} - x_{23} - x_{12} = 0

\displaystyle -600 \leq x_{13} \leq 600

\displaystyle 0 \leq d_3 \leq 1500

Notice, for every unit of flow sent along {(1,3)}, half a unit of flow must be sent along {(1,2)} and {(2,3)} as well to satisfy the cycle flow constraint.

The solution to this problem is {x_{13} = 600}, {x_{12} = -300}, {x_{23} = 900}, {s_1 = 300}, {s_2 = 1200} and {d_3 = 1500}. What is remarkable about this not all of customer 3’s demand is met by the lowest cost producer even though that producer has unlimited capacity. Why is this? The intuitive solution would have been send 600 units along {(1,3)} and 900 units along {(1,2) \rightarrow (2,3)}. This flow violates the cycle constraint.

In this example, when generator 1 injects electricity into the network to serve customer 3’s demand, a positive amount of that electricity must flow along every path from 1 to 3 in specific proportions. The same is true for generator 2. Thus, generator 1 is unable to supply all of customer 3’s demands. However, to accommodate generator 2, it must actually reduce its flow! Hence, customer 3 cannot contract with generators 1 and 2 independently to supply power. The shared infrastructure requires that they co-ordinate what they inject into the system. This need for coordination is the argument for a clearing house not just to manage the network but to match supply with demand. This is the argument for why electricity markets must be designed.

The externalities caused by electricity flows is not a proof that a clearing house is needed. After all, we know that if we price the externalities properly we should be able to implement the efficient outcome. Let us examine what prices might be needed by looking at the dual to the surplus maximization problem.

Let {y_i} be the dual variable associated with the flow balance constraint. Let {\lambda_C} be associated with the cycle constraints. Let {\nu_i} and {\theta_i} be associated with link capacity constraints. Let {\mu_i} and {\sigma_i} be associated with the remaining tow constraints. These can be interpreted as the profit of supplier {i} and the surplus of customer {i} respectively. For completeness the dual would be:

\displaystyle \min \sum_{(i,j) \in E}[\nu_{ij} + \theta_{ij}]K_{ij} + \sum_{i \in V}[S_i \mu_i + D_i \sigma_i]

subject to

\displaystyle -\theta_{ij} + \nu_{ij} + \rho_{ij}\sum_{C^+ \ni (i,j)}\lambda_C - \rho_{ij}\sum_{C^- \ni (i,j)}\lambda_C + y_i - y_j = 0\,\, \forall (i,j) \in E

\displaystyle \mu_i - y_i \geq -c_i\,\, \forall i \in V

\displaystyle \sigma_i + y_i \geq v_i\,\, \forall i \in V

\displaystyle \nu_{ij}, \theta_{ij}, \mu_i, \sigma_i \geq 0\,\, \forall i \in V,\,\,\forall (i,j) \in E

Now {y_i} has a natural interpretation as a price to be paid for consumption at node {i} for supply injected at node {i}. {\mu_i} and {\nu_i} can be interpreted as the price of capacity. However, {\lambda_C} is trickier, price for flow around a cycle? It would seem that one would have to assign ownership of each link as well as ownership of cycles in order to have a market to generate these prices.

Recent Comments

Kellogg faculty blogroll