Around the mid 2010’s Google introduced automated bidding. Other platforms have followed suit.

Rather than bidding directly for an `eyeball’, an advertiser delegates the bidding to the platform. In order to inform the bids that the platform will submit on their behalf, the advertiser submits two numbers to the platform. One is their budget and the second is their ROI target which can be thought of as  \frac{\#\,\, of\,\, clicks}{cost}. Hence, the ROI is the inverse of the cost per click.

Some observers have remarked that auto-bidding is strange because one asks the auctioneer themselves to bid on one’s behalf. Others have been inspired to focus on the design of auctions when bidders have an ROI constraint. This, I think, is misguided. 

First, just because the auctioneer’s chosen bidding language uses an ROI target does not mean that a ROI constraint enters bidder’s preferences. One should never confuse the message space of a mechanism with the preferences of the agents. 

Second, once a bidder has submitted a budget and an ROI target, the subsequent auction is an irrelevance. Why? Suppose I submit a budget of B. Then, my ROI target, says that the platform must deliver  \frac{B}{cost\,\, per \,\, click} clicks. For example, at a budget of $100 and an ROI target of 2, I am telling the platform that I will give them $100 in return for 200 clicks. Now, the platform, has access not to a finite number of clicks but a flow. They can, given time, satisfy every bid. In short, the platform will get your $100. The only issue is when. The automated auction is merely an elaborate device for determining the rate at which different bidders receive a click. One can think of far simpler procedures to do this. For example, round robin or deplete budgets at a uniform rate.

While my colleagues have been using Chat GPT to determine what it knows about important things, such as sunk costs and elasticity of demand, I was curious to learn what it knew about me. Here is a snippet:

“He is currently a Professor of Economics at the University of Pennsylvania, where he has been a faculty member since 2018. Prior to that, he was a Professor of Economics at the University of Chicago Booth School of Business, where he also served as the Dean from 2012 to 2019.”

I have, alas, never been a Professor of Economics at Booth, nor have I served as dean either at Booth or anywhere else. One might be tempted to dismiss this as another example of how GPT gets it wrong. However, if one realizes that GPT works by associating some block of text with another, it tells me, that given the keywords associated with my name, GPT predicts I should have been a Dean, at Booth, anyway. And not for one term, which is typically 5 years but 7 years. Did some great scandal bring my office to an end? Or, was I tempted away by an even higher office? We will never know. However, headhunters everywhere, take note, that powerful AI thinks I am dean material!

At dinner, some weeks ago, among my companions, was a cardinal of the profession. The cardinal lamented the fixation on the top 5. This cardinal, rebelled against it by declining to use the term in letters of reference, promotion and tenure. The cardinal urged a focus on the content of the work rather than the ranking of the outlet. While sympathetic to the sentiment, I think the proposal mistaken in some contexts. Specifically, for promotion and tenure letters, one is writing for `outsiders’. The intended audience is not part of the specialty. The letter serves as a marker of credibility for the outsider. Declining to rate the outlets diminishes the credibility of a supportive letter. I advocate, replacing the offending term by `top 7′. For example, `The candidate has three publications in the top 7 outlets of the profession’. Just as one never needs to say which are the top 5 journals, there is no reason to say what the top 7 are. Those who know, know. Those who don’t, are clearly uninformed

Lower the height of the basket and impose a restriction on the height of players, say 5 foot 8. This will increase the pool of available players and eventually improve the quality of play compared to the status quo where we are restricted to choosing players who are at least 6 feet.

In a 2020 paper in the Journal of Economic Literature, Heckman and Moktan argue that the obsessive focus on top 5 publications has a deleterious effect on the profession. In addition to documenting the impact a top 5 publication has on career outcomes, they argue that their hold distorts the incentives of junior faculty. For example, junior faculty may scrap a possibly good idea if it can’t get published in a top-five or focus on ideas with a ready made audience on the relevant editorial boards.

In addition to insufficient experimentation/exploration, there are two other consequences we should expect. As departments outsource their promotion and tenure decisions to the editorial boards of the top 5, we should anticipate an increase in the balkanization of individual departments. One’s colleagues have less of an incentive to engage with one’s own work and conversely. There should also be a decline in the willingness of faculty to contribute to the needs of the department. After all, a department is now merely an ensemble of special interest groups.

A second consequence should be an increase in attempts to subvert the primary goal of the peer review process: to provide disinterested, but, informed assessments of work. One need only look at Computer Science (CS) to see that such a possibility is not far fetched. Unless Economists are immune to the temptations that plague other mortals, we should anticipate the same. Nihar Shah at CMU surveys the problems that beset peer review in CS. Like Economics, there is a small subset of prestige venues for one’s work. Acceptance into these venues affects grants and tenure. The increased stakes have spawned collusive refereeing rings (mutual appreciation societies would be a less perjorative term) that subvert the goal of disinterested, but, informed review. There is also a strong suspicion that there are coalitions of scientists who agree to include each other as co-authors on multiple papers (risk sharing) so as to maximize the chances that they will make it in.

Nihar’s paper discusses various strategies to inoculate peer review against strategic behavior, but none are perfect. The fundamental problem is that the rewards to acceptance in these select venues exceed the expected penalties one might face.

These issues are part of a larger question: what is the optimal organization of scientific activity? The literature on contests, moral hazard and mechanism design focus on the individual component, ignoring other aspects such as rewarding discovery vs verification, incentivizing sharing, exploration and the decision to enter scientific work. For example, entry may involve high up front investments in specialized skills. Who will make these investments if the ex-post rewards from doing so are concentrated on a tiny number?

Peter Bickel taught me a nifty, little, literally, proof of the minimax theorem. Peter says it was inspired by his reading of David Blackwell‘s vector generalization of the same. Here goes.

Let A be the m \times n payoff matrix of zero-sum game. Here a_{ij} is the payoff to the row player from playing i when the column player plays j. Let \Delta^k be $k$-simplex.

For each p \in \Delta^m, let R(p) = \{ p^TAq: q \in \Delta^n \}. This is the set of payoffs that the row player can achieve when playing mixed strategy p. By convexity, it must be an interval whose endpoints depend on p. We will only be interested in the left most endpoint, call it b(p). Thus, we can assume that (strictly speaking should be subset) R(p) = [b(p), + \infty). Similarly, for each q \in \Delta^n, let C(q) = \{ p^TAq: p \in \Delta^m \}. By the same observations we may suppose that C(q)= (-\infty, d(q)].

For any p \in \Delta^m and q \in \Delta^n we have R(p) \cap C(q) \neq \emptyset. Now, choose p^* = \arg \max_{p \in \Delta^m}b(p) which exists by continuity and compactness. Similarly, choose q^* = \arg \min_{q \in \Delta^n}d(q). Then, v = (p^{*})^{T}Aq^{*} \in R(p^{*}) \cap C(q^{*}) is the value of the game.

No Farkas or separating hyperplane. Now, minimax is equivalent to duality. So, one might wonder whether such a proof could be used to obtain the duality theorem of linear programming. Assuming that the primal and the dual had optimal solutions, yes. However, the case where one is infeasible or unbounded would cause this kind of proof to break.

PS: Then, Olivier Gossner pointed out that perhaps because I wanted to believe so much that it was true, that I had overlooked something. See the comments. He was right. I tried to make the proof work but could not. A proof of minimax that avoids Farkas and/or the separating hyperlane is not outlandish. There is Guillermo Owen’s proof, via induction, for example. By the way this was not the first such elementary proof. There was a flurry elementary proofs of the minimax in the 40s and 50s in response to a challenge of von Neumann. Indeed, Peck (1958) writes in the introduction to his paper:

“There are so many proofs of this theorem in the literature, that an excuse is necessary before exhibiting another. Such may be found by examining the proof given below for the following: it uses no matrices, almost no topology and makes little use of the geometry of convex sets; it applies equally well to the case where only one of the pure strategy spaces is finite; also there is no assumption that the payoff function is bounded.”

That an induction proof should exist is unsurprising given David Gale’s inductive proof of the Farkas lemma. In fact, Gale’s proof can be seen as an instantiation of Fourier’s method for solving linear programs, again, no Farkas or separating hyperplane.

The majorization relationship arises frequently in Economic applications, most recently in the context of information design (see Kleiner, Moldovanu and Strack (2021) for instance). This post highlights the connection between optimization problems involving the majorization relationship and polymatroid optimization problems.

The majorization relationship is a partial order on vectors. If x is an n-vector, denote the j^{th} largest component by x_{[j]}.  Let x and b be two vectors in n dimensions. The vector x is majorized by the vector b if 

\sum_{j \leq i}x_{[j]} \leq \sum_{j \leq i}b_{[j]},\, \forall i \leq n-1

\sum_{j=1}^nx_{[j]} = \sum_{j=1}^nb_{[j]}

In the typical application one is given a non-negative vector b and the goal is to identify a vector x that is majorized by b to optimize a linear function of x. The problem is easily formulated as a linear program. Suppose  b_1 \geq b_2 \geq \ldots \geq b_n \geq 0. Then, our goal is to maximize cx subject to

\sum_{j \leq i}x_j \leq \sum_{j \leq i}b_j\,\, \forall i \leq n-1

\sum_{j=1}^nx_j = \sum_{j=1}^nb_j

x_1 \geq x_2 \geq \ldots \geq x_n \geq 0

A classic result of Hardy, Littlewood and Polya states that x is majorized by b iff there is a doubly-stochastic matrix \Pi such that x=\Pi b.  By the Birkhoff-von Neumann theorem it means that the set of vectors majorized by b is the convex hull of all permutations of b, i.e., a permutahedron. Permutahedra are polymatroids (see, for example, Kuipers, Vermeulen and Voorneveld (2010)). Therefore, we can reformulate the linear program above as a polymatroid optimization problem which can be solved via a greedy algorithm. Dahl (2010) gives the underlying submodular function that describes the polymatroid. For each S \subseteq \{1, \ldots, n\}=N let f(S) = \sum_{j=1}^{|S|}b_j. Note that f(S) depends only on the cardinality of S and not S itself. Given this, the optimization problem above can be expressed as

 \max cx

subject to

\sum_{j=1}^nx_j = f(N)

\sum_{j \in S}x_j \leq f(S)\,\, \forall S \subset N

x_j \geq 0\,\, \forall j

There is also a related `minorization’ problem that is of interest (see Akhil and Sundaresan (2016) for a survey):

\min cx

subject to

\sum_{j \leq i}x_j \geq \sum_{j \leq i}b_j\,\, \forall i \leq n-1

\sum_{j=1}^nx_j = \sum_{j=1}^nb_j

x_1 \geq \ldots  \geq  x_n \geq 0

Dahl (2001) characterizes the extreme points of the feasible region of this program. Morton, von Randow, Ringwald (1985) show that its feasible region can be described as a polymatroid with monotonicity constraints. Set g(S) = \max \{\sum_{i=1}^jb_i: \{1, 2, \ldots, j\} \subseteq S\}\,\, \forall S \subseteq N. In words g(S) = \sum_{i=1}^jb_i if j is the largest index such that the set \{1, 2, \ldots, j\} is contained in S. Otherwise g(S)=0. Set f(S) =g(N) - g(N \setminus S) The function f(S) is non-decreasing and submodular on N. The feasible region of the minorization problem can be expressed as

\sum_{j \in S}x_j \leq f(S)\,\, \forall S \subset N

\sum_{j \in N}x_j = f(N)

x_1 \geq \ldots  \geq  x_n \geq 0

Charactertizing the structure of the optimal solution is straightforward. It involves the `pooling’ of some variables. Variables are partitioned into sets consisting of a consecutive sequence of indices. Within each of these sets the variables are equal. Variables from sets with lower’ indices will of course have higher values than variables from sets with larger indices.

For the more general problem of optimizing a linear function over the intersection of a polymatroid and an affine space, see Hartvigsen (1998).

The goal of this post is to highlight a feature of the Bayesian Persuasion problem that seems useful but as far as I can tell not explicitly stated anywhere.

Let \mathcal{S} \subset \mathbb{R} be the finite set of states and \mathcal{A} \subset \mathbb{R} the finite set of actions the receiver can take. Elements of each are denoted by \omega_{j} and a_{i}, respectively.

The value to the receiver (he/him) of choosing action a_i in state \omega_j is V_R(a_i, \omega_j). The payoff to the  sender (she/her) if the action that the receiver takes in state \omega_j is a_i is denoted V_S(a_i, \omega_j). Sender and receiver share  a common prior p over \mathcal{S}.

The persuasion problem can be formulated in terms of choosing a distribution over state and action pairs such that an obedience constraint holds for the receiver. Call this the obedience formulation. The more popular formulation, in terms of finding a concave envelope is, as will be seen below, is equivalent.

The sender commits to a mapping from the state to an action recommendation. Given a recommendation, the receiver can update their prior over states. Once the algebraic dust settles, it turns out all that matters is the joint probability of action and state. Thus, the sender’s problem reduces to choosing x(\omega_j,a_i), the joint probability of action a_i and state  \omega_j.  The sender’s optimization problem is:

\max_{x(\omega,a)} \sum_{i=1}^{|\mathcal{A}|}\sum_{j=1}^{|\mathcal{S}|}V_{S}(a_{i}, \omega_j)x(\omega_{j},a_{i})

subject to

\sum_{j=1}^{|\mathcal{S}|}V_{R}(a_{i}, \omega_j)x(\omega_{j},a_{i}) \geq \sum_{j=1}^{|\mathcal{S}|}V_{R}(a_{k}, \omega_j)x(\omega_{j},a_{i})\,\, \forall a_{i}, a_{k}

\sum_{i=1}^{|\mathcal{A}|}x(\omega_{j},a_{i}) = p(\omega_{j})\,\, \forall \omega_j \in \mathcal{S}

x(\omega_{j},a_{i}) \geq 0 \,\, \forall  \omega_j \in \mathcal{S} \& a_i \in \mathcal{A}

The first constraint is the obedience constraints (OC) which ensure that it is in the receiver’s interest to follow the sender’s recommendation.

The second ensures that the total probability weight assigned to actions recommended in state \omega_j matches the prior probability of state \omega_j being realized.

The difficulty is dealing with the OC constraints. Many relax them using the method of Lagrange multipliers and try to pin down the values of the multipliers. We take a different approach.

For each action a_i, let K_i be the polyhedral cone defined by:

\sum_{j=1}^{|\mathcal{S}|}V_{R}(a_{i}, \omega_j)x(\omega_{j},a_{i}) \geq \sum_{j=1}^{|\mathcal{S}|}V_{R}(a_{k}, \omega_j)x(\omega_{j},a_{i})\,\, \forall a_{k}

By the Weyl-Minkowski theorem,  K_i is characterized by its generators (extreme rays), which we will denote by G_i = \{g^1_i, \ldots, g^{t_i}_i\}. The generators are non-negative and non-zero and can be normalized so that they sum to 1 which allows us to interpret them as probability distributions over the state space. For any generator (probability distribution) in G_i, action a_i is the best response of the receiver. Many of the qualitative features of a solution to the persuasion problem that have been identified in the literature are statements about the generators of K_i.

As each vector in K_i can be expressed as a non-negative linear combination of its generators, we can rewrite the constraints as follows: \sum_{i=1}^{|\mathcal{A}|}\sum_{r=1}^{t_i}\mu^r_ig^r_i(\omega_j) =  p(\omega_j) \,\, \forall j \in \mathcal{S}

\mu^r_i \geq 0\,\, \forall i \in \mathcal{A},r = 1, \ldots, t_i

If one adds up the constraints, we see that

\sum_{j=1}^{|\mathcal{S}|}\sum_{i=1}^{|\mathcal{A}|}\sum_{r=1}^{t_i}\mu^r_ig_i^r(\omega_j) =  1. In other words, the prior distribution p(\cdot) can be expressed as a convex combination of other distributions. In this way arrive at another popular formulation of the persuasion problem which is in terms of finding a decomposition of the prior into a convex combination of possible posteriors. Notice, the set of possible posterior distributions that need to be considered  are limited to the generators of the corresponding cones K_i. We illustrate why this is useful in a moment. For now, let me point out a connection to the other magical phrase that appears in the context of persuasion: concavification.

First, express the persuasion problem in terms of the weight assigned to the generators: \max \sum_{a_i \in \mathcal{A}}\sum_{r=1}^{t_i}[\sum_{\omega_j \in \mathcal{S}}V_S(a_i, \omega_j)g_i^r(\omega_j)]\mu^r_i

subject to \sum_{i=1}^{|\mathcal{A}|}\sum_{r=1}^{t_i}\mu^r_ig^r_i(\omega_j) =  p(\omega_{j}) \,\, \forall j \in \mathcal{S}

\mu^r_i \geq 0\,\, \forall i \in \mathcal{A},r = 1, \ldots, t_i

Letting y be the vector of dual variables, the linear programming dual of the persuasion problem is:

\min \sum_{\omega_j \in \mathcal{S}}p(\omega_j)y_j

subject to \sum_{\omega_j \in \mathcal{S}}g^r_i (\omega_j)y_j  \geq \sum_{\omega_j \in \mathcal{S}}V_S(a_i, \omega_j)g_i^r(\omega_j)\,\, \forall i \in \mathcal{A}\,\, r = 1, \ldots t_i

This dual problem characterizes the sender’s optimal value in terms of a concave envelope (since Kamenica & Gentzkow (2011) this is the most popular way to state a solution to the persuasion problem). Notice, the approach taken here shows clearly that the receiver’s preferences alone determine the set of posterior beliefs that will play a role. This idea is implicit in Lipinowski and Mathevet (2017). They introduce the notion of a posterior cover: a collection of sets of posterior beliefs, over each of which a given function is convex. When the given function is the best response correspondence of the receiver, this reduces precisely to the cone of the obedience constraints.

To demonstrate the usefulness of the conic representation of the obedience constraints, lets examine a persuasion problem with one dimensional state and action space described in Kolotillin and Wolitzky (2020). To simplify notation assume that \mathcal{S} = \{1, 2, \ldots, |\mathcal{S}|\} and \mathcal{A} = \{1, \ldots, |\mathcal{A}|\}. Write \omega_j as j and a_i as i

The goal of KW2020 is to provide a general approach to understanding qualitative properties of the optimal signal structure under two substantive assumptions. The first is that the sender’s utility is increasing in the receiver’s action. The second is called aggregate downcrossing, which implies that the receiver’s optimal action is increasing in his belief about the state.

1) V_S(i, j) > V_S(i-1, j)\,\, \forall i \in \mathcal{A} \setminus \{1\}\,\, \forall j \in \mathcal{S}

2) For all probability distributions q over \mathcal{S},

\sum_{j \in \mathcal{S}} [V_R(i,j)- V_R(i-1, j)]q_{j} \geq 0

\Rightarrow \sum_{j\in \mathcal{S}} [V_R(i', j)- V_R(i'-1, j)]q_{j} \geq 0\,\, \forall i' < i,\,\, i, i' \in \mathcal{A}

A result in KW2020 is that it is without loss to assume each induced posterior distribution has at most binary support. We show this to be an immediate consequence of the properties of the generators of each K_i.

Given condition 1, the sender only cares about the downward obedience constraints. We show that the adjacent downward constraints suffice.

Note that for any i, i-1, i-2 \in \mathcal{A},

\sum_{j \in \mathcal{S}} [V_R(i, j)- V_R(i-2, j)]x(i,j)=

\sum_{j \in \mathcal{S} } [V_R(i, j)- V_R(i-1, j)]x(i,j)

+ \sum_{j \in \mathcal{S} } [V_R(i-1, j)- V_R(i-2, j)]x(i,j)

The first term on the right hand side of the equality is non-negative by virtue of the adjacent obedience constraint holding. The second term is non-negative by aggregate downcrossing. Hence, each K_i is described by a single inequality:

\sum_{j\in\mathcal{S}}[V_{R}(i, j)-V_{R}(i-1, j)]x(i,j) \geq 0.

It is straightforward to see that each generator takes one of the following forms:

1) A single non-zero component with weight 1 assigned to a state j \in \mathcal{S} where V_{R}(i, j)-V_{R}(i-1, j)\geq0 .

2) Two non-zero components one assigned to a state j \in \mathcal{S} where V_{R}(i, j)-V_{R}(i-1, j) \geq 0 and one to a state j'\in \mathcal{S} where V_{R}(i, j')-V_{R}(i-1, j') < 0. The value assigned to component j is \frac{|V_{R}(i, j)-V_{R}(i-1, j)|^{_1}}{|V_{R}(i, j)-V_{R}(i-1, j)|^{_1}  +|V_{R}(i, j')-V_{R}(i-1, j')|^{-1} } while the value assigned to component j' is \frac{|V_{R}(i, j')-V_{R}(i-1, j')|^{_1}}{|V_{R}(i, j)-V_{R}(i-1, j)|^{_1}  +|V_{R}(i, j')-V_{R}(i-1, j')|^{-1} }.

Note that items 1 and 2 correspond to  Lemma 1 & Theorem 1 of KW2020 and follow immediately from the properties of the cone of the OC constraints.

Serious infectious diseases are a prime example of a public bad (non-exclusive and non-congestible). We limit them by restricting behavior and or getting individuals to internalize the externalities they generate. For example, one could mandate masks in public places. To be effective this requires monitoring and punishment. Unpleasant, but we know how to do this.  Or, one could hold those who don’t wear masks responsible for the costs they impose on those whom they infect. Unclear exactly how we would implement this, so impractical. However, it is still interesting to speculate about how one might do this. Coase pointed out that if one could tie the offending behavior to something that was excludable, we would be in business.

To my mind an obvious candidate is medical care. A feature of infectious diseases, is that behavior which increases the risk of infection to others also increases it for oneself. Thus, those who wish to engage in behavior that increases the risk of infection should be allowed to do so provided they waive the right to medical treatment for a defined period should they contract the infection. If this is unenforceable, perhaps something `weaker’ such as treatment will not be covered by insurance or the subject will be accorded lowest priority when treatment capacity is scarce.

How exactly could such a scheme be implemented? To begin with one needs to define which behaviors count, get the agent to explicitly waive the right when engaging in it and then making sure medical facilities are made aware of it.  We have some ready made behaviors that make it easy. Going to a  bar, gym and indoor dining. The rough principle is any activity with a $$ R_0 > 1 $$ whose access is controlled by a profit seeking entity. The profit seeking entity obtains the waiver from the interested agent as a condition of entry (this would have to be monitored by the state). The waiver releases the profit entity from liability. Waiver enters a database that is linked to health records (probably the biggest obstacle).

The efficacy of lockdowns was debated at the start of the pandemic and continues to this day. Sweden, famously, chose not to implement a lockdown. As Anders Tegnell, remarked:
`Closedown, lockdown, closing borders — nothing has a historical scientific basis, in my view.’

Lyman Stone of the American Enterprise Institute expresses it more forcefully:

`Here’s the thing: there’s no evidence of lockdowns working. If strict lockdowns actually saved lives, I would be all for them, even if they had large economic costs. But the scientific and medical case for strict lockdowns is paper-thin.’

The quotes above reveal an imprecision at the heart of the debate. What exactly is a lockdown? Tegnell uses a variety of words that suggest a variety of policies that can be lumped together. Stone is careful to say there is no evidence in support of strict lockdowns. Suggesting that `less’ strict lockdowns might be beneficial. So, lets speculate about the other extreme: no lockdown relaxed or otherwise.

The presence of infected individuals increases the cost of certain interactions. If infected individuals don’t internalize this cost, then, in the absence of any intervention we will observe a reduction in the efficient level of economic activity.

How will this manifest itself? Agents may adopt relatively low cost precautionary actions like wearing masks. On the consumer side they will substitute away from transactions that expose them to the risk of infection. For example, take out rather than sit down and delivery versus shopping in person. In short, we will see a drop in the demand for certain kinds of transactions. Absent subsidies for hospital care, we should expect an increase in price (or wait times) for medical care further incentivizing precautionary actions on the part of individuals.

The various models of network formation in the face of contagion we have (eg Erol and Vohra (2020)) all suggest we will see changes in how individuals socialize. They will reduce the variety of their interactions and concentrate them in cliques of `trusted’ agents.

On the supply side, firms will have to make costly investments to reduce the risk of infection to customers and possibly workers. The ability of firms to pass these costs onto their customers or their suppliers will depend on the relative bargaining power of each. Restaurant workers, for example, may demand increased compensation for the risks they bear, but this will occur at the same time as a drop in demand for their services.

To summarize, a `no lockdown’ policy will, over time, resemble a lockdown policy. Thus, the question is whether there is a  coordinated lockdown policy that is superior to an uncoordinated one that emerges endogenously.

Kellogg faculty blogroll