It states that the Minkowski sum of a large number of sets is approximately convex. The clearest statement  as well as the nicest proof  I am familiar with is due to J. W. S. Cassels. Cassels is a distinguished number theorist who for many years taught the mathematical economics course in the Tripos. The lecture notes  are available in a slender book now published by Cambridge University Press.

This central limit like quality of the lemma is well beyond the capacity of a hewer of wood like myself. I prefer the more prosaic version.

Let ${\{S^j\}_{j=1}^n}$ be a collection of sets in ${\Re ^m}$ with ${n > m}$. Denote by ${S}$ the Minkowski sum of the collection ${\{S^i\}_{i=1}^n}$. Then, every ${x \in conv(S)}$ can be expressed as ${\sum_{j=1}^nx^j}$ where ${x^j \in conv(S^j)}$ for all ${j = 1,\ldots, n}$ and ${|\{j: x^j \not \in S^j| \leq m}$.

How might this be useful? Let ${A}$ be an ${m \times n}$ 0-1 matrix and ${b \in \Re^m}$ with ${n > m}$. Consider the problem $\displaystyle \max \{cx: Ax = b, x_j \in \{0,1\}\ \forall \,\, j = 1, \ldots, n\}.$

Let ${x^*}$ be a solution to the linear relaxation of this problem. Then, the lemma yields the existence of a 0-1 vector ${x}$ such that ${cx \geq cx^* = z}$ and ${||Ax - b||_{\infty} \leq m}$. One can get a bound in terms of Euclidean distance as well.

How does one do this? Denote each column ${j}$ of the ${A}$ matrix by ${a^j}$ and let ${d^j = (c_j, a^j)}$. Let ${S^j = \{d^j, 0\}}$. Because ${z = cx^*}$ and ${b = Ax^*}$ it follows that ${(z,b) \in conv(S)}$. Thus, by the Lemma, $\displaystyle (z, b) = \sum_{j=1}^n(c_j, a^j)y_j$

where each ${y_j \in [0,1]}$ and ${|\{j: y_j \in (0,1) \}| \leq m }$. In words, ${y}$ has at most ${m}$ fractional components. Now construct a 0-1 vector ${y^*}$ from ${y}$ as follows. If ${y_j \in \{0,1\}}$, set ${y^*_j = y_j}$. If ${y_j }$ is fractional, round ${y^*_j}$ upto 1 with probability ${y_j}$ and down to zero otherwise. Observe that ${||Ay - b||_{\infty} \leq m}$ and the ${E(cy) = cx^*}$. Hence, there must exist a 0-1 vector ${x}$ with the claimed properties.

The error bound of ${m}$ is to large for many applications. This is a consequence of the generality of the lemma. It makes no use of any structure encoded in the ${A}$ matrix. For example, suppose $x^*$ were an extreme point and $A$ a totally unimodular matrix. Then, the number of fractional components of $x^*$ are zero. The rounding methods of Kiralyi, Lau and Singh as well as of Kumar, Marathe, Parthasarthy and Srinivasan exploit the structure of the matrix. In fact both use an idea that one can find in Cassel’s paper. I’ll follow the treatment in Kumar et. al.

As before we start with ${x^*}$. For convenience suppose ${0 < x^*_j < 1}$ for all ${j = 1, \ldots, n}$. As ${A}$ as has more columns then rows, there must be a non-zero vector ${r}$ in the kernel of ${A}$, i.e., ${Ar = 0}$. Consider ${x + \alpha r}$ and ${x -\beta r}$. For ${\alpha > 0}$ and ${\beta > 0}$ sufficiently small, ${x_j + \alpha r_j, x_j - \beta r_j \in [0,1]}$ for all ${j}$. Increase ${\alpha}$ and ${\beta}$ until the first time at least one component of ${x +\alpha r}$ and ${x- \beta r}$ is in ${\{0,1\}}$. Next select the vector ${x + \alpha r}$ with probability ${\frac{\beta}{\alpha + \beta}}$ or the vector ${x- \beta r}$ with probability ${\frac{\alpha}{\alpha + \beta}}$. Call the vector selected ${x^1}$.

Now ${Ax^1 = b}$. Furthermore, ${x^1}$ has at least one more integer component than ${x^*}$. Let ${J = \{j: x^1_j \in (0,1)\}}$. Let ${A^J}$ be the matrix consisting only of the columns in ${J}$ and ${x^1(J)}$ consist only of the components of ${x^1}$ in ${J}$. Consider the system ${A^Jx^1(J) = b - \sum_{j \not \in J}x^1_j}$. As long as ${A^J}$ has more columns then rows we can repeat the same argument as above. This iterative procedure gives us the same rounding result as the Lemma. However, one can do better, because it may be that even when the number of columns of the matrix is less than the number of rows, the system may be under-determined and therefore the null space is non-empty.

In a sequel, I’ll describe an optimization version of the Lemma that was implicit in Starr’s 1969 Econometrica paper on equilibria in economies with non-convexities.