Duality and the Primal Dual Algorithm

As with many mathematical problems, it is possible to define problems “dual” to an LP problem in various forms. Study of this aspect of LP problems, besides revealing interesting structural properties, has many implications and applications. This is a continuation of the siplex algorithm discussion and here is the PDF.

1.1. Primal-dual formulation and its basic properties

The primal-dual relationship for LP in general form is defined as

$\displaystyle \begin{array}{lcccl} {\rm primal} && \textrm{(index sets)} && {\rm dual} \\ \min c^Tx && && \max y^Tb \\ a_i^Tx = b_i && i \in M && y_i \gtrless 0 \\ a_i^Tx \ge b_i && i \in \overline{M} && y_i \ge 0 \\ x_j \ge 0 && j \in N && y^T A_j \le c_j \\ x_j \gtrless && j \in \overline{N} && y^T A_j = c_j \end{array} \ \ \ \ \ (1)$

If we view the standard LP as a special case of general LP ( ${\overline{M} = \overline {N} = \varnothing}$ ), then we have the following

$\displaystyle \begin{array}{lcccl} {\rm primal} && \textrm{(index sets)} && {\rm dual} \\ \min c^Tx && && \max y^Tb \\ a_i^Tx = b_i && i \in M && y_i \gtrless 0 \\ x_j \ge 0 && j \in N && y^T A_j \le c_j \\ \end{array} \ \ \ \ \ (2)$

Equivalently, we can express it in a more familiar way

$\displaystyle \begin{array}{rl} {\rm primal:}& \min c^Tx \quad \textrm{subject to }\quad Ax = b, x \ge 0 \\ {\rm dual:}& \max y^Tb \quad \textrm{subject to }\quad y^TA \le c^T\\ \end{array} \ \ \ \ \ (3)$

For canonical form ( ${M = \overline {N} = \varnothing}$ ), we have

$\displaystyle \begin{array}{rl} {\rm primal:}& \min c^Tx \quad \textrm{subject to }\quad Ax \ge b, x \ge 0 \\ {\rm dual:}& \max y^Tb \quad \textrm{subject to }\quad y^TA \le c^T, y \ge 0\\ \end{array} \ \ \ \ \ (4)$

Property: Dual of dual is the primal. This is clear by observing that we may express the dual problem in (1) as

$\displaystyle \begin{array}{lll} \min y^T(-b) & \\ y_i \gtrless 0 & i \in M & \\ y_i \ge 0 & i \in \overline{M} & \\ y^T(-A_j) \ge (-c_j) & j \in N & \\ y^T(-A_j) = (-c_j) & j \in \overline{N} & \end{array} \ \ \ \ \ (5)$

Taking the dual of (5) according to the definition from (1) yields the primal. Since the standard/canonical form primal-dual formulations are special cases of the general form, same property holds for (3) and (4). For the rest of the section we work mainly with the standard form.

Property: Weak duality. The weak duality of LP problem is that, for any feasible solutions ${x, y}$ to the primal and dual LP problems, the following holds

$\displaystyle c^Tx \ge y^TAx = y^Tb. \ \ \ \ \ (6)$

The first inequality is obtained by multiplying ${x}$ on both sides of ${y^TA \le c^T}$ and the second by multiplying ${y^T}$ on both sides of ${Ax = b}$ .

Property: Strong duality. It turns out that the equalities hold for (6) when both ${x, y}$ are optimal solutions for the respective primal and dual LP problems. This is called the strong duality. To show this we will need Farka’s lemma based on the separating hyperplane theorem, which says that given two convex sets ${S_1, S_2}$ that do not intersect trivially (the intersection does not contain any volume), there exists a hyperplane such that no two points ${s_1 \in S_1, s_2 \in S_2}$ lie in the same open halfspace created by the hyperplane. The separating hyperplane theorem, although intuitively true, is not trivial to prove; for more details see Convex Analysis, R. Tyrrell Rockafellar, pages 95-97. The variant of Farka’s lemma that we use is the following.

Theorem 1 (Farka’s lemma) Let ${A \in \mathbb{R}^{m \times n}}$ and ${b \in \mathbb{R}^m}$ . Exactly one of the following holds:

(a) ${\exists x \ge 0}$ such that ${Ax = b}$

(b) ${\exists p \in \mathbb{R}^m}$ such that ${p^TA \ge 0^T}$ and ${p^Tb < 0}$ .

Proof. We need to show that a, b cannot be true at the same time and a ${\vee}$ b. Note that ${a \vee b \Leftrightarrow \neg a \Rightarrow b}$ . Also, the equations ${Ax = b}$ can be interpreted as “ ${b}$ is a linear combination of the columns ${A_j}$ of ${A}$ with coefficients ${x_j}$ : ${\sum_{j = 1}^mA_j^Tx_j = b}$ ”.

(a, b are mutually exclusive). The two conditions cannot hold at the same time, otherwise we have ${p^TAx = p^Tb}$ by (a), which means that ${p^TA}$ and ${p^Tb}$ must have the same sign.

( ${\neg a \Rightarrow b}$ ). We are now left to show that if the first condition fails, the second must be true. Note that we may view the columns ${A_j}$ of ${A}$ as ${m}$ ${n}$ -vectors in ${\mathbb{R}^m}$ . Then ${Ax}$ for all ${x \ge 0}$ is simply a cone ${C}$ of these ${n}$ -vectors which is convex and contains the zero vector. By assumption ${\neg a}$ , the vector ${b}$ , which is simply a point in ${\mathbb{R}^m}$ and thus convex, in not contained in ${C}$ . Moreover, if we take the ray from origin in the direction of ${b}$ , this is again convex and does not intersect ${C}$ non trivially. Thus, there exists a hyperplane that separates ${C}$ and ${b}$ . Clearly the hyperplane must pass through origin and we can pick its normal ${p}$ such that ${p^Tb < 0}$ . For this ${p}$ , we have ${p^TA_j \ge 0}$ for each column ${j}$ of ${A}$ .

$\Box$

We are now ready to prove the strong duality theorem. Assume that ${y}$ is an optimal solution to the dual. Note that we may express the dual as ${\min b^T(-y), A^T(-y) \ge -c}$ . Let ${J}$ be the set of (column) indices such that ${A_j^T(-y) = -c_j \Leftrightarrow y^TA_j = c_j}$ for ${j \in J}$ . That is, ${J}$ is the set of indices that are active. The set ${J}$ is not empty, otherwise ${y}$ is an interior point in the corresponding polytope representation of the feasible set. We see that there cannot be a vector ${d \in \mathbb{R}^m}$ , such that ${b^Td < 0}$ and ${A_j^Td \ge 0}$ for ${j \in J}$ , otherwise ${(-y + \theta d)}$ can be a better feasible solution for small ${\theta > 0}$ since ${A_j^T(-y) > -c_j, j \notin J}$ and a small perturbation of ${y}$ will not violate the ${A_j^T(-y) \ge -c_j}$ constraints. By Farka’s lemma, the other alternative must hold: For the set of ${A_j}$ ‘s with ${j \in J}$ , there exists a set of ${x_j}$ ‘s such that ${x_j \ge 0}$ and ${\sum_{j \in J} A_j^Tx_j = b}$ . If we let ${x_j = 0}$ for ${j \ne J}$ , we have ${\sum_{j = 1}^n A_j^Tx_j = \sum_{j \in J} A_j^Tx_j + \sum_{j \notin J} A_j^Tx_j = \sum_{j \in J} A_j^Tx_j + \sum_{j \notin J} A_j^T\cdot 0 = b}$ , making ${x \in \mathbb{R}^n}$ a feasible solution to the primal problem. We then have

$\displaystyle c^Tx \overset{x_j = 0 \textrm{ for } j \ne J}{=} \sum_{j \in J} c_jx_j = \sum_{j \in J} y^TA_jx_j \overset{x_j = 0 \textrm{ for } j \ne J}{=} \sum_{j = 1}^n y^TA_jx_j = y^TAx= y^Tb \ \ \ \ \ (7)$

The strong duality theorem can be proved similarly for other primal-dual formulations. An immediate consequence of the strong duality property is that when the primal problem has an optimal, then the dual must also have an optimal and cannot be unbounded or infeasible. Same holds for dual-primal relationship. From the weak duality, we know that it is not possible for both primal and dual to be unbounded (having unbounded optimal cost). For the same reason, if primal/dual is unbounded, then the corresponding dual/primal must be infeasible. It is also possible that both primal and dual are infeasible. Examples can be found in the book.

Property: Complementary slackness. Another direct consequence of the strong duality property is complementary slackness. From (6) we observe that the following holds

$\displaystyle \nonumber u := (c^T - y^TA)x \ge 0, \quad v:= y^T(Ax - b) \ge 0,$

with equality iff ${x, y}$ are optimal by the strong duality property. Complementary slackness holds for other primal-dual formulations as well, as a direct consequence of strong duality. Again, complementary slackness holds for other primal-dual formulations.

1.2. The primal dual algorithm

What can we do with duality? It turns out that there is quite a bit. We know that simplex algorithm may potential do an exponential number of pivoting. However, with primal dual algorithm, some problems can be solved in time polynomial in the size of the input. First, we will go through the basic procedure of carrying out the primal dual algorithm. To begin, we start with standard LP and its dual. First let us recall the primal dual formulation from last subsection

$\displaystyle \nonumber \begin{array}{rl} \textrm{primal (P):}& \min c^Tx \quad \textrm{subject to }\quad Ax = b \ge 0, x \ge 0 \\ \textrm{dual (D):}& \max y^Tb \quad \textrm{subject to }\quad y^TA \le c^T, y \gtrless 0\\ \end{array}$

Note that we added ${b \ge 0}$ since this is always possible simply by flipping signs of some equations in ${Ax = b}$ . Now recall the complementary slackness condition, which says that for optimal ${x, y}$ , we always have

$\displaystyle \nonumber (c^T - y^TA)x = y^T(Ax - b) = 0,$

Since ${Ax = b}$ holds, second equality always holds for any feasible ${x}$ . Now assume that we have a feasible solution ${y}$ for the dual (D) (see the book for a simple way to get this), we let the set of indices ${J}$ denote those ${j}$ ‘s such that ${c_j - y^TA_j = 0}$ ; for ${j \in J}$ , ${x_j}$ can take any value. If we can get an ${x}$ that satisfies

$\displaystyle \nonumber \sum_{j\in J}a_{ij}x_j = b_i, i = 1, \ldots, m, \quad x_j \ge 0, j \in J,\quad x_j = 0, j \notin J.$

Then the pair ${x, y}$ must be optimal by complementary slackness. Note that the set of ${x_j}$ ‘s in the first equality has ${j \in J}$ since the rest are zeros. Of course, the set defined in above equation may be empty. This is where the primal dual algorithm comes in: We create an augmented primal, called restricted primal (RP), as follows:

$\displaystyle \begin{array}{rll} \min \sum_{i = 1}^mx_i^a && \\ \\ \sum_{j\in J}a_{ij}x_j + x_i^a = b_i && i = 1, \ldots, m\\ x_j \ge 0 && j \in J \\ x_j = 0 && j \notin J \\ x_i^a \ge 0 && \end{array} \ \ \ \ \ (8)$

The above restricted primal is feasible simply because we may let all ${x_j = 0}$ and ${x_i^a = b_i \ge 0}$ to get a feasible solution. The cost is bounded below since it is non negative. If the (RP) above has an optimal solution with cost ${0}$ , then we found an optimal solution for P. But as we said, this cannot be guaranteed. Let us see what happens if the optimal solution does not have optimal cost ${0}$ . For this we move to the dual restricted primal (DRP), defined as the dual to (8), following the formula in (1):

$\displaystyle \nonumber \begin{array}{rll} \max y^Tb && \\ \\ y^TA_j \le 0 && j \in J\\ y_i \le 1 && i = 1, \ldots, m\\ y_i \gtrless 0 && \end{array}$

Since it is the dual of (DP), which has an optimal solution, it has an optimal solution ${\bar y}$ . This solution satisfies ${\bar y^TA_j \le 0, j \in J}$ and ${\bar y^Tb = \sum_{i=1}^mx_i^a > 0}$ . This suggest that for a feasible ${y}$ for (D), ${y + \theta\bar y}$ with ${\theta > 0}$ may be a better solution than ${y}$ . Clearly, we cannot have ${\bar y^TA_j \le 0}$ for all ${j \notin J}$ ; otherwise we can make ${(y + \theta \bar y)^Tb}$ arbitrarily large (if this happens, then (P) is infeasible since (D) is unbounded). We are now left with the case that ${\bar y^TA_j > 0}$ for some ${j \notin J}$ . On the other hand, recall that for ${j \notin J}$ , we have that ${y^TA_j < c_j}$ . As we make ${\theta}$ larger, eventually for at least one more ${j}$ , ${(y + \theta\bar y)^TA_j}$ will be exactly ${c_j}$ . Thus, we have obtained a new feasible solution ${y' = y + \theta\bar y}$ of (D) for which one or more ${j \notin J}$ will move to the set ${J}$ . At this point, we either get an optimal ${y'}$ or we loop back to get another (RP) and then (DRP). Since every time at least one more ${x_j = 0}$ is added, the process must end. This is the gist of the primal dual algorithm.