This writeup, as a documentation of my learning of the simplex algorithm, focuses on the discussion of the basic theory behind the algorithm rather than the algorithm itself. The writeup is based on the first chapter of the book by Papadimitriou and Steiglitz, which I feel leaves many subtle questions unanswered or delayed. I hope this writeup complements the book by answering some of these questions more explicitly upfront. No actual example is provided here, but the book has many good ones. A PDF file of this document is available here. This web version is made with the python scripts from latex to wordpress by in theory.
— 1.1. Equivalence Between Forms of Linear Programming Problems —
There are three common forms of Linear Programming (LP) problems: general, standard, and canonical. This subsection introduce the formulations and show that they are equivalent in the sense that one form can be converted to other without making the problem much larger.
General LP. Let . Let be index sets of elements of such that for , (that is, can be and real number) for . Let be a integer matrix with row vectors denoted and column vectors denoted . Let . Let be the set of indices of the rows corresponding to equality constraints (that is, there is a constraint of the form for ), and let be the set of indices of the rows corresponding to inequality constraints ( for ). Let be an integer -vector, an instance of general LP is defined as
Here denotes the inner product between , which can also be written as .
General LP to canonical LP. The three common formulations of LP are equivalent in the sense that one problem can be turned into another with minimal modification (polynomial time reduction). It is clear that a canonical LP or a standard LP is a special case of a general LP. To go from general form to canonical form, we note that a constraint of the form is equivalent to and . This transformation gives us updated constraints with intact. We then take care of by letting and add the constraint . After making this substitution ( will get updated in this process), we get the canonical form.
Canonical LP to standard LP. All we need to do here is to take to a set of equality constraint. For each , we introduce a surplus variable such that . If we happen to work with , we may introduce slack variable such that .
Standard LP to canonical LP. We may simply take a standard LP and turn it into a general LP and then to a canonical one.
— 1.2. Basic Feasible Solutions —
With the equivalence between the different formulations of LP problems, any algorithm solving any formulation solves all three; we work with the standard form from now on. For this formulation, we again assume and is a matrix, . Without loss of generality, we may assume that the matrix has rank ; if not, some rows are redundant (as linear combinations of others) and can be removed. We have since otherwise either has a unique solution or is over determined, which are trivial cases.
LP problems can be approached from two equivalent angles: Algebraic and geometric. The constraints from the standard form, defines a feasible set of possible ‘s; the algebraic angle works with basic feasible solutions (BFS) which are special elements of this set . As we will see, a BFS is always bounded, always exists when , and has a cost vector for which is optimal among all feasible solutions in .
Basic feasible solution. To obtain a basic feasible solution, we start with a basis of linearly independent columns of , . Since is a basis, the equation
The variables are called basic variables. For the rest of the variables, we simply set them to zero. It is easy to see that such an is a solution of . If also , the solution is called a basic feasible solution.
Property: A BFS is bounded. From (5) we see that each is bounded since the magnitude of elements of is bounded (since is an invertible integer matrix). Let , then we have .
Property: A BFS always exists when the feasible set is not empty. Note that the feasible set given by may be empty in practice; here we assume that . With this assumption, at least one BFS exists. We now formally prove this. First we observe that if , then is a BFS; therefore, we assume that . Since , we know that must have a solution . We may take the feasible solution with the most number of zero components and assume the nonzero components are the first components:
We now show that if the first columns of is linearly dependent then we can always get an with more zero components. Since are linearly dependent, we have that for some nonzero -vector ,
On the other hand, we have
We may scale such that for some , and for all . This gives us
which means that the vector is a feasible solution with one more zero component. This contradiction means that the columns must be linearly independent, which means since rank of is . We may then take these columns and add columns from to get a basis for which is a solution, which is always doable since has rank . Thus, a basic solution always exists.
Property: There exists a cost vector for every BFS such that is the unique optimal cost. Given a basic feasible solution , a cost vector can always be chosen such that yields the lowest cost of all points in . For this task, we simply pick such that for every that corresponds to a basis column, , and otherwise; therefore, . For any other feasible solution that differ from , one of the non basis columns must be positive, making the cost positive as well.
Discussion on boundedness of feasible set. Although any BFS is bounded, it may be the case that some feasible solutions are unbounded. For example, has unbounded feasible region. Such problem has the property that there exists such that . Nevertheless, in such cases, optimal solution may still exist for specific . For example, if we have cost vector for the just mentioned unbounded example, then is the optimal solution. We shall later see how to detect that a feasible set is unbounded.
— 1.3. The Geometry of Linear Programs —
Another angle of attack for LP problems is the geometric point of view. More specifically, we may view a bounded feasible set as a polytope and vise versa, as explained below. Moreover, each vertex of the polytope corresponds to a BFS.
Bounded polytope bounded feasible set. We now explore the relationship between a bounded feasible set and a polytope. We use polytope to mean the bounded intersection of a set of halfspaces, which is always convex. We focus specifically on polytopes with . We may add slack variables to get a standard form LP: . The feasible set for this LP must be bounded since the old variables from are bounded and each newly added variables depends exclusively on the old variables and therefore is also bounded.
On the other hand, if we work with a bounded feasible set for the LP, we may take the last columns of and assume that they form a basis. We may further assume that this basis is the identity matrix. For the first columns, we then have for , . Since we also have , we get a bunch of halfspaces of the form
Since the original LP has a bounded feasible set, the corresponding polytope must also be bounded.
1-1 Correspondence between polytope vertex and BFS of . ( is a vertex is a BFS). Given a polytope and a vertex , we have from above discussion that the corresponding is a feasible solution. We may let be arranged such that the first components are nonzero (positive). If is a BFS then the corresponding columns of must be linearly independent; suppose not. Then there exists with not all are zero such that
Since is a feasible solution we also have . Multiplying (11) by and subtract it from gives us
This suggests that for small enough , are also feasible solutions in . However, the transformation between and are affine, and thus must be a strict convex combination of and . This contradicts that is vertex of (to show this, note that there exists a hyperplane such that for all and . That is, the hyperplane contains and the rest of is on one side of ; iff . since , ; there liner combination cannot satisfy ).
( is a BFS is a vertex). Since is a BFS, there exists a basis for such that is the unique solution to . Suppose is not a vertex, then it can be expressed as strict convex combination of two vertices such that are on the same line. This suggests that must have be zero if since otherwise one of must be less than zero, which is not allowed for a feasible solution. Since the nonzero components of are these corresponding to , and must equal as they are all solutions to .
Degenerate BFS. A BFS is called degenerate if it has more than zeros. Obviously, when there are multiple bases for a single BFS, the BFS is degenerate.
Existence of optimal BFS for LP with bounded . This is easy to see if we look at the polytope. For a given cost vector , we have (where is a vector in the corresponding polytope ) for some unique . For each fixed constant , defines a hyperplane in . Since is bounded and convex, is minimized when is on the border of . Therefore there is at least one such that is a vertex of and the corresponding is an optimal BFS.
— 1.4. The simplex algorithm —
So far we see that if there is an optimal solution to an LP problem, then there is an optimal BFS as well as an optimal vertex in the corresponding polytope (if the intersection is bounded). The basic idea behind the simplex algorithm is to move from one BFS to another and eventually get to the optimal one (we will get to how to obtain a first BFS in the end since that will use the simplex algorithm). The rest of this section explains briefly the important ingredients of the simplex algorithm.
Moving from BFS to BFS. The operation of moving from BFS to BFS is called pivoting. Suppose that we want to move from a BFS to a new BFS . Let be the basis for , we may assume that is an identity matrix (via elimination which does not change the LP problem) and its columns are the first columns of . Then, for any column of not in , we can express as linear combination of , . Since , we have for any ,
There are couple possibilities here. If some , we can increase so that there will be a first such that . Then is the column to be replaced. We have . The new basis is linearly independent since . The other case is that all . This means that we may increase to be arbitrarily large and still have a feasible solution, which means that is unbounded.
Choosing a profitable column . For all choices of , we may calculate the corresponding new BFS (if some ) and then evaluate the cost . If we get a decrease in cost compared to , then is profitable. As we march from BFS to all adjacent BFS’, there are three possibilities:
- The cost goes unbounded (small) when some column is chosen, suggesting that the LP problem has as lowest cost. This can only happen if is unbounded. Specifically, this can be used to detect whether a given feasible set is unbounded: We simply let .
- No new BFS is profitable so is the optimal one.
- Some new BFS gives a better cost.
The issue of cycling. As we move from BFS to BFS, there is the potential problem of returning to a BFS after getting to it at some point. Note that this can only happen if the BFS is degenerate. That is, cycling can only happen if we stay at the same vertex of the corresponding polytope , since from one vertex that is not optimal, there is always a better adjacent vertex in terms of cost or the cost goes unbounded low along an edge from the current vertex (in which case the algorithm terminates). Once we move to a new BFS with better cost, we will never return to the originating BFS.
There are various methods to prevent cycling, which are of practical importance. However, since all pivoting algorithms have worst case behavior being exponential, the naive method of remembering all basis used for one BFS is no worse than these algorithms in theory.
Get a BFS to start with. We have now established that given a BFS to start with, we will find the optimal BFS if it exists. To begin the algorithm, however, we must first obtain a BFS. To do this we first flip the sign of any equation in so that we have . We then add artificial variables and use the temporary cost vector (since we only want an arbitrary BFS in the original , which does not depend on the actual cost vector c). For the new system, is clearly a BFS. As we drive the cost to zero, which will happen since as long as the original is not empty, the first components of is a BFS for the original system since we must have . We can then return to the original system.
Geometry aspects of pivoting. In above discussion, we have already hinted what happens to the polytope as we move from BFS to BFS. One of two things happen during a move from a BFS to a BFS: 1. We move from a vertex of to a new vertex of better cost; 2. The BFS only changes basis (the solution vector remains the same), in which case we stay at the same vertex of .
Final word on simplex algorithm. The importance of this theory is that, we can basically use any of the numerous pseudo algorithms or full blown packages on the web on simplex algorithm to solve any variation of the LP problems discussed in this chapter without worrying that they will produce different results, since they are essentially the same with only difference being the relative speed. We know that all deterministic pivoting simplex algorithms have worst case time complexity exponential with respect to the number of variables.
— 1.5. Primal-dual formulation and its basic properties —
As with many mathematical problems, it is possible to define problems “dual” to an LP problem in various forms. Study of this aspect of LP problems, besides revealing interesting structural properties, has many implications and applications. The primal-dual relationship for LP in general form is defined as
Property: Dual of dual is the primal. This is clear by observing that we may express the dual problem in (14) as
Taking the dual of (18) according to the definition from (14) yields the primal. Since the standard/canonical form primal-dual formulations are special cases of the general form, same property holds for (16) and (17). For the rest of the section we work mainly with the standard form.
The first inequality is obtained by multiplying on both sides of and the second by multiplying on both sides of .
Property: Strong duality. It turns out that the equalities hold for (19) when both are optimal solutions for the respective primal and dual LP problems. This is called the strong duality. To show this we will need Farka’s lemma based on the separating hyperplane theorem, which says that given two convex sets that do not intersect trivially (intersection is not a single point), there exists a hyperplane such that no two points lie in the same open halfspace created by the hyperplane. The variant of Farka’s lemma we use is the following.
Theorem 1 (Farka’s lemma) Let and . Exactly one of the following holds:
- (a) such that
- (b) such that and .
Proof. We need to show that a, b cannot be true at the same time and a b. Note that . Note that can be interpreted as “ is a linear combination of the columns of with coefficients : ”.
(a, b are exclusive). The two conditions cannot hold at the same time, otherwise we have , which means and must hold same sign.
(). We are now left to show that if the first condition fails, the second must be true. Note that we may view the columns of as -vectors in . Then for all is simply a cone of these -vectors which is convex and contains the zero vector. By assumption, the vector , which is simply a point in and thus convex, does not intersect . Moreover, if we take the ray from origin in the direction of , this is again convex and does not intersect non trivially. Thus, there exists a hyperplane that separates and . Clearly the hyperplane must pass through origin and we can pick its normal such that . For this , we have for each column of .
We are now ready to prove the strong duality theorem. Assume that is an optimal solution to the dual. Note that we may express the dual as . Let be the set of (column) indices such that for . That is, is the set of indices that are active. This set is not empty, otherwise is an interior point in a corresponding polytope representation of the feasible set. We see that there cannot be a vector , such that and for , otherwise can be a better feasible solution for small since are not binding and a small perturbation of will not violate these constraints. By Farka’s lemma, the other alternative must hold: For the set of ‘s with , there exists a set of ‘s such that and . If we let for , we have , making a feasible solution to the primal problem. We then have
An immediate consequence of the strong duality property is that when the primal problem has an optimal, then the dual must also have an optimal and cannot be unbounded or infeasible. Same holds for dual-primal relationship. From the weak duality, we know that it is not possible for both primal and dual to be unbounded. For the same reason, if primal/dual is unbounded, then the corresponding dual/primal must be infeasible. It is also possible that both primal and dual are infeasible. Examples can be found in the book.
Property: Complementary slackness. Another direct consequence of the strong duality property is complementary slackness. From (19) we observe that the following holds
with equality iff are optimal by the strong duality property. Complementary slackness holds for other primal-dual formulations as well, as a direct consequence of strong duality.