This writeup, as a documentation of my learning of the simplex algorithm, focuses on the discussion of the basic theory behind the algorithm rather than the algorithm itself. The writeup is based on the first chapter of the book by Papadimitriou and Steiglitz, which I feel leaves many subtle questions unanswered or delayed. I hope this writeup complements the book by answering some of these questions more explicitly upfront. No actual example is provided here, but the book has many good ones. A PDF file of this document is available here. This web version is made with the python scripts from latex to wordpress by in theory.
— 1.1. Equivalence Between Forms of Linear Programming Problems —
There are three common forms of Linear Programming (LP) problems: general, standard, and canonical. This subsection introduce the formulations and show that they are equivalent in the sense that one form can be converted to other without making the problem much larger.
General LP. Let . Let
be index sets of elements of
such that
for
,
(that is,
can be and real number) for
. Let
be a
integer matrix with row vectors denoted
and column vectors denoted
. Let
. Let
be the set of indices of the rows corresponding to equality constraints (that is, there is a constraint of the form
for
), and let
be the set of indices of the rows corresponding to inequality constraints (
for
). Let
be an integer
-vector, an instance of general LP is defined as
Here denotes the inner product between
, which can also be written as
.
Canonical LP. An instance has the form
Standard LP. An instance has the form
General LP to canonical LP. The three common formulations of LP are equivalent in the sense that one problem can be turned into another with minimal modification (polynomial time reduction). It is clear that a canonical LP or a standard LP is a special case of a general LP. To go from general form to canonical form, we note that a constraint of the form is equivalent to
and
. This transformation gives us updated constraints
with
intact. We then take care of
by letting
and add the constraint
. After making this substitution (
will get updated in this process), we get the canonical form.
Canonical LP to standard LP. All we need to do here is to take to a set of equality constraint. For each
, we introduce a surplus variable
such that
. If we happen to work with
, we may introduce slack variable
such that
.
Standard LP to canonical LP. We may simply take a standard LP and turn it into a general LP and then to a canonical one.
— 1.2. Basic Feasible Solutions —
With the equivalence between the different formulations of LP problems, any algorithm solving any formulation solves all three; we work with the standard form from now on. For this formulation, we again assume and
is a
matrix,
. Without loss of generality, we may assume that the matrix
has rank
; if not, some rows are redundant (as linear combinations of others) and can be removed. We have
since otherwise
either has a unique solution or is over determined, which are trivial cases.
LP problems can be approached from two equivalent angles: Algebraic and geometric. The constraints from the standard form, defines a feasible set
of possible
‘s; the algebraic angle works with basic feasible solutions (BFS) which are special elements of this set
. As we will see, a BFS
is always bounded, always exists when
, and has a cost vector
for which
is optimal among all feasible solutions in
.
Basic feasible solution. To obtain a basic feasible solution, we start with a basis of
linearly independent columns of
,
. Since
is a basis, the equation
The variables are called basic variables. For the rest of the variables, we simply set them to zero. It is easy to see that such an
is a solution of
. If also
, the solution is called a basic feasible solution.
Property: A BFS is bounded. From (5) we see that each is bounded since the magnitude of elements of
is bounded (since
is an invertible integer matrix). Let
, then we have
.
Property: A BFS always exists when the feasible set is not empty. Note that the feasible set given by
may be empty in practice; here we assume that
. With this assumption, at least one BFS exists. We now formally prove this. First we observe that if
, then
is a BFS; therefore, we assume that
. Since
, we know that
must have a solution
. We may take the feasible solution
with the most number of zero components and assume the
nonzero components are the first
components:
We now show that if the first columns of
is linearly dependent then we can always get an
with more zero components. Since
are linearly dependent, we have that for some nonzero
-vector
,
On the other hand, we have
We may scale such that for some
,
and
for all
. This gives us
which means that the vector is a feasible solution with one more zero component. This contradiction means that the
columns must be linearly independent, which means
since rank of
is
. We may then take these
columns and add
columns from
to get a basis
for which
is a solution, which is always doable since
has rank
. Thus, a basic solution always exists.
Property: There exists a cost vector for every BFS
such that
is the unique optimal cost. Given a basic feasible solution
, a cost vector
can always be chosen such that
yields the lowest cost of all points in
. For this task, we simply pick
such that for every
that corresponds to a basis column,
, and
otherwise; therefore,
. For any other feasible solution that differ from
, one of the non basis columns must be positive, making the cost positive as well.
Discussion on boundedness of feasible set. Although any BFS is bounded, it may be the case that some feasible solutions are unbounded. For example, has unbounded feasible region. Such problem has the property that there exists
such that
. Nevertheless, in such cases, optimal solution may still exist for specific
. For example, if we have cost vector
for the just mentioned unbounded example, then
is the optimal solution. We shall later see how to detect that a feasible set is unbounded.
— 1.3. The Geometry of Linear Programs —
Another angle of attack for LP problems is the geometric point of view. More specifically, we may view a bounded feasible set as a polytope and vise versa, as explained below. Moreover, each vertex of the polytope corresponds to a BFS.
Bounded polytope bounded feasible set. We now explore the relationship between a bounded feasible set
and a polytope. We use polytope to mean the bounded intersection of a set of halfspaces, which is always convex. We focus specifically on polytopes
with
. We may add
slack variables to get a standard form LP:
. The feasible set for this LP must be bounded since the old variables from
are bounded and each newly added variables depends exclusively on the old variables and therefore is also bounded.
On the other hand, if we work with a bounded feasible set for the LP, we may take the last columns of
and assume that they form a basis. We may further assume that this basis is the identity matrix. For the first
columns, we then have for
,
. Since we also have
, we get a bunch of halfspaces of the form
Since the original LP has a bounded feasible set, the corresponding polytope must also be bounded.
1-1 Correspondence between polytope vertex and BFS of . (
is a vertex
is a BFS). Given a polytope
and a vertex
, we have from above discussion that the corresponding
is a feasible solution. We may let
be arranged such that the first
components are nonzero (positive). If
is a BFS then the corresponding columns of
must be linearly independent; suppose not. Then there exists
with not all
are zero such that
Since is a feasible solution we also have
. Multiplying (11) by
and subtract it from
gives us
This suggests that for small enough ,
are also feasible solutions in
. However, the transformation between
and
are affine, and thus
must be a strict convex combination of
and
. This contradicts that
is vertex of
(to show this, note that there exists a hyperplane
such that
for all
and
. That is, the hyperplane contains
and the rest of
is on one side of
;
iff
. since
,
; there liner combination cannot satisfy
).
( is a BFS
is a vertex). Since
is a BFS, there exists a basis
for
such that
is the unique solution to
. Suppose
is not a vertex, then it can be expressed as strict convex combination of two vertices
such that
are on the same line. This suggests that
must have
be zero if
since otherwise one of
must be less than zero, which is not allowed for a feasible solution. Since the nonzero components of
are these corresponding to
,
and
must equal
as they are all solutions to
.
Degenerate BFS. A BFS is called degenerate if it has more than zeros. Obviously, when there are multiple bases for a single BFS, the BFS is degenerate.
Existence of optimal BFS for LP with bounded . This is easy to see if we look at the polytope. For a given cost vector
, we have
(where
is a
vector in the corresponding polytope
) for some unique
. For each fixed constant
,
defines a hyperplane in
. Since
is bounded and convex,
is minimized when
is on the border of
. Therefore there is at least one such
that is a vertex of
and the corresponding
is an optimal BFS.
— 1.4. The simplex algorithm —
So far we see that if there is an optimal solution to an LP problem, then there is an optimal BFS as well as an optimal vertex in the corresponding polytope (if the intersection is bounded). The basic idea behind the simplex algorithm is to move from one BFS to another and eventually get to the optimal one (we will get to how to obtain a first BFS in the end since that will use the simplex algorithm). The rest of this section explains briefly the important ingredients of the simplex algorithm.
Moving from BFS to BFS. The operation of moving from BFS to BFS is called pivoting. Suppose that we want to move from a BFS to a new BFS
. Let
be the basis for
, we may assume that
is an identity matrix (via elimination which does not change the LP problem) and its columns are the first
columns of
. Then, for any column
of
not in
, we can express
as linear combination of
,
. Since
, we have for any
,
There are couple possibilities here. If some , we can increase
so that there will be a first
such that
. Then
is the column to be replaced. We have
. The new basis is linearly independent since
. The other case is that all
. This means that we may increase
to be arbitrarily large and still have a feasible solution, which means that
is unbounded.
Choosing a profitable column . For all choices of
, we may calculate the corresponding new BFS
(if some
) and then evaluate the cost
. If we get a decrease in cost compared to
, then
is profitable. As we march from BFS to all adjacent BFS’, there are three possibilities:
- The cost goes unbounded (small) when some column
is chosen, suggesting that the LP problem has
as lowest cost. This can only happen if
is unbounded. Specifically, this can be used to detect whether a given feasible set
is unbounded: We simply let
.
- No new BFS is profitable so
is the optimal one.
- Some new BFS
gives a better cost.
The issue of cycling. As we move from BFS to BFS, there is the potential problem of returning to a BFS after getting to it at some point. Note that this can only happen if the BFS is degenerate. That is, cycling can only happen if we stay at the same vertex of the corresponding polytope , since from one vertex that is not optimal, there is always a better adjacent vertex in terms of cost or the cost goes unbounded low along an edge from the current vertex (in which case the algorithm terminates). Once we move to a new BFS with better cost, we will never return to the originating BFS.
There are various methods to prevent cycling, which are of practical importance. However, since all pivoting algorithms have worst case behavior being exponential, the naive method of remembering all basis used for one BFS is no worse than these algorithms in theory.
Get a BFS to start with. We have now established that given a BFS to start with, we will find the optimal BFS if it exists. To begin the algorithm, however, we must first obtain a BFS. To do this we first flip the sign of any equation in so that we have
. We then add
artificial variables
and use the temporary cost vector
(since we only want an arbitrary BFS in the original
, which does not depend on the actual cost vector c). For the new system,
is clearly a BFS. As we drive the cost
to zero, which will happen since as long as the original
is not empty, the first
components of
is a BFS for the original system since we must have
. We can then return to the original system.
Geometry aspects of pivoting. In above discussion, we have already hinted what happens to the polytope as we move from BFS to BFS. One of two things happen during a move from a BFS to a BFS: 1. We move from a vertex of to a new vertex of better cost; 2. The BFS only changes basis (the solution vector
remains the same), in which case we stay at the same vertex of
.
Final word on simplex algorithm. The importance of this theory is that, we can basically use any of the numerous pseudo algorithms or full blown packages on the web on simplex algorithm to solve any variation of the LP problems discussed in this chapter without worrying that they will produce different results, since they are essentially the same with only difference being the relative speed. We know that all deterministic pivoting simplex algorithms have worst case time complexity exponential with respect to the number of variables.
— 1.5. Primal-dual formulation and its basic properties —
As with many mathematical problems, it is possible to define problems “dual” to an LP problem in various forms. Study of this aspect of LP problems, besides revealing interesting structural properties, has many implications and applications. The primal-dual relationship for LP in general form is defined as
If we view the standard LP as a special case of general LP (), then we have the following
Equivalently, we can express it in a more familiar way
For canonical form (), we have
Property: Dual of dual is the primal. This is clear by observing that we may express the dual problem in (14) as
Taking the dual of (18) according to the definition from (14) yields the primal. Since the standard/canonical form primal-dual formulations are special cases of the general form, same property holds for (16) and (17). For the rest of the section we work mainly with the standard form.
Property: Weak duality. The weak duality of LP problem is that, for any feasible solutions to the primal and dual LP problems, the following holds
The first inequality is obtained by multiplying on both sides of
and the second by multiplying
on both sides of
.
Property: Strong duality. It turns out that the equalities hold for (19) when both are optimal solutions for the respective primal and dual LP problems. This is called the strong duality. To show this we will need Farka’s lemma based on the separating hyperplane theorem, which says that given two convex sets
that do not intersect trivially (intersection is not a single point), there exists a hyperplane such that no two points
lie in the same open halfspace created by the hyperplane. The variant of Farka’s lemma we use is the following.
Theorem 1 (Farka’s lemma) Let
and
. Exactly one of the following holds:
- (a)
such that
![]()
- (b)
such that
and
.
Proof. We need to show that a, b cannot be true at the same time and a b. Note that
. Note that
can be interpreted as “
is a linear combination of the columns
of
with coefficients
:
”.
(a, b are exclusive). The two conditions cannot hold at the same time, otherwise we have , which means
and
must hold same sign.
(). We are now left to show that if the first condition fails, the second must be true. Note that we may view the columns of
as
-vectors in
. Then
for all
is simply a cone
of these
-vectors which is convex and contains the zero vector. By assumption, the vector
, which is simply a point in
and thus convex, does not intersect
. Moreover, if we take the ray from origin in the direction of
, this is again convex and does not intersect
non trivially. Thus, there exists a hyperplane that separates
and
. Clearly the hyperplane must pass through origin and we can pick its normal
such that
. For this
, we have
for each column
of
.
We are now ready to prove the strong duality theorem. Assume that is an optimal solution to the dual. Note that we may express the dual as
. Let
be the set of (column) indices such that
for
. That is,
is the set of indices that are active. This set is not empty, otherwise
is an interior point in a corresponding polytope representation of the feasible set. We see that there cannot be a vector
, such that
and
for
, otherwise
can be a better feasible solution for small
since
are not binding and a small perturbation of
will not violate these constraints. By Farka’s lemma, the other alternative must hold: For the set of
‘s with
, there exists a set of
‘s such that
and
. If we let
for
, we have
, making
a feasible solution to the primal problem. We then have
An immediate consequence of the strong duality property is that when the primal problem has an optimal, then the dual must also have an optimal and cannot be unbounded or infeasible. Same holds for dual-primal relationship. From the weak duality, we know that it is not possible for both primal and dual to be unbounded. For the same reason, if primal/dual is unbounded, then the corresponding dual/primal must be infeasible. It is also possible that both primal and dual are infeasible. Examples can be found in the book.
Property: Complementary slackness. Another direct consequence of the strong duality property is complementary slackness. From (19) we observe that the following holds
with equality iff are optimal by the strong duality property. Complementary slackness holds for other primal-dual formulations as well, as a direct consequence of strong duality.