Professional Documents
Culture Documents
Linear and Integer Optimization - Theory and Practice, 3rd Ed, 2015
Linear and Integer Optimization - Theory and Practice, 3rd Ed, 2015
Presenting a strong and clear relationship between theory and practice, Linear and Edition
Integer Optimization: Theory and Practice is divided into two main parts. The first
More advanced topics also are presented including interior point algorithms, the branch-
and-bound algorithm, cutting planes, complexity, standard combinatorial optimization
models, the assignment problem, minimum cost flow, and the maximum flow/minimum
OPTIMIZATION
cut theorem.
The second part applies theory through real-world case studies. The authors discuss
Theory and Practice
advanced techniques such as column generation, multiobjective optimization, dynamic
optimization, machine learning (support vector machines), combinatorial optimization,
Third Edition
approximation algorithms, and game theory.
Besides the fresh new layout and completely redesigned figures, this new edition
incorporates modern examples and applications of linear optimization. The book now
includes computer code in the form of models in the GNU Mathematical Programming
Language (GMPL). The models and corresponding data files are available for download
and can be readily solved using the provided online solver.
This new edition also contains appendices covering mathematical proofs, linear algebra,
graph theory, convexity, and nonlinear optimization. All chapters contain extensive
examples and exercises.
This textbook is ideal for courses for advanced undergraduate and graduate students
in various fields including mathematics, computer science, industrial engineering,
operations research, and management science.
w w w. c rc p r e s s . c o m
LINEAR AND INTEGER
OPTIMIZATION
Theory and Practice
Third Edition
Advances in Applied Mathematics
Published Titles
Green’s Functions with Applications, Second Edition Dean G. Duffy
Linear and Integer Optimization: Theory and Practice, Third Edition
Gerard Sierksma and Yori Zwols
Markov Processes James R. Kirkwood
Pocket Book of Integrals and Mathematical Formulas, 5th Edition
Ronald J. Tallarida Stochastic Partial Differential Equations,
Second Edition Pao-Liu Chow
Gerard Sierksma
University of Groningen, The Netherlands
Yori Zwols
Google, London, United Kingdom
All code in this book is subject to the MIT open source license. See http://opensource.org/licenses/MIT.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2015 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid-
ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-
lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-
ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To Rita and Rouba
This page intentionally left blank
Conte nts vii
Contents
Preface xxv
Appendices
A Mathematical proofs 563
A.1 Direct proof · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 564
A.2 Proof by contradiction · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 565
A.3 Mathematical induction · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 565
D Convexity 597
D.1 Sets, continuous functions, Weierstrass’ theorem · · · · · · · · · · · · · · · · · · · 597
D.2 Convexity, polyhedra, polytopes, and cones · · · · · · · · · · · · · · · · · · · · · 600
Conte nts xv
Bibliography 639
List of Figures
2.2 Normal vectors pointing out of the feasible region of Model Dovetail∗ . · · · · · · 53
2.3 Vertices of the feasible region of Model Dovetail. · · · · · · · · · · · · · · · · · · 56
2.4 Extreme points and directions of F . · · · · · · · · · · · · · · · · · · · · · · · · · 56
2.5 The feasible region of a nonstandard LO-model without vertices. · · · · · · · · · · 58
2.6 Three-dimensional feasible region. · · · · · · · · · · · · · · · · · · · · · · · · · · 59
2.7 Zero-variable representation. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 67
2.8 Feasible and infeasible basic solutions. · · · · · · · · · · · · · · · · · · · · · · · · 67
2.9 Geometry-algebra relationships. · · · · · · · · · · · · · · · · · · · · · · · · · · · 68
2.10 The set BI and its complementary dual set BI c . · · · · · · · · · · · · · · · · · · 70
2.11 Degenerate vertex and redundant hyperplane. · · · · · · · · · · · · · · · · · · · · 76
2.12 Pyramid with a degenerate top. · · · · · · · · · · · · · · · · · · · · · · · · · · · 76
15.2 Owen points for different distributions of farmers of Type 1, 2, and 3. · · · · · · · 504
E.1 The functions f (x) = x2 and h(x) = x̂2 + 2x̂(x − x̂). · · · · · · · · · · · · · · · 612
E.2 The functions f (x) = |x| and h(x) = |x̂| + α(x − x̂). · · · · · · · · · · · · · · · 612
E.3 Global and local minimizers and maximizers of the function f (x) = cos(2πx)
x . · · · 615
E.4 Consumer choice model. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 616
E.5 The feasible region of the model in Example E.4.1. · · · · · · · · · · · · · · · · · 621
E.6 Nonlinear optimization model for which the KKT conditions fail. · · · · · · · · · 623
This page intentionally left blank
L i s t o f Ta b l e s xxiii
List of Tables
11.1 Relative letter frequencies (in percentages) of several newspaper articles. · · · · · · 426
11.2 Validation results for the classifier. · · · · · · · · · · · · · · · · · · · · · · · · · · 432
The past twenty years showed an explosive growth of applications of mathematical algo-
rithms. New and improved algorithms and theories appeared, mainly in practice-oriented
scientific journals. Linear and integer optimization algorithms have established themselves
among the most-used techniques for quantitative decision support systems.
Many universities all over the world used the first two editions of this book for their linear
optimization courses. The main reason of choosing this textbook was the fact that this book
shows the strong and clear relationships between theory and practice.
Linear optimization, also commonly called linear programming, can be described as the process
of transforming a real-life decision problem into a mathematical model, together with the
process of designing algorithms with which the mathematical model can be analyzed and
solved, resulting in a proposal that may support the procedure of solving the real-life problem.
The mathematical model consists of an objective function that has to be maximized or
minimized, and a finite number of constraints, where both the objective function and the
constraints are linear in a finite number of decision variables. If the decision variables are
restricted to be integer-valued, the linear optimization model is called an integer (linear)
optimization model.
Linear and integer optimization are among the most widely and successfully used decision
tools in the quantitative analysis of practical problems where rational decisions have to be
made. They form a main branch of management science and operations research, and they
are indispensable for the understanding of nonlinear optimization. Besides the fundamental
role that linear and integer optimization play in economic optimization problems, they
are also of great importance in strategic planning problems, in the analysis of algorithms,
in combinatorial problems, and in many other subjects. In addition to its many practical
applications, linear optimization has beautiful mathematical aspects. It blends algorithmic
and algebraic concepts with geometric ones.
xxv
xxvi P r e f ac e
Organization
This book starts at an elementary level. All concepts are introduced by means of simple
prototype examples which are extended and generalized to more realistic settings. The
reader is challenged to understand the concepts and phenomena by means of mathematical
arguments. So besides the insight into the practical use of linear and integer optimization,
the reader obtains a thorough knowledge of its theoretical background. With the growing
need for very specific techniques, the theoretical knowledge has become more and more
practically useful. It is very often not possible to apply standard techniques in practical
situations. Practical problems demand specific adaptations of standard models, which are
efficiently solvable only with a thorough mathematical understanding of the techniques.
The book consists of two parts. Part I covers the theory of linear and integer optimization.
It deals with basic topics such as Dantzig’s simplex algorithm, duality, sensitivity analysis,
integer optimization models, and network models, as well as more advanced topics such as
interior point algorithms, the branch-and-bound algorithm, cutting planes, and complex-
ity. Part II of the book covers case studies and more advanced techniques such as column
generation, multiobjective optimization, and game theory.
All chapters contain an extensive number of examples and exercises. The book contains five
appendices, a list of symbols, an author index, and a subject index. The literature list at the
end of the book contains the relevant literature usually from after 1990.
Examples, computer exercises, and advanced material are marked with icons in the margin:
Overview of Part I
In Chapter 1, the reader is introduced to linear optimization. The basic concepts of linear
optimization are explained, along with examples of linear optimization models. Chapter 2
introduces the mathematical theory needed to study linear optimization models. The ge-
ometry and the algebra, and the relationship between them, are explored in this chapter. In
Chapter 3, Dantzig’s simplex algorithm for solving linear optimization problems is devel-
oped. Since many current practical problems, such as large-scale crew scheduling problems,
may be highly degenerate, we pay attention to this important phenomenon. For instance,
its relationship with multiple optimal solutions and with shadow prices is discussed in de-
tail. This discussion is also indispensable for understanding the output of linear optimization
computer software. Chapter 4 deals with the crucial concepts of duality and optimality, and
Chapter 5 offers an extensive account to the theory and practical use of sensitivity analysis.
In Chapter 6, we discuss the interior path version of Karmarkar’s interior point algorithm
for solving linear optimization problems. Among all versions of Karmarkar’s algorithm, the
interior path algorithm is one of the most accessible and elegant. The algorithm determines
optimal solutions by following the so-called interior path through the (relative) interior
of the feasible region towards an optimal solution. Chapter 7 deals with integer linear opti-
mization, and discusses several solution techniques such as the branch-and-bound algorithm,
and Gomory’s cutting-plane algorithm. We also discuss algorithms for mixed-integer lin-
ear optimization models. Chapter 8 can be seen as an extension of Chapter 7; it discusses
xxviii P r e f ac e
the network simplex algorithm, with an application to the transshipment problem. It also
presents the maximum flow/minimum cut theorem as a special case of linear optimization
duality. Chapter 9 deals with computational complexity issues such as polynomial solvability
and NP-completeness. With the use of complexity theory, mathematical decision problems
can be partitioned into ‘easy’ and ‘hard’ problems.
Overview of Part II
The chapters in Part II of this book discuss a number of (more or less) real-life case studies.
These case studies reflect both the problem-analyzing and the problem-solving ability of
linear and integer optimization. We have written them in order to illustrate several advanced
modeling techniques, such as network modeling, game theory, and machine learning, as well
as specific solution techniques such as column generation and multiobjective optimization.
The specific techniques discussed in each chapter are listed in Table 0.1.
Acknowledgments
We are grateful to a few people who helped and supported us with this book — to Vašek
Chvátal, Peter van Dam, Shane Legg, Cees Roos, Gert Tijssen, and Theophane Weber, to
the LATEX community at tex.stackexchange.com, and to our families without whose
support this book would not have been written.
Groningen and London, January 2015 Gerard Sierksma and Yori Zwols
P r e f ac e xxix
Chapter 1
Introduction
Chapter 10 Chapter 2
Designing a reservoir Geometry and algebra
Section 3.9
Chapter 11 Sections 3.1–3.8
Revised sim-
Classification Simplex algorithm
plex algorithm
Sections 4.1–
Sections 4.6–4.7
4.4, except 4.3.4
Dual simplex algorithm
Duality
Section 4.3.4
Chapters 12, 13 Sections 5.1–5.5
Strong comple-
Production planning Sensitivity analysis
mentary slackness
Chapter 17 Chapter 8
Helicopter scheduling Network optimization
Chapter 18 Chapter 9
Catering problem Complexity theory
Overview
In 1827, the French mathematician Jean-Baptiste Joseph Fourier (1768–1830) published
a method for solving systems of linear inequalities. This publication is usually seen as the
first account on linear optimization. In 1939, the Russian mathematician Leonid V. Kan-
torovich (1912–1986) gave linear optimization formulations of resource allocation problems.
Around the same time, the Dutch economist Tjalling C. Koopmans (1910–1985) formu-
lated linear optimization models for problems arising in classical, Walrasian (Léon Walras,
1834–1910), economics. In 1975, both Kantorovich and Koopmans received the Nobel
Prize in economic sciences for their work. During World War II, linear optimization mod-
els were designed and solved for military planning problems. In 1947, George B. Dantzig
(1914–2005) invented, what he called, the simplex algorithm. The discovery of the simplex
algorithm coincided with the rise of the computer, making it possible to computerize the
calculations, and to use the method for solving large-scale real life problems. Since then, lin-
ear optimization has developed rapidly, both in theory and in application. At the end of the
1960’s, the first software packages appeared on the market. Nowadays linear optimization
problems with millions of variables and constraints can readily be solved.
Linear optimization is presently used in almost all industrial and academic areas of quantita-
tive decision making. For an extensive – but not exhaustive – list of fields of applications of
linear optimization, we refer to Section 1.6 and the case studies in Chapters 10–11. More-
over, the theory behind linear optimization forms the basis for more advanced nonlinear
optimization.
In this chapter, the basic concepts of linear optimization are discussed. We start with a
simple example of a so-called linear optimization model (abbreviated to LO-model) containing
two decision variables. An optimal solution of the model is determined by means of the
‘graphical method’. This simple example is used as a warming up exercise for more realistic
cases, and the general form of an LO-model. We present a few LO-models that illustrate
1
2 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
the use of linear optimization, and that introduce some standard modeling techniques. We
also describe how to use an online linear optimization package to solve an LO-model.
x1 = the number of boxes (×100,000) of long matches to be made the next year,
x2 = the number of boxes (×100,000) of short matches to be made the next year.
The company makes a profit of 3 (×$1,000) for every 100,000 boxes of long matches, which
means that for x1 (×100,000) boxes of long matches, the profit is 3x1 (×$1,000). Similarly,
for x2 (×100,000) boxes of short matches the profit is 2x2 (×$1,000). Since Dovetail aims
at maximizing its profit, and it is assumed that Dovetail can sell its full production, the
objective of Dovetail is:
The function 3x1 + 2x2 is called the objective function of the problem. It is a function of the
decision variables x1 and x2 . If we only consider the objective function, it is obvious that
the production of matches should be taken as high as possible. However, the company also
has to take into account a number of constraints. First, the machine capacity is 9 (×100,000)
boxes per year. This yields the constraint:
x1 + x2 ≤ 9. (1.1)
Third, the numbers of available boxes for long and short matches is restricted, which means
that x1 and x2 have to satisfy:
x1 ≤ 7, (1.3)
and x2 ≤ 6. (1.4)
The inequalities (1.1) – (1.4) are called technology constraints. Finally, we assume that only
nonnegative amounts can be produced, i.e.,
x1 , x2 ≥ 0.
The inequalities x1 ≥ 0 and x2 ≥ 0 are called nonnegativity constraints. Taking together the
six expressions formulated above, we obtain Model Dovetail:
Model Dovetail.
In this model ‘s.t.’ means ‘subject to’. Model Dovetail is an example of a linear optimization
model. We will abbreviate ‘linear optimization model’ as ‘LO-model’. The term ‘linear’
refers to the fact that the objective function and the constraints are linear functions of the
decision variables x1 and x2 . In the next section we will determine an optimal solution (also
called optimal point) of Model Dovetail, which means that we will determine values of x1 and
x2 satisfying the constraints of the model, and such that the value of the objective function
is maximum for these values.
LO-models are often called ‘LP-models’, where ‘LP’ stands for linear programming. The word
‘programming’ in this context is an old-fashioned word for optimization, and has nothing
to do with the modern meaning of programming (as in ‘computer programming’). We
therefore prefer to use the word ‘optimization’ to avoid confusion.
x2 x2
(1.3)
9
(1.2)
v3 (1.4)
6 v
4
v2
v1 (1.1)
0 x1 0 6 x1
Figure 1.1: Nonnegativity constraints. Figure 1.2: The feasible region of Model Dovetail.
x1 + x2 ≤ 9 are located. Figure 1.2 is obtained by doing this for all constraints. We end up
with theh region 0v1 v2 v3 v4 , which is called the feasible region of the model; it contains the
x1
i
points x that satisfy the constraints of the model. The points 0, v1 , v2 , v3 , and v4 are
2
called the vertices of the feasible region. It can easily be calculated that:
1
6 42 3 0
v1 = , v2 = 1 , v3 = , and v4 = .
0 42 6 6
In Figure 1.2, we also see that constraint (1.3) can be deleted without changing the feasible
region. Such a constraint is called redundant with respect to the feasible region. On the other
hand, there are reasons for keeping this constraint in the model. For example, when the
right hand side of constraint (1.2) is sufficiently increased (thereby moving the line in Figure
1.2 corresponding to (1.3) to the right), constraint (1.3) becomes nonredundant again. See
Chapter 5.
Next, we determine the points in the feasible region that attain the maximum value of the
objective function. To that end, we drawh iniFigure 1.2 a number of so-called level lines.
x
A level line is a line for which all points x1 on it have the same value of the objective
2
function. In Figure 1.3, five level lines are drawn, namely 3x1 + 2x2 = 0, 6, 12, 18, and
24. The arrows in Figure 1.3 point in the direction of increasing values of the objective
function 3x1 + 2x2 . These arrows are in fact perpendicular to the level lines.
In order to find an optimal solution using Figure 1.3, we start with a level line corresponding
to a small objective value (e.g., 6) and then (virtually) ‘move’ it in the direction of the arrows,
so that the values of the objective function increase. We stop moving the level line when it
reaches the boundary of the feasible region, so that moving the level line any further would
mean that no point of it would lie in the region 0v1 v2 v3 v4 . This happens for the level line
3x
" 1# + 2x2 = 4 12 . This level line intersects the feasible region at exactly one point, namely
1
42
. Hence, the optimal solution is x∗1 = 4 21 , x2∗ = 4 12 , and the optimal objective value
4 12
1. 2 . D e f i niti on of an L O - mode l 5
z 0
v3
x2 0
v2
0
v4
0
24 v1
18 x2
12
v4
6 v3 v2
0
0
x1 v1 x1
0
Figure 1.3: Level lines and the feasible region. Figure 1.4: Three-dimensional picture.
is 22 21 . Note that the optimal point is a vertex of the feasible region. This fact plays a crucial
role in linear optimization; see Section 2.1.2. Also note that this is the only optimal point.
In Figure 1.4, the same model is depicted in three-dimensional space. The values of z =
3x1 + 2x2 on the region 0v1 v2 v3 v4 form the region 0v10 v20 v30 v40 . From Figure 1.4 it is
obvious that the point v2 with coordinate values x1 = 4 21 and x2 = 4 21 is the optimal
solution. At v2 , the value of the objective function is z ∗ = 22 12 , which means that the
maximum profit is $22,500. This profit is achieved by producing 450,000 boxes of long
matches and 450,000 boxes of short matches.
max c1 x1 + . . . + cn xn , or min c1 x1 + . . . + cn xn ,
respectively. In the case of Model Dovetail the objective is max 3x1 + 2x2 and the
objective function is 3x1 + 2x2 . The value of the objective function at a point x is
called the objective value at x.
I Technology constraints. A technology constraint of an LO-model is either a ‘≤’, a
‘≥’, or an ‘=’ expression of the form:
where (≤, ≥, =) means that either the sign ‘≤’, or ‘≥’, or ‘=’ holds. The entry aij is the
coefficient of the j ’th decision variable xj in the i’th technology constraint. Let m be
the number of technology constraints. All (left hand sides of the) technology constraints
are linear functions of the decision variables x1 , . . . , xn .
I Nonnegativity and nonpositivity constraints. A nonnegativity constraint of an LO-
model is an inequality of the form xi ≥ 0; similarly, a nonpositivity constraint is of the
form xi ≤ 0. It may also happen that a variable xi is not restricted by a nonnegativity
constraint or a nonpositivity constraint. In that case, we say that xi is a free or unrestricted
variable. Although nonnegativity and nonpositivity constraints can be written in the
form of a technology constraint, we will usually write them down separately.
For i ∈ {1, . . . , m} and j ∈ {1, . . . , n}, the real-valued entries aij , bi , and cj are called
the parameters of the model. The technology constraints, nonnegativity and nonpositivity
constraints together are referred to as the constraints (or restrictions) of the model.
A vector x ∈ Rn that satisfies all constraints is called a feasible point or feasible solution of
the model. The set of all feasible points is called the feasible region of the model. An LO-
model is called feasible if its feasible region is nonempty; otherwise, it is called infeasible. An
optimal solution of a maximizing (minimizing) LO-model is a point in the feasible region with
maximum (minimum) objective value, i.e., a point such that there is no other point with a
larger (smaller) objective value. Note that there may be more than one optimal solution, or
none at all. The objective value at an optimal solution is called the optimal objective value.
Let x ∈ Rn . A constraint is called binding at the point x if it holds with equality at x. For
example, in Figure 1.2, the constraints (1.1) and (1.1) are binding at the point v2 , and the
other constraints are not binding. A constraint is called violated at the point x if it does not
hold at x. So, if one or more constraints are violated at x, then x does not lie in the feasible
region.
1. 2 . D e f i niti on of an L O - mode l 7
A maximizing LO-model with only ‘≤’ technology constraints and nonnegativity con-
straints can be written as follows:
max c1 x1 + . . . + cn xn
s.t. a11 x1 + . . . + a1n xn ≤ b1
.. .. ..
. . .
am1 x1 + . . . + amn xn ≤ bm
x1 , . . . , xn ≥ 0.
Using the summation sign ‘ ’, this can also be written as:
P
n
X
max cj xj
j=1
Xn
s.t. aij xj ≤ bi for i = 1, . . . , m
j=1
x1 , . . . , xn ≥ 0.
In terms of matrices there is an even shorter notation. The superscript ‘T’ transposes a row
vector into a column vector, and an (m, n) matrix into an (n, m) matrix (m, n ≥ 1). Let
T T
c = c1 . . . cn ∈ Rn , b = b1 . . . bm ∈ Rm ,
a11 . . . a1n
x = x1 . . . xn ∈ Rn , and A = ... ..
T m×n
. ∈R .
am1 . . . amn
The matrix A is called the technology matrix (or coefficients matrix), c is the objective vector, and
b is the right hand side vector of the model. The LO-model can now be written as:
max cT x Ax ≤ b, x ≥ 0 ,
where 0 ∈ Rn is the n-dimensional all-zero vector. We call this form the standard form of an
LO-model (see also Section 1.3). It is a maximizing model with ‘≤’ technology constraints,
and nonnegativity constraints. The feasible region F of the standard LO-model satisfies:
F = {x ∈ Rn | Ax ≤ b, x ≥ 0}.
x2
(1.3)
(1.2)
v3 (1.4)
v4
v5
v2
(1.5)
(1.1)
0 v6 v1 x1
∗
Figure 1.5: The feasible region of Model Dovetail .
Suppose that we want to add the following additional constraint to Model Dovetail. The
manager of Dovetail has an agreement with retailers to deliver a total of at least 500,000
boxes of matches next year. Using our decision variables, this yields the new constraint:
x1 + x2 ≥ 5. (1.5)
Instead of a ‘≤’ sign, this inequality contains a ‘≥’ sign. With this additional constraint,
Figure 1.3 changes into Figure 1.5. From Figure 1.5 one can graphically derive that the
optimal solution (x∗1 = x∗2 = 4 12 ) is not affected by adding the new constraint (1.5).
Including constraint (1.5) in Model Dovetail, yields Model Dovetail∗ :
Model Dovetail∗ .
In order to write this model in the standard form, the ‘≥’ constraint has to be transformed
into a ‘≤’ constraint. This can be done by multiplying both sides of it by −1. Hence,
x1 + x2 ≥ 5 then becomes −x1 − x2 ≤ −5. Therefore, the standard form of Model
1. 2 . D e f i niti on of an L O - mode l 9
Dovetail∗ is:
1 1 9
x1 3 1 18
x1 x1
0
max 3 2 1 0 ≤ 7 , ≥ .
x2 x
2
x
2 0
0 1 6
−1 −1 −5
Similarly, a constraint with ‘=’ can be put into standard form by replacing it by two ‘≤’
constraints. For instance, 3x1 − 8x2 = 11 can be replaced Tby 3x1 − 8x2 ≤ 11 and
−3x1 + 8x2 ≤ −11. Also, the minimizing LO-model min c x Ax ≤ b, x ≥ 0 can
be written in standard form, since
min cT x Ax ≤ b, x ≥ 0 = − max −cT x Ax ≤ b, x ≥ 0 ;
x1 + x2 ≤ 9. (1.1)
This constraint expresses the fact that the machine can produce at most 9 (× 100,000) boxes
per year. We may wonder whether there is excess machine capacity (overcapacity) in the case
of the optimal solution. For that purpose, we introduce an additional nonnegative variable
x3 in the following way:
x1 + x2 + x3 = 9.
The variable x3 is called the slack variable of constraint (1.1). Its optimal value, called the
slack, measures the unused capacity of the machine. By requiring that x3 is nonnegative, we
can avoid the situation that x1 + x2 > 9, which would mean that the machine capacity is
exceeded and the constraint x1 + x2 ≤ 9 is violated. If, at the optimal solution, the value
of x3 is zero, then the machine capacity is completely used. In that case, the constraint is
binding at the optimal solution.
Introducing slack variables for all constraints of Model Dovetail, we obtain the following
model:
10 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
In this model, x3 , x4 , x5 , and x6 are the nonnegative slack variables of the constraints (1.1),
(1.2), (1.3), and (1.4), respectively. The number of slack variables is therefore equal to the
number of inequality constraints of the model. In matrix notation the model becomes:
x1 x1 0
1 1 1 0 0 0 x 9 x 0
2 2
x1 3 1 0 1 0 0 x3 18 x3 0
max 3 2 = , ≥ .
x2 1 0 0 0 1 0 x4 7 x4 0
0 1 0 0 0 1 x5 6 x5 0
x6 x6 0
If Im denotes the identity matrix with m rows and m columns (m ≥ 1), then the general
form of an LO-model with slack variables can be written as:
T
x
max c x A Im = b, x ≥ 0 ,
xs
max x1 + x2 ,
then all points on the line segment v2 v3 (see Figure 1.2) have the same optimal objective
value, namely 9, and therefore all points on the line segment v2 v3 are optimal. In this case,
we say that there are multiple optimal solutions; see also Section 3.7 and Section 5.6.1. The
feasible region has two optimal vertices, namely v2 and v3 .
Three types of feasible regions can be distinguished, namely:
1. 2 . D e f i niti on of an L O - mode l 11
x2 x2 x2
0 x1 0 x1 0 x1
(a) Bounded model, bounded (b) Unbounded model, (c) Bounded model, unbounded
feasible region. unbounded feasible region. feasible region.
I Feasible region bounded and nonempty. A feasible region is called bounded if all
decision variables are bounded on the feasible region (i.e., no decision variable can take
on arbitrarily large values on the feasible region). An example is drawn in Figure 1.6(a). If
the feasible region is bounded, then the objective values are also bounded on the feasible
region and hence an optimal solution exists. Note that the feasible region of Model
Dovetail is bounded; see Figure 1.2.
I Feasible region unbounded. A nonempty feasible region is called unbounded if it is not
bounded; i.e., at least one of the decision variables can take on arbitrarily large values on
the feasible region. Examples of an unbounded feasible region are shown in Figure 1.6(b)
and Figure 1.6(c). Whether an optimal solution exists depends on the objective function.
For example, in the case of Figure 1.6(b) an optimal solution does not exist. Indeed, the
objective function takes on arbitrarily large values on the feasible region. Therefore, the
model has no optimal solution. An LO-model with an objective function that takes on
arbitrarily large values is called unbounded; it is called bounded otherwise. On the other
hand, in Figure 1.6(c), an optimal solution does exist. Hence, this is an example of an
LO-model with an unbounded feasible region, but with a (unique) optimal solution.
I Feasible region empty. In this case we have that F = ∅ and the LO-model is called
infeasible. For example, if an LO-model contains the (contradictory) constraints x1 ≥ 6
and x1 ≤ 3, then its feasible region is empty. If an LO-model is infeasible, then it has
no feasible points and, in particular, no optimal solution. If F 6= ∅, then the LO-model
is called feasible.
So, an LO-model either has an optimal solution, or it is infeasible, or it is unbounded.
Note that an unbounded LO-model necessarily has an unbounded feasible region, but the
converse is not true. In fact, Figure 1.6(c) shows an LO-model that is bounded, although it
has an unbounded feasible region.
12 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
with A ∈ Rm×n . We call this form the standard form of an LO-model. The standard form
is characterized by a maximizing objective, ‘≤’ technology constraints, and nonnegativity
constraints.
In general, many different forms may be encountered, for instance with both ‘≥’ and ‘≤’
technology constraints, and both nonnegativity (xi ≥ 0) and nonpositivity
T constraints (x i ≤
0). All these forms can be reduced to the standard form max c x Ax ≤ b, x ≥ 0 .
The following rules can be applied to transform a nonstandard LO-model into a standard
model:
I A minimizing model is transformed into a maximizing model by using the fact that
minimizing a function is equivalent to maximizing minus that function. So, the objective
of the form ‘min cT x’ is equivalent to the objective ‘− max(−c)T x’. For example,
‘min x1 + x2 ’ is equivalent to ‘− max −x1 − x2 ’.
I A ‘≥’ constraint is transformed into a ‘≤’ constraint by multiplying both sides of the
inequality by −1 and reversing the inequality sign. For example, x1 − 3x2 ≥ 5 is
equivalent to −x1 + 3x2 ≤ −5.
I A ‘=’ constraint of the form ‘aT x = b’ can be written as ‘aT x ≤ b and aT x ≥ b’. The
second inequality in this expression is then transformed into a ‘≤’ constraint (see the
previous item). For example, the constraint ‘2x1 +x2 = 3’ is equivalent to ‘2x1 +x2 ≤ 3
and −2x1 − x2 ≤ −3’.
I A nonpositivity constraint is transformed into a nonnegativity constraint by replacing
the corresponding variable by its negative. For example, the nonpositivity constraint
‘x1 ≤ 0’ is transformed into ‘x01 ≥ 0’ by substituting x1 = −x10 .
I A free variable is replaced by the difference of two new nonnegative variables. For
example, the expression ‘x1 free’ is replaced by ‘x10 ≥ 0, x001 ≥ 0’, and substituting
x1 = x01 − x100 .
The following two examples illustrate these rules.
Example 1.3.1. Consider the nonstandard LO-model:
In addition to being a minimizing model, the model has a ‘≥’ constraint and a ‘=’ constraint. By
applying the above rules, the following equivalent standard form LO-model is found:
point ∗
x∗1
∗ x1 3
x = ∗ = =
x2 (x20 )∗ − (x002 )∗ −2
T
is an optimal solution of model (1.8). Note that x̂0 = 3 10 12 is another optimal solution of
(1.9) (why?), corresponding to the same optimal solution x∗ of model (1.8). In fact, the reader may
verify that every point in the set
3
α ∈ R3 α ≥ 0
2+α
is an optimal solution of (1.9) that corresponds to the optimal solution x∗ of model (1.8).
14 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
We have listed six possible general nonstandard models below. Any method for solving
one of the models (i)–(vi) can be used to solve the others, because they are all equivalent.
The matrix A in (iii) and (vi) is assumed to be of full row rank (i.e., rank(A) = m; see
Appendix B and Section 3.8). The alternative formulations are:
(i) max cT x Ax ≤ b, x ≥ 0 ; (iv) min cT x Ax ≥ b, x ≥ 0 ;
x0 x0
0 00
T T
= max c −c A −A ≤ b, x ≥ 0, x ≥ 0 ,
x00 x00
and this has the form (i). The reduction of (i) to (iii) follows by introducing slack variables
in (i). Formulation (iii) can be reduced to (i) by noticing that the constraints Ax = b can
be written as the two constraints Ax ≤ b and Ax ≥ b. Multiplying the former by −1
on both sides yields −Ax ≤ −b. Therefore, (iii) is equivalent to:
T
A b
max c x x≤ .
−A −b
The disadvantage of this transformation is that the model becomes considerably larger. In
Section 3.8, we will see an alternative, more economical, reduction of (iii) to the standard
form. Similarly, (iv), (v), and (vi) are equivalent. Finally, (iii) and (vi) are equivalent because
n o
T
min cT x Ax = b, x ≥ 0 = − max (−c) x Ax = b, x ≥ 0 .
needs to be solved. The purpose of the package is to find an optimal solution to that
LO-model. As described in Section 1.2.3, it is not always possible to find an optimal
solution because the LO-model may be infeasible or unbounded. Another reason why
a solver might fail is because there may be a limit on the amount of time that is used to
find a solution, or because of round-off errors due to the fact that computer algorithms
generally do calculations with limited precision using so-called floating-point numbers (see
also the discussion in Section 3.6.1). Examples of linear optimization solvers are cplex and
the linprog function in matlab.
I Linear optimization modeling languages. Although any LO-model can be cast in a
form that may serve as an input for a linear optimization solver (see Section 1.3), it is often
times rather tedious to write out the full cost vector, the technology matrix, and the right
hand side vector. This is, for example, the case when we have variables x1 , . . . , x100 and
P100
we want the objective of the LO-model to maximize i=1 xi . Instead of having to type a
hundred times the entry 1 in a vector, we would prefer to just tell the computer to take the
sum of these variables. For this purpose, there are a few programming languages available
that allow the user to write LO-models in a more compact way. The purpose of the linear
optimization programming package is to construct the cost vector, the technology matrix,
and the right hand side vector from an LO-model written in that language. Examples
of such linear optimization programming languages are gnu MathProg (also known as
gmpl), ampl, gams, and aimms.
We will demonstrate the usage of a linear optimization package by using the online solver
provided on the website of this book. The online solver is able to solve models that are
written in the gnu MathProg language. To solve Model Dovetail using the online solver,
the following steps need to be taken:
I Start the online solver from the book website: http://www.lio.yoriz.co.uk/.
I In the editor, type the following code (without the line numbers).
1 var x1 >= 0;
2 var x2 >= 0;
3
4 maximize z: 3*x1 + 2*x2;
5
6 subject to c11: x1 + x2 <= 9;
7 subject to c12: 3*x1 + x2 <= 18;
8 subject to c13: x1 <= 7;
9 subject to c14: x2 <= 6;
10
11 end;
This is the representation of Model Dovetail in the MathProg language. Since Model
Dovetail is a relatively simple model, its representation is straightforward. For more details
on how to use the MathProg language, see Appendix F.
16 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
I Press the ‘Solve’ button to solve the model. In this step, a few things happen. The
program first transforms the model from the previous step into a cost vector, technology
matrix, and right hand side vector in some standard form of the LO-model (this standard
form depends on the solver and need not correspond to the standard form we use in
this book). Then, the program takes this standard form model and solves it. Finally,
the solution of the standard-form LO-model is translated back into the language of the
original model.
I Press the ‘Solution’ button to view the solution. Among many things that might sound
unfamiliar for now, the message states that the solver found an optimal solution with
objective value 22.5. Also, the optimal values of x1 and x2 are listed. As should be
expected, they coincide with the solution we found earlier in Section 1.1.2 by applying
the graphical solution method.
which is linear. Note that the left hand side of (1.10) is not defined if x1 and x2 both have
value zero; the expression 34 x1 − 32 x2 ≥ 0 does not have this problem. Adding the constraint
to Model Dovetail yields the optimal solution x∗1 = 5 71 , x∗2 = 2 47 with corresponding
objective value 20 47 . The profit from long matches is 15 37 (×$1,000), which is exactly 75%
of the total profit, as required.
1. 5. L i neari z i ng nonl i near f unc ti on s 17
min u1 + u2 + x2
s.t. 3u1 − 3u2 + 2x2 ≥ 1
(1.13)
u1 u2 =0
x2 , x3 , u2 ≥ 0.
This model is still nonlinear. However, the constraint u1 u2 = 0 can be left out. That is, we
claim that it suffices to solve the following optimization model (which is an LO-model):
min u1 + u2 + x2
s.t. 3u1 − 3u2 + 2x2 ≥ 1 (1.14)
u1 , u2 , x2 ≥ 0.
To see that the constraint u1 u2 = 0 may be left out, we will show that it is automatically
T
satisfied at any optimal solution of (1.14). Let x∗ = u1∗ u∗2 x∗2 be an optimal solution of
(1.14). Suppose for a contradiction that u∗1 u2∗ 6= 0. This implies that both u∗1 > 0 and u2∗ >
T T
0. Let ε = min{u∗1 , u2∗ } > 0, and consider x̂ = û1 û2 x̂2 = u1∗ −ε u∗2 −ε x∗2 .
It is easy to verify that x̂ a feasible solution of (1.14), and that the corresponding objective
value ẑ satisfies ẑ = u∗1 − ε + u2∗ − ε + x2∗ < u1∗ + u2∗ + x∗2 . Thus, we have constructed a
feasible solution x̂ of (1.14), the objective value of which is smaller than the objective value
of x∗ , contradicting the fact that x∗ is an optimal solution of (1.14). Hence, the constraint
u1 u2 = 0 is automatically satisfied by any optimal solution of (1.14) and, hence, any optimal
solution of (1.14) is also an optimal solution of (1.13). We leave it to the reader to show that
18 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
f (x1 )
18
3
0 3 8 x1
T T
if x∗ = u1∗ u2∗ x∗2 is an optimal solution of (1.14), then x1∗ x∗2 with x∗1 = u∗1 − u∗2
min u2 + x2 min u1 + x2
s.t. −3u2 + 2x2 ≥ 1 and s.t. 3u1 + 2x2 ≥ 1 (1.15)
u2 , x2 ≥ 0, u1 , x2 ≥ 0.
Since both LO-models have two decision variables, they can be solved using the graphical
T T
method. The optimal solutions are 0 21 (with optimal objective value 12 ) and 13 0
(with optimal objective value 13 ), respectively. The optimal solution of (1.14) is found by
choosing the solution among these two that has the smallest objective value. This gives
u1∗ = 31 , u2∗ = 0, and x∗2 = 0, with optimal objective value 31 . The corresponding optimal
solution of (1.11) satisfies x∗1 = u1∗ − u∗2 = 13 and x∗2 = 0.
It is important to realize that it is not true that every feasible solution of (1.13) corresponds to
T T
a feasible solution of (1.11). For example, the vector u1 u2 x2 = 2 1 0 is a feasible
T
solution of (1.13) with objective value 3. However, the corresponding vector x1 x2 =
T
1 0 in (1.11) has objective value 1. The reason for this mismatch is the fact that u1 and
u2 are simultaneously nonzero. Recall that this never happens at an optimal feasible solution
of (1.13).
This method also works for a maximizing objective in which the absolute value appears
with a negative coefficient. However, it does not work for a maximizing objective in which
the absolute value appears in the objective function with a positive coefficient. The reader
is asked to verify this in Exercise 1.8.10.
1. 5. L i neari z i ng nonl i near f unc ti on s 19
Figure 1.7 shows the graph of f (x1 ). The function f (x1 ) is called piecewise linear, because
it is linear on each of the intervals [0, 3], (3, 8], and (8, ∞) separately. Like in the previous
subsection, model (1.16) can be solved by an alternative formulation. We start by introducing
three new nonnegative decision variables u1 , u2 , and u3 . They will have the following
relationship with x1 :
0 if 0 ≤ x1 ≤ 3
x1 if 0 ≤ x1 ≤ 3 if x1 ≤ 8
0
u1 = u2 = x1 − 3 if 3 < x1 ≤ 8 u3 =
3 if x1 > 3, x1 − 8 if x1 > 8.
if x > 8,
5
1
u2 (3 − u1 ) = 0 and u3 (5 − u2 ) = 0.
The first equation states that either u2 = 0, or u1 = 3, or both. Informally, this says that u2
has a positive value only if u1 is at its highest possible value 3. The second equation states
20 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
that either u3 = 0, or u2 = 5, or both. This says that u3 has a positive value only if u3 is at
its highest possible value 5. The two equations together imply that u3 has a positive value
only if both u1 and u2 are at their respective highest possible values (namely, u1 = 3 and
u2 = 5).
Adding these equations to (1.16) and substituting the expressions of (1.17), we obtain the
following (nonlinear) optimization model:
(1.18), and let z ∗ be the corresponding optimal objective value. Suppose for a contradiction
that u∗2 (3 − u1∗ ) 6= 0. Because u2∗ ≥ 0 and u1∗ ≤ 3, this implies that u∗1 < 3 and u2∗ > 0.
Let ε = min{3 − u1∗ , u2∗ } > 0, and define û1 = u∗1 + ε, û2 = u∗2 − ε, û3 = u∗3 , and
T
x̂2 = x∗2 . It is straightforward to check that the vector x̂ = û1 û2 û3 x̂2 is a feasible
contrary to the fact that x∗ is an optimal solution of (1.18). Therefore, (3 − u∗1 )u∗2 = 0
T
is satisfied for any optimal solution u∗1 u2∗ u3∗ x2∗ of (1.18), and hence the constraint
In Exercise 1.8.12, the reader is given a model with a nonconvex piecewise linear objective
function and is asked to show that the technique does not work in that case.
The current section contains a number of linear optimization models that illustrate the wide
range of applications from real world problems. They also illustrate the variety and the com-
plexity of the modeling process. See also Chapters 10–11 for more real world applications.
Finding an optimal solution of this LO-model can be done by elimination as follows. First,
subtracting ten times the first constraint from the second one yields:
Similarly, subtracting five times the first constraint from the second one yields:
Substituting these expressions for x2 and x3 into the original model, we obtain:
1 var x1 >= 0;
2 var x2 >= 0;
3 var x3 >= 0;
4
5 minimize z:
6 40 * x1 + 100 * x2 + 150 * x3;
7
8 subject to vitaminA:
9 x1 + 2 * x2 + 2 * x3 = 3;
10
11 subject to vitaminC:
12 30 * x1 + 10 * x2 + 20 * x3 = 75;
13
14 end;
Listing 1.2: The diet problem.
amount of the federal income tax to be paid by person i. It is assumed that the values of
T T
the eleven vectors a1 = a11 a12 , . . ., a11 = a11,1 a11,2 , together with the values of
b1 , . . . , b11 are known. Table 1.1 lists an example data set for eleven persons of whom we
collected the values of two attributes, a1 and a2 . In Figure 1.8, we have plotted these data
points.
The question now is: how can we use this data set in order to estimate the federal income tax
b of any given person (who is not in the original data set) based on the values of a given profile
T
vector a? To that end, we construct a graph ‘through’ the data points a11 . . . a1m b , . . .,
T
an1 . . . anm b in such a way that the total distance between these points and this graph
is as small as possible. Obviously, how small or large this total distance is depends on the
shape of the graph. In Figure 1.8, the shape of the graph is a plane in three-dimensional space.
In practice, we may take either a convex or a concave graph (see Appendix D). However,
when the data points do not form an apparent ‘shape’, we may choose a hyperplane. This
hyperplane is constructed in such a way that the sum of the deviations of the n data points
T
ai1 . . . aim bi (i = 1, . . . , n) from this hyperplane is as small as possible. The general
(see also Section 2.1.1) with variables a (∈ Rm ) and b (∈ R). The values of the parameters
u (∈ Rm ) and v (∈ R) need to be determined such that the total deviation between the n
T
points ai1 . . . aim bi (for i = 1, . . . , n) and the hyperplane is as small as possible. As
the deviation of the data points from the hyperplane, we use the ‘vertical’ distance. That is,
for each i = 1, . . . , n, we take as the distance between the hyperplane H and the point
T
ai1 . . . aim bi :
|uT ai + v − bi |.
In order to minimize the total deviation, we may solve the following LO-model:
min y1 + . . . + yn
s.t. −yi ≤ aiT u + v − bi ≤ yi for i = 1, . . . , n
yi ≥ 0 for i = 1, . . . , n.
In this LO-model, the variables are the entries of y, the entries of u, and v . The values of
ai and bi (i = 1, . . . , n) are given. The ‘average’ hyperplane ‘through’ the data set reads:
n T o
H(u∗ , v ∗ ) = a1 . . . am b ∈ Rm+1 b = (u∗ )T x + v ∗ ,
where u∗ and v ∗ are optimal values for u and v , respectively. Given this hyperplane, we may
now estimate the income tax to be paid by a person that is not in our data set, based on the
person’s profile. In particular, for a person with given profile â, the estimated income tax to
be paid is b̂ = (u∗ )T â+v ∗ . The optimal solution obviously satisfies yi∗ = |aTi u∗ +v ∗ −bi |,
with u∗ and v∗ the optimal values, so that the optimal value of yi∗ measures the deviation
of data point ai1 . . . aim b from the hyperplane.
Table 1.1: Federal income tax data for eleven Figure 1.8: Plot of eleven persons’ federal income tax
persons. The second column b as a function of two attributes a1 and a2 .
contains the amount of income tax The dots are the data points. The lines
paid, and the third and fourth show the distance from each data point to
columns contain the profile of each the hyperplane H.
person.
26 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
For the example with the data given in Table 1.1, the optimal solution turns out to be:
u1∗ = 249.5, u2∗ = 368.5, v ∗ = 895.5, y1∗ = 494.5, y2∗ = 1039, y3∗ = 0, y4∗ = 2348,
y5∗ = 1995.5, y6∗ = 1236.5, y7∗ = 0, y8∗ = 1178, y9∗ = 1226, y10
∗ ∗
= 0, y11 = 214.5.
Moreover, for a person with profile â = â1 â2 , the estimated income tax to be paid is
b̂ = 249.5â1 + 368.5â2 + 895.5. This means that the second attribute has more influence
∗
on the estimate than the first attribute. Also, from the values of y1∗ , . . . , y11 , we see that the
distances from the data points corresponding to persons 5 and 6 are largest, meaning that the
amount of income tax that they had to pay deviates quite a bit from the amount predicted
by our model. For example, for person 5, the estimated income tax is (249.5 × 1 + 368.5 ×
9 + 895.5 =) $4,461.50, whereas this person’s actual income tax was $2,466.00.
The model described above is a special kind of (linear) regression model. Regression models
are used widely in statistics. One of the most common such models is the so-called least
squares regression model, which differs from the model above in that the distance between
the data points and the hyperplane is not measured by the absolute value of the deviation, but
by the square of the deviation. Thus, in a least squares regression model, the objective is to
minimize the sum of the squared deviations. In addition to different choices of the distance
function, it is also possible to apply ‘better’ graphs, such as convex or concave functions.
Such models are usually nonlinear optimization models and hence these topics lie outside
of the scope of this book.
Since not every assignment of 0’s and 1’s to the xij ’s represents a legal line-up, we need to
impose some restrictions on the values of the xij ’s. For one, we need to make sure that
exactly one player is assigned to each position. This is captured by the following set of M
constraints:
XN
xij = 1 for j = 1, . . . , M .
i=1
These constraints state that, for each position j , the sum of x1j , . . . , xN j equals 1. Since
each of the xij ’s should equal either 0 or 1, this implies that exactly one of x1j , . . . , xN j
will have value 1. Moreover, although not every player has to be lined up, we require that
no player is lined up on two different positions. This is achieved by the following set of N
constraints:
XM
xij ≤ 1 for i = 1, . . . , N .
j=1
These constraints state that, for each player i, at most one of the xi1 , . . . , xiM has value 1.
In order to formulate an optimization model, we also need to add an objective function
which measures how good a given line-up is. To do so, we introduce the parameter cij
(i = 1, . . . , N and j = 1, . . . , M ), which measures how well player i fits on position j .
A question that arises is of course: how to determine the values of the cij ’s? We will come
back to this later. With the values of the cij ’s at hand, we can now write an objective. Let
us say that if player i is assigned to position j , this player contributes cij to the objective
function. Thus, the objective is to maximize the sum of the cij ’s, over all pairs i, j , such
that player i is assigned to position j . The corresponding objective function can be written
as a linear function of the xij ’s. The objective then reads:
N X
X M
max cij xij .
i=1 j=1
Since xij = 1 if and only if player i is assigned to position j , the term cij xij equals cij if
player i is assigned to position j , and 0 otherwise, as required.
Combining the constraints and the objective function, we have the following optimization
model:
XN X M
max cij xij
i=1 j=1
N
X
s.t. xij = 1 for j = 1, . . . , M
i=1
XM
xij ≤ 1 for i = 1, . . . , N
j=1
xij ∈ {0, 1} for i = 1, . . . , N and j = 1, . . . , M .
This optimization model, in its current form, is not an LO-model, because its variables are
restricted to have integer values and this type of constraint is not allowed in an LO-model.
28 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
Actually, the model is a so-called integer linear optimization model (abbreviated as ILO-model).
We will see in Chapter 7 that, in general, integer linear optimization models are hard to
solve. If we want to write the model as an LO-model, we will have to drop the constraint
that the decision variables be integer-valued. So let us consider the following optimization
model instead:
N X
X M
max cij xij
i=1 j=1
XN
s.t. xij = 1 for all j = 1, . . . , M
(1.20)
i=1
XM
xij ≤ 1 for all i = 1, . . . , N
j=1
0 ≤ xij ≤ 1 for all i = 1, . . . , N and j = 1, . . . , M .
This model is an LO-model. But recall from Model Dovetail that, in general, an LO-model
may have a solution whose coordinate values are fractional. Therefore, by finding an optimal
solution of (1.20), we run the risk of finding an optimal solution that has a fractional value
for xij for some i and j . Since it does not make sense to put only part of a player in the
field, this is clearly not desirable (although one could interpret half a player as a player who is
playing only half of the time). However, for this particular LO-model something surprising
happens: as it turns out, (1.20) always has an optimal solution for which all xij ’s are integer-
valued, i.e., they are either 0 or 1. The reason for this is quite subtle and will be described
in Chapter 8. But it does mean that (1.20) correctly models the team formation problem.
We promised to come back to the determination of the values of the cij ’s. One way is to
just make educated guesses. For example, we could let the values of cij run from 0 to 5,
where 0 means that the player is completely unfit for the position, and 5 means that the
player is perfect for the position.
A more systematic approach is the following. We can think of forming a team as a matter
of economic supply and demand of qualities. Let us define a list of ‘qualities’ that play a
role in soccer. The positions demand certain qualities, and the players supply these qualities.
The list of qualities could include, for example, endurance, speed, balance, agility, strength,
inventiveness, confidence, left leg skills, right leg skills. Let the qualities be labeled 1, . . . , Q.
For each position j , let djq be the ‘amount’ of quality q demanded on position j and, for
each player i, let siq be the ‘amount’ of quality q supplied by player i. We measure these
numbers all on the same scale from 0 to 5. Now, for all i and j , we can define how well
player i fits on position j by, for example, calculating the average squared deviation of player
i’s supplied qualities compared to the qualities demanded for position j :
Q
X
cij = − (siq − djq )2 .
q=1
1 . 6 . E xam p l e s o f l i n e a r o p t i m i z at i o n m o d e l s 29
The negative sign is present because we are maximizing the sum of the cij ’s, so a player i
that has exactly the same qualities as demanded by a position j has cij = 0, whereas any
deviation from the demanded qualities will give a negative number.
Observe that cij is a nonlinear function of the siq ’s and djq ’s. This, however, does not
contradict the definition of an LO-model, because the cij ’s are not decision variables of the
model; they show up as parameters of the model, in particular as the objective coefficients.
The current definition of cij assigns the same value to a positive deviation and a negative
deviation. Even worse: suppose that there are two players, i and i0 , say, who supply exactly
the same qualities, i.e., siq = si0 q , except for some quality q , for which we have that
siq = djq − 1 and si0 q = djq + 2 for position j . Then, although player i0 is clearly better
for position j than player i is, we have that cij > ci0 j , which does not make sense. So, a
better definition for cij should not have that property. For example, the function:
Q
X 2
cij = − min(0, siq − djq )
q=1
does not have this property. It is left to the reader to check this assertion.
produce the same outputs, namely economics, business, and mathematics graduates. The
universities differ, however, in the amounts of the inputs they use, and the amounts of the
outputs they produce. The first two columns of Table 1.2 contain the input amounts for
each university, and the last three columns contain the output amounts.
Given these data, comparing universities 1 and 2 is straightforward. University 1 has strictly
larger outputs than university 2, while using less inputs than university 2. Hence, university
1 is more efficient than university 2. But how should universities 3 and 4 be compared?
Among these two, university 3 has the larger number of graduated economics students, and
university 4 has the larger number of graduated business and mathematics students. So, the
outputs of universities 3 and 4 are hard to compare. Similarly, the inputs of universities 3
and 5 are hard to compare. In addition, it is hard to compare universities 9 and 10 to any of
the other eight universities, because they are roughly twice as large as the other universities.
In DEA, the different inputs and outputs are compared with each other by assigning weights
to each of them. However, choosing weights for the several inputs and outputs is generally
hard and rather arbitrary. For example, it may happen that different DMUs have organized
their operations differently, so that the output weights should be chosen differently. The key
idea of DEA is that each DMU is allowed to choose its own set of weights. Of course, each
DMU will then choose a set of weights that is most favorable for their efficiency assessment.
However, DMUs are not allowed to ‘cheat’: a DMU should choose the weights in such a
way that the efficiency of the other DMUs is restricted to at most 1. It is assumed that all
DMUs convert (more or less) the same set of inputs into the same set of outputs: only the
weights of the inputs and outputs may differ among the DMUs. DEA can be formulated as
a linear optimization model as follows.
Inputs Outputs
DMU Real Wages Economics Business Math
estate graduates graduates graduates
1 72 81 77 73 78
2 73 82 73 70 69
3 70 59 72 67 80
4 87 83 69 74 84
5 53 64 57 65 65
6 71 85 78 72 73
7 65 68 81 71 69
8 59 62 64 66 56
9 134 186 150 168 172
10 134 140 134 130 130
Let the DMUs be labeled (k =) 1, . . . , N . For each k = 1, . . . , N , the relative efficiency (or
efficiency rate) RE(k) of DMU k is defined as:
weighted sum of output values of DMU k
RE(k) = ,
weighted sum of input values of DMU k
where 0 ≤ RE(k) ≤ 1. DMU k is called relatively efficient if RE(k) = 1, and relatively
inefficient otherwise. Note that if DMU k is relatively efficient, then its total input value
equals its total output value. In other words, a relative efficiency rate RE(k) of DMU k
means that DMU k is able to produce its outputs with a 100RE(k) percent use of its inputs.
In order to make this definition more precise, we introduce the following notation.
Let m (≥ 1) be the number of inputs, and n (≥ 1) the number of outputs. For each
i = 1, . . . , m and j = 1, . . . , n, define:
These definitions suggest that the same set of weights, the xi ’s and the yi ’s, are used for all
DMUs. As described above, this is not the case. In DEA, each DMU is allowed to adopt
its own set of weights, namely in such a way that its own relative efficiency is maximized.
Hence, for each DMU, the objective should be to determine a set of input and output
weights that yields the highest efficiency for that DMU in comparison to the other DMUs.
The optimization model for DMU k (= 1, . . . , N ) can then be formulated as follows:
v1k y1 + . . . + vnk yn
RE ∗ (k) = max
u1k x1 + . . . + umk xm
v1r y1 + . . . + vnr yn (M1 (k))
s.t. ≤ 1 for r = 1, . . . , N
u1r x1 + . . . + umr xm
x1 , . . . , xm , y1 , . . . , yn ≥ ε.
The decision variables in this model are constrained to be at least equal to some small positive
number ε, so as to avoid any input or output becoming completely ignored in determining
the efficiencies. We choose ε = 0.00001. Recall that in the above model, for each k , the
weight values are chosen so as to maximize the efficiency of DMU k . Also note that we
have to solve N such models, namely one for each DMU.
Before we further elaborate on model (M1 (k)), we first show how it can be converted into a
linear model. Model (M1 (k)) is a so-called fractional linear model, i.e., the numerator and the
denominator of the fractions are all linear in the decision variables. Since all denominators
are positive, this fractional model can easily be converted into a common LO-model. In
order to do so, first note that when maximizing a fraction, it is only the relative value of the
numerator and the denominator that are of interest and not the individual values. Therefore,
we can set the value of the denominator equal to a constant value (say, 1), and then maximize
32 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
The peer group contains DMUs that can be set as a target for the improvement of the relative
efficiency of DMU k .
Consider again the data of Table 1.2. Since there are ten universities, there are ten models
to be solved. For example, for k = 1, the LO-model M2 (1) reads:
1 set INPUT;
2 set OUTPUT;
3
4 param N >= 1;
5 param u{1..N, INPUT}; # input values
6 param v{1..N, OUTPUT}; # output values
7
1 . 6 . E xam p l e s o f l i n e a r o p t i m i z at i o n m o d e l s 33
The optimal solution reads: y1∗ = 0.01092, y2∗ = 0.00264, x∗1 = 0.00505, x∗2 = 0.00001,
x3∗ = 0.00694, with optimal objective value 0.932. Hence, the relative efficiency of uni-
34 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
Table 1.3: DEA results for the data in Table 1.2. The weights have been multiplied by 1,000.
versity 1 is:
72y1∗ + 81y2∗
RE ∗ (1) = = z ∗ = 0.932.
77x∗1
+ 73x2∗ + 78x∗3
We have listed the optimal solutions, the relative efficiencies, and the peer sets in Table
1.3. It turns out that universities 3, 5, 7, 8, and 9 are relatively efficient, and that the other
universities are relatively inefficient. Note that universities 9 and 10 are about twice as large
as the other universities. Recall that the DEA approach ignores the scale of a DMU and
only considers the relative sizes of the inputs and the outputs.
Vi1
Ri = ,
Vi0
where Vi0 is the current value of stock i, and Vi1 is the value of stock i in one month. This
means that if the investor decides to invest $1 in stock i, then this investment will be worth
$Ri at the end of the month. The main difficulty with portfolio selection, however, is that
the rate of return is not known in advance, i.e., it is uncertain. This means that we cannot
know in advance how much any given portfolio will be worth at the end of the month.
1 . 6 . E xam p l e s o f l i n e a r o p t i m i z at i o n m o d e l s 35
Stock i
Scenario s 1 2 3 4 5
1 −4.23 −1.58 0.20 5.50 2.14
2 8.30 0.78 −0.34 5.10 2.48
3 6.43 1.62 1.19 −2.90 4.62
4 0.35 3.98 2.14 −0.19 −2.72
5 1.85 0.61 1.60 −3.30 −0.58
6 −6.10 1.79 0.61 2.39 −0.24
µi 1.10 1.20 0.90 1.10 0.95
ρi 4.43 1.27 0.74 3.23 2.13
s
Table 1.4: The values of Ri of each stock i in each scenario s, along with the expected rate of return µi ,
and the mean absolute deviation ρi . All numbers are in percentages.
One way to deal with this uncertainty is to assume that, although we do not know the exact
value of Ri in advance, we know a number of possible scenarios that may happen. For
example, we might define three scenarios, describing a bad outcome, an average outcome,
and a good outcome. Let S be the number of scenarios. We assume that all scenarios are
equally likely to happen. For each i, let Ris be the rate of return of stock i in scenario s.
Table 1.4 lists an example of values of Ris for (n =) 5 stocks and (S =) 6 scenarios. For
example, in scenario 1, the value of stock 1 decreases by 4.23% at the end of the month.
So, in this scenario, an investment of $1 in stock 1 will be worth (1 − 0.0423 =) $0.9567
at the end of the month. On the other hand, in scenario 2, this investment will be worth
(1 + 0.083 =) $1.083. Since we do not know in advance which scenario will actually
happen, we need to base our decision upon all possible scenarios. One way to do this is to
consider the so-called expected rate of return of stock i, which is denoted and defined by:
S
1X s
µi = R.
S s=1 i
The expected rate of return is the average rate of return, where the average is taken over
all possible scenarios. It gives an indication of the rate of return at the end of the month.
Since the value of Ris is assumed to be known, the value of µi is known as well. Hence,
it seems reasonable to select a portfolio of stocks that maximizes the total expected rate of
return. However, there is usually a trade-off between the expected rate of return of a stock
and the associated risk. A low-risk stock is a stock that has a rate of return that is close to its
expected value in each scenario. Putting money on a bank account (or in a term-deposit) is
an example of such a low-risk investment: the interest rate r (expressed as a fraction, e.g., 2%
corresponds to 0.02) is set in advance, and the investor knows that if $x is invested (x ≥ 0),
then after one month, this amount will have grown to $(1 + r)x. In this case, the rate of
return is the same for each scenario. On the other hand, the rate of return of a high-risk
stock varies considerably among the different scenarios. Stock 1 in Table 1.4 is an example
of a high-risk stock.
36 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
Risk may be measured in many ways. One of them is the mean absolute deviation. The mean
absolute deviation of stock i is denoted and defined as:
S
1X s
ρi = |R − µi |.
S s=1 i
So, ρi measures the average deviation of the rate of return of stock i compared to the
expected rate of return of stock i.
For each i = 1, . . . , n, define the following decision variable:
Since the investor wants to invest the full $10,000, the xi ’s should satisfy the constraint:
x1 + . . . + xn = 1.
If the investor simply wants the highest expected rate of return, then one may choose the
stock i that has the largest value of µi and set xi = 1. However, this strategy is risky: it is
better to diversify. So, the problem facing the investor is a so-called multiobjective optimization
problem: on the one hand, the expected rate of return should be as large as possible; on
the other hand, the risk should be as small as possible. Since stocks with a high expected
rate of return usually also have a high risk, there is a trade-off between these two objectives;
see also Chapter 14. If $xi is invested in stock i, then it is straightforward to check that the
expected rate of return µ of the portfolio as a whole satisfies:
S n n
1 XX s X
µ= Ri xi = µi x i . (1.21)
S s=1 i=1 i=1
The mean absolute deviation ρ of a portfolio in which $xi is invested in stock i satisfies:
S
n S
n
1 XX s 1 XX s
ρ= R x − µ = (R − µ )x i .
i i i i
S s=1 S s=1
i=1 i=1
Thus, the objective is max(λµ − ρ). The resulting optimization problem is:
n S
n
X 1 XX s
max λ µi x i − (Ri − µi )xi
i=1
S
s=1 i=1
n
X (1.22)
s.t. xi = 1
i=1
x1 , . . . , xn ≥ 0.
Although this is not an LO-model because of the absolute value operations, it can be turned
into one using the technique of Section 1.5.2. To do so, we introduce, for each s =
Pn
1, . . . , S , the decision variable us , and define us to be equal to the expression i=1 (Ris −
µi )xi inside the absolute value bars in (1.22). Note that us measures the rate of return of
the portfolio in scenario s. Next, we write us = u+ − + −
s − us and |us | = us + us . In Section
1.5.2, this procedure is explained in full detail. This results in the following LO-model:
n S
X 1X +
us + us−
max λ µi x i −
i=1
S s=1
n
X
s.t. xi = 1
(1.23)
i=1
n
X
us+ − u−
s = (Ris − µi )xi for s = 1, . . . , S
i=1
x1 , . . . , xn , us+ , us− ≥0 for s = 1, . . . , S .
Consider again the data in Table 1.4. As a validation step, we first choose the value of λ
very large, say λ = 1000. The purpose of this validation step is to check the correctness of
the model by comparing the result of the model to what we expect to see. When the value
of λ is very large, the model tries to maximize the expected rate of return, and does not
care much about minimizing the risk. As described above, in this case we should choose
x∗j = 1 where j is the stock with the largest expected rate of return, and x∗i = 0 for i 6= j .
Thus, when solving the model with a large value of λ, we expect to see exactly this solution.
After solving the model with λ = 1000 using a computer package, it turns out that this is
indeed an optimal solution. On the other hand, when choosing λ = 0, the model tries to
minimize the total risk. The optimal solution then becomes:
The corresponding expected rate of return is 0.94%, and the corresponding average absolute
deviation of the portfolio is 0.177%. This means that if the investor invests, at the beginning
of the month, $360 in stock 1, $7,610 in stock 3, $1,510 in stock 4, and $520 in stock 5,
then the investor may expect a 0.94% profit at the end of the month, i.e., the investor may
expect that the total value of the stocks will be worth $10,094. Note, however, that the
38 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
Expected
return (%)
λ≥25.97
1.2
5.68≤λ≤25.97
1.1 3.11≤λ≤5.68
1 0.07≤λ≤3.11
0≤λ≤0.07
0.9
actual value of this portfolio at the end of the month is not known in advance. The actual
value can only be observed at the end of the month and may deviate significantly from the
expected value. For example, if scenario 6 happens to occur, then the rate of return of the
portfolio will be (0.94 − 0.35 =) 0.59%; hence, the portfolio will be worth only $10,059.
On the other hand, if scenario 4 occurs, the portfolio will be worth $10,147, significantly
more than the expected value.
By setting the value of λ to a strictly positive number, the expected return and the risk
are balanced. As an example, we choose λ = 5. In that case, the following is an optimal
solution:
with an expected return of 1.12% (about 30% higher than the minimum-risk portfolio) and
an average absolute deviation of 0.6% (more than three times as much as the minimum-risk
portfolio).
In general, the choice of the value of λ depends on the investor. If the investor is very
risk averse, a small value of λ should be chosen; if the investor is risk-seeking, a large value
should be chosen. Since it is not clear beforehand what exact value should be chosen, we
can solve the model for various values of λ. Figure 1.9 shows the expected return and the
risk for different optimal solutions with varying values of λ. This figure illustrates the fact
that as λ increases (i.e., as the investor becomes increasingly risk-seeking), both the expected
return and the risk increase. The investor can now choose the portfolio that provides the
combination between expected return and risk that meets the investors’ preferences best.
The process in which we analyze how optimal values depend on the choice of a parameter
(in this case, λ) is called sensitivity analysis, which is the main topic of Chapter 5.
1 . 7. B u i l d i n g a n d i m p l e m e n t i n g m at h e m at i ca l m o d e l s 39
hand, if too many details are included, then attention will be distracted from the crucial
factors and the results may become more difficult to interpret. The trade-off between the
size of the model (or, more precisely, the time to solve the model) and its practical value
must also be taken into account. Conceptual models have to be validated, which means
investigating whether or not the model is an accurate representation of the practical
situation. The conceptual model (which usually does not contain mathematical notions)
can be discussed with experts inside and outside the company. In case the model does
not describe the original practical situation accurately enough, the conceptual model
may need to be changed, or one has to return to Step 3. At the end of Step 4, the project
team may even decide to change the problem definition, and to return to Step 2.
I Step 5. Designing a mathematical model; verification.
In this step the conceptual relationships between the various parameters are translated
into mathematical relationships. This is not always a straightforward procedure. It may be
difficult to find the right type of mathematical model, and to formulate the appropriate
specification without an overload of variables and constraints. If significant problems
occur, it may be useful to return to Step 3, and to change the conceptual model in such
a way that the mathematical translation is less difficult. If the project team decides to
use a linear optimization model, then the relationships between the relevant parameters
should be linear. This is in many situations a reasonable restriction. Instead of solving the
whole problem immediately, it is usually better to start with smaller subproblems; this may
lead to new insights and to a reformulation of the conceptual model. The mathematical
model has to be verified, which means determining whether or not the model performs as
intended, and whether or not the computer running time of the solution technique that
is used is acceptable (see also Chapter 9). This usually includes checking the (in)equalities
and the objective of the model. Another useful technique is to compute solutions of the
model for a number of extreme worst-case instances, or for data for which realizations are
already known. Sensitivity analysis on a number of relevant parameters may also provide
insight into the accuracy and robustness of the model. These activities may lead to a
return to Step 3 or Step 4.
I Step 6. Solving the mathematical model.
Sometimes a solution of the mathematical model can be found by hand, without using
a computer. In any case, computer solution techniques should only be used after the
mathematical model has been carefully analyzed. This analysis should include looking
for simplifications which may speed up the processing procedures, analyze the models
outputs thoroughly, and carry out some sensitivity analysis. Sensitivity analysis may, for
instance, reveal a strong dependence between the optimal solution and a certain param-
eter, in which case the conceptual model needs to be consulted in order to clarify this
situation. Moreover, sensitivity analysis may provide alternative (sub)optimal solutions
which can be used to satisfy constraints that could not be included in the mathematical
model.
I Step 7. Taking a decision.
As soon as Step 6 has been finished, a decision can be proposed. Usually, the project team
42 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
formulates a number of alternative proposals together with the corresponding costs and
other business implications. In cooperation with the management of the business, it is
decided which solution will be implemented or, if implementation is not possible, what
should be done otherwise. Usually, this last situation means returning to either Step 4,
or Step 3, or even Step 2.
I Step 8. Implementing the decision.
Implementing the decision requires careful attention. The expectations that are set during
the decision process have to be realized. Often a pilot-project can be started to detect any
potential implementation problems at an early stage. The employees that have to work
with the new situation, need to be convinced of the effectiveness and the practicability
of the proposed changes. Sometimes, courses need to be organized to teach employees
how to deal with the new situation. One possibility to prevent problems with the new
implementation is to communicate during the decision process with those involved in
the actual decision and execution situation.
I Step 9. Evaluation.
During the evaluation period, the final checks are made. Questions to be answered are:
Does everything work as intended? Is the problem solved? Was the decision process
organized well? Finally, the objectives of the company are compared to the results of the
implemented project.
1.8 Exercises
Exercise 1.8.1. In Section 1.2.1, we defined the standard form of
an LO-model. Write
each of the following LO-models in the standard form max cT x Ax ≤ b, x ≥ 0 .
with ci ∈ Rni , xi ∈ Rni , Ai ∈ Rmi ×ni , bi ∈ Rmi (i = 1, 2), can be solved by solving
the two models
max cT1 x1 A1 x1 = b1 , x1 ≥ 0 and max cT2 x2 A2 x2 = b2 , x2 ≥ 0 .
Given an optimal solution of these two, what is the optimal solution for the original model,
and what is the corresponding optimal objective value?
Exercise 1.8.3. Solve the following LO-models by using the graphical solution method.
Exercise 1.8.4. The LO-models mentioned in this chapter all have ‘≤’, ‘≥’, and ‘=’
constraints, but no ‘<’ or ‘>’ constraints. The reason for this is the fact that models with
such constraints may not have an optimal solution, even if the feasible region is bounded.
Show this by constructing a bounded model with ‘<’ and/or ‘>’ constraints, and argue that
the constructed model does not have an optimal solution.
Exercise 1.8.5. Consider the constraints of Model Dovetail in Section 1.1. Determine
the optimal vertices in the case of the following objectives:
(a) max 2x1 + x2
(b) max x1 + 2x2
(c) max 32 x1 + 21 x2
Exercise 1.8.6. In Section 1.1.2 the following optimal solution of Model Dovetail is
found:
x∗1 = 4 21 , x2∗ = 4 12 , x3∗ = 0, x∗4 = 0, x∗5 = 2 12 , x∗6 = 1 12 .
44 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
What is the relationship between the optimal values of the slack variables x3 , x4 , x5 , x6 and
the constraints (1.1), (1.2), (1.3), (1.4)?
min x1 + x2
s.t. αx1 + βx2 ≥ 1
x1 ≥ 0, x2 free.
Determine necessary and sufficient conditions for α and β such that the model
(a) is infeasible,
(b) has an optimal solution,
(c) is feasible, but unbounded,
(d) has multiple optimal solutions.
Exercise 1.8.8. Solve the following LO-model with the graphical solution method.
T
Exercise 1.8.9. Show that if x∗ = u∗1 u2∗ x∗2 is an optimal solution of (1.14), then
∗ ∗ T
x1 x2 with x∗1 = u1∗ − u∗2 is an optimal solution of (1.11).
Exercise 1.8.10. The method of Section 1.5.2 to write a model with an absolute value
as an LO-model does not work when the objective is to maximize a function in which an
absolute value appears with a positive coefficient. Show this by considering the optimization
model max{|x| | −5 ≤ x ≤ 5}.
Exercise 1.8.11. Show that any convex piecewise linear function is continuous.
Exercise 1.8.12. The method of Section 1.5.3 to write a model with a piecewise linear
function as an LO-model fails if the piecewise linear function is not convex. Consider again
model (1.16), but with the following function f :
x1
if 0 ≤ x1 ≤ 3
f (x1 ) = 3 + 3(x1 − 3) if 3 < x1 ≤ 8
18 + (x1 − 8) if x1 > 18.
1. 8 . E xe rc i se s 45
f (x1 )
18
3
0 3 8 x1
This function is depicted in Figure 1.10. What goes wrong when the model is solved using
the solution technique of Section 1.5.3?
Exercise 1.8.13. Consider the team formation model in Section 1.6.3. Take any p ∈
{1, . . . , N }. For each of the following requirements, modify the model to take into account
the additional requirement:
(a) Due to an injury, player p cannot be part of the line-up.
(b) Player p has to play on position 1.
(c) Due to contractual obligations, player p has to be part of the line-up, but the model
should determine at which position.
(d) Let A ⊆ {1, . . . , N } and B ⊆ {1, . . . , M }. The players in set A can only play on
positions in set B .
Exercise 1.8.14. Consider the data envelopment analysis example described in Section
1.6.4. Give an example that explains that if a DMU is relatively efficient according to the
DEA solution, then this does not necessarily mean that it is ‘inherently efficient’. (Formulate
your own definition of ‘inherently efficient’.) Also explain why DMUs that are relatively
inefficient in the DEA approach, are always ‘inherently inefficient’.
Exercise 1.8.15. One of the drawbacks of the data envelopment analysis approach is that
all DMUs may turn out to be relatively efficient. Construct an example with three different
DMUs, each with three inputs and three outputs, and such that the DEA approach leads to
three relatively efficient DMUs.
Exercise 1.8.16. Construct an example with at least three DMUs, three inputs and three
outputs, and such that all but one input and one output have weights equal to ε in the
optimal solution of model (M2 (k)).
46 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
Exercise 1.8.17. Consider model (M2 (k)). Show that there is always at least one DMU
that is relatively efficient. (Hint: show that if all ‘≤’-constraints of (M2 (k)) are satisfied
with strict inequality in the optimal solution of (M2 (k)), then a ‘more optimal’ solution of
(M2 (k)) exist, and hence the optimal solution was not optimal after all.)
Exercise 1.8.18. Consider the following data for three baseball players.
We want to analyze the efficiency of these baseball players using data envelopment analysis.
(a) What are the decision making units, and what are the inputs and outputs?
(b) Use DEA to give an efficiency ranking of the three players.
Exercise 1.8.19. Consider again the portfolio selection problem of Section 1.6.5. Using
the current definition of risk, negative deviations from the expected rate of return have the
same weight as positive deviations. Usually, however, the owner of the portfolio is more
worried about negative deviations than positive ones.
(a) Suppose that we give the negative deviations a weight α and the positive deviations a
weight β , i.e., the risk of the portfolio is:
S
1X
ρ= |f (Ris − µi )|,
S s=1
Exercise 1.8.20. When plotting data from a repeated experiment in order to test the
validity of a supposed linear relationship between two variables x1 and x2 , the data points
are usually not located exactly on a straight line. There are several ways to find the ‘best fitting
line’ through the plotted points. One of these is the method of least absolute deviation regression.
In least absolute deviation regression it isassumed that the exact
underlying relationship is
a straight line, say x2 = αx1 + β . Let a1 b1 , . . . , an bn be the data set. For each
i = 1, . . . , n, the absolute error, ri , is defined by ri = ai − bi α − β . The problem is to
Pn
determine values of α and β such that the sum of the absolute errors, i=1 |ri |, is as small
as possible.
Salary x2 (×1,000)
Employee Age Salary
(×$1,000)
1 31 20 50
2 57 50
3 37 30
4 46 38
5 32 21
6 49 34 40
7 36 26
8 51 40
9 53 38
10 55 42 30
11 43 26
12 28 21
13 34 23
14 62 40
15 31 25 20
16 58 43
17 47 35
18 65 42
19 28 25
20 43 33 30 40 50 60 Age x1
Table 1.5: Ages and salaries of twenty Figure 1.11: Scatter plot of the data in Table 1.5.
employees of a company.
(c) Table 1.5 contains the ages and salaries of twenty employees of a company. Figure
1.11 shows a scatter plot of the same data set. Determine the least absolute deviation
regression line x2 = αx1 + β for the data set, where x1 is the age, and x2 is the salary
of the employee.
(d) Modify the model so that instead of minimizing the sum of the absolute deviations, the
maximum of the absolute deviations is minimized. Solve the model for the data set in
Table 1.5.
Exercise 1.8.21. The new budget airline CheapNSafe in Europe wants to promote its
brand name. For this promotion, the company has a budget of e95,000. CheapNSafe hired
a marketing company to make a television advertisement. It is now considering buying 30-
second advertising slots on two different television channels: ABC and XYZ. The number
of people that are reached by the advertisements clearly depends on how many slots are
bought. Fortunately, the marketing departments of ABC and XYZ have estimates of how
many people are reached depending on the number of slots bought. ABC reports that the
first twenty slots on their television channel reach 20,000 people per slot; the next twenty
slots reach another 10,000 each; any additional slots do not reach any additional people. So,
48 C h a p t e r 1 . B a s i c c o n c e p t s o f l i n e a r o p t i m i z at i o n
for example, thirty slots reach (20 × 20,000 + 10 × 10,000 =) 500,000 people. One slot
on ABC costs e2,000. On the other hand, XYZ reports that the first fifteen slots on XYZ
reach 30,000 people each; the next thirty slots reach another 15,000 each; any additional
slots do not reach any additional people. One slot on XYZ costs e3,000.
Suppose that CheapNSafe’s current objective is to maximize the number of people reached
by television advertising.
(a) Draw the two piecewise linear functions that correspond to this model. Are the func-
tions convex or concave?
(b) Assume that people either watch ABC, or XYZ, but not both. Design an LO-model
to solve CheapNSafe’s advertising problem.
(c) Solve the LO-model using a computer package. What is the optimal advertising mix,
and how many people does this mix reach?
(d) XYZ has decided to give a quantity discount for its slots. If a customer buys more than
fifteen slots, the slots in excess of fifteen cost only e2,500 each. How can the model
be changed in order to incorporate this discount? Does the optimal solution change?
Part I
Overview
In Chapter 1, we introduced the basic concepts of linear optimization. In particular, we
showed how to solve a simple LO-model using the graphical method. We also observed the
importance of the vertices of the feasible region. In the two-dimensional case (in which case
the graphical method works), it was intuitively clear what was meant by a vertex, but since
the feasible region of an LO-model is in general a subset of n-dimensional Euclidean space,
Rn (n ≥ 1), we will need to properly define what is meant by a vertex. In the first part
of the current chapter we will formalize the concept of vertex, along with other concepts
underlying linear optimization, and develop a geometric perspective of the feasible region
of an LO-model.
In the second part of this chapter, we will take a different viewpoint, namely the algebraic
perspective of the feasible region. The algebraic description of the feasible region forms
the basic building block for one of the most widely used methods to solve LO-models,
namely the simplex algorithm, which will be described in Chapter 3. We will also describe
the connection between the geometric and the algebraic viewpoints.
51
52 C h a p t e r 2 . G e o m e t ry a n d a l g e b ra o f f e a s i b l e r e g i o n s
the form a1 x1 + . . . + an xn = b where a1 , . . . , an , b are real numbers and not all ai ’s are
equal to zero. Any hyperplane H in Rn can be written as:
x1
..
n
H = . ∈ R a1 x1 + . . . + an xn = b = x ∈ Rn aT x = b ,
xn
T T
where a1 , . . . , an , b are real numbers, x = x1 . . . xn , and a = a1 . . . an with
a= 6 0.
T
Example 2.1.1. The set { x1 x2 ∈ R2 | 3x1 +2x2 = 8} is a hyperplane in R2 , namely a
T
line in the plane. The hyperplane { x1 x2 x3 ∈ R3 | − 2x1 = 3} is parallel to the coordinate
plane 0x2 x3 , because the coefficients of x2 and x3 are zero, meaning that x2 and x3 can take on
any arbitrary value.
Every hyperplane H = x ∈ Rn aT x = b with a 6= 0 divides Rn into two halfspaces,
namely:
I H + = x ∈ Rn aT x ≤ b , and
I H − = x ∈ Rn aT x ≥ b .
0} are linearly independent (to be precise: they form a linearly independent collection of hyperplanes),
T T
because the normal vectors 1 0 and 0 1 are linearly independent (to be precise: they form
2 . 1 . Th e g e o m e t ry o f f e a s i b l e r e g i o n s 53
x2
0
x
00
x
H
0 00 a
x −x
0 0 x1
Figure 2.1: The vector a is a normal ofo Figure 2.2: Normal vectors pointing out of the
∗
feasible region of Model Dovetail .
n
n T
H = x∈R a x=b .
T
a linearly independent collection of vectors). On the other hand, the hyperplanes { x1 x2 ∈
2
T 2
R | x1 = 0} and { x1 x2 ∈ R | 2x1 = 5} are not linearly independent, because the normal
T T
vectors 1 0 and 2 0 are not linearly independent.
a11 x1 + . . . + a1n xn ≤ b1
.. .. ..
. . .
am1 x1 + . . . + amn xn ≤ bm
−x1 ≤0
..
.
..
.
−xn ≤ 0.
Defining
x1
= ... ∈ Rn
Hi+ ai1 x1 + . . . + ain xn ≤ bi for i = 1, 2, . . . , m,
xn
x1
= ... ∈ Rn
+
and Hm+j − x ≤ 0 for j = 1, 2, . . . , n,
j
xn
54 C h a p t e r 2 . G e o m e t ry a n d a l g e b ra o f f e a s i b l e r e g i o n s
−1
0
a11 am1 0 ..
a1 = ... , . . . , am = ... , am+1 = .. , . . . , am+n = . ,
. 0
a1n amn
0 −1
A
and they are precisely the rows of −I . Notice that any halfspace is convex. Hence, F
n
is the intersection of m + n convex sets, namely H1+ , . . . , Hm+n+
, and therefore F itself
is convex (see Appendix D). Observe also that, for i = 1, . . . , m, the hyperplane Hi cor-
responding to the halfspace Hi+ consists of all points in Rn for which the i’th technology
constraint holds with equality. Similarly, for j = 1, . . . , n, the hyperplane Hm+j corre-
+
sponding to the halfspace Hm+j consists of all points in Rn for which the j ’th nonnegativity
n T o
constraint −xj ≤ 0 holds with equality, i.e., Hm+j = x1 . . . xn xj = 0 .
Example 2.1.3. In the case of Model Dovetail∗ (Section 1.2.1), the halfspaces are:
T T
H1+ = x1 x2 | x1 + x2 ≤ 9 , H2+ = x1 x2 | 3x1 + x2 ≤ 18 ,
T T
H3+ = x1 x2 | x1 ≤ 7 , H4+ = x1 x2 |
x2 ≤ 6 ,
T T
H5+ = x1 x2 | −x1 −x2 ≤ −5 , H6+ = x1 x2 | −x1
≤ 0 ,
T
H7+ = x1 x2 |
−x2 ≤ 0 .
and F = H1+ ∩ H2+ ∩ H3+ ∩ H4+ ∩ H5+ ∩ H6+ ∩ H7+ . The normal vectors are:
1 3 1 0 −1 −1 0
, , , , , , .
1 1 0 1 −1 0 −1
If we draw each of the normal vectors in the previous example as an arrow from some point
on the corresponding hyperplane, then each vector ‘points out of the feasible region F ’; see
Figure 2.2. Formally, we say that, for l = 1, . . . , m + n, the vectoral ‘points
out of the
feasible region F ’ if, for every point x0 ∈ Hl , it holds that x0 + λal λ > 0 ⊂ Hl− \Hl .
To see that, for each l = 1, . . . , m+n, the vector al indeed points out of the feasible region
F , let λ > 0. We have that aTl (x0 + λal ) = aTl x0 + λaTl al = bl + λkal k2 > bl , where
we have used the fact that kal k > 0, which in turn follows from the fact that al 6= 0 (see
Appendix B). Thus, x0 + λal ∈ Hl− and x0 + λal 6∈ Hl+ , and hence al points out of the
feasible region. Notice that, since F ⊂ Hl+ , it follows that F and x0 + λal are on different
sides of Hl for all λ > 0.
2 . 1 . Th e g e o m e t ry o f f e a s i b l e r e g i o n s 55
We will first define the concept ‘vertex’ more precisely. Let F be the feasible region of
the LO-model and let H1 , . . . , Hm+n be the hyperplanes corresponding to its constraints.
There are three equivalent ways of defining a vertex (or extreme point):
I The vector x0 ∈ F is called a vertex of F if and only if there are n independent hyper-
planes in the collection {H1 , . . . , Hm+n } that intersect at x0 .
I The vector x0 ∈ F is called a vertex of F if and only if there is a hyperplane H (not
necessarily one of H1 , . . . , Hm+n ) with corresponding halfspace H + such that F ⊆ H +
and F ∩ H = {x0 }.
I The vector x0 ∈ F is called a vertex of F if and only if there are no two distinct
x0 , x00 ∈ F such that x0 = λx0 + (1 − λ)x00 for some λ ∈ (0, 1).
The first definition is in terms of hyperplanes corresponding to F . The second defini-
tion
is inn terms of a halfspace that contains F . Note that if there is a hyperplane H =
x ∈ R aT x = b with F ⊆ H + , then there is also ‘another’ hyperplane H̃ with
Note that H1+ , H2+ , H3+ , H4+ correspond to the technology constraints of the model, and H5+ and
H6+ to the nonnegativity constraints. The corresponding hyperplanes are:
T T
H1 = x1 x2 | x1 + x2 = 9 , H2 = x1 x2 | 3x1 + x2 = 18 ,
T T
H3 = x1 x2 | x1 = 7 , H4 = x1 x2 | x2 = 6 ,
T T
H5 = x1 x2 | −x1 = 0 , H6 = x1 x2 | −x2 = 0 .
To illustrate the first definition of the concept of a vertex, v1 is the unique point in the intersection of
H2 and H6 , vertex v3 is the unique point in the intersection of H1 and H4 , and 0 is the unique
point in the intersection of H5 and H6 . Therefore, the first definition of a vertex implies that v1 , v3 ,
and 0 are vertices of F (and, similarly, v2 and v4 are vertices of F ).
To illustrate
n the second
T o v3 . The dotted line in the figure shows the hyperplane
definition, consider
H = x1 x2 x1 + 3x2 = 21 . Graphically, it is clear that v3 is the unique point in the
56 C h a p t e r 2 . G e o m e t ry a n d a l g e b ra o f f e a s i b l e r e g i o n s
x2 x2
H5
H2 H3 v3 +α1 r1
1 1
2 r1 + 2 r2
H4
v3
v4
H v3
v2 F
v2
H1 v1 +α2 r2
H6 v1
0 v1 x1 0 x1
Figure 2.3: Vertices of the feasible region of Figure 2.4: Extreme points and directions of F .
Model Dovetail.
n T o
intersection of H and F , and the corresponding halfspace H + = x1 x2 x1 + 3x2 ≤ 21
satisfies F ⊂ H + . Therefore, the second definition implies that v3 is a vertex. Notice that H is not
among the hyperplanes H1 , . . . , H6 . In fact, for i = 1, . . . , 6, the intersection F ∩ Hi does not
consist of a single point (e.g., H1 ∩ F contains the line segment v2 v3 , and H3 ∩ F contains no
point at all). Therefore, we could not even have chosen H to be one of the H1 , . . . , H6 . Constructing
similar hyperplanes for the other vertices is left to the reader.
Finally, to illustrate the third definition, we observe that none of the points 0, v1 , . . . , v4 can be
written as a convex combination of other points in F . For example, v1 cannot be written as a
convex combination of some other two points in F . This is because, if it could be written as a convex
combination of two other points, x, x0 ∈ F , say, then v1 would lie on the line segment connecting
x and x0 , which is clearly impossible. It is left to the reader to convince oneself that every point in F
except 0, v1 , . . . , v4 can be written as a convex combination of two other points in F .
In the case of Model Dovetail, the vertices describe the full feasible region in the sense that
the feasible region is the convex hull (see Appendix D) of these vertices; see Figure 2.3. This
is, however, not true in the case of an unbounded feasible region. Consider for example the
feasible region drawn in Figure 2.4. This feasible region is unbounded and can therefore not
be written as the convex hull of (finitely many) vertices (see Exercise 2.3.9). To allow for
the possibility of an unbounded feasible region in an LO-model, we need another concept
to describe the feasible region of an arbitrary LO-model. We say that a vector r ∈ Rn is
a direction of unboundedness (or, direction) of the feasible region F if r 6= 0 and there exists
a point x ∈ F such that x + αr ∈ F for all α ≥ 0. For example, in Figure 2.4, the
vectors r1 and r2 are directions of the feasible region. Notice that if F is bounded, then it
has no directions of unboundedness; see Exercise 2.3.4. Observe that if r is a direction of
the feasible region, then so is αr with α > 0. Because the vectors r and αr (with α > 0)
point in the same direction, they are essentially the same, and we think of them as the same
direction.
2 . 1 . Th e g e o m e t ry o f f e a s i b l e r e g i o n s 57
Informally, the definition of a direction of F states that r is a direction if there exists some
point x ∈ F so that, if we move away from x in the direction r, we stay inside F . The
following theorem states that for a direction r it holds that, if we move away from any point
in the direction r, we stay inside F .
Theorem 2.1.1.
Let F ⊂ Rn be the feasible region of a standard LO-model. Then, a vector r ∈ Rn is
a direction of F if and only if for every x ∈ F it holds that x + αr ∈ F for all α ≥ 0.
n o
Proof. Write F = x ∈ Rn aTi x ≤ bi , i = 1, . . . , m + n . Note that for i = m+1, . . . , m+
n, ai is a unit vector and bi = 0. The ‘if ’ part of the statement is trivial. For the ‘only if ’ part,
let r be a direction of F . We will first prove that aTi r ≤ 0 for i = 1, . . . , m + n. Suppose for a
contradiction that aTi r > 0 for some i ∈ {1, . . . , m + n}. By the definition of a direction, there
exists a point x̂ ∈ F such that x̂ + αr ∈ F for all α ≥ 0. Because aTi r > 0, we may choose α
large enough so that aTi (x̂ + αr) = aiT x̂ + αaiT r > bi . But this implies that x̂ + αr 6∈ F , which
is a contradiction.
Hence, aTi r ≤ 0 for i = 1, . . . , m + n. Now let x be any point of F and let α ≥ 0. We have
that:
T T T T
ai (x + αr) = ai x + αai r ≤ ai x ≤ bi ,
for i = 1, . . . , m + n. Hence, x + αr ∈ F , as required.
If r1 and r2 are directions of the feasible region, then so is any convex combination of them.
To see this, let λ ∈ [0, 1]. We will show that r = λr1 + (1 − λ)r2 is a direction of
the feasible region as well. By the definition of a direction, there exist points x1 , x2 ∈ F
such that x1 + αr1 ∈ F and x2 + αr2 ∈ F for every α ≥ 0. Now let α ≥ 0 and let
x = λx1 + (1 − λ)x2 . It follows that:
x2
c
0 x1
Theorem 2.1.2.
Let F ⊂ Rn be the feasible region of a standard LO-model. Let v1 , . . . , vk be the
vertices of F and let r1 , . . . , rl be its extreme directions. Let x ∈ Rn . Then x ∈ F
if and only if x can be written as:
x = λ1 v1 + . . . + λk vk + µ1 r1 + . . . + µl rl ,
with λ1 , . . . , λk , µ1 , . . . , µl ≥ 0 and λ1 + . . . + λk = 1.
The requirement that the LO-model is in standard form is necessary. Figure 2.5 shows
the unbounded feasible region of the LO-model max{x1 + x2 | x1 + x2 ≤ 5}. This LO-
model is not in standard form. The extreme directions of the feasible region are r1 =
T T
1 −1 and r2 = −1 1 , but the feasible region has no vertex. Clearly, no point on
v1
v5 v2
v4 v3
For any set J ⊆ {1, . . . , m + n}, let FJ be the subset of the region F in which all ‘≤’
constraints with indices in J are replaced by ‘=’ constraints. That is, define:
where H1 , . . . , Hm+n denote the hyperplanes of which a1 , . . . , am+n are the normal vec-
tors. The reader is asked to prove this in Exercise 2.3.10. From this expression, it follows
immediately that if J 0 ⊆ J , then FJ ⊆ FJ 0 .
Example 2.1.5. Consider again Figure 2.3. Let J1 = {1}. FJ1 is the set of all points in
F for which the inequality (1.2) of Model Dovetail holds with equality. That is, FJ1 = F ∩ H1 .
Graphically, it should be clear that the face FJ1 is exactly the line segment v2 v3 . Similarly, let
J2 = {2}. The face FJ2 is exactly the line segment between v1 and v2 . Next, let J3 = {1, 2}.
Then FJ3 = F ∩ H1 ∩ H2 , which means that the face FJ3 consists of exactly one point, namely
the point v2 in H1 ∩ H2 . Note that v2 is a vertex. Finally, let J4 = {1, 2, 3}. Then, FJ4 = ∅,
and hence FJ4 is not a face of F . The reader may check that the faces of F are: F itself, the line
segments 0v1 , v1 v2 , v2 v3 , v3 v4 , v4 0, and the points 0, v1 , v2 , v3 , and v4 . Notice that
vertices of a face are also vertices of F ; see Theorem D.3.3.
Example 2.1.6. Consider the feasible region F in R3 drawn in Figure 2.6. The faces of F are:
F itself, the twelve pentagon-shaped ‘sides’ (e.g., the pentagon v1 v2 v3 v4 v5 v1 ), the thirty ‘edges’
(e.g., the line segments v1 v2 and v2 v3 ), and the twenty vertices.
The above examples suggest that if FJ is a singleton, then FJ is a vertex. The reader is asked
to prove this fact in Exercise 2.3.11.
60 C h a p t e r 2 . G e o m e t ry a n d a l g e b ra o f f e a s i b l e r e g i o n s
Note that faces may in turn have faces themselves. As it turns out, these ‘faces of faces’ of F
are faces of F too. For example, in Figure 2.6, the vertices v1 , . . . , v5 , and the line segments
v1 v2 , . . . , v4 v5 , v5 v1 are all faces of both F and the pentagonal face v1 v2 v3 v4 v5 . This
fact is proved in general in Theorem D.3.3 in Appendix D.
feasible region.
Theorem 2.1.3.
Consider any LO-model with feasible region
F = x ∈ Rn aTi x ≤ bi , i = 1, . . . , m + n ,
and objective max cT x. Suppose that the LO-model has an optimal solution. Then,
the set of optimal solutions is a face of the feasible region of the model.
Proof. Let Z ⊆ F be the set of optimal solutions of the model, and note that by assumption
Z 6= ∅. Let FJ be a face that contains Z and let J be as large as possible with that property.
Note that Z ⊆ FJ . So, in order to prove that Z = FJ , it suffices to prove that FJ ⊆ Z .
Take any i ∈ J¯. We claim that there exists a point x̂i ∈ Z such that aTi x̂i < bi . Assume to the
contrary that aTi x = bi for every x ∈ Z . Then, Z ⊆ FJ∪{i} , and hence we could have chosen
J ∪ {i} instead of J . This contradicts the fact that J is largest.
For each i ∈ J¯, let x̂i ∈ Z be such that aTi x̂i < bi , and define x∗ = i
X
1
¯
|J|
x̂ . Because Z is
i∈J¯
convex, we have that x∗ ∈ Z (why?). Moreover, it is straightforward to check that aTi x∗ < bi
for each i ∈ J¯, i.e., x∗ lies in the relative interior (see Appendix D) of FJ .
Let x̂ be any point in FJ and let d = x̂ − x∗ . Since, for small enough ε > 0, we have that
∗ ∗
x + εd ∈ FJ ⊆ F and x − εd ∈ FJ ⊆ F , we also have that
T ∗ T ∗ T ∗ T ∗
c (x + εd) ≤ c x and c (x − εd) ≤ c x .
2 . 1 . Th e g e o m e t ry o f f e a s i b l e r e g i o n s 61
We have seen in Figure 2.5 that the set of optimal solutions may be a face that contains no
vertex. In that case, there is no optimal vertex. This situation, however, cannot happen
in the case of a standard LO-model. Informally, this can be seen as follows. The feasi-
ble region of a standard LO-model can be iteratively constructed by starting with the set
{x ∈ Rn | x ≥ 0} and intersecting it by one halfspace at a time. The crucial observations
are that the initial region {x ∈ Rn | x ≥ 0} has the property that every face of it contains
a vertex and that, no matter how we intersect it with halfspaces, it is not possible to destroy
this property.
So, in order to find a vertex of the feasible region of a standard LO-model, we should
consider any face of the feasible region, and then take a smallest face contained in the
former face. The latter face should be a vertex. The following theorem proves that this is
in fact the case.
Theorem 2.1.4.
The feasible region of any standard LO-model has at least one vertex.
Proof. Let FJ be a face of F with J as large as possible. Let I (⊆ J¯) be as small as possible
such that the following equation holds:
n o
n T T
FJ (I) = x ∈ R aj x = bj for j ∈ J, ai x ≤ bi for i ∈ I .
Suppose for a contradiction that I 6= ∅. Let i ∈ I . If FJ (I \ {i}) = FJ (I), then we should
have chosen I \ {i} instead of I . Hence, since FJ (I) ⊆ FJ (I \ {i}), it follows that there exists
1 1 1 T 1
x ∈ FJ (I \ {i}) but x 6∈ FJ (I). So, x satisfies ai x > bi .
Let x2 be any point in FJ (I). It satisfies aTi x2 ≤ bi . Since x1 and x2 are both in FJ (I \ {i}), it
follows from the convexity of this set that the line segment x1 x2 lies in FJ (I \ {i}), and hence
there exists a point x̂ ∈ FJ (I) on the line segment x1 x2 that satisfies aTi x̂ = bi . But this means
that the set FJ∪{i} is nonempty, contrary to the choice of J .
n o
So, we have proved that in fact FJ = x ∈ Rn aTj x = bj for j ∈ J . Hence, FJ is the
intersection of finitely many hyperplanes. If FJ is not a singleton, then it contains a line, and
hence F contains a line. But this contradicts the fact that x ≥ 0. This proves the theorem.
With these tools at hand, we can prove the optimal vertex theorem.
62 C h a p t e r 2 . G e o m e t ry a n d a l g e b ra o f f e a s i b l e r e g i o n s
n o
Proof. Let Z be the set of optimal solutions of the model max cT x Ax ≤ b, x ≥ 0 , i.e.,
n o
n T ∗
Z = x ∈ R Ax ≤ b, x ≥ 0, c x = z ,
where z ∗ is the optimal objective value. The set Z is itself the feasible region of the standard
LO-model n o
T T ∗ T ∗
max c x Ax ≤ b, c x ≤ z , −c x ≤ −z , x ≥ 0 .
Hence, by Theorem
n 2.1.4,
Z contains a o vertex x∗ , which is an optimal solution of the original
LO-model max cT x Ax = b, x ≥ 0 . By Theorem 2.1.3, Z is a face of the feasible region
∗ ∗ ∗
F . Since x is a vertex of a face of F , x is also a vertex of F . Hence, x is an optimal vertex.
1
Named after the German mathematician, theoretical physicist and philosopher Hermann K. H. Weyl
(1885–1955) and the German mathematician Hermann Minkowski (1864–1909).
2 . 2 . A l g e b ra o f f e a s i b l e r e g i o n s ; f e a s i b l e b a s i c s o lu t i o n s 63
While the statement of Theorem 2.1.6 may be intuitively and graphically clear, it requires
careful work to actually prove the theorem; see the proof of Theorem 2.1.2 in Appendix
D, which shows the ‘if ’ direction of Theorem 2.1.6. Note that not every polyhedron is
a polytope: since a polytope is necessarily bounded, any unbounded polyhedron is not a
polytope.
The important difference between polyhedra and polytopes lies in the information by which
they are described, namely hyperplanes in the case of polyhedra, and vertices in the case
of polytopes. The collection of hyperplanes that describe a bounded polyhedron is often
called its H-representation, and the collection of vertices that describe a polytope is called
its V-representation. The Weyl-Minkowski theorem shows that any bounded polyhedron has
both an H-representation and a V-representation. However, it is computationally very hard
to determine the V-representation from the H-representation and vice versa. The interested
reader is referred to Grünbaum (2003).
(A)I,J = the (|I|, |J|)-submatrix of A with row indices in I , and column indices
in J .
When taking a submatrix, the indices of the rows and columns are preserved, i.e., if a row
has index i ∈ R in A, then it also has index i in (A)I,J (provided that i ∈ I ), and similarly
for column indices. An equivalent way to define (A)I,J is: (A)I,J is the matrix obtained
from A by deleting all rows whose index is not in I and all columns whose index is not in
J . Define the following short hand notation:
3 8 2 1 3
The row indices are written left of each row, and the column indices above each column. Let I =
{1, 3} ⊂ {1, 2, 3} and J = {3} ⊂ {1, 2, 3, 4}. Then,
" # 5 " #
2 3 5 0 5
(A)I,? = , (A)?,J = 6 , and (A)I,J = .
8 2 1 3 1
1
Whenever this does not cause any confusion, we omit the parentheses in the notation (A)?,J ,
(A)I,? , and (A)I,J , and we write A?,J , AI,? , and AI,J , respectively, instead. Define
(A)i,? = (A){i},? ; (A)?,j and (A)i,j are defined analogously. The same notation also
applies to vectors, but we will have only one subscript. For example, for a vector x ∈ Rn
with entry indices R, (x)I is the subvector of x containing the entries with indices in I
(I ⊆ R). Note that, for i ∈ R, we have that (x){i} = xi .
variable. Let the index of each column be the subscript of the corresponding decision or
slack variable. So the columns have indices 1, . . . , n + m. The rank (see Appendix B) of
A Im is m, because this matrix contains the submatrix Im which has rank m. Suppose
columns (see Appendix B). Let B be the (m, m)-submatrix determined by these columns,
2 . 2 . A l g e b ra o f f e a s i b l e r e g i o n s ; f e a s i b l e b a s i c s o lu t i o n s 65
and let N be the (m, n)-submatrix consisting of the remaining columns. That is, define:
where the symbol ≡ means ‘equality up to an appropriate permutation h i of the rows and
x
columns’. See Appendix B.2. The vector xBI is the subvector of x consisting of all
s
variables with indices corresponding to the columns of B; the h entries of xBI are called the
x
i
basic variables. Similarly, the vector xN I is the subvector of x consisting of all variables
s
that have indices in N I ; the entries of xN I are called the nonbasic variables. So, we have that:
x x
xBI = and xN I = .
xs BI xs N I
So it is possible to express the basic variables in terms of the corresponding nonbasic vari-
ables, meaning that any choice of values for the nonbasic variables fixes the values of the
corresponding basic variables. In fact, since the matrix B is invertible, there is a unique way
of expressing the basic variables in terms of the nonbasic variables.
Example 2.2.2.
Consider Model Dovetail. Let B consist of the columns 1, 3, 5, and 6
of the matrix A I4 . We have that BI = {1, 3, 5, 6}, N I = {2, 4}, so that xBI =
66 C h a p t e r 2 . G e o m e t ry a n d a l g e b ra o f f e a s i b l e r e g i o n s
T T
x1 x3 x5 x6 , xN I = x2 x4 , and:
x1 x2 x3 x4 x5 x6 x1 x3 x5 x6 x2 x4
1 1 1 0 0 0 1 1 0 0 1 0
3 1 0 1 0 0 3 0 0 0 1 1
A I4 = , B = , and N = .
1 0 0 0 1 0 1 0 1 0 0 0
0 1 0 0 0 1 0 0 0 1 1 0
x x
h i h i
Let B be a basis matrix. If x ≡ xBI with xBI = B−1 b and xN I = 0, then the
s NI
vector x is called the basic solution with respect to the basis matrix B. If B−1 b ≥ 0, then
x x
h i h i
xs is a solution of A Im xs = b satisfying x ≥ 0 and xs ≥ 0, and hence x is a
is Im . Note that not all such choices give rise to a feasible basic solution, e.g., B−1 b may
not satisfy B−1 b ≥ 0, or B may not be invertible. In the former case, the choice gives rise
to an infeasible basic solution, and in the latter case the choice does not give rise to a basic
solution at all.
hx i x
h i
For b ≥ 0, the system of equalities A Im x = b with x ≥ 0 has at least one
s s
feasible basic solution, namely the basic solution corresponding to the basis matrix Im , i.e.,
x = 0, xs = b. The choice B = Im implies that xBI = xs . This solution corresponds
to the vertex 0 of the feasible region. If, however, b 6≥ 0, then this solution is an infeasible
basic solution.
There is a close relationship between feasible basic solutions and vertices of the feasible
region. In fact, each feasible basic solution of an LO-model corresponds to a vertex of
the feasible region; see Theorem 2.2.2. Also, it may happen that a vertex corresponds to
multiple feasible basic solutions, in which case these feasible basic solutions are degenerate;
see Section 2.2.4.
2 . 2 . A l g e b ra o f f e a s i b l e r e g i o n s ; f e a s i b l e b a s i c s o lu t i o n s 67
x2
x2
x4 = 0 x5 = 0
v4 v3
v4 v3 v5 v2
x6 = 0
v2
x1 = 0
v1
v1 x3 = 0 x1
0 v6
0 x1
x2 = 0
Figure 2.7: Zero-variable representation. Figure 2.8: Feasible ( ) and infeasible ( ) basic
solutions.
Example 2.2.3. Each equation associated with a technology constraint of Model Dovetail corre-
sponds to a slack variable with value zero, namely (see Figure 2.7):
x1 + x2 = 9 corresponds to x3 = 0,
3x1 + x2 = 18 corresponds to x4 = 0,
x1 = 7 corresponds to x5 = 0,
and x2 = 6 corresponds to x6 = 0.
We have seen that each feasible basic solution corresponds to a (4, 4) basis matrix B
with nonnegative
−1 6
B b. There are 4 = 15 possible ways to choose such a matrix B in A I4 ; only five of them
correspond to a vertex of the feasible region in the following way. Each vertex of the feasible region (see
Figure 2.7) is determined by setting the values of two nonbasic variables equal to 0, as follows:
Theorem 2.2.1
A
Rows of Columns of
A Im
−In
Theorem 2.2.2
Vertices of the feasible region Feasible basic solutions
observations hold in general and are crucial in the simplex algorithm that we will develop in Chapter
3. In Section 2.2.5 we will deal with adjacency in more detail.
In the case of vertex 0, the basis matrix B consists of the last four columns of the matrix A Im . In
the case of vertex v1 , the matrix B consists of the columns 1, 3, 5, 6, because x2 = 0 and x4 = 0.
Not every pair xi = 0, xj = 0 with i 6= j , determines a vertex of the feasible region. For instance,
the pair x5 = 0, x6 = 0 corresponds to the basis matrix
x1 x2 x3 x4
1 1 1 0
3 1 0 1
B= ,
1 0 0 0
0 1 0 0
T
with BI = {1, 2, 3, 4} and N I = {5, 6}. However, B−1 b = 7 6 −4 −9 is not
T T
nonnegative, and the point 7 6 is not in the feasible region. So, 7 6 −4 −9 0 0 is an
infeasible basic solution. Figure 2.8 shows all basic solutions, feasible as well as infeasible, of Model
Dovetail∗ . In this figure the feasible basic solutions are shown as circles, and the infeasible ones are
shown as squares. It is left to the reader to calculate all basic solutions.
A
normal vectors of H1 , . . . , Hm+n are precisely the rows of the matrix −I . This means
n
that, effectively, a vertex is determined by selecting n linearly independent rows from the
latter matrix.
x̂
Suppose now that we have a feasible basic solution x̂ with index sets BI and N I . Let
s
I = BI ∩ {1, . . . , n} and let J = (BI ∩ {n + 1, . . . , n + m}) − n. That is, I is the set
of indices in BI corresponding to decision variables, and n + J is the set of indices in BI
corresponding to slack variables. Note that we have that I ⊆ {1, . . . , n}, J ⊆ {1, . . . , m},
and
BI = I ∪ (n + J), and N I = I¯ ∪ (n + J). ¯
Recall that, at this feasible basic solution, we have that x̂N I = 0. This implies that:
I For i ∈ I¯, xi is a decision variable with value zero. Thus, the constraint xi ≥ 0 is binding
(i.e., it holds with equality) at the feasible basic solution. This constraint corresponds to
the hyperplane Hm+i .
I For j ∈ J¯, xn+j is a slack variable with value zero. Thus, the j ’th technology constraint
is binding at the feasible basic solution. This constraint corresponds to the hyperplane
Hj .
This means that the indices in N I provide us with (|N I| =) n hyperplanes that contain x̂.
The indices of these hyperplanes are: m + i for i ∈ I¯, and j for j ∈ J¯. Define:
BI c = J¯ ∪ (m + I),
¯ and N I c = J ∪ (m + I).
x1 (d) H1
x2 (d) H2
x3 (s) H3
BI
x4 (s) H4 BI
c
x5 (s) H5
x6 (s) H6
c
Figure 2.10: The set BI and its complementary dual set BI for Model Dovetail. The ‘d’ and ‘s’ in
parentheses mean ‘decision variable’ and ’slack variable’, respectively.
The set of binding hyperplanes can be derived using the above notation as follows. We have that I =
{1} (corresponding to the basic decision variable x1 ). Moreover, n + J = {3, 5, 6} (corresponding
to the basic slack variables x3 , x5 , and x6 ), so that J = {1, 3, 4}. It follows that I¯ = {2},
and J¯ = {2}. Therefore, the complementary dual sets are BI c = J¯ ∪ (m + I) ¯ = {2, 6} and
c
N I = J ∪ (m + I) = {1, 3, 4, 5}. We have that:
x̂BI
I Since I¯ = {2}, the decision variable x2 has value zero at the feasible basic solution . The
x̂
NI
equation x2 = 0 corresponds to the hyperplane (H4+2 =) H6 .
I Since J¯ = {2}, the slack variable (x2+2 =) x4 has value zero. The equation x4 = 0
corresponds to the hyperplane H2 .
Thus, the hyperplanes with indices in BI c are binding. Figure 2.10 shows the correspondence between
the variables with indices in BI and the hyperplanes with indices in BI c . The set BI c is the
complement of the set of indices of hyperplanes that correspond to the variables in BI .
The question that remains is: how can we be sure that the n hyperplanes are linearly inde-
A
pendent? In other words, do the rows of −I with indices in BI c form an invertible
n
matrix? The following theorem, Theorem 2.2.1,
shows the relationship between the in-
vertibility of the (m, m)-submatrix of m corresponding to the
columns in BI on
A I
A
the one hand, and the invertibility of the (n, n)-submatrix of −I corresponding to the
n
rows in BI c on the other hand.
The matrices
T A
and B =
B = A Im ?,BI −In c
BI ,?
are called complementary dual matrices. The term ‘dual’ will
become clear in Chapter 4. Note
that the columns of B are columns from the matrix
A Im , and the rows of the matrix
T A
B are rows from the matrix −I . Note also that B = AT −In ?,BI c .
n
Before we give a proof of Theorem 2.2.1, we illustrate the theorem by means of an example.
Example 2.2.5. Consider again the technology matrix A of Model Dovetail; see also Figure 2.3.
We have that n = 2, m = 4, and:
x 1 x2 x 3 x 4 x 5 x 6 H1 1 1
1 1 1 0 0 0 H2
3 1
A
3 1 0 1 0 0 1 0
H3
A I4 =
, and = .
1 0 0 0 1 0 −I2 0 1
H4
H5 −1
0 1 0 0 0 1 0
H6 0 −1
Note that the columns of A I4 correspond to the decision variables and the slack variables, and
the rows correspond to the technology constraints of the model with slack variables. For example, the
second column corresponds to x2 , and the first rowcorresponds
to the first technology constraint (i.e.,
A
x1 + x2 + x3 = 9). Similarly, the columns of −I correspond to the decision variables of the
2
model, and the rows correspond to the hyperplanes that define the feasible region. For example, the
third row corresponds to hyperplane H3 .
Take BI = {1, 3, 5, 6} and N I = {2, 4}. We saw in Example 2.2.2 that this choice of BI
and N I corresponds to a feasible basic solution, and we saw in Example 2.2.4 that this feasible basic
solution in turn corresponds to a vertex of the feasible region, namely the unique point in the intersection
of H2 and H6 . It is easy to see that H2 and H6 are linearly independent.
Using the concept of complementary dual matrices, we obtain the following. To see how the invertibility
of the basis matrix B implies that the Hi with i ∈ BI c form a collection of linearly independent
hyperplanes, we introduce the following ‘matrix’:
BI
x1 x2 x3 x4 x5 x6
1 1 1 0 0 0 1
H
3 1 0 1 0 0 H2
A Im 1 0 0 0 1 0 H3
=
−In
0 1 0 0 0 1
H
4 BI c
−1 0 H5
0 −1 H6
72 C h a p t e r 2 . G e o m e t ry a n d a l g e b ra o f f e a s i b l e r e g i o n s
The corresponding basis matrix B consists of the vertically shaded entries. Let B be the transpose of
the matrix consisting of the horizontally shaded entries. So, we have that:
x1 x3 x5 x6
1 1 0 0 " #
A
3 0 0 0 3 1
T H
B = A I4
= , and B = = 2 .
?,BI 1 0 1 0 −I2 BI c ,? H6 0 −1
0 0 0 1
The columns of the matrix B are the normal vectors corresponding to H2 and H6 . Therefore, H2 and
H6 are linearly independent if and only if the matrix B is invertible. Applying Gaussian elimination
(see Appendix B), we find that:
3
0 0 0
and BT ∼ 3
0
1 0 0 0
B∼ .
0 0 1 0 0 −1
0 0 0 1
So, both B and B are equivalent to a block diagonal matrix (see Appendix B.3) which has 3
(marked in bold in the matrices above) and the negative of an identity matrix as its blocks. Since the
matrices 3 and −1 are trivially invertible, it follows that the matrices B and B are invertible as
well; see also Appendix B.7.
AJ,I
¯ AJ,
¯ I¯ (Im )J,
¯ J¯ (Im )J,J
¯
1, . . . , m
"
A Im
# AJ,I AJ,I ¯ (I )
m J,J ¯ (I )
m J,J
≡
−In
(−In )I,I (−In )I,I¯
1, . . . , n.
(−In )I,I
¯ (−In )I,
¯ I¯
In this matrix, the row and column indices are given by the lists at the curly braces. Note that
¯ , (Im )J,J¯, (−In )I,I¯, and (−In )I,I
(Im )J,J ¯ are all-zero matrices. Using this fact, and applying
Gaussian elimination, where the 1’s of (Im )J,J and the −1’s of (−In )I,
¯ I¯ are the pivot entries,
2 . 2 . A l g e b ra o f f e a s i b l e r e g i o n s ; f e a s i b l e b a s i c s o lu t i o n s 73
we obtain that:
AJ,I
¯ 0 (Im )J,
¯ J¯ 0
0 0 0 (Im )J,J
" #
A Im
∼ .
−In
(−In )I,I 0
0 (−In )I,
¯ I¯
We will show that (i) =⇒ (ii). The proof of (ii) =⇒ (i) is left to the reader (Exercise 2.3.8).
We have that:
AJ,I
¯ (Im )J,J
¯ AJ,I
¯ 0
A Im ?,BI
= A?,I (Im )?,J ≡ ∼ .
AJ,I (Im )J,J 0 (Im )J,J
The latter matrix is a block diagonal matrix (see Appendix B.3), which is invertible if and only
if its blocks, i.e., the square matrices AJ,I ¯ and (Im )J,J , are invertible (see Appendix B.7).
Hence, because (Im )J,J is invertible and A?,I (Im )?,J is invertible by assumption, it follows
AJ,I
¯ 0
that AJ,I¯ is also invertible. This implies that 0 (−Im )I, is invertible as well, and since
¯ I¯
A¯
this matrix is equivalent to (−I J,)? , it follows that the latter is invertible. The latter matrix
¯?
n I,
A
h i
is exactly −I c
. This proves that (i) =⇒ (ii).
n BI ,?
Proof of Theorem 2.2.2. i To prove that every feasible basic solution corresponds to a ver-
x̂
h
tex, suppose first that x̂ BI is a feasible basic solution with respect to the invertible (m, m)-
NI
x̂ x̂
h i h i
submatrix B of A Im ; i.e., x̂BI = B−1 b ≥ 0, and x̂N I = 0. Let x̂ ≡ x̂ BI ; the
s NI
indices of x̂ are 1, . . . , n and the indices of x̂s are n + 1, . . . , n + m. We will show that x̂ is a
74 C h a p t e r 2 . G e o m e t ry a n d a l g e b ra o f f e a s i b l e r e g i o n s
It remains to show that x̂ is in the intersection of these n hyperplanes. We will show that:
ai1 x̂1 + . . . + ain x̂n = bi for i ∈ J¯,
(2.2)
and x̂j = 0 for j ∈ I¯.
The fact that x̂j = 0 for each j ∈ I¯ follows from the fact that I¯ ⊆ N I . The expression
‘ai1 x̂1 + . . . + ain x̂n = bi for i ∈ J¯’ can be written in matrix form as:
AJ,
¯ ? x̂ = bJ¯, (2.3)
where AJ, ¯ ? consists of the rows of A with indices in J¯, and bJ¯ consists of the corresponding
hx̂ i
entries of b in J¯. In order to prove (2.3), first notice that A Im x̂ = b. Partitioning Im
s
according to J and J¯ yields:
x̂
A (Im )?,J (Im )?,J¯ (x̂s )J = b.
(x̂s )J¯
This is equivalent to:
Ax̂ + (Im )?,J (x̂s )J + (Im )?,J¯(x̂s )J¯ = b.
Taking in this expression only the rows with indices in J¯, we find that:
¯ ? x̂ + (Im )J,J
AJ, ¯ (x̂s )J + (Im )J,
¯ J¯(x̂s )J¯ = bJ¯.
Since (Im )J,J¯ is an all-zeroes matrix and (x̂s )J¯ = 0 because J¯ ⊆ N I , it follows that AJ,
¯ ? x̂ =
bJ¯. This proves that (2.3) holds, which means that (2.2) holds. Hence, x̂ indeed corresponds
to a vertex of F .
For the converse, let x̂ be a vertex of F . Hence, Ax̂ ≤ b, x̂ ≥ 0, and there are n independent
hyperplanes Hi1 , . . . , Hin that intersect at x̂. The normal vectors of Hi1 , . . . Hin are rows of
2 . 2 . A l g e b ra o f f e a s i b l e r e g i o n s ; f e a s i b l e b a s i c s o lu t i o n s 75
A
h i
the matrix −I . Hence, the hyperplanes Hi1 , . . . , Hin correspond to an invertible (n, n)-
n h
A
i
T
submatrix B of −I . Let I ⊆ {1, . . . , n}, J ⊆ {1, . . . , m}, with |I|¯ + |J|¯ = n, be
n
T A¯
such that B = (−I J,)? . Therefore, AJ, ¯ ? x̂ = bJ¯ and x̂I¯ = 0. According to Theorem
¯
n I,
?
2.2.1, the matrix B = A?,I (Im )?,J is an invertible (m, m)-submatrix of A Im . Define
x̂ x̂
h i h i
x̂s = b − Ax̂, and let x̂ ≡ x̂ BI with BI = I ∪ (n + J) and N I = I¯ ∪ (n + J) ¯ . Since
s NI
A?,J¯x̂ = bJ¯, it follows that the corresponding vector of slack variables satisfies (x̂s )J¯ = 0, so
that
x̂ 0 x̂I¯
x̂N I = = . =
x̂s
NI
0 (x̂s )J¯
−1 x̂
Since x̂ ≥ 0, we also have that x̂BI = B b ≥ 0, and hence BI is in fact a feasible basic
x̂N I
solution with respect to B.
From Theorem 2.2.2, it follows that every vertex of the feasible region corresponds to at
least one feasible basic solution, while a feasible basic solution corresponds to precisely one
vertex.
x2
x1 ≤ 6
v4 v3
6 v
v2
v1
0 6 x1
Figure 2.11: Degenerate vertex and redundant Figure 2.12: Pyramid with a degenerate top.
hyperplane.
tionship between degenerate vertices and degenerate feasible basic solutions, and between
uniqueness of feasible basic solutions and nondegeneracy.
Proof of Theorem 2.2.3. (i) =⇒ (ii). Suppose that v is degenerate. Let H1 , . . . , Hm+n be
A
h i
the hyperplanes corresponding to the rows of the matrix −I . Since v is a vertex, there exist n
n
linearly independent hyperplanes among H1 , . . . , Hm+n that contain v. Let N I ⊆ {1, . . . , n +
n
m} be the set of indices of these n hyperplanes. For i = 1, . . . , m + n, let ai (∈ R ) be
the normal vector corresponding to hyperplane Hi . Since v is degenerate, there exists another
hyperplane, say Hp with normal vector ap , that contains v. By Theorem B.1.1 applied to the
vectors ai (i ∈ N I ) and ap , there exists N I 0 ⊆ N I ∪ {p} such that the n vectors ai (i ∈ N I 0 )
are linearly independent. Let BI = {1, . . . , n + m} \ N I and BI 0 = {1, . . . , n + m} \ N I 0 .
It follows from Theorem 2.2.1 that BI and BI 0 are the index sets of the basic variables of two
distinct basic solutions, and it follows from the construction of BI and BI 0 that these basic
solutions are in fact feasible basic solutions that correspond to v.
x x
h i h i
(ii) =⇒ (iii). Consider any feasible basic solution xBI ≡ x corresponding to v. Let
NI s
0
N I be the index
h set
i of nonbasic variables of a different feasiblehbasic
i solution that corresponds
x x x
h i
to v. Because x = 0 and x = 0, we have that x has at least m + 1 entries
s NI s N I0 s
2 . 2 . A l g e b ra o f f e a s i b l e r e g i o n s ; f e a s i b l e b a s i c s o lu t i o n s 77
with value zero, and hence at most n − 1 entries with nonzero value. It follows that xBI has at
least one entry with value zero, which means that this feasible basic solution is degenerate.
x
h i
(iii) =⇒ (i). Let BI 0 be a degenerate feasible basic solution with respect to the invertible
(m, m)-submatrix B of A Im . Hence xBI contains at least one zero entry, say xk with
x
h i
k ∈ BI . Let v be the vertex of F corresponding to BI 0 (see Theorem 2.2.2) determined
by the n equalities xi = 0 with i ∈ N I . Besides the n hyperplanes corresponding to xi = 0
with i ∈ N I , the equation xk = 0 (k ∈ BI ) corresponds to a hyperplane that also contains v.
Hence, there are at least n + 1 hyperplanes that contain v, and this implies that v is degenerate.
By the definition of a degenerate feasible basic solution, it holds that the feasible basic solu-
tions corresponding to a particular vertex v are either all degenerate, or all nondegenerate.
In fact, Theorem 2.2.3 shows that in the latter case, there is a unique nondegenerate feasible
basic solutions. So, for any vertex v of the feasible region of a standard LO-model, there
are two possibilities:
1. The vertex v is degenerate. Then, v corresponds to multiple feasible basic solutions and
they are all degenerate.
2. The vertex v is nondegenerate. Then, v corresponds to a unique feasible basic solution,
and this feasible basic solution is nondegenerate.
Degeneracy plays an important role in the development of the simplex algorithm in Chapter
3; see also Section 3.5.1. In the following example, we illustrate the first possibility, i.e., a
degenerate vertex with degenerate feasible basic solutions.
T
Example 2.2.6. Consider again the degenerate vertex v1 = 6 0 in Figure 2.11. Introducing
the slack variables x3 , x4 , x5 , and x6 for the constraints (1.1), (1.2), (1.3), and (1.4), respectively,
we find three different feasible basic solutions associated with v1 , namely the feasible basic solutions
with as nonbasic variables the pairs {x2 , x4 }, {x2 , x5 }, and {x4 , x5 }. Since these three feasible
basic solutions correspond to the same vertex, it follows from Theorem 2.2.3 that the vertex v1 is
degenerate and so are the three feasible basic solutions corresponding to it.
2.2.5 Adjacency
In Chapter 3, we will see that, using the so-called simplex algorithm, an optimal solution
can be found by starting at vertex 0 of the feasible region (provided that the point 0 is in fact
a feasible solution) and then proceeding along the boundary of the feasible region towards
an optimal vertex. In fact, an optimal vertex is reached by ‘jumping’ from vertex to vertex
along the boundary of the feasible region, i.e., from feasible basic solution to feasible basic
solution, in such a way that in each step one nonbasic variable changes into a basic variable
and thus one basic variable becomes a nonbasic variable.
Tm+n
Let n, m ≥ 1, and let F = i=1 Hi+ be nonempty where H1 , . . . , Hm+n are hyper-
planes in Rn . Two distinct vertices u and v of F are called adjacent if and only if there are
78 C h a p t e r 2 . G e o m e t ry a n d a l g e b ra o f f e a s i b l e r e g i o n s
two index sets I1 , I2 ⊆ {1, . . . , m + n} that differ in exactly one index, such that {u} =
i∈I1 Hi and {v} = i∈I2 Hi . Two adjacent vertices u and v therefore ‘share’ n − 1
T T
independent hyperplanes, say H1 , . . . , Hn−1 ; the remaining two hyperplanes, say Hn and
Hn+1 , determine the locations of the vertices u and v on the ‘line’ H1 ∩. . .∩Hn−1 . (Why
is this intersection one-dimensional?) Hence, {u} = H1 ∩ . . . ∩ Hn−1 ∩ Hn and {v} =
H1 ∩. . .∩Hn−1 ∩Hn+1 , whereas both {H1 , . . . , Hn−1 , Hn } and {H1 , . . . , Hn−1 , Hn+1 }
are independent sets of hyperplanes.
In Appendix D it is shown that the definition of adjacency of vertices given above is equiv-
alent to the following one. Two distinct vertices u and v are adjacent in F if and only if
there exists a hyperplane H , with a corresponding halfspace H + , such that F ⊂ H + and
F ∩ H = conv{u, v}. When comparing the definitions of ‘vertex’ and ‘adjacent vertices’
(see Section 2.1.2), one may observe that conv{u, v} is a face of F ; see Section 2.1.3.
Example 2.2.7. Consider again Model Dovetail (where n = 2), and Figure 1.2. It is clear from
the figure that the vertices v2 and v3 are adjacent in F . The hyperplanes H1 , H2 , H4 (see Section
2.1.1) determine v2 and v3 , namely {v2 } = H1 ∩ H2 and {v3 } = H1 ∩ H4 . See also Exercise
2.3.2.
We are now ready to formulate the ‘vertex adjacency’ theorem. We will use the expression
‘two feasible basic solutions differ in one index’. By this, we mean that the corresponding
index sets BI1 and BI2 differ in precisely one index, i.e., |BI1 \ BI2 | = |BI2 \ BI1 | = 1.
Proof of Theorem 2.2.4. Let u and v be two adjacent vertices of the nonempty fea-
sible region F = x ∈ Rn Ax ≤ b, x ≥ 0 . According to the definition of adjacency,
A
h i
there are n + 1 hyperplanes H1 , . . . , Hn , Hn+1 corresponding to the rows of −I , such
n
that {u} = H1 ∩ . . . ∩ Hn−1 ∩ Hn and {v} = H1 ∩ . . . ∩ Hn−1 ∩ Hn+1 , and both
{H1 , . . . , Hn−1 , Hn } and {H1 , . . . , Hn−1 , Hn+1 } are independent sets of hyperplanes. Let
A A
h i h i
B1 and B2 be the (n, n)-submatrices of −I with B1 consisting of the rows of −I cor-
n n
A
h i
responding to H1 , . . . , Hn−1 , Hn , and B2 consisting of the rows of −I corresponding to
n
H1 , . . . , Hn−1 , Hn+1 . Hence, B1 and B2 are invertible. The complementary dual matrices
B1 and B2 of B1 and B2 , respectively, are invertible submatrices of A Im according to
Theorem 2.2.1.
2 . 3. E xe rc i se s 79
One can easily check that B1 and B2 differ in precisely one row and therefore that B1 and B2
differ in precisely one column. Let BI1 consist of the indices corresponding to the columns of
B1 in A Im , and BI2 of the indices corresponding to the columns of B2 in A Im . Then
xBI1 x 2
0
and BI 0
are feasible basic solutions with respect to B1 and B2 , respectively, that
differ in precisely one index.
x 1 xBI2
Conversely, let BI 0
and 0
be feasible basic solutions corresponding to the invertible
(m, m)-submatrices B1 and B2 , respectively, of A Im that differ in precisely one index. The
sets of hyperplanes {H1 , . . . , Hn−1 , Hn } and {H1 , . . . , Hn−1 , Hn+1 }, corresponding to the
complementary dual matrices B1 and B2 of B1 and B2 , respectively, are both independent
(see Theorem 2.2.1), and define therefore a vertex u and a vertex v of F , respectively. If u 6= v,
then u and v are adjacent. If u = v, then u (= v) is a degenerate vertex of F , because all n + 1
hyperplanes H1 , . . . , Hn−1 , Hn , Hn+1 contain u (= v).
The phrase “there exists” in Theorem 2.2.4 is important. Not every pair of feasible basic
solutions corresponding to adjacent vertices is also a pair of adjacent feasible basic solutions.
Also, in the case of degeneracy, adjacent feasible basic solutions may correspond to the same
vertex, and hence they do not correspond to adjacent vertices.
2.3 Exercises
Exercise 2.3.1.
(a) Give an example in R2 of an LO-model in which the feasible region has a degenerate
vertex that has no redundant binding constraints.
(b) Give an example in R3 of an LO-model in which the optimal solution is degenerate
without redundant binding technology constraints and all nonnegativity constraints are
redundant.
Exercise 2.3.4. Show that any LO-model with a bounded feasible region has no directions
of unboundedness.
Exercise 2.3.5. Consider the following assertion. If every vertex of the feasible region of
a feasible LO-model is nondegenerate, then the optimal solution is unique. If this assertion
is true, give a proof; otherwise, give a counterexample.
Exercise 2.3.6. Show that the feasible region of the following LO-model is unbounded,
and that it has multiple optimal solutions.
max −2x1 + x2 + x3
s.t. x1 − x2 − x3 ≥ −2
x1 − x2 + x3 ≤ 2
x1 , x2 ≥ 0, x3 free.
Determine all vertices (extreme points) and extreme directions of the feasible region.
Exercise 2.3.7. Consider a feasible standard LO-model. Assume that this model has
an unbounded ‘optimal solution’ in such a way that the objective function can be made
arbitrarily large by increasing the value of some decision variable, xk , say, while keeping the
values of the other variables fixed. Let α > 0, and consider the original model extended
with the constraint xk ≤ α. Assuming that this new model has a (bounded) optimal solution,
show that it has an optimal solution x∗ for which x∗k = α.
Exercise 2.3.8. Prove that (ii) implies (i) in the proof of Theorem 2.2.1.
Exercise 2.3.9. Prove that an unbounded feasible region cannot be written as the convex
hull of finitely many points.
T
Exercise 2.3.10. Prove that the FJ = F ∩ j∈J Hj in Section 2.1.3 holds.
Exercise 2.3.12. Consider Model Dovetail. Using the method in Example D.3.1, con-
P5
struct values of λ1 , . . . , λ5 ≥ 0 with i=1 λi so that x0 = λv1 + . . . + λ4 v4 + λ5 0. Is
this choice of values of λi ’s unique? Explain your answer.
Overview
The simplex algorithm that is described in this chapter was invented in 1947 by George
B. Dantzig (1914–2005). In 1963, his famous book Linear Programming and Extensions was
published by Princeton University Press, Princeton, New Jersey. Since that time, the im-
plementations of the algorithm have improved drastically. Linear optimization models with
millions of variables and constraints can nowadays readily be solved by the simplex algorithm
using modern computers and sophisticated implementations. The basic idea of Dantzig’s
simplex algorithm for solving LO-models is to manipulate the columns of the technology
matrix of the model in such a way that after a finite number of steps an optimal solution
is achieved. These steps correspond to jumps from vertex to vertex along the edges of the
feasible region of the LO-model, while these jumps in turn correspond to manipulations of
the rows of the technology matrix. This relationship between manipulating columns and
manipulating rows was explained in Chapter 2.
In Section 2.2.2, we saw that each vertex of the feasible region is determined by setting two
variables (the nonbasic variables) to zero in Model Dovetail with slack variables (Section
1.2.2), and that the basic variables can be expressed in terms of the nonbasic variables. We
also saw that there are fifteen possible ways to select a (4, 4)-submatrix in the technology
83
84 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
x2
x5 =0
9
x4 =0
v3 x6 =0
6 v
4
v2
v1 x3 =0
0 6 7 9 x1
matrix A I4 of Model Dovetail with slack variables and that, therefore, there are fifteen
candidate choices of basic variables that determine a basic solution. Some of these candidates
lead to feasible basic solutions, and some of them may lead to infeasible basic solutions. The
feasible basic solutions correspond to vertices of the feasible region. Therefore, in the light
of the optimal vertex theorem (Theorem 2.1.5), we could in principle find an optimal
solution by determining, for each of these fifteen choices, whether they correspond to a
feasible basic solution, and then choosing among them the one that has the largest objective
value. This approach, however, has two major drawbacks. First, in general, the number of
feasible basic solutions can be extremely large: for a standard LO-model with n variables
and m constraints, the number of candidates is m+n m
. Therefore, checking all possibilities
is very inefficient; see also Chapter 9. As we will see, the simplex algorithm is a procedure
that considers vertices in a more systematic way. A second drawback is that considering all
feasible basic solutions does not provide a way to detect that the LO-model at hand has an
unbounded solution. The simplex algorithm does not have this restriction.
The idea behind the simplex algorithm is surprisingly simple: it starts at a vertex of the fea-
sible region, if possible 0, and then continues by choosing a vertex adjacent to the starting
vertex. In each iteration the simplex algorithm finds, except possibly in the case of degener-
acy, a vertex adjacent to the previous one but with a larger objective value. The algorithm
stops when the objective value cannot be improved by choosing a new adjacent vertex.
Recall Model Dovetail with slack variables:
has value 0 at that vertex. The coefficients of the objective function corresponding to the
nonbasic variables x1 and x2 (3 and 2, respectively) are both positive. Therefore, we can
increase the objective value by increasing the value of one of the nonbasic variables. Suppose
that we choose to increase the value of x1 . (It is left to the reader to repeat the calculations
when choosing x2 instead of x1 .) The value of x2 remains zero. In terms of Figure 3.1, this
means that we start at the origin 0, and we move along the horizontal axis in the direction of
vertex v1 . An obvious question is: by how much can the value of x1 be increased without
‘leaving’ the feasible region, i.e., without violating the constraints? It is graphically clear
from Figure 3.1 that we can move as far as vertex v1 , but no further.
To calculate algebraically by how much the value of x1 may be increased, we move to the
realm of feasible basic solutions. We started at the feasible basic solution for which x1 and
x2 are the nonbasic variables, corresponding to the origin vertex 0. Since we are increasing
the value of x1 , the variable x1 has to become a basic variable. We say that x1 enters the
set of basic variables, or that x1 is the entering nonbasic variable. Recall that we want to jump
from one feasible basic solution to another. Recall also that any given feasible basic solution
corresponds to a set of exactly m (= 4) basic variables. Since we are jumping to a new
vertex and x1 becomes a basic variable, this means that we have to determine a variable that
leaves the set of basic variables. Moreover, because the value of x2 remains zero, the variable
x2 remains nonbasic. Taking x2 = 0 in the constraints of (M1), and using the fact that the
slack variables have to have nonnegative values, we obtain the inequalities:
So, we may increase the value of x1 to at most min{9/1, 18/3, 7/1} = min{9, 6, 7} = 6.
Any larger value will yield a point outside the feasible region. Determining by how much
the value of the entering variable can be increased involves finding a minimum ratio (in this
case among 9/1, 18/3, and 7/1). This process is called the minimum-ratio test.
Figure 3.1 illustrates the situation geometrically: starting at the origin vertex 0, we move
along the x1 -axis as the value of x1 increases. As we move ‘to the right’, we consecutively
hit the lines corresponding to x4 = 0 (at x1 = 6), x5 = 0 (at x1 = 7), and x3 = 0 (at
x1 = 9). Clearly, we should stop increasing the value of x1 when x1 has value 6, because if
the value of x1 is increased more than that, we will leave the feasible region. Observe that
the values 6, 7, 9 in parentheses correspond to the values in the minimum-ratio test.
Because we want to end up at a vertex, we set x1 to the largest possible value, i.e., we set
x1 = 6, and we leave x2 = 0, as it was before. This implies that x3 = 3, x4 = 0, x5 = 1,
x6 = 6. Since we stopped increasing the value of x1 when we hit the line x4 = 0, we
necessarily have that x4 = 0 at the new vertex. Thus, we can choose x4 as the new nonbasic
variable. We say that x4 leaves the set of basic variables, or that x4 is the leaving basic variable.
So we are now at the feasible basic solution with nonbasic variables x2 and x4 . This feasible
basic solution corresponds to vertex v1 , i.e., the vertex determined by the lines x2 = 0 and
x4 = 0.
86 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
3x1 + x2 + x4 = 18.
The reason for choosing this particular constraint (instead of one of the other constraints) is
that this is the only constraint of (M1) that contains the leaving variable x4 . Solving for x1 ,
we find that:
x1 = 31 (18 − x2 − x4 ). (3.1)
Substituting (3.1) into the objective max 3x1 + 2x2 , the new objective becomes:
max 18 + x2 − x4 .
We can rewrite the remaining constraints in a similar way by substituting (3.1) into them.
After straightforward calculations and reordering the constraints, the model becomes:
min x2 − x4 + 18 (M2)
1
s.t. x
3 2
+ 13 x4 + x1 =6
2 1
x
3 2
− x
3 4
+ x3 =3
− 13 x2 − 1
x
3 4
+ x5 =1
x2 + x6 =6
x2 , x4 , x1 , x3 , x5 , x6 ≥ 0.
This model is equivalent to (M1) in the sense that both models have the same feasible region
and the same objective; this will be explained in more detail in Section 3.2. Notice also that
we have written the model in such a way that the current basic variables x1 , x3 , x5 , and x6
play the role of slack variables. In particular, they appear in exactly one equation of (M2),
and they do so with coefficient 1. Figure 3.2 shows the feasible region of (M2), when x1 ,
x3 , x5 , and x6 are interpreted as slack variables. Observe that, although the shape of the
feasible region has changed compared to the one shown in Figure 3.1, the vertices and the
adjacencies between the vertices are preserved. Notice also that the objective vector has
rotated accordingly and still points towards the optimal vertex v2 .
Informally speaking, it is as if we are looking at the feasible region with a camera, and we
move the camera position from 0 to v1 , so that we are now looking at the feasible region
from the point v1 . We therefore say that the feasible region depicted in Figure 3.2 is the
feasible region of Model Dovetail from the perspective of the vertex v1 . The vertex v1 is
now the origin of the feasible region. That is, it corresponds to setting x2 = 0 and x4 = 0.
Substituting these equations into the constraints and the objective function of (M2) implies
that x1 = 6, x3 = 3, x5 = 1, and x6 = 6, and the current corresponding objective value
is 18.
Comparing the objective function of (M1) (namely, 3x1 + 2x2 ) with the objective function
of (M2), (namely, 18 + x2 − x4 ), we see that in the first case the coefficients of the nonbasic
variables x1 and x2 are both positive, and in the second case the coefficient of x2 is positive
3 . 1 . F r o m v e r t e x t o v e r t e x t o a n o p t i m a l s o lu t i o n 87
x2
x1
0
=
x 3=
0
v4 x6 =0
6 v3
v2
0
v1 3 6 9 12 15 18 x4
and that of x4 is negative. The constant in the objective function is 0 in the first case, and
18 in the second case.
Next, since the objective coefficient of x2 in (M2) is positive, we choose x2 to enter the
set of basic variables. This means that we will increase the value of x2 , while x4 remains a
nonbasic variable and, therefore, the value of x4 remains zero. In terms of Figure 3.2, this
means that we are moving upward along the x2 -axis, starting at v1 , in the direction of v2 .
Taking x4 = 0 in the constraints of (M2), we find the following inequalities:
1
x
3 2
≤ 6, 2
x
3 2
≤ 3, − 31 x2 ≤ 1, and x2 ≤ 6.
The third inequality is, because of the negative coefficient in front of x2 , not restrictive
when determining a maximum value for x2 . Hence, according to the minimum-ratio test,
the value of x2 can be increased to at most:
where ‘?’ signifies the nonrestrictive third inequality. Taking x2 = 4 21 and x4 = 0, we find
that x1 = 4 21 , x3 = 0, x5 = 2 12 , and x6 = 1 12 . Geometrically, we see from Figure 3.2 that,
as we move from v1 towards v2 , the first constraint that is hit is the one corresponding to
the line x3 = 0. Therefore, x3 is the new nonbasic variable, i.e., it leaves the set of basic
variables. So, currently, the basic variables are x1 , x2 , x5 , and x6 .
We again rewrite the model by using the unique constraint of (M2) that contains the leaving
variable x3 , namely:
2
x − 13 x4 + x3 = 3.
3 2
x2 = 4 12 + 12 x4 − 32 x3 .
88 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
x3
x5 =0
0
0 0
6 x 2= x 6=
v1 v4
x1 =0
v2 v3 6 12 18 x4
Again, after substituting this expression into the objective function and the constraints of
(M2), we find the following equivalent LO-model:
max −1 12 x3 − 12 x4 + 22 12 (M3)
s.t. − 12 x3 + 1
x
2 4
+ x1 = 4 12
1 12 x3 − 1
x
2 4
+ x2 = 4 12
1 1
x
2 3
− x
2 4
+ x5 = 2 12
−1 12 x3 + 12 x4 + x6 = 1 12
x3 , x4 , x1 , x2 , x5 , x6 ≥ 0.
Now it is clear that the objective value cannot be improved anymore, because the coefficients
of both x3 and x4 are negative: increasing the value of x3 or x4 will decrease the objective
value. For x3 = 0, x4 = 0 (corresponding to vertex v2 ), the objective value is 22 21 , which
is therefore the optimal value. The values of the basic variables x1 , x2 , x5 , and x6 are 4 21 ,
4 21 , 2 12 , and 1 12 , respectively. Figure 3.3 shows the feasible region of (M3), when x1 , x2 ,
x5 , and x6 are interpreted as slack variables.
Summarizing the above procedure, we have taken the following steps (see Figure 3.4):
I 1. Start at 0 (x1 = 0, x2 = 0); the objective value is 0.
I 2. Go to vertex v1 (x2 = 0, x4 = 0); the objective value is 18.
I 3. Go to vertex v2 (x3 = 0, x4 = 0); the objective value is 22 21 .
I 4. Stop, because the coefficients of x3 and x4 in the objective function are negative.
In each step, we produced a new feasible basic solution, and we rewrote the original LO-
model in such a way that it was straightforward to decide whether the current feasible basic
solution was optimal and, if not, how to determine the next step. This idea of rewriting the
model in an equivalent form is one of the key features of the simplex algorithm.
3 . 2 . L O - m o d e l r e f o r m u l at i o n 89
x2
x5 =0
9
x4 =0
v3 x6 =0
6 v
4
v2
v1 x3 =0
0 6 7 9 x1
Figure 3.4: Path taken by the simplex algorithm when applied to Model Dovetail.
max cT x
s.t. Ax + xs = b (3.2)
x ≥ 0, xs ≥ 0.
Now consider a feasible basic solution. Let BI ⊆ {1, . . . , n + m} be the set of indices of
the corresponding basic variables, and let N I = {1, . . . , n + m} \ BI be the set of indices
of the nonbasic variables. Note that BI ∩ N I = ∅ and BI ∪ N I = {1, . . . , n + m}.
Define:
B = A Im ?,BI , and N = A Im ?,N I .
Following the discussion in Section 2.2.2 and in particular equation (2.1), we have that,
given any feasible basic solution with a corresponding basis matrix, the basic variables may
be uniquely expressed in terms of the nonbasic variables. In particular, we have that:
Recall that, because B corresponds to a feasible basic solution, we have that B−1 b ≥ 0.
The fact that the basic variables can be expressed uniquely in terms of the nonbasic variables
means that we may rewrite model (3.2) into an equivalent LO-model. By ‘equivalent’, we
mean that the two LO-models have the same set of feasible solutions, and the same objective
value for each feasible solution. First, using (3.3), we may rewrite the technology constraints
90 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
of (3.2) as follows:
x
Ax + Im xs = b ⇐⇒ A Im =b
x
s
xBI
⇐⇒ B N =b
xN I
⇐⇒ BxBI + NxN I = b (3.4)
⇐⇒ B−1 BxBI + B−1 NxN I = B−1 b
⇐⇒ Im xBI + B−1NxN I = B−1 b
xBI
Im B−1 N = B−1 b.
⇐⇒
xN I
Here, we have used the fact that B is an invertible matrix. Next, again using (3.3), we may
rewrite the objective function as:
In virtue of (3.3), each constraint of (3.6) corresponds to a unique basic variable. Therefore,
for each constraint of (3.6), we sometimes add the corresponding basic variable as a row
label.
3 . 3 . Th e s i m p l e x a l g o r i t h m 91
Example 3.2.1. Consider again (M2). We have that the basic variables corresponding to the
feasible basic solution are x1 , x3 , x5 , x6 . That is, BI = {1, 3, 5, 6} and N I = {2, 4}. Hence,
we have that:
x1 x3 x5 x6 x2 x4
0 13 0 0
1 1 0 0 1 0
1
1 −3
3 0 0 0
1 1
0 0
B= , N = , and B−1 = .
1 0 1 0 0 0 1
0 −3 1 0
0 0 0 1 1 0 0 0 0 1
It is left to the reader to check that:
h x2 x4 i
cTBI B−1 b = 18, T
cN T
I − cBI B
−1
N= 1 −1 ,
x2 x4
1 1
x1 x1 6
3 3
2
− 13
x3 3
−1
x3
B N= 3 , B−1 b = .
− 31 − 13
x5
x5 1
x6 1 0 x6 6
From these expressions, it can be seen that the LO-model formulation (M2) is of the form (3.6). That
is, (M3) is equivalent to Model Dovetail, but it is written from the perspective of the feasible basic
solution with basic variables x1 , x3 , x5 , x6 , so that:
x1 x3 x5 x6 x2 x4
1 1
x1 1 0 0 0 3 3
2
− 13
−1
−1
x3 0 1 0 0 3
B B N = Im B N = .
x5 0 0 1 0 − 13 − 13
x6 0 0 0 1 1 0
Also, the current objective coefficients of the first and second nonbasic variables are 1 and −1, respectively.
It is left to the reader to check that the LO-model formulations (M1) and (M3) are also of the form
(3.6).
(c) find a variable that enters the set of basic variables and a variable that leaves the set
of basic variables, and construct a new feasible basic solution.
Because of Theorem 2.2.2 and Theorem 2.2.4, the procedure of ‘jumping’ from vertex to
vertex along the boundary of the feasible region is a routine algebraic matter of manipulating
systems of linear equations Ax + xs = b with x ≥ 0 and xs ≥ 0. Jumping to an
adjacent vertex means changing the corresponding feasible basic solution by exchanging a
basic variable with a suitable nonbasic variable.
xN I n
= B−1 b − (B−1 N)?,α xN Iα .
Note that (B−1 N)?,α is the column of B−1 N that corresponds to the nonbasic variable
xN Iα . The above expression represents a system of m linear equations. In fact, it is instruc-
tive to explicitly write out these equations:
Using the fact that xBIj ≥ 0 for j ∈ {1, . . . , m}, we obtain the following inequalities for
xN I α :
(B−1 N)j,α xN Iα ≤ (B−1 b)j for j = 1, . . . , m. (3.8)
Among these inequalities, the ones with (B−1 N)j,α ≤ 0 do not provide an upper bound
for the value of xN Iα , i.e., they form no restriction when increasing the value of xN Iα . So,
since we are interested in determining by how much the value of xN Iα can be increased
without violating any of these inequalities, we can compactly write the relevant inequalities
as follows:
( )
(B−1 b)j −1
xN Iα ≤ min (B N)j,α > 0, j = 1, . . . , m . (3.9)
(B−1 N)j,α
Let δ be the value of the right hand side of this inequality. So, xN Iα can be increased to
−1
(B b)β
at most δ . Let β be such that (B
−1 = δ , assuming for now that β exists. This is the
N)β,α
situation that we encountered in models (M1) and (M2) in Section 3.1. The case where
β does not exist will be discussed later in this section. Set xN Iα := δ , while keeping the
values of all other nonbasic variables at zero, i.e., xN Ii = 0 for i 6= α. Now, (3.7) provides
us with the new values for the variables with indices in BI , namely:
By the choice of δ , we again have that xBI ≥ 0. Moreover, we have that the equation of
(3.7) with j = β (corresponding to the basic variable xBIβ ) becomes:
(B−1 b)β
xBIβ = (B−1 b)β − δ(B−1 N)β,α = (B−1 b)β − (B−1 N)β,α = 0.
(B−1 N)β,α
So, the new value of xBIβ is zero. We can therefore remove this variable from the set of
basic variables, and insert it into the set of nonbasic variables. In other words, xBIβ leaves
the set of basic variables. Thus, we now have arrived at a new feasible basic solution. The
corresponding new set of basic variable indices is BI := (BI \ {BIβ }) ∪ {N Iα }, and
94 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
We saw at the end of the previous section that the first and second nonbasic variables (x2 and x4 ,
respectively) have current objective coefficients 1 and −1, respectively. Since x2 has a positive current
objective coefficient, we let this variable enter the set of basic variables. Thus, α = 1 (because N I1 =
2). Therefore, we have that (see Example 3.2.1):
1
3
−1 −1
2
(B N)?,α = (B N)?,1 − 1 .
= 3
3
1
This means that (3.9) becomes:
6 3 6
= min 18, 4 21 , ?, 6 = 4 12 =: δ,
xN I1 = x2 ≤ min , , ?,
1/3 2/3 1
where ‘?’ indicates an entry that satisfies (B−1 N)j,α ≤ 0. The minimum is attained by the second
entry, i.e., β = 2. The corresponding basic variable, xBI2 = x3 , leaves the set of basic variables.
Now we set xN I1 = x2 := 4 21 , while leaving the other nonbasic variable (i.e., x4 ) at value zero.
The new values of the basic variables with indices in BI are given by:
1 1
x1 6 3
42
x3 3 2
= − 31 × 4 1 = 01 .
xBI = x5 1 − 2 2
3 2
x6 6 1 1 21
Observe that indeed x3 = 0. The new solution satisfies x1 = x2 = 4 21 , which means that we have
arrived at vertex v2 . The corresponding new sets of basic and nonbasic variable indices are now given
by (recall that α = 1 and β = 2):
Recall that we assumed that β exists. An important question is: what happens if β does not
exist? This happens if and only if (B−1 N)j,α ≤ 0 for each j ∈ {1, . . . , m} in (3.9). This
means that none of the inequalities of (3.8) are restrictive and, as a result, we may increase
the value of xN Iα by any arbitrary amount. Since increasing the value of xN Iα results in
an increase of the objective value (recall that the current value of the objective coefficient
of xN Iα is positive), this means that the LO-model has an unbounded solution; see also
Section 3.5.3.
and an initial feasible basic solution for the LO-model with slack variables
(see Section 3.6).
Output: Either an optimal solution of the model, or the message ‘the model is
unbounded’.
I Step 1: Constructing the basis matrix. Let BI and N I be the sets of the indices of
the basic and nonbasic variables, respectively, corresponding to the current
feasible basic solution. Let B consist of the columns of A Im with
Go to Step 4.
I Step 4: Exchanging. Set BI := (BI \ {BIβ }) ∪ {N Iα } and N I := (N I \
{N Iα }) ∪ {BIβ }. Return to Step 1.
The computational process of the Steps 2–4 is called pivoting. Determining values for α and
β together with carrying out one pivot operation is called an iteration. The choice of the
variable xN Iα that enters the current set of basic variables in Step 2 is motivated by the fact
that we want to increase the current objective value. If we deal with a minimizing LO-model
(see Section 1.3), then we should look for a current objective coefficient which is negative.
The choice of the leaving variable in Step 3 is based on the requirement that all variables
must remain nonnegative, keeping the current solution feasible. The leaving variable, de-
termined by means of the minimum-ratio test in Step 3, is a current basic variable whose
nonnegativity imposes the most restrictive upper bound on the increase of the value of the
entering variable.
Example 3.3.2. The simplex algorithm as described above will now be applied to Model Dovetail
(see Section 1.1).
x1 = x2 = 0, x3 = 9, x4 = 18, x5 =
I Initialization. The initial feasible solution is
7, x6 = 6, and BI = {3, 4, 5, 6}, N I = {1, 2}.
I Iteration 1. The current value of the objective coefficient ofx1 is 3, which is positive. Therefore,
x1 will enter the set of basic variables. In order to find out which variable should leave this set,
we perform the minimum-ratio test, namely: min{9/1, 18/3, 7/1} = 18/3. Hence, α = 2,
and therefore (xN I2 =) x4 leaves the set of basic variables, so that BI := {1, 3, 5, 6} and
N I := {2, 4}. Rewriting the model from the perspective of the new feasible basic solution, we
obtain model (M2) of Section 3.1.
x2 , so that x2
I Iteration 2. Now the only positive objective coefficient is the one corresponding to
will enter the set of basic variables. The minimum-ratio test shows that x3 leaves the set. Hence,
BI := {1, 2, 5, 6}, and N I := {3, 4}. Rewriting the model, we obtain model (M3) of
Section 3.1.
I Iteration 3. Since none of the current objective coefficients has a positive value at this point, we
have reached an optimal solution.
matrix needs to be inverted. This takes a significant amount of time because computing the
inverse of a matrix is time-consuming1 .
There is a useful technique for carrying out the calculations required for the simplex algo-
rithm without inverting the matrix B in each iteration step. This technique makes use of
so-called simplex tableaus. A simplex tableau is a partitioned matrix that contains all relevant
information for the current feasible basic solution. A simplex tableau can schematically be
depicted in the following partitioned and permuted form (see also Example 3.4.1 below):
0T
T
cN T
I − cBI B
−1
N −cTBI B−1 b one row
A simplex tableau has m + 1 rows and m + n + 1 columns. The top row, which is referred
to as ‘row 0’, contains the current objective coefficients for each of the variables, and (the
negative of ) the current objective value. The remaining rows, which are referred to as ‘row
1’ through ‘row m’, contain the coefficients and right hand sides of the constraints in the
formulation of the LO-model from the perspective of the current feasible basic solution.
Each column, except for the rightmost one, corresponds to a decision variable or a slack
variable of the LO-model. The objective value, when included in a simplex tableau, is
always multiplied by −1. The reason for this is that it simplifies the calculations when using
simplex tableaus.
It needs to be stressed that the above tableau refers to a permutation of an actual simplex
tableau. The columns of actual tableaus are always ordered according to the indices of the
variables, i.e., according to the ordering 1, . . . , n, n+1, . . . , n+m. On the other hand, the
basic variables corresponding to the rows of a simplex tableau are not necessarily ordered
in increasing order of their subscripts; see Example 3.4.1. For each j = 1, . . . , m, let
BIj ∈ BI be the index of the basic variable corresponding to row j . Note that, unlike in
Section 3.3.3, it is not necessarily true that BI1 < BI2 < . . . < BIm . To summarize, we
have the following correspondence between the rows and columns of a simplex tableau and
the variables of the LO-model:
I Each column is associated with either a decision variable or a slack variable. To be precise,
column i (∈ {1, . . . , n + m}) is associated with variable xi . This association remains
the same throughout the execution of the simplex algorithm.
1
Volker Strassen (born 1936) discovered an algorithm that computes the inverse of an (m, m)-matrix
2.808 2.373
in O(m ) running time; see Strassen (1969). This has recently been improved to O(m ) by Virginia
Vassilevska Williams; see Williams (2012). See also Chapter 9.
98 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
I Each row (except row 0) is associated with a current basic variable. To be precise, row j
(∈ {1, . . . , m}) is associated with xBIj . Due to the fact that the set BI changes during
the execution of the simplex algorithm, this association changes in each iteration step.
To avoid confusion about which column corresponds to which variable, and which row
corresponds to which basic variable, it is useful to write the name of the variable above
the corresponding column. We use arrows to mark which variables are the basic variables.
Moreover, since each row (except the top row) corresponds to a basic variable, we write
the name of each basic variable next to the corresponding row. The following example
illustrates this convention.
Example 3.4.1. In the case of Model Dovetail, when performing the simplex algorithm using
simplex tableaus, the following simplex tableau may occur (see Example 3.4.2). It corresponds to the
feasible basic solution represented by (M2).
x1 x2 x3 x4 x5 x6 −z
0 1 0 −1 0 0 −18
2
0 3
1 − 31 0 0 3 x3
1 1
1 3
0 3
0 0 6 x1
0 − 13 0 − 31 1 0 1 x5
0 1 0 0 0 1 6 x6
In this tableau, the first, third, fifth, and sixth column correspond to the basic variables x1 , x3 , x5 ,
and x6 , respectively, and the second and fourth column correspond to the nonbasic variables x2 and
x4 , respectively. The seventh, rightmost, column corresponds to the (negative of the) current objective
value and the current right hand side values. Note that the identity matrix in this tableau is in a
row-permuted form: the first and second rows are switched. The rows correspond to the basic variables
in the order x3 , x1 , x5 , and x6 . So, BI1 = 3, BI2 = 1, BI3 = 5, and BI4 = 6.
Several interesting facts about the current feasible basic solution can immediately be read off
a simplex tableau:
I The current objective value. This is the negative of the top-right entry of the tableau. In the
example above, it is −(−18) = 18.
I The current objective coefficients. These are the values in row 0. In the example above,
the current objective coefficients for the variables x1 , . . . , x6 are 0, 1, 0, −1, 0, 0, re-
spectively. The current objective coefficients of the basic variables are zero. Note also
−1
that the vector cNT T
I − cBI B N consists of the entries of the top row of the simplex
tableau that correspond to the nonbasic variables. So, in the example above, we have that
cTN I − cBI
T
B−1 N = 1 −1 .
I The current basic variables associated with the constraints. The basic variable corresponding to
the k ’th row is xBIk (k ∈ {1, . . . , m}). This fact can also be determined from the (row-
permuted) identity matrix in the tableau as follows. Observe that, for each basic variable
3 . 4 . S i m p l e x ta b l e au s 99
xBIk , the column with index BIk is equal to the unit vector ek (∈ Rm ). This means
that, for each constraint, the basic variable associated with it can be found by looking at
the occurrence of a 1 in the column corresponding to this basic variable. For instance,
suppose that, in the example above, we want to determine BI1 . We then need to look
for the column (corresponding to a basic variable) that is equal to e1 . This is the column
with index 3, so that BI1 = 3. Similarly, we can determine that BI2 = 1, BI3 = 5,
and BI4 = 6.
I The values of the current basic variables. In the example above, the first constraint in the
above simplex tableau reads 32 x2 + x3 − 31 x4 = 3. It corresponds to the basic variable
x3 . Setting the values of the nonbasic variables x2 and x4 to zero, we obtain x3 = 3.
Thus, the current value of the basic variable x3 is 3. Similarly, the current values of the
other basic variables satisfy: x1 = 6, x5 = 1, and x6 = 6. Thus, this simplex tableau
T
corresponds to the point 6 0 of the feasible region (which is vertex v1 ).
Observe also that during the execution of the simplex algorithm, the current basic solution is
always a feasible basic solution. This means that the current basis matrix B satisfies B−1 b ≥
0, which implies that the entries in the rightmost column, except for the top-right entry,
are always nonnegative.
If these three conditions are not met, then the coefficients need to be manipulated by means
of Gaussian elimination (see Section 3.4.2) to ensure that the first two conditions indeed hold.
This is particularly important when the big-M or two-phase procedure is used to find an
initial feasible basic solution; see Section 3.6. If, after ensuring that the first two conditions
are satisfied, the third condition does not hold, then the basic solution corresponding to
the chosen set of basic variables is infeasible, and therefore a different set of basic variables
should be chosen (if it exists).
I Determine a constraint that is the most restrictive for the value of the entering variable,
i.e., a constraint that attains the minimum in the minimum-ratio test. The basic variable
that is associated with this constraint is the variable that leaves the set of basic variables.
If no such constraint exists, then we may stop because the LO-model is unbounded.
In order to determine a nonbasic variable that enters the set of basic variables, the simplex
algorithm chooses a nonbasic variable with a positive current objective coefficient. In terms
of simplex tableaus, this is an almost trivial exercise: all that needs to be done is look for a
positive entry in row 0 of the simplex tableau. Recall that the current objective coefficient
of any basic variable is zero, so that it suffices to look for a positive entry among the objective
coefficients of the nonbasic variables. If such a positive entry is found, say in the column
with index N Iα with α ∈ {1, . . . , n}, then xN Iα is the variable that enters the set of basic
variables.
When the value of α has been determined, a basic variable that leaves the set of basic variables
needs to be determined. Recall that this is done by means of the minimum-ratio test, which
means determining a row index β such that:
( )
(B−1 b)β (B−1 b)j −1
= min (B N)j,α > 0, j = 1, . . . , m .
(B−1 N)β,α (B−1 N)j,α
The beauty of the simplex-tableau approach is that all these quantities can immediately be
read off the tableau. Indeed, the entries of B−1 b are the entries in the rightmost column
of the simplex tableau, and the entries of (B−1 N)?,α are the entries in the column with
index N Iα of the simplex tableau (both excluding row 0). Thus, the minimum-ratio test
can be performed directly from the simplex tableau, and the leaving basic variable xBIβ can
readily be determined.
What remains is to update the simplex tableau in such a way that we can perform the next
iteration step. This is done using Gaussian elimination (so, not by explicitly calculating the
matrix B−1 ). Let xN Iα be the variable that enters the current set of basic variables, and let
xBIβ be the variable that leaves this set. The (β, N Iα )’th entry of the simplex tableau is
called the pivot entry. Recall that a crucial property of a simplex tableau is that, for every basic
variable, it holds that the column corresponding to it contains exactly one nonzero entry,
which has value 1. To restore this property, observe that all columns corresponding to basic
variables before the pivot are already in the correct form. Only the column corresponding
to the variable xN Iα that enters the set of basic variables needs to be transformed using
Gaussian elimination. In particular, we consecutively perform the following elementary
row operations on the simplex tableau:
I Divide row β by the value of the (β, N Iα )’th entry of the simplex tableau (the pivot
entry).
I For each k = 0, . . . , m, k 6= β , subtract row β , multiplied by the (k, N Iα )’th entry,
from row k . Notice that this includes row 0.
3 . 4 . S i m p l e x ta b l e au s 101
This procedure guarantees that, in the new simplex tableau, the column corresponding
to variable xN Iα becomes the unit vector eβ . It is left to the reader to check that the
columns corresponding to the other basic variables have the same property, and that in fact
the columns of the simplex tableau corresponding to the basic variables form an identity
matrix (when ordered appropriately). Thus, the resulting tableau is a simplex tableau cor-
responding to the new feasible basic solution, and we can continue to the next iteration.
(Explain why the columns with indices in BI \ {BIβ }, which are unit vectors, remain
unchanged.)
We illustrate the tableau-version of the simplex algorithm in the following example.
Example 3.4.2. We calculate the optimal solution of Model Dovetail by means of simplex
tableaus.
I Initialization. At the initial feasible basic solution, we have that BI = {3, 4, 5, 6}, and
N I = {1, 2}. The corresponding initial simplex tableau reads as follows:
x1 x2 x3 x4 x5 x6 −z
3 2 0 0 0 0 0
1 1 1 0 0 0 9 x3
3 1 0 1 0 0 18 x4
1 0 0 0 1 0 7 x5
0 1 0 0 0 1 6 x6
The columns corresponding to the basic variables form the identity matrix, and the current objective
coefficients of the basic variables are 0. So this tableau is indeed a simplex tableau. The values of the
coefficients in the tableau correspond to the coefficients of (M1). The first row (row 0) of the tableau
states that the objective function is 3x1 + 2x3 , and the current objective value is (−0 =) 0. The
second row (row 1) states that x1 + x2 + x3 = 9, which is exactly the first constraint of (M1).
I Iteration 1. There are two nonbasic variables with a positive current objective coefficient, namely
x1 and x2 . We choose x1 as the variable to enter the set of basic variables, so α = 1 (because
N I1 = 1). We leave it to the reader to carry out the calculations when x2 is chosen. The
minimum-ratio test gives:
δ = min 91 , 18 , 7, ? .
3 1
The numerators in the above expression are found in the rightmost column of the current simplex
tableau, and the denominators in the column corresponding to xN Iα (= x1 ). We obtain that
δ = 18 3
and, hence, β = 2. So (xBI2 =) x4 is the leaving variable, and the pivot entry is the
entry of the simplex tableau in row 2 and the column 1.
102 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
x1 x2 x3 x4 x5 x6 −z
3 2 0 0 0 0 0
1 1 1 0 0 0 9 x3
3 1 0 1 0 0 18 x4
1 0 0 0 1 0 7 x5
0 1 0 0 0 1 6 x6
The marked entry is the pivot entry. We set BI := ({3, 4, 5, 6} \ {4}) ∪ {1} = {3, 1, 5, 6}
and N I := ({1, 2} \ {1}) ∪ {4} = {2, 4}, and obtain the following ‘tableau’. This tableau
is not a simplex tableau yet; we need to apply Gaussian elimination to obtain a simplex tableau.
We perform the following elementary row operations:
(a) Divide row 2 (corresponding to x4 ) by 3.
(b) Subtract 3 times the (new) third row from row 0.
(c) Subtract 1 times the (new) third row from row 1.
(d) Subtract 1 times the (new) third row from row 3.
(e) Leave row 4 as is.
I Iteration 2. The resulting new simplex tableau is:
x1 x2 x3 x4 x5 x6 −z
0 1 0 −1 0 0 −18
2
0 3
1 − 13 0 0 3 x3
1 1
1 3
0 3
0 0 6 x1
0 − 13 0 − 13 1 0 1 x5
0 1 0 0 0 1 6 x6
The feasible basic solution that corresponds to this simplex tableau satisfies x1 = 6, x2 = 0,
x3 = 3, x4 = 0, x5 = 1, x6 = 6, and the corresponding current objective value is 18. The
entry in row 0 corresponding to x2 is the only positive entry, so N Iα = 2 (hence, α = 1).
Therefore, (xN I1 =) x2 is the entering variable. The minimum-ratio test yields:
n o
3 6
, ?, 61 = min 4 12 , 18 , ?, 61 = 4 12 .
δ = min 2/3 , 1/3 1
Hence, β = 1, and therefore (xBIβ =) x3 is the leaving variable. The pivot entry is marked in
the simplex tableau above. After Gaussian elimination, we continue to the next iteration.
3 . 4 . S i m p l e x ta b l e au s 103
x1 x2 x3 x4 x5 x6 −z
0 0 −1 12 − 12 0 0 −22 21
0 1 1 12 − 12 0 0 4 21 x2
1 0 − 12 1
2
0 0 4 21 x1
1 1
0 0 2
−2 1 0 2 21 x5
0 0 −1 12 1
2
0 1 1 21 x6
This tableau corresponds to an optimal feasible basic solution, because all objective coefficients are
nonpositive. Hence, x∗1 = 4 21 , x∗2 = 4 21 , x∗3 = x∗4 = 0, x∗5 = 2 21 , x∗6 = 1 12 , and z ∗ = 22 12 .
and an initial feasible basic solution for the LO-model with slack variables
(see Section 3.6).
Output: Either an optimal solution of the model, or the message ‘the model is
unbounded’.
I Step 1: Initialization. Let BI = {BI1 , . . . , BIm } and N I = {N I1 , . . . , N In }
be the sets of the indices of the basic and nonbasic variables corresponding
to the initial feasible basic solution. Use Gaussian elimination to ensure
that the tableau is in the correct form, i.e., the objective coefficients cor-
responding to the basic variables are zero, and the columns corresponding
to the basic variables form a (row-permuted) identity matrix.
I Step 2: Optimality test; choosing an entering variable. The entries in row 0 with indices
in N I contain the current objective coefficients for the nonbasic variables
(i.e., they form the vector cTN I −cTBI B−1 N). If each entry is nonpositive,
then stop: an optimal solution has been reached. Otherwise, select a pos-
itive entry, say with index N Iα (α ∈ {1, . . . , n}). Then, xN Iα is the
entering variable. Go to Step 3.
I Step 3: Minimum-ratio test; choosing a leaving variable. If the entries of column N Iα
of the tableau are all nonpositive, then stop with the message ‘the model is
unbounded’. Otherwise, determine an index β ∈ {1, . . . , m} such that:
( )
(B−1 b)β (B−1 b)j −1
= min (B N)j,α > 0, j = 1, . . . , m .
(B−1 N)β,α (B−1 N)j,α
104 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
order of the subscripts of the corresponding basic variables. We therefore refer to such a
basis matrix as an ordered basis matrix. The actual simplex tableau can be found by arrang-
ing the columns of the basis matrix in the order BI1 , . . . , BIm . Recall that the indices
BI1 , . . . , BIm are not necessarily in increasing order in the corresponding simplex tableau.
Hence, the ordering of the columns is in general different from the ordering of the columns
in an ordered basis matrix, and we therefore refer to this matrix as an unordered basis matrix.
That is, an unordered basis matrix has the form:
h i
B0 = A Im ?,BI1 . . . A Im ?,BIm ,
where BI1 , . . . , BIm are not necessarily in increasing order. Row 1 through row m of the
simplex tableau are then given by:
0 −1
A Im (B0 )−1 b .
(B )
The reader may check that this matrix is equal to B−1 A Im B−1 b up to a row permu-
tation. Note that the ordering of the columns of the basis matrix only affects the ordering of
the rows of the simplex tableau; it does not affect the feasible basic solution corresponding
to it.
Example 3.4.3. Consider the simplex tableaus in iteration 2 of Example 3.4.2. For this simplex
tableau, we have that BI1 = 3, BI2 = 1, BI3 = 5, and BI4 = 6. The unordered basis matrix
corresponding to this tableau is:
x3 x1 x5 x6
1 1 0 0
0 3 0 0
B0 = .
0 1 1 0
0 0 0 1
Compare this to the (ordered) basis matrix B of Example 3.2.1. The reader may verify that:
x1 x2 x3 x4 x5 x6
0 23 1 − 13 0 0
3
1 13 0 13
0 0
6
(B0 )−1 A Im = 0 −1
, and (B ) b = 1 .
0 − 13 0 − 13 1 0
0 1 0 0 0 1 6
This is exactly the order in which the coefficients appear in the simplex tableau of iteration 2 of Example
3.4.2.
Recall that, in each iteration of the simplex algorithm, exactly one of the values of BI1 , . . .,
BIm (namely, BIβ ) is changed. In a computer implementation, it is therefore more efficient
to store BI as an (ordered) vector rather than an (unordered) set. When storing BI as a
vector, in each iteration, only the β ’th entry needs to be updated, i.e., we set BIβ := N Iα .
Similarly, it is more efficient to work with unordered basis matrices rather than the usual
106 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
ordered basis matrices. When working with ordered basis matrices, in each iteration, the
new basis matrix needs to be constructed from scratch by taking the columns of A Im
with indices in BI . In contrast, when working with unordered basis matrices, obtaining
the new basis matrix after pivoting is a matter of replacing the β ’th column of the current
unordered basis matrix. This is one of the key ideas behind the revised simplex algorithm,
to be discussed in Section 3.9.
v4 2,3, 1,2, v3
4,5 4,5
1,2, v2
N Iα =2
5,6
BIβ =6
3,4, 1,3, v1
0
5,6 N Iα =1 5,6
BIβ =4
Figure 3.5: Simplex adjacency graph corresponding to Model Dovetail. Vertex v2 is optimal.
x3
v3
v3 3, 5 1, 3
1, 2
v1
x1
2, 5 1, 5
0 0 4, 5 v2 1, 4
v1
v2 x2
Figure 3.6: The feasible region of model Figure 3.7: Simplex adjacency graph corresponding to
(3.10). model (3.10). Vertex v3 is optimal.
max x3 (3.10)
s.t. x1 + x2 + x3 ≤ 1
x1 ≤1
x1 , x2 , x3 ≥ 0.
The feasible region of the model is depicted in Figure 3.6. Notice that vertex v1 is degenerate. The
simplex adjacency graph for model (3.10) is depicted in Figure 3.7. The nodes with BI = {1, 2},
BI = {1, 3}, BI = {1, 4}, and BI = {1, 5} correspond to vertex v1 of the feasible region.
This vertex actually corresponds to four feasible bases index sets; each two of these index sets differ in
precisely one component. Let
T
H1 = x1 x2 x3 | x1 + x2 + x3 ≤ 1 ,
T
H2 = x1 x2 x3 | x1 ≤1 ,
T
H3 = x1 x2 x3 | −x1 ≤0 ,
T
H4 = x1 x2 x3 | −x2 ≤0 ,
T
H5 = x1 x2 x3 | −x3 ≤ 0 .
3. 5. D i sc u s si on of th e si m p le x alg orith m 109
x1 x2 x3 x4 x5 −z x1 x2 x3 x4 x5 −z
1 1 1 1 0 1 x4 1 1 1 1 0 1 x2
1 0 0 0 1 1 x5 1 0 0 0 1 1 x5
x1 x2 x3 x4 x5 −z x1 x2 x3 x4 x5 −z
1 1 1 1 0 1 x3 1 0 0 0 1 1 x1
1 0 0 0 1 1 x5 0 1 1 1 −1 0 x2
x1 x2 x3 x4 x5 −z x1 x2 x3 x4 x5 −z
1 0 0 0 1 1 x1 1 0 0 0 1 1 x1
0 1 1 1 −1 0 x3 0 1 1 1 −1 0 x4
x1 x2 x3 x4 x5 −z
{1, 5}: 0 0 1 0 0 0
1 1 1 1 0 1 x1
0 −1 −1 −1 1 1 x5
The different pivot steps between the above tableaus correspond to the arcs in the graph depicted in
Figure 3.7.
Note also that the feasible basic solutions corresponding to the index sets {1, 2}, {1, 3}, {1, 4}, and
{1, 5} are degenerate. This can be seen from the fact that in each of the corresponding simplex tableaus,
one of the right-hand side values in the tableau is 0.
110 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
Theorem 3.5.1.
Consider the LO-model max cT x Ax ≤ b, x ≥ 0 . Let Bbe a basis matrix in
x̂ x̂
A Im corresponding to the feasible basic solution x̂ ≡ x̂ BI with x̂BI =
s NI
−1 −1
B b ≥ 0 and x̂N I = 0. If cN I − cBI B N ≤ 0, then x̂ is an optimal vertex of
T T
the LO-model.
Proof. To prove the theorem, it suffices to prove two facts: (1) x̂ is a feasible solution of (3.2),
and (2) no feasible solution x0 of (3.2) satisfies cT x0 > cT x̂.
Fact (1) is easily checked by substituting xBI = B−1 b and xN I = 0 into (3.6). Fact (2) can
be seen as follows. Let x0 be any feasible solution of (3.2). Then, in particular x0BI ≥ 0 and
0
xN I ≥ 0. Hence, by (3.5), we have that:
T 0 T −1 T T −1 0 T −1 T
c x = cBI B b + (cN I − cBI B N)xN I ≤ cBI B b = c x̂,
which proves the theorem.
Theorem 3.5.1 states that a basic solution with respect to the basis matrix B is optimal if
and only if
B−1 b ≥ 0, and cTN I − cBI T
B−1 N ≤ 0.
The first condition is called the feasibility condition, and the second condition is called the
optimality condition.
3. 5. D i sc u s si on of th e si m p le x alg orith m 111
x2 v2
1,2,3
2
v1 2,3,5
x3 =0
? ?
v1 1,2,5 1,2,4
1 v2 x4 =0
v3 x5 =0
0 3,4,5 1,4,5 v3
0 1 2 x1
Figure 3.8: The feasible region of model (3.11). Figure 3.9: Simplex adjacency graph
corresponding to model (3.11). Vertex
v2 is optimal.
Recall that, in the case of degeneracy, there are multiple feasible basic solutions that corre-
spond to the same optimal vertex. The simplex algorithm looks for a feasible basic solution
for which the objective coefficients are nonpositive. A question is: is it possible that a feasible
basic solution corresponds to an optimal vertex, while one or more of the corresponding
objective coefficients are positive? If the simplex algorithm encounters such a feasible basic
solution, it will not conclude that it has found an optimal solution, and it will continue with
the next iteration instead. This is in fact a real possibility.
We say that the feasible basic solution corresponding to basis matrix B is an optimal feasible
−1
basic solution if cN
T T
I − cBI B N ≤ 0. The matrix B is then called an optimal basis matrix.
An optimal vertex may correspond to some optimal feasible basic solutions as well as some
nonoptimal feasible basic solutions. The following example shows that it may happen that
not all feasible basic solutions that correspond to an optimal vertex are also optimal feasible
basic solutions.
Example 3.5.2. Consider the following LO-model.
x1 x2 x3 x4 x5 −z x1 x2 x3 x4 x5 −z x1 x2 x3 x4 x5 −z
0 0 0 1 −3 −5 0 0 −1 0 −2 −5 0 0 −3 −2 0 −5
1 0 0 −1 1 1 1 0 1 0 0 1 1 0 1 0 0 1
0 1 0 1 0 1 0 1 −1 0 1 1 0 1 0 1 0 1
0 0 1 1 −1 0 0 0 1 1 −1 0 0 0 −1 −1 1 0
Informally, when looking at the feasible region from the perspective of the feasible basic solution with
BI = {1, 2, 3} (see Section 3.2), the lines x4 = 0 and x5 = 0, intersecting at v2 , become the
axes. During an iteration step, the simplex algorithm ‘tries’ to move away from v2 , along the line
T
x5 = 0, in the direction of the point 2 0 , because doing so seemingly increases the objective value
(because of the angle between the line x5 = 0 and the level lines of the objective function). However,
when doing so, the line x3 = 0 is immediately hit, and so in fact a new feasible basic solution is reached
without moving to a new vertex. Clearly, the feasible basic solution corresponding to BI = {1, 2, 3}
is not optimal.
It is left to the reader to draw the feasible region of (3.11) from the perspective of the feasible basic
solution with BI = {1, 2, 3}, i.e., the feasible region of the equivalent model
This discussion begs the following question. The simplex algorithm terminates when it
either finds an optimal feasible basic solution, or it determines that the LO-model has an
unbounded solution. If the LO-model has an optimal solution, how can we be sure that an
optimal feasible basic solution even exists? We know from Theorem 2.1.5 that if the given
LO-model has an optimal solution, then it has an optimal vertex. But could it happen that
none of the feasible basic solutions corresponding to some optimal vertex are optimal? If
that is the case, then it is conceivable that the simplex algorithm cycles between the various
feasible basic solutions corresponding to that optimal vertex, without ever terminating.
As we will see in Section 3.5.5, this ‘cycling’ is a real possibility. The good news, however,
is that there are several ways to circumvent this cycling behavior. Once this issue has been
settled, we will be able to deduce that, in fact, if a standard LO-model has an optimal vertex,
then there is at least one optimal feasible basic solution.
3.5.3 Unboundedness
In step 3, the simplex algorithm (see Section 3.3) tries to determine a leaving variable xBIβ .
The algorithm terminates with the message ‘the model is unbounded’ if such xBIβ cannot
be determined. The justification for this conclusion is the following theorem.
3. 5. D i sc u s si on of th e si m p le x alg orith m 113
−1
Assume that there exists an index α ∈ {1, . . . , n} that satisfies (cN
T T
I −cBI B N)α >
−1
0 and (B N)?,α ≤ 0. Then, the halfline L defined by:
−1
−(B−1 N)?,α
n x xBI B b
L= x∈R ≡ = +λ ,λ ≥ 0
xs xN I 0 eα
is contained in the feasible region and the model is unbounded. In the above expression,
0 ∈ Rn and eα is the α’th unit vector in Rn .
Proof of Theorem
3.5.2.
Let α be as in the statement of the theorem. For every λ ≥ 0,
xBI (λ)
define x(λ) ≡ x (λ) by xBI (λ) = B−1 b − λ(B−1 N)?,α and xN I (λ) = λeα . Note that
NI
(xN I (λ))α = (λeα )α implies that xN Iα (λ) = λ.
Let λ ≥ 0. Since xBI = B−1 b ≥ 0 and (B−1 N)?,α ≤ 0, it follows that xBI (λ) ≥ 0 and
xN I (λ) ≥ 0. In order to show that x(λ) lies in the feasible region, we additionally need to
prove that BxBI (λ) + NxN I (λ) = b. This equation holds because
−1 −1 −1 −1 −1
xBI (λ) + B NxN I (λ) = B b − λ(B N)?,α + (B N)λeα = B b.
Hence, L lies in the feasible region. To see that the model is unbounded, note that
T T T T −1 T −1 T
c x(λ) = cBI xBI (λ) + cN I xN I (λ) = cBI B b − λcBI (B N)?,α + λcN I eα
T −1 T T −1
= cBI B b + λ(cN I − cBI B N)α .
T T −1
Thus, since (cN I −cBI B N)α > 0, the objective value grows unboundedly along the halfline
L.
max x2
s.t. x1 − x2 ≤ 1
−x1 + 2x2 ≤ 1
x1 , x2 ≥ 0.
After one iteration of the simplex algorithm, the following simplex tableau occurs:
114 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
x1 x2 x3 x4
3
2
0 0 − 12 − 21
− 12 0 1 1
2
3
2
x3
− 32 1 0 1
2
1
2
x2
The current objective coefficient of the nonbasic variable x1 is positive, so α = 1 (N I1 = 1). The
minimum-ratio test does not give a value for β , because the entries of the column corresponding to x1
are all negative. By Theorem 3.5.2, the line
3 1
x3
2 2
1 3
4 x BI
x 2
L= x∈R x≡ = = + λ , λ ≥ 0
2 2
xN I x1 0 1
x4 0 0
lies in the feasible region, and the objective function takes on arbitrarily large values on L. In terms of
the model without slack variables, this is the line
x 0 1
x ∈ R2 x = 1 = 1 + λ 3 , λ ≥ 0 .
x2 2 2
positive current objective coefficient may seem a good idea, because it allows the algorithm
to make the most ‘progress’ towards an optimal solution. However, when this rule is adopted,
the simplex algorithm does not necessarily take the shortest path (in terms of the number
of iterations) to the optimal vertex. In fact, we will see in Section 3.5.5 that, in the case of
degeneracy, the pivot rule of choosing the most positive current objective coefficient may
lead to cycling. Fortunately, there exists a pivot rule that avoids cycling; see Section 3.5.7.
(b) If there is a tie in the minimum-ratio test, then the ‘minimum-ratio row’ with the smallest index
is chosen.
Introducing the slack variables x5 and x6 , the model becomes:
x1 x2 x3 x4 x5 x6 −z
10 −57 −9 −24 0 0 0
1
2
−5 21 −2 21 9 1 0 0 x5
1
2
−1 21 − 21 1 0 1 0 x6
x1 x2 x3 x4 x5 x6 −z
0 53 41 −204 −20 0 0
1 −11 −5 18 2 0 0 x1
0 4 2 −8 −1 1 0 x6
x1 x2 x3 x4 x5 x6 −z
0 0 14 21 −98 −6 43 −13 14 0
1
1 0 2
−4 − 43 2 34 0 x1
1
0 1 2
−2 − 41 1
4
0 x2
x1 x2 x3 x4 x5 x6 −z
−29 0 0 18 15 −93 0
2 0 1 −8 −1 12 5 12 0 x3
1
−1 1 0 2 2
−2 12 0 x2
3. 5. D i sc u s si on of th e si m p le x alg orith m 117
x1 x2 x3 x4 x5 x6 −z
−20 −9 0 0 10 21 −70 21 0
1
−2 4 1 0 2
−4 12 0 x3
− 21 1
2
0 1 1
4
−1 14 0 x4
x1 x2 x3 x4 x5 x6 −z
22 −93 −21 0 0 24 0
−4 8 2 0 1 −9 0 x5
1
2
−1 12 − 12 1 0 1 0 x4
x1 x2 x3 x4 x5 x6 −z
10 −57 −9 −24 0 0 0
1
2
−5 21 −2 21 9 1 0 0 x5
1
2
−1 21 − 21 1 0 1 0 x6
Thus, after six iterations, we see the initial tableau again, so that the simplex algorithm has made a
cycle.
Figure 3.10 shows the simplex adjacency graph of the above LO-model. The graph has fifteen nodes,
each labeled with a set BI of basic variable indices. These fifteen nodes represent all feasible basic
solutions. They were calculated by simply checking all ( 62 =) 15 possible combinations of indices.
The cycle corresponding to the above simplex tableaus is depicted with thick arcs in Figure 3.10.
The LO-model of the example above has an unbounded solution. However, the fact that
the simplex algorithm makes a cycle has nothing to do with this fact. In Exercise 3.10.4,
the reader is asked to show that when adding the constraint x1 ≤ 0 to the model, the
simplex algorithm still makes a cycle (when using the same pivot rules), although the model
has a (finite) optimal solution. Also note that, in the LO-model of the example above, the
simplex algorithm cycles in a nonoptimal vertex of the feasible region. It may happen that
the algorithm cycles through feasible basic solutions that correspond to an optimal vertex, so
that in fact an optimal solution has been found, but the simplex algorithm fails to conclude
that the solution is in fact optimal; see Exercise 3.10.4.
We have seen that cycling is caused by the fact that feasible basic solutions may be degenerate.
However, not all degenerate vertices cause the simplex algorithm to cycle; see Exercise
3.10.2.
118 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
3,6
1,3 2,6
1,2
1,6 2,3
4,6 3,5
5,6 3,4
1,5 2,4
Figure 3.10: Simplex adjacency graph of Chvátal’s example. The cycle corresponding to the simplex
tableaus is indicated with thick arcs.
max c1 x1 + . . . + cn xn
s.t. a11 x1 + . . . + a1n xn ≤ b1 + ε1
s.t. a21 x1 + . . . + a2n xn ≤ b2 + ε2
.. .. .. (3.12)
. . .
am1 x1 + . . . + amn xn ≤ bm + εm
x1 , . . . , xn ≥ 0,
or, in matrix notation:
max cT x Ax ≤ b + ,
T
where = ε1 . . . εm . Informally, this causes the hyperplanes defined by the constraints
to ‘move’ by a little bit, not enough to significantly change the feasible region of the model,
but just enough to make sure that no more than n hyperplanes intersect in the same point.
3. 5. D i sc u s si on of th e si m p le x alg orith m 119
Theorem 3.5.3.
For a small enough value of ε > 0, the perturbed model (3.12) has no degenerate
vertices.
Proof of Theorem 3.5.3. Before we prove the theorem, we prove the technical statement (?)
below.
2 m
(?) Let f (ε) = a0 + a1 ε + a2 ε + . . . + am ε , where a0 , a1 , . . . , am are real numbers that are
not all equal to zero. There exists δ > 0 such that f (ε) 6= 0 for all ε ∈ (0, δ).
We may now continue the proof of the theorem. Consider any basic solution (feasible or
infeasible), and let B be the corresponding basis matrix. Take any k ∈ {1, . . . , m}. Define:
−1 −1 −1 −1
f (ε) = (B )k,? (b + ) = (B )k,? b + (B )k,1 ε1 + . . . + (B )k,m εm . (3.13)
−1 −1
Since B is nonsingular, it follows that (B )k,? is not the all-zero vector. It follows from
−1
(?) applied to (3.13) that there exists δ > 0 such that (B )k,? (b + ) 6= 0 for all ε ∈ (0, δ).
This argument gives a strictly positive value of δ for every choice of the basis matrix B and
every choice of k ∈ {1, . . . , m}. Since there are only finitely many choices of B and k, we may
choose the smallest value of δ , say δ ∗ . Note that δ ∗ > 0. It follows that (B−1 )k (b + ) 6= 0
for every basis matrix B and for every ε ∈ (0, δ ∗ ).
10 −57 −9 −24 0 0 0
1
2
−5 21 −2 21 9 1 0 ε x5
1
2
−1 21 − 12 1 0 1 ε2 x6
I Iteration 1. The variable x1 is the only candidate for entering the set of basic variables. Since
2ε2 2ε, it follows that x6 has to leave the set of basic variables. Hence, BI := {1, 5}.
120 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
x1 x2 x3 x4 x5 x6 −z
0 −4 −2 8 1 −1 ε−ε2 x5
1 −3 −1 2 0 2 2ε2 x1
Note that applying the perturbation method may render feasible basic solutions of the orig-
inal LO-model infeasible. In fact, if improperly applied, the perturbation method may turn
a feasible LO-model into an infeasible one. In Exercise 3.10.5, the reader is asked to explore
such situations.
2
Named after the American mathematician and operations researcher Robert G. Bland (born 1948).
3. 5. D i sc u s si on of th e si m p le x alg orith m 121
x1 x2 x3 x4 x5 x6 −z x1 x2 x3 x4 x5 x6 −z
10 −57 −9 −24 0 0 0 0 53 41 −204 −20 0 0
1
2
−5 12 −2 12 9 1 0 0 x5 1 −11 −5 18 2 0 0 x1
1
2
−1 21 − 12 1 0 1 0 x6 0 4 2 −8 −1 1 0 x6
x1 x2 x3 x4 x5 x6 −z x1 x2 x3 x4 x5 x6 −z
0 0 14 12 −98 −6 34 −13 41 0 −29 0 0 18 15 −93 0
1
1 0 2
−4 − 34 2 43 0 x1 2 0 1 −8 −1 12 5 21 0 x3
1
0 1 2
−2 − 41 1
4
0 x2 −1 1 0 2 1
2
−2 21 0 x2
x1 x2 x3 x4 x5 x6 −z x1 x2 x3 x4 x5 x6 −z
−20 −9 0 0 10 12 −70 21 0 22 −93 −21 0 0 24 0
1
−2 4 1 0 2
−4 21 0 x3 −4 8 2 0 1 −9 0 x5
1
− 12 1
2
0 1 1
4
−1 41 0 x4 2
−1 12 − 12 1 0 1 0 x4
x1 x2 x3 x4 x5 x6 −z
0 −27 1 −44 0 −20 0
0 −4 −2 8 1 −1 0 x5
1 −3 −1 2 0 2 0 x1
At this point, we conclude that no cycling has occurred, and that the model is unbounded.
initial feasible basic solution. Then, the simplex algorithm (Algorithm 3.3.1), with
an anti-cycling procedure, terminates in finitely many steps and it correctly returns
an optimal feasible basic solution, or determines that the model has an unbounded
solution.
Proof. We showed in Section 3.3.2 that if we start with a feasible basic solution in step 1 of
Algorithm 3.3.1, then the iteration step either terminates, or it produces a new feasible basic
solution. Because an anti-cycling procedure is used, the simplex algorithm does not encounter
the same feasible basic solution twice. Since there are at most m+n feasible basic solutions,
m
this means that the simplex algorithm takes a finite number of iteration steps. Therefore, the
algorithm terminates either in step 2 with an optimal solution, or in step 3 with the message
122 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
T
‘the model is unbounded’. In the former case, the current feasible basic solution satisfies cN I−
T −1
cBI B N ≤ 0, and hence it follows from Theorem 3.5.1 that the current feasible basic solution
T
is indeed optimal. In the latter case, an index α ∈ {1, . . . , n} is found that satisfies (cN I −
T −1 −1
cBI B N)α > 0 and (B N)?,α ≤ 0; hence, it follows from Theorem 3.5.2 that the model
is indeed unbounded.
Although Theorem 3.5.4 states that the number of steps taken by the simplex algorithm
is finite, this does not rule out the possibility that the number of steps may be enormous
(exponential). In fact, there are (pathological) LO-models for which the simplex algorithm
goes through all (exponentially many) feasible basic solutions before reaching an optimal
solution; see Chapter 9. It is still an open question whether there exists a pivot rule such
that the simplex algorithm is guaranteed to terminate in a reasonable number of steps for all
possible inputs. On the other hand, despite this theoretical problem, the simplex algorithm
is extremely efficient in practice.
Now that Theorem 3.5.4 has been proved, we can address the question whether there even
exists a feasible basic solution corresponding to an optimal vertex. The answer is affirmative.
Theorem 3.5.5.
Consider a standard LO-model and suppose that it is feasible and bounded. Then there
exists an optimal feasible basic solution.
Proof. Suppose the LO-model has an optimal solution. Apply the simplex algorithm to the
model, starting from any initial feasible basic solution. Because the LO-model does not have an
unbounded solution, it follows from Theorem 3.5.4 that the algorithm returns, in finitely many
iterations, an optimal feasible basic solution. Therefore, the model has an optimal feasible basic
solution.
3.6 Initialization
In order to start the simplex algorithm, an initial feasible basic solution is needed. Sometimes,
the all-zero solution can be taken. If an initial feasible basic solution is not immediately
apparent, there are several possibilities to find one. In Section 1.2.1, we have formulated
Model Dovetail∗ for which 0 does not lie in the feasible region. The reason is that this
model contains the constraint x1 + x2 ≥ 5, which excludes 0 from the feasible region.
Indeed, for Model Dovetail∗ , the choice BI = {3, 4, 5, 6, 7} (i.e., the set of indices of the
T
slack variables) corresponds to an infeasible basic solution, namely 0 0 9 18 7 6 −5 .
This solution cannot serve as an initial feasible basic solution for the simplex algorithm.
We will discuss the so-called big-M procedure and the two-phase procedure for determining an
initial feasible basic solution. If all constraints of the primal LO-model are ‘≥’ constraints
and all right hand side values are positive, then the all-zero vector is certainly not feasible.
3 . 6 . I n i t i a l i z at i o n 123
In this case it is sometimes profitable to apply the simplex algorithm to the so-called dual
model; see Section 4.6.
or of the form
ai1 x1 + · · · + ain xn ≤ bi with bi < 0
exclude the all-zero vector as start feasible basic solution for the simplex algorithm. When
applying the big-M procedure, each such constraint i is augmented, together with its slack
variable, with a so-called artificial variable ui , and the objective function is augmented with
−M ui , where M is a big positive real number. For big values of M (‘big’ in relation to the
original input data), the simplex algorithm will put highest priority on making the value
of the factor M ui as small as possible, thereby setting the value of ui equal to zero. We
illustrate the procedure by means of the following example.
Example 3.6.1. Consider the model:
infeasible basic solutions to feasible basic solutions, and arrives – if it exists – at an optimal feasible
basic solution. To reduce the calculations, as soon as an artificial variable leaves the set of basic variables
(and therefore becomes zero), the corresponding column may be deleted. The reader can check that the
optimal solution of (3.15) satisfies x∗1 = 7 31 , x∗2 = 4 23 , u∗1 = u∗2 = 0, with optimal objective value
z ∗ = 19 31 . The reader should also check that, because u∗1 = u∗2 = 0, the optimal solution of (3.14)
is obtained by discarding the values of u1∗ and u2∗ . That is, it satisfies x∗1 = 7 31 , x∗2 = 4 23 , with
optimal objective value z ∗ = 19 31 .
If the simplex algorithm terminates with at least one artificial variable in the optimal ‘solu-
tion’, then either the original LO-model is infeasible, or the value of M was not chosen
large enough. The following two examples show what happens in the case of infeasibility
of the original model, and in the case when the value of M is not chosen large enough.
Example 3.6.2. Consider again model (3.14), but with the additional constraint x1 ≤ 3. Let
M = 10. The optimal solution of this model is x∗1 = 3, x3∗ = 2 21 , u∗1 = 12 , and u∗2 = 0. The
optimal value of u1∗ is nonzero, which is due to the fact that the LO-model is infeasible.
The following example shows that, from the fact that some artifical variable has a nonzero
optimal value, one cannot immediately draw the conclusion that the original LO-model
was infeasible.
Example 3.6.3. Consider again model (3.15), but let M = 10 1
. The optimal solution of
(3.15) then is x1 = 12, x3 = 20, u2 = 14, and x2 = x4 = x5∗ = u1∗ = 0. Clearly,
∗ ∗ ∗ ∗ ∗
∗ ∗ T T
x1 x2 = 12 0 is not a feasible solution of (3.14) because the constraint −x1 + 2x2 ≥ 2 is
violated.
This example illustrates the fact that, in order to use the big-M procedure, one should
choose a large enough value for M . However, choosing too large a value of M may lead
to computational errors. Computers work with so-called floating-point numbers, which store
only a finite number of digits (to be precise: bits) of a number. Because of this finite precision,
a computer rounds off the results of any intermediate calculations. It may happen that adding
a small number k to a large number M results in the same number M , which may cause
the final result to be inaccurate or plainly incorrect. For example, when using so-called
single-precision floating-point numbers, only between 6 and 9 significant digits (depending
on the exact value of the number) are stored. For instance, if M = 108 , then adding any
number k with |k| ≤ 4 to M results in M again. The following example shows how this
phenomenon may have a disastrous effect on the calculations of the simplex algorithm.
Example 3.6.4. Consider again Model Dovetail. We choose M = 108 and use single-precision
floating point numbers. It turns out that, given these particular choices, M + 2 and M + 3 both
result in M . This means that the initial simplex tableau after Gaussian elimination becomes:
3 . 6 . I n i t i a l i z at i o n 125
x1 x2 x3 x4 x5 x6 x7 u1 −z
M M 0 0 0 0 0 −M 5M
1 1 1 0 0 0 0 0 9
3 1 0 1 0 0 0 0 18
1 0 0 0 1 0 0 0 7
0 1 0 0 0 1 0 0 6
1 1 0 0 0 0 −1 1 5
Similarly, 5M − 15 results in 5M . Hence, performing one pivot step leads to the tableau:
x1 x2 x3 x4 x5 x6 x7 u1 −z
0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 1 −1 4
0 −2 0 1 0 0 3 −3 3
0 −1 0 0 1 0 1 −1 2
0 1 0 0 0 1 0 0 6
1 1 0 0 0 0 −1 1 5
This ‘simplex tableau’ states that we have reached an optimal solution. The corresponding solution
satisfies x1 = 5, x2 = 0. Since we know that the unique optimal solution of Model Dovetail satisfies
x1∗ = x2∗ = 4 12 , the solution we find is clearly not optimal. Moreover, the current objective value, as
given in the top-right entry of the tableau, is 0. But the actual objective value for x1 = 5, x2 = 0 is
15. So, not only have we ended up with a wrong ‘optimal vertex’, the reported corresponding objective
value is wrong as well.
A better way to apply the big-M procedure is to treat M as a large but unspecified number.
This means that the symbol M remains in the simplex tableaus. Whenever two terms need to
be compared, M is considered as a number that is larger than any number that is encountered
while performing the calculations. So, for example, 2M + 3 > 1.999M + 10000 >
1.999M . We illustrate this procedure by applying it to Model Dovetail∗ ; see Section 1.2.1.
Example 3.6.5. Consider Model Dovetail∗ . Besides the slack variables, an artificial variable u1
is added to the model with a large negative objective coefficient −M . So, we obtain the model:
Applying the simplex algorithm to this model, the following iterations occur.
I Initialization. BI = {3, 4, 5, 6, 8}, N I = {1, 2, 7}. We start by putting the coefficients
into the tableau:
x1 x2 x3 x4 x5 x6 x7 u1 −z
3 2 0 0 0 0 0 −M 0
1 1 1 0 0 0 0 0 9
3 1 0 1 0 0 0 0 18
1 0 0 0 1 0 0 0 7
0 1 0 0 0 1 0 0 6
1 1 0 0 0 0 −1 1 5
Recall that in a simplex tableau the objective coefficients corresponding to the basic variables have to
be zero. So we need to apply Gaussian elimination to obtain a simplex tableau. To do so, we add
M times the fifth row to the top row:
x1 x2 x3 x4 x5 x6 x7 u1 −z
3+M 2+M 0 0 0 0 −M 0 5M
1 1 1 0 0 0 0 0 9
3 1 0 1 0 0 0 0 18
1 0 0 0 1 0 0 0 7
0 1 0 0 0 1 0 0 6
1 1 0 0 0 0 −1 1 5
x1 x2 x3 x4 x5 x6 x7 u1 −z
0 −1 0 0 0 0 3 −(3+M ) −15
0 0 1 0 0 0 1 −1 4
0 −2 0 1 0 0 3 −3 3
0 −1 0 0 1 0 1 −1 2
0 1 0 0 0 1 0 0 6
1 1 0 0 0 0 −1 1 5
Since u1 = 0, we have arrived at a feasible basic solution of Model Dovetail∗ . We now delete the
column corresponding to u1 .
3 . 6 . I n i t i a l i z at i o n 127
0 1 0 −1 0 0 0 −18
2
0 3
1 − 31 0 0 0 3
0 − 32 0 1
3
0 0 1 1
0 − 13 0 − 31 1 0 0 1
0 1 0 0 0 1 0 6
1 1
1 3
0 3
0 0 0 6
0 0 − 32 − 12 0 0 0 −22 21
3
0 1 2
− 12 0 0 0 9
2
0 0 1 0 0 0 1 4
1
0 0 2
− 21 1 0 0 5
2
0 0 − 23 1
2
0 1 0 3
2
1 0 − 21 1
2
0 0 0 9
2
The simplex algorithm has followed the path 0v6 v1 v2 in Figure 1.5.
max −u1 − u2
s.t. 2x1 − x2 − x3 + u1 = 4
−x1 + 2x2 − x4 + u2 = 2
x1 + x2 + x5 = 12
x1 , x2 , x3 , x4 , x5 , u1 , u2 ≥ 0.
An initial feasible basic solution for the simplex algorithm has x5 , u1 , u2 as the basic variables. It
is left to the reader to carry out the different steps that lead to an optimal solution of this model. It
turns out that, in an optimal solution of this model, it holds that u1 = u2 = 0. Deleting u1 and
u2 from this solution results in a feasible basic solution of the original model.
I Phase 2. The original objective function z = 2x1 +x2 is now taken into account, and the simplex
algorithm is carried out with the optimal feasible basic solution from Phase 1 as the initial feasible
basic solution.
If the optimal solution in Phase 1 contains a nonzero artificial variable ui , then the original
model is infeasible. In case all ui ’s are zero in the optimal solution from Phase 1, but one
of the ui ’s is still in the set of basic variables, then a pivot step can be performed that drives
this artificial variable out of the set of basic variables. It is left to the reader to check that
this does not change the objective value.
Let us now look at how the two-phase procedure works when it is applied to Model
Dovetail∗ .
Example 3.6.7. Consider again Model Dovetail∗ .
I Phase 1. Only one artificial variable u1 has to be introduced. We then obtain the model:
max −u1
s.t. x1 + x2 + x3 = 9
3x1 + x2 + x4 = 18
x1 + x5 = 7
x2 + x6 = 6
x1 + x2 − x7 + u1 = 5
x1 , x2 , x3 , x4 , x5 , x6 , x7 , u1 ≥ 0.
Recall that the objective coefficients of the basic variables in simplex tableaus need to be zero.
I Initialization. BI = {3, 4, 5, 6, 8}, N I = {1, 2, 7}.
3 . 6 . I n i t i a l i z at i o n 129
x1 x2 x3 x4 x5 x6 x7 u1 −z
1 1 0 0 0 0 −1 0 5
1 1 1 0 0 0 0 0 9
3 1 0 1 0 0 0 0 18
1 0 0 0 1 0 0 0 7
0 1 0 0 0 1 0 0 6
1 1 0 0 0 0 −1 1 5
We can now bring either x1 or x2 into the set of basic variables. Suppose that we select x2 . Then,
u1 leaves the set of basic variables.
I Iteration 1. BI = {2, 3, 4, 5, 6}, N I = {1, 7, 8}.
x1 x2 x3 x4 x5 x6 x7 u1 −z
0 0 0 0 0 0 0 −1 0
0 0 1 0 0 0 1 −1 4
2 0 0 1 0 0 1 −1 13
1 0 0 0 1 0 0 0 7
−1 0 0 0 0 1 1 −1 1
1 1 0 0 0 0 −1 1 5
x1 x2 x3 x4 x5 x6 x7 −z
1 0 0 0 0 0 2 −10
0 0 1 0 0 0 1 4
2 0 0 1 0 0 1 13
1 0 0 0 1 0 0 7
−1 0 0 0 0 1 1 1
1 1 0 0 0 0 −1 5
x1 x2 x3 x4 x5 x6 x7 −z
3 0 0 0 0 −2 0 −12
1 0 1 0 0 −1 0 3
3 0 0 1 0 −1 0 12
1 0 0 0 1 0 0 7
−1 0 0 0 0 1 1 1
0 1 0 0 0 1 0 6
x1 x2 x3 x4 x5 x6 x7 −z
0 0 −3 0 0 1 0 −21
1 0 1 0 0 −1 0 3
0 0 −3 1 0 2 0 3
0 0 −1 0 1 1 0 4
0 0 1 0 0 0 1 4
0 1 0 0 0 1 0 6
x1 x2 x3 x4 x5 x6 x7 −z
0 0 − 23 − 12 0 0 0 −22 12
1 0 − 21 1
2
0 0 0 9
2
0 0 − 23 1
2
0 1 0 3
2
1
0 0 2
− 12 1 0 0 5
2
0 0 1 0 0 0 1 4
3
0 1 2
− 12 0 0 0 9
2
Note that this final result is the same as the final tableau of the big-M procedure.
When comparing the big-M procedure with the two-phase procedure, it turns out that, in
practice, they both require roughly the same amount of computing time. However, the paths
from the infeasible 0 to the feasible region can be different for the two methods. For example,
in the case of Model Dovetail∗ , the big-M procedure follows the path 0v6 v1 v2 , while
the two-phase procedure follows the path 0v5 v4 v3 v2 . When we choose to bring
x1 into the set of basic variables in Phase 1 (Iteration 1), the two methods follow the same
path.
3 . 7. U n i q u e n e s s a n d m u l t i p l e o p t i m a l s o lu t i o n s 131
x2
v4 v3
v2
v1
0 x1
Figure 3.11: Multiple optimal solutions. All points on the thick line are optimal.
Every convex combination of x1∗ , . . . , xk∗ is also an optimal solution of the model.
132 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
Proof. Let x∗1 , . . . , x∗k be optimal solutions and let z ∗ be the corresponding optimal objective
value. Let x0 be a convex combination of x1∗ , . . . , x∗k . This means that there exist λ1 , . . . , λk ,
with λi ≥ 0 (i = 1, . . . , k) and ki=1 λk = 1, such that x0 = ki=1 λi x∗i . Together with the
P P
A consequence of Theorem 3.7.1 is that every LO-model has either no solution (infeasible),
one solution (unique solution), or infinitely many solutions (multiple optimal solutions).
Multiple solutions can be detected using the simplex algorithm as follows. Recall that
when the simplex algorithm has found an optimal feasible basic solution of a maximizing
LO-model, then all current objective coefficients are nonpositive. If the current objective
coefficient corresponding to a nonbasic variable xi is strictly negative, then increasing the
value of xi leads to a decrease of the objective value, and hence this cannot lead to another
optimal solution. If, however, the current objective coefficient corresponding to a nonbasic
variable xi is zero, then increasing the value of xi (provided that this possible) does not
change the objective value and hence leads to another optimal solution. The following
example illustrates this.
Example 3.7.1. Introduce the slack variables x3 , x4 , x5 . After one simplex iteration, model
(3.16) changes into:
max −x4 + 18
2 1
s.t. x
3 2
+ x3 − x
3 4
=3
1 1
x1 + x
3 2
+ x
3 4
=6
x2 + x5 =6
x1 , x2 , x3 , x4 , x5 ≥ 0.
The current basic variables are x1 , x3 , x5 , and the nonbasic variables are x2 and x4 . However, only
the nonbasic variable x4 has a nonzero objective coefficient. The other nonbasic variable x2 has zero
objective coefficient. Taking x2 = x4 = 0, we find that x1 = 6, x3 = 3, and x5 = 6. This
corresponds to vertex v1 of the feasible region. The optimal objective value is z ∗ = 18. Since x2 is
not in the current objective function, we do not need to take x2 = 0. In fact, x2 can take on any
3 . 7. U n i q u e n e s s a n d m u l t i p l e o p t i m a l s o lu t i o n s 133
x3 = 3 − 23 x2 ≥ 0
x1 = 6 − 31 x2 ≥ 0
x5 = 6 − x2 ≥ 0.
0 ≤ x2 ≤ 4 21 .
It may also happen that the set of optimal solutions is unbounded as well. Of course, this is
only possible if the feasible region is unbounded. See also Section 4.4.
Example 3.7.2. Consider the LO-model:
max x1 − 2x2
1
s.t. x
2 1
− x2 ≤ 2
−3x1 + x2 ≤ 3
x1 , x2 ≥ 0.
Figure 3.12 shows the feasible region and a level line of the objective function of this model. Introducing
the slack variables x3 and x4 and applying the simplex algorithm, one can easily verify that the optimal
simplex tableau becomes:
x1 x2 x3 x4 −z
0 0 −2 0 −4
1 −2 2 0 4
0 −5 6 1 15
134 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
x2
0 x1
max −2x3 +4
s.t. x1 − 2x2 + 2x3 = 4
−5x2 + 6x3 + x4 = 15
x1 , x2 , x3 , x4 ≥ 0.
The optimal solution satisfies: x∗1 = 4, x4∗ = 15, x∗2 = x∗3 = 0, and z ∗ = 4. Note that all
entries in the column corresponding to x2 of the optimal simplex tableau are negative, and that the
objective coefficient of x2 is zero. Therefore, instead of taking x2 = 0, we can take x2 = λ ≥ 0,
while staying feasible and without changing the optimal objective value z ∗ = 4. For x2 = λ ≥ 0,
and x3 = 0, we find that x1 = 4 + 2λ and x4 = 15 + 5λ. Hence, all vectors
x1 4 2
= +λ with λ ≥ 0
x2 0 1
T T
are optimal. These vectors form a halfline that starts at 4 0 and that has direction vector 2 1 .
T
On the other hand, the simplex tableau corresponding to the vertex 0 3 , i.e., to BI = {2, 3},
reads:
x1 x2 x3 x4 −z
−5 0 0 2 6
−2 21 0 1 1 5
−3 1 0 1 3
3 . 7. U n i q u e n e s s a n d m u l t i p l e o p t i m a l s o lu t i o n s 135
where we use the convention that min ∅ = ∞. Then, every point in the set
−1
−(B−1 N)?,α
n x xBI B b
L= x∈R ≡ = +λ ,0 ≤ λ ≤ δ
xs xN I 0 eα
is an optimal solution of the model. In the above expression, 0 ∈ Rn and eα is the
α’th unit vector in Rn .
It is left to the reader to compare this theorem with the results of Example 3.7.2.
Note that, in the case of degeneracy, the value of δ in the statement of Theorem 3.7.2 may
be zero. In that case, the set L contains just a single point. This does not imply that the
LO-model has a unique solution; see Exercise 3.10.13. If, however, all current objective
coefficients of the nonbasic variables are strictly negative, then the LO-model has a unique
solution.
136 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
Note that an LO-model may have a unique optimal feasible basic solution, while having mul-
tiple optimal points. The reader is asked to construct such a model in Exercise 3.10.14.
x̂ x̂
h i h i
Proof of Theorem 3.7.3. Let x̂ ≡ x̂ BI with x̂BI = B−1 b ≥ 0 and x̂N I = 0 be
s NI
the feasible basic solution with respect to B, and let ẑ be the corresponding objective value.
T T −1
Consider the model formulation (3.6). Since (cN I − cBI B N)j < 0 for all j ∈ {1, . . . , n},
it follows that any solution with xN I =
6 0 has a corresponding objective value that is strictly
smaller
∗ than ẑ . Hence, ẑ is the optimal objective value of the model, and any optimal point
xBI ∗ ∗ −1
∗ satisfies xN I = 0. Therefore, the constraints of (3.6) imply that xBI = B b. Thus,
xN I
x̂
h i
∗ ∗
every optimal point of (3.6) satisfies xBI = x̂BI and xN I = x̂N I . In other words, x̂ BI is
NI
the unique optimal point of (3.6), and hence x̂ is the unique optimal solution of the standard
LO-model.
See also (iii) in Section 1.3. Before we explain how this model may be solved using the
simplex algorithm, recall that, in Section 1.3, we described the following transformation
from a model of the form (3.17) to a standard LO-model:
T T
A b
max c x Ax = b, x ≥ 0 = max c x x≤ ,x ≥ 0 .
−A −b
The simplex algorithm can certainly be applied directly to this model. However, the disad-
vantage of this transformation is that the resulting model has twice as many constraints as
the original model. Moreover, as Theorem 3.8.1 shows, all feasible basic solutions become
degenerate.
3 . 8 . M o d e l s w i t h e q ua l i t y c o n s t ra i n t s 137
Theorem 3.8.1.
A b
All feasible basic solutions of the model max c x −A x ≤ −b , x ≥ 0 are
T
degenerate.
A I 0
h i
Proof of Theorem 3.8.1. Let à = −A 0m I , and take any invertible (2m, 2m)-
m
0
B B 0 0 00
submatrix B̃ = 00 of Ã, with B a submatrix of A, and B and B submatrices
−B 0 B
of Im . Let x0 and x00 be the nonnegative variables corresponding to B0 and B00 , respectively.
Moreover, let xBI be the vector of variables corresponding to the column indices of B. Then,
the system of equations
xBI
0 b
B̃ x =
00 −b
x
0 0 00 00
is equivalent to BxBI + B x = b, −BxBI + B x = −b. Adding these two expressions
gives B0 x0 + B00 x00 = 0. Now recall that B0 is a submatrix of the identity matrix Im , and each
column of B0 is a unit vector. Therefore, B0 x0 is an m-dimensional vector consisting of the
entries of x0 and a number of zeroes (to be precise, m minus the number of entries in x0 ). The
same is true for B00 x00 . Therefore, since x0 , x00 ≥ 0, the fact that B0 x0 + B00 x00 = 0 implies
that x0 = x00 = 0. Because the variables in xBI , x0 , and x00 are exactly the
basic
variables
xBI
corresponding to the basis matrix B̃, it follows that the feasible basic solution x0 is indeed
00
x
degenerate.
In order to apply the simplex algorithm to models of the form (3.17), we need to understand
the concept of feasible basic solutions for such models. InSection
2.2, we defined feasible
basic solutions for LO-models of the standard form max cT x Ax ≤ b, x ≥ 0 , with
A an (m, n)-matrix. The key component of the definition of a feasible basic solution is the
concept of basis matrix. A basis matrix is any invertible (m, m)-submatrix of the matrix
A Im . Recall that the latter matrix originates from the fact that the constraints of the
hx i
model with slack variables can be written as A Im x = b.
s
Consider now an LO-model of the form (3.17). The concept of ‘basis matrix’ in such
a model is defined in an analogous way: a basis matrix for model (3.17) is any
h invertible
x
i
(m, m)-submatrix of the matrix A. The corresponding basic solution is then xBI with
NI
xBI = B−1 b and xN I = 0.
In the case of a model of the form (3.17), it may happen that A contains no basis matrices
at all. This happens when A is not of full row rank, i.e., if rank(A) < m. In that case,
A contains no (m, m)-submatrix, and hence the model has no basic solutions at all. This
problem does not occur in the case of a model with inequality constraints, because the
138 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
matrix A Im contains the (m, m) identity matrix, which has rank m. (See also Section
B.4.)
In practice one may encounter matrices that do not have full row rank. For example, every
instance of the transportation problem (see Section 8.2.1) has a technology matrix that does
not have full row rank, so that its rows are not linearly independent. In such cases we can
apply Gaussian elimination and delete zero rows until either the remaining matrix has full
row rank, or this matrix augmented with its right hand side values gives rise to an inconsistent
set of equalities. In case of the transportation problem, the deletion of one arbitrary row
already leads to a full row rank matrix; see Section 8.2.1. In general, multiple rows may
need to be deleted.
The mechanics of the simplex algorithm for models with equality constraints are the same as
in the case of models with inequality constraints. The only caveat is that Gaussian elimination
is required to turn the initial tableau into a simplex tableau. The following example illustrates
this.
Example 3.8.1. Consider the following LO-model:
x1 x2 x3 −z
40 100 150 0
1 2 2 3
30 10 20 75
This is not a simplex tableau yet, because we have not specified the set of basic variables and the tableau
does not contain a (2, 2) identity matrix. Let us choose BI = {1, 2}. After applying Gaussian
elimination, we find the following tableau:
x1 x2 x3 −z
0 0 54 −126
2
1 0 5
2 25
4 3
0 1 5 10
This is indeed a simplex tableau: the columns of the technology matrix corresponding to the basic
variables form an identity matrix, the current objective coefficients of the basic variables are all zero, and
3 . 9 . Th e r e v i s e d s i m p l e x a l g o r i t h m 139
the right hand side values are nonnegative. Since this is a simplex tableau, the simplex algorithm may
be applied to find an optimal feasible basic solution. An optimal solution is in fact found after one
iteration. It is left to the reader to check that the optimal objective value of the model is 146 14 , and
x1∗ = 2 41 , x2∗ = 0, x∗3 = 83 is an optimal solution.
The initial choice of BI in the above example is in some sense lucky: had we used a different
choice of BI , then it could happen that one or more right hand side values turned out being
negative. This happens, for instance, for BI = {2, 3}. In that case, we would have found
an infeasible initial basic solution, and we could not have applied the simplex algorithm. In
general, the big-M procedure (see Section 3.6.1) can be applied to avoid having to guess an
initial basis. This is illustrated in the following example.
Example 3.8.2. The model from Example 3.8.1 may be solved by augmenting it with artificial
variables u1 and u2 as follows:
max cT x
s.t. Ax ≤ b
x ≥ 0,
−1 −1
T T T T "
T −1 T −1
#
0 cN I −cBI B N −cBI B b cBI B −cBI B b
−1 −1 −1 −1
Im B N B b B B b
where N Iα is the index of the entering variable, and BIβ is the index of the leaving variable.
The index α satisfies:
cTN I − cTBI B−1 N α > 0;
see Section 3.3. If such an α does not exist, then the current basis matrix is optimal and the
simplex algorithm terminates. Otherwise, once such an α has been calculated, the leaving
basic variable xBIβ is determined by performing the minimum-ratio test. The minimum-
ratio test determines an index β ∈ {1, . . . , m} such that:
( )
(B−1 b)β (B−1 b)j −1
= min j ∈ {1, . . . , m}, (B N)j,α > 0 .
(B−1 N)β,α (B−1 N)j,α
Then xBIβ is the leaving basic variable.
It follows from this discussion that at each iteration, we really only need to calculate the
current objective coefficients, the values of the current set of basic variables, and the column
α of B−1 N. Hence, only the calculation of the inverse B−1 is required. There is no
need to update and store the complete simplex tableau (consisting of m + 1 rows and
m + n + 1 columns) at each iteration, the updating of a tableau consisting of m + 1 rows
and m + 1 columns suffices. Figure 3.13 and Figure 3.14 schematically show the standard
simplex tableau and the revised simplex tableau.
Each column of B−1 N in the standardsimplex tableau can be readily computed from the
original input data (N is a submatrix of A Im ) as soon as B−1 is available. Moreover, row
0 of the standard simplex tableau can easily be calculated from the original input data if the
value of cTBI B−1 is known. The revised simplex algorithm, in the case of a maximization
model, can now be formulated as follows.
and an initial feasible basic solution for the LO-model with slack variables
(see Section 3.6).
Output: Either an optimal solution of the model, or the message ‘the model is
unbounded’.
3 . 9 . Th e r e v i s e d s i m p l e x a l g o r i t h m 141
In Section 3.9.3, we will apply the revised simplex algorithm to Model Dovetail.
where, for i = 1, . . . , m, ei is the i’th unit vector in Rm , and T the identity matrix
with the β ’th column replaced by B−1 N?,α (= (B−1 N)?,α ). Because det(Bnew ) =
det(B) det(T) (see Appendix B.7), and B and T are nonsingular square matrices, it follows
that Bnew is nonsingular. Its inverse satisfies:
−1
Bnew = T−1 B−1 .
142 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
T
Define E = T−1 . E is called an elementary matrix. Write M1 . . . Mm = B−1 N?,α . It
Hence, at each iteration of the revised simplex algorithm, the inverse of the new basis matrix
can be written as a product of elementary matrices.
I Initialization. As the initial basis matrix, we take the columns of A I4 corresponding to the
variables x3 , x4 , x5 , and x6 . So for the initial solution, it holds that:
B(0) = a3 a4 a5 a6 = I4 ,
T
cBI = c3 c4 c5 c6 = 0 0 0 0 ,
T
xBI = x3 x4 x5 x6 = 9 18 7 6 ,
T
xN I = x1 x2 = 0 0 ,
z = 0,
−1
B(0) = I4 .
−1
BI = {3, 4, 5, 6} and B(0)
I Iteration 1. The first iteration starts with = I4 .
T T −1
I Step 1. Since cN I − cBI B(0) a1 a2 = 3 2 , and both entries of this vector are positive;
we select x1 as the entering variable.
−1 −1
I Step 2. (B(0) a1 )T = 1 3 1 0 , and (B(0) b)T = 9 18 7 6 . The minimum-ratio test
cTBI = c2 c1 c5 c6 , and cN T
I Iteration 3. Recall that I = c4 c3 .
T T −1
1 1
I Step 1. cN I − cBI B(2) a4 a3 = − 2 −1 2 . Since none of the entries of this vector is
positive, the current basis matrix is optimal. In order to calculate the optimal objective value, we
determine xBI as follows:
3
1
2
0 0 0 3 4
21
− 12 1 0 0 6 4 2
B−1
xBI = (2) b = E2 (E1 b) = 1
=
1 2 1
.
2
0 1 0 2
− 32 0 0 1 6 1 12
Note that the values of the entries of E1 b were already calculated in the previous iteration. The
optimal objective value is 22 12 .
3.10 Exercises
Exercise 3.10.1. Consider the following LO-model:
max x1 + x2
s.t. x1 ≤2 (1)
2x1 + 3x2 ≤ 5 (2)
x1 , x2 ≥ 0.
Introduce slack variables x3 and x4 for the two constraints. Solve the model in a way as
T
described in Section 3.1. Start the procedure at the vertex 0 0 , then include x1 in the
set of basic variables while removing x3 , and finally exchange x4 by x2 .
(b) Use (a) to determine the variables (including slack variables) that are unbounded.
(c) Show that the simplex algorithm cycles when using the following pivot rule: in case of
ties, choose the basic variable with the smallest subscript.
(d) Apply the perturbation method.
(e) Extend the model with the constraint x3 ≤ 1, and solve it.
Exercise 3.10.4. Consider the LO-model of Example 3.5.4, with the additional constraint
x1 ≤ 0.
(a) Argue that, in contrast to the LO-model of Example 3.5.4, the new model has a
bounded optimal solution.
(b) Find, by inspection, the optimal feasible basic solution of the model.
(c) Show that the simplex algorithm cycles when the same pivot rules are used as in Exam-
ple 3.5.4.
Exercise 3.10.5.
(a) Give an example of an LO-model in which a feasible basic solution becomes infeasible
when the model is perturbed.
(b) Give an example in which the perturbation method turns a feasible LO-model into an
infeasible LO-model.
(c) Argue that, if the perturbation method is applied by adding εi to the right hand side
of each ‘≤’ constraint i and subtracting εi from the right hand side of each ‘≥’, the
situation of (b) does not occur.
Exercise 3.10.6. A company produces two commodities. For both commodities three
kinds of raw materials are needed. In the table below the used quantities are listed (in kg)
for each kind of raw material for each commodity unit.
The price of raw material 1 is $2 per kg, the price of raw material 2 is $3 per kg, and the
price of raw material 3 is $4 per kg. A maximum of only 20,000 kgs of raw material 1 can
be purchased, a maximum of only 15,000 kgs of raw material 2, and a maximum of only
25,000 kgs of raw material 3 can be purchased.
(a) The company wants to maximize the production. Write the problem as a standard
LO-model and solve it.
146 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
(b) The company also wants to keep the costs for raw material as low as possible, but
it wants to produce at least 1,000 kgs of each commodity. Write this problem as a
standard LO-model and solve it using the Big-M procedure.
Exercise 3.10.7. The company Eltro manufactures radios and televisions. The company
has recently developed a new type of television. In order to draw attention to this new
product, Eltro has embarked on an ambitious TV-advertising campaign and has decided to
purchase one-minute commercial spots on two types of TV programs: comedy shows and
basketball games. Each comedy commercial is seen by 4 million women and 2 million men.
Each basketball commercial is seen by 2 million women and 6 million men. A one-minute
comedy commercial costs $50,000 and a one-minute basketball commercial costs $100,000.
Eltro would like the commercials to be seen by at least 20 million women and 18 million
men. Determine how Eltro can meet its advertising requirements at minimum cost. (Hint:
use the two-phase procedure.)
Exercise 3.10.8. Solve the following LO-model by using the simplex algorithm:
Exercise 3.10.9. Use the simplex algorithm to solve the following LO-model: (1) by
converting it into a maximization model, (2) by solving it directly as a minimization model.
Compare the various iterations of both methods.
Exercise 3.10.10. Dafo Car manufactures three types of cars: small, medium, and large
ones. The finishing process during the manufacturing of each type of car requires finishing
materials and two types of skilled labor: assembling labor and painting labor. For each type
of car, the amount of each resource required to finish ten cars is given in the table below.
At present, 4,800 kgs metal, 20 assembling hours, and 8 painting hours are available. A
small car sells for $2,000, a medium car for $3,000, and a large car for $6,000. Dafo Car
expects that demand for large cars and small cars is unlimited, but that at most 5 (×100,000)
medium cars can be sold. Since the available resources have already been purchased, Dafo
Car wants to maximize its total revenue. Solve this problem.
Exercise 3.10.11. Prove Theorem 3.7.2. (Hint: first read the proof of Theorem 3.5.2.)
Exercise 3.10.12.
(a) Construct an example for which δ = ∞ in Theorem 3.7.2.
(b) Construct an example for which δ = 0 in Theorem 3.7.2.
Exercise 3.10.13. Construct an LO-model and an optimal feasible basic solution for
which the following holds: the value of δ in the statement of Theorem 3.7.2 is zero, but the
model has multiple optimal solutions.
Exercise 3.10.14. Construct an LO-model that has a unique optimal feasible basic solution,
but multiple optimal points.
Exercise 3.10.15. Determine all optimal solutions of the following LO-model; compare
the results with those of the model from the beginning of Section 3.7.
max x1 + x2
s.t. x1 + x2 ≤ 9
3x1 + x2 ≤ 18
x2 ≤ 6
x1 , x2 ≥ 0.
1 −1 0 0
T
with b = 1 0 0 0 and A = 0 1 −1 0
0 0 1 −1
is infeasible. (Hint: apply the two-phase procedure.)
148 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
max x1 + 2x2
s.t. −x1 + x2 ≤ 6
x1 − 2x2 ≤ 4
x1 , x2 ≥ 0.
(a) Determine all vertices and extreme directions of the feasible region of the following
LO-model by drawing its feasible region.
(b) Use the simplex algorithm to show that the model is unbounded.
(c) Determine a halfline in the feasible region on which the objective values increase un-
boundedly.
Exercise 3.10.19. Determine the simplex adjacency graphs of the following LO-models.
(a) max{x1 | x1 + x2 + x3 ≤ 1, x1 ≤ 1, x1 , x2 , x3 ≥ 0}.
(b) max{x2 | x1 + x2 + x3 ≤ 1, x1 ≤ 1, x1 , x2 , x3 ≥ 0}.
Exercise 3.10.20. Let BI(1) and BI(2) be two nodes of a simplex adjacency graph, each
corresponding to a degenerate vertex. Suppose that BI(1) and BI(2) differ in precisely
one entry. Show, by constructing an example, that BI(1) and BI(2) need not be connected
by an arc.
3. 10 . E xe rc i se s 149
Exercise 3.10.21.
(a) Show that, when applying the simplex algorithm, if a variable leaves the set of basic
variables in some iteration, then it cannot enter the set of basic variables in the following
iteration.
(b) Argue that it may happen that a variable that leaves the set of basic variables enters the
set of basic variable in a later iteration.
max x1 + 2x2
s.t. 2x1 − x2 − x3 ≥ −2
x1 − x2 + x3 ≥ −1
x1 , x2 , x3 ≥ 0.
(a) Use the simplex algorithm to verify that the LO-model has no optimal solution.
(b) Use the final simplex tableau to determine a feasible solution with objective value at
least 300.
Exercise 3.10.24. Apply the big-M procedure to show that the following LO-model is
infeasible.
max 2x1 + 4x2
s.t. 2x1 − 3x2 ≥ 2
−x1 + x2 ≥ 1
x1 , x2 ≥ 0.
Exercise 3.10.25. Convert the following LO-models into standard form, and determine
optimal solutions with the simplex algorithm.
(a) max −2x1 + x2
s.t. x1 + x2 ≤ 4
x1 − x2 ≤ 62
x1 ≥ 0, x2 free.
150 C h a p t e r 3 . Da n t z i g ’s s i m p l e x a l g o r i t h m
Exercise 3.10.27. Show that the cycle drawn with thick edges in Figure 3.10 is the only
simplex cycle in the graph of Figure 3.10.
Exercise 3.10.28. Apply the revised simplex algorithm, by using the product form of the
inverse, to the following LO-models:
(a) max 2x1 + 3x2 − x3 + 4x4 + x5 − 3x6
s.t. x1 − 2x2 + x4 + 4x5 + 12 x6 ≤ 10
x1 + x2 + 3x3 + 2x4 + x5 − x6 ≤ 16
2x1 + 12 x2 − x3 − x4 + x5 + 5x6 ≤ 8
x1 , x2 , x3 , x4 , x5 , x6 ≥ 0.
Overview
The concept of duality plays a key role in the theory of linear optimization. In 1947, John
von Neumann (1903–1957) formulated the so-called dual of a linear optimization model.
The variables of the dual model correspond to the technology constraints of the original
model. The original LO-model is then called the primal model. Actually, the terms primal
and dual are only relative: the dual of the dual model is again the primal model. In practical
terms, if the primal model deals with quantities, then the dual deals with prices. This
relationship will be discussed thoroughly in Chapter 5. In this chapter we will show the
relationship with the other important concept of linear optimization, namely optimality.
The theory is introduced by means of simple examples. Moreover, the relevant geometrical
interpretations are extensively discussed.
We saw in Chapter 3 that any simplex tableau represents a feasible basic solution of the
primal LO-model. In this chapter, we will see that any simplex tableau also represents a
basic solution of the dual model. In contrast to the primal solution, however, this dual
solution satisfies the dual optimality criterion, but it is not necessarily feasible. We will see
that the simplex algorithm can be viewed as an algorithm that:
I seeks primal optimality, while keeping primal feasibility, and
I seeks dual feasibility, while keeping dual optimality.
We will describe an alternative simplex algorithm, namely the so-called dual simplex algo-
rithm, which seeks primal feasibility, while keeping primal optimality. Although we will not
show this here, this dual simplex algorithm can be viewed as the primal simplex algorithm
applied to the dual problem.
151
152 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
Consider now the production of long matches. Each unit of long matches requires one
unit of production capacity, three units of wood, and one unit of boxes for long matches.
Once processed, these raw materials yield a profit of 3 (×$1,000) for this unit of long
matches. This means that the value of this particular combination of raw materials is at least
3 (×$1,000). Thus, the price that Salmonnose has to pay cannot be less than the profit that
is made by selling this unit of long matches. Therefore,
y1 + 3y2 + y3 ≥ 3. (4.1)
Similarly, considering the production of short matches, Salmonnose will come up with the
constraint
y1 + y2 + y4 ≥ 2. (4.2)
Given these constraints, Salmonnose wants to minimize the amount it pays to the owners
of Dovetail in order to buy all its production capacity and in-stock raw materials, i.e., it
4 . 1 . Th e c o m pa n i e s D o v e ta i l a n d S a l m o n n o s e 153
wants to minimize 9y1 + 18y2 + 7y3 + 6y4 . Of course, the prices are nonnegative, so that
y1 , y2 , y3 , y4 ≥ 0. Therefore, the model that Salmonnose solves is the following.
Model Salmonnose.
The optimal solution of this model turns out to be: y1∗ = 1 12 , y2∗ = 12 , y3∗ = 0, y4∗ = 0,
with the optimal objective value z ∗ = 22 12 ; see Section 4.2. This means that ’Salmonnose’
has to pay $22,500 to Dovetail to buy it out. The value z ∗ = 22 12 should not come as a
surprise. Model Dovetail showed that the company was making $22,500 from its facilities,
and so any offer of more than this amount would have been acceptable for their owners.
The beauty of Model Salmonnose lies in its interpretations, which are given in the next
section.
I The global interpretation. In Section 4.1.1 it is noted that Salmonnose has to pay 22 12
(×$1,000) for the whole transaction, which includes the rent of the machine during
one year and the purchase of the inventory of wood and boxes. The various means
of production have their own prices as well. For instance, the price of the rent of the
machine is equal to 9 × 1 21 = 13 12 (×$1,000). It should be mentioned that these prices
only hold for the whole transaction.
The fact that the ‘long’ and ‘short’ boxes in stock go with it for free (y3∗ = 0 and y4∗ = 0)
does not mean, of course, that they do not represent any value; if the ‘long’ and ‘short’
boxes are excluded from the transaction, then the total price may become lower. Actually,
the prices of the ‘long’ and ‘short’ boxes are discounted in the prices of the other means
of production.
I The marginal interpretation. The fact that y3∗ = 0 means that a change in the inven-
tory of boxes for long matches has no implications for the value of the total transaction.
The same holds for the inventory of the ‘short’ boxes. However, a change in the inventory
of wood, say by an (additive) quantity γ (called the perturbation factor), has consequences
for the total value because the value of the corresponding variable y2 is larger than 0. For
instance, the fact that y2∗ = 12 means that if the amount of wood inventory changes from
18 to 18 + γ , then the optimal profit changes from 22 12 to 22 12 + 12 γ (× $1,000). It
should be mentioned that this only holds for small values of γ ; see Chapter 5.
154 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
we find that z ∗ ≤ 24. Generalizing this idea, we can construct any nonnegative linear
combination of all four constraints (with nonnegative weight factors y1 , y2 , y3 , y4 ), and
obtain the inequality:
(x1 + x2 )y1 + (3x1 + x2 )y2 + (x1 )y3 + (x2 )y4 ≤ 9y1 + 18y2 + 7y3 + 6y4 ,
provided that y1 , y2 , y3 , y4 ≥ 0. By reordering the terms of the left hand side, this inequality
can be written as:
to the optimal objective value of Model Salmonnose. In the realm of duality, Model Dovetail
is called the primal model and Model Salmonnose the dual model. The general standard form
(see Section 1.2.1) of both models is as follows:
with x ∈ Rn , y ∈ Rm , c ∈ Rn , b ∈ Rm , A ∈ Rm×n .
The following remarkable connections between the primal and the dual model exist.
The i’th dual constraint corresponds to the primal decision variable xi (i = 1, . . . , n).
The dual decision variable yj corresponds to the j ’th primal constraint (j = 1, . . . , m).
The constraint coefficients of the i’th primal variable are the coefficients in the i’th dual constraint.
The process described above, in which the dual model is derived from a given primal model,
is called dualization. We will see in the following section that dualizing a standard dual model
results in the original standard primal model.
Theorem 4.2.1.
Consider the following general LO-model:
The dimensions of the symbols in model (GM) are omitted. The expression ‘x free’ means
that the entries of the vector x are not restricted in sign, and so they can be either nonneg-
ative or nonpositive. Often, expressions of the form ‘x free’ are omitted.
Proof. In order
n to dualize
model (GM),
o we first transform it into the standard form, i.e., into
T
the form max c x Ax ≤ b, x ≥ 0 . Handling the constraints is straightforward: each ‘≥’
constraint is turned into a ‘≤’ constraint by multiplying both sides of the inequality by −1, and
each ‘=’ constraint is turned into two ‘≤’ constraints (see also Section 1.3). This leaves the
expressions ‘x2 ≤ 0’ and ‘x3 free’. These can be handled as follows:
Let y1 , y2 , y30 , and y300 be the vectors of the dual decision variables corresponding to the
constraints of (SGM) with right hand side vectors b1 , −b2 , b3 , and −b3 , respectively. The
dual (DSGM) of (SGM) then is:
T T T 0 T 00
min b1 y1 + (−b2 )y2 + b3 y3 + (−b3 )y3 (DSGM)
T T T 0 T 00
s.t. A1 y1 + (−A4 )y2 + A7 y3 + (−A7 )y3 ≥ c1
T T T 0 T 00
(−A2 )y1 + A5 y2 + (−A8 )y3 + A8 y3 ≥ −c2
T T T 0 T 00
A3 y1 + (−A6 )y2 + A9 y3 + (−A9 )y3 ≥ c3
T T T 0 T 00
(−A3 )y1 + A6 y2 + (−A9 )y3 + A9 y3 ≥ −c3
0 00
≥ 0.
y1 , y2 , y3 , y3
Recall that, before the dualization process, we have applied the following transformations to
(GM):
It is left to the reader to show that the dual of model (DGM) is the original model (GM);
see Exercise 4.8.9. Hence, (GM) and (DGM) are mutually dual. In Table 4.1 we have
summarized the primal-dual relationships (A ∈ Rm×n , b ∈ Rm , c ∈ Rn , i = 1, . . . , n,
and j = 1, . . . , m).
What can be said about the primal and the dual
slack variables? Recall that the standard
primal model with slack variables is max cT x Ax + xs = b, x ≥ 0, xs ≥ 0 , and the
standard dual model with slack variables is min bT y AT y − ys = c, y ≥ 0, ys ≥ 0 .
considered as a technology constraint? That is, consider the following nonstandard LO-
158 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
PRIMAL/DUAL DUAL/PRIMAL
1. Maximizing model 1. Minimizing model
2. Technology matrix A 2. Technology matrix AT
3. Right hand side vector b 3. Objective coefficients vector b
4. Objective coefficients vector c 4. Right hand side vector c
5. j ’th constraint ‘=’ type 5. Decision variable yj free
6. j ’th constraint ‘≤’ type (slack var. 6. Decision variable yj ≥ 0
xn+j ≥ 0)
7. j ’th constraint ‘≥’ type (slack var. 7. Decision variable yj ≤ 0
xn+j ≤ 0)
8. Decision variable xi free 8. i’th constraint ‘=’ type
9. Decision variable xi ≥ 0 9. i’th constraint ‘≥’ type (slack var.
ym+i ≥ 0)
10. Decision variable xi ≤ 0 10. i’th constraint ‘≤’ type (slack var.
ym+i ≤ 0)
model:
max cT x
s.t. Ax ≤ b, x ≥ 0
x free.
A
The technology matrix of this model is I . Let y1 (∈ Rm ) be the vector of dual variables
n
corresponding to Ax ≤ b, and y2 (∈ Rn ) the vector of dual variables corresponding to
x ≥ 0. Applying the above formulated dualization rules, we obtain the following model:
min bT y1 + 0T y2
s.t. AT y1 + y2 = c
y1 ≥ 0, y2 ≤ 0.
This model is a standard dual model with slack variables, except for the fact that the vec-
tor of slack variables ys of the original standard dual model satisfies ys = −y2 . So, the
dual slack variables of a standard LO-model are equal to the negatives of the dual variables
corresponding to the ‘constraint’ x ≥ 0. If we wantto avoid this difference in sign, we
should have defined the standard primal model as max c x Ax ≤ b, −x ≤ 0 ; see also
T
Section 5.3.3. On the other hand, some LO-packages only accept nonnegative decision vari-
ables, and so – if relevant – we should pay extra attention to the actual signs of the optimal
values of the slack variables presented in the output of the package.
4 . 2 . D ua l i t y a n d o p t i m a l i t y 159
Example 4.2.1. We illustrate the dualization rules using the following LO-model.
min 2x1 + x2 − x3
s.t. x1 + x2 − x3 = 1
x1 − x2 + x3 ≥ 2
x2 + x3 ≤ 3
x1 ≥ 0, x2 ≤ 0, x3 free.
Replacing x2 by −x2 , and x3 by x03 − x003 , the ‘=’ constraint by two ‘≥’ constraints with opposite
sign, and multiplying the third constraint by −1, we find the following standard LO-model:
in the following
theorems. These theorems are
formulated for the
standard LO-models
max cT x Ax ≤ b, x ≥ 0 and min bT y AT y ≥ c, y ≥ 0 . However, since non-
standard LO-models can easily be transformed into standard models, these theorems can
be generalized for nonstandard LO-models as well. Examples of such generalizations can
be found in the exercises at the end of this chapter. In Section 5.7, models with equality
constraints receive special attention.
We start by proving the so-called weak duality theorem, which states that the optimal objective
value of a standard LO-model is at most the optimal objective value of its dual model.
The following important theorem shows that if a pair of vectors x̂ and ŷ can be found satis-
fying the constraints of the respective primal and dual models, and such that their objective
values are equal, then x̂ and ŷ are both optimal.
Proof of Theorem 4.2.3. Take any x̂ and ŷ satisfying the conditions of the theorem. It
follows from Theorem 4.2.2 that:
n o
T T
c x̂ ≤ max c x Ax ≤ b, x ≥ 0
n o
T T T T
≤ min b y A y ≥ c, y ≥ 0 ≤ b ŷ = c x̂.
Because the leftmost term equals the rightmost term in this expression, we have that the in-
equalities are in fact equalities. Therefore, we have that:
n o n o
T T T T T
c x̂ = max c x Ax ≤ b, x ≥ 0 , and b ŷ = min b y A y ≥ c, y ≥ 0 .
4 . 2 . D ua l i t y a n d o p t i m a l i t y 161
This implies that x̂ is an optimal solution of the primal model, and ŷ is an optimal solution of
the dual model.
The following strong duality theorem implies that the converse of Theorem 4.2.3 is also true,
i.e., the optimal objective value of a standard LO-model is equal to the optimal objective
value of its dual model.
y∗ = (B−1 )T cBI
Using Theorem 4.2.4, we can immediately calculate an optimal dual solution as soon as an
optimal primal basis matrix is at hand.
162 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
Example 4.2.2. Consider again Model Dovetail. The optimal primal basis matrix, its inverse,
and the corresponding vector cBI are
x1 x2 x5 x6
−1 1 0 0
1 1 00
3 1 00 −1 1
3 −1 0 0 , and cBI = 3 2 0 0 T .
B= , B = 2
1 0 10 1 −1 2 0
0 1 01 −3 1 0 2
The following general optimality condition holds for nonstandard LO-models; see Exercise
4.8.10. Note that this condition does not use feasible basic solutions.
∗ ∗
x1 = 4 12 y1 = 1 12
∗ ∗
x2 = 4 12 y2 = 1
2
∗ ∗
(slack) x3 = 0 y3 = 0
∗ ∗
(slack) x4 = 0 y4 = 0
∗ ∗
(slack) x5 = 2 12 y5 = 0 (slack)
∗ ∗
(slack) x6 = 1 12 y6 =0 (slack)
and
The optimal solutions of Model Dovetail and Model Salmonnose with slack variables are:
STANDARD
n PRIMAL MODEL withoslack variables
T
max c x Ax + Im xs = b, x, xs ≥ 0
STANDARD
n DUAL MODEL with slack o variables
T T
min b y A y − In ys = c, y, ys ≥ 0
where xs and ys are the primal and the dual slack variables, respectively, and Im and In
are the identity matrices with m and n rows, respectively.
We have seen that, in general, each constraint of a standard LO-model has an associated
dual decision variable. Since each decision variable of the primal LO-model corresponds to
a dual constraint, this means that each primal decision variable also corresponds to a dual
slack variable. Similar reasoning implies that each primal slack variable corresponds to a dual
decision variable. In particular, we have the following correspondence between primal and
dual variables.
I The primal decision variable xi corresponds to the dual slack variable ym+i , for i =
1, . . . , n.
I The primal slack variable xn+j corresponds to the dual decision variable yj , for j =
1, . . . , m.
The variables xi and ym+i , as well as xn+j and yj , are called complementary dual variables
(i = 1, . . . , n and j = 1, . . . , m). The corresponding constraints are called complementary
dual constraints.
Recall that, in Section 2.2.3, we defined the complementary dual set BI c of BI , and
the complementary dual set N I c of N I . The reader may verify that the vector yBI c
contains the complementary dual variables of the variables in the vector xN I . Similarly,
yN I c contains the complementary dual variables of the variables in xBI .
The correspondence between the complementary dual variables is illustrated in Figure 4.2.
The matrix in the figure is the technology matrix A of a standard primal LO-model. From
the primal perspective, each column i corresponds to the primal decision variable xi (i =
1, . . . , n); each row j corresponds to j ’th primal constraint, associated with the primal
slack variable xn+j (j = 1, . . . , m). From the dual perspective, each row j corresponds to
the dual decision variable yj (j = 1, . . . , m); each column i corresponds to the i’th dual
constraint, associated with the dual slack variable ym+i (i = 1, . . . , n).
x1 x2 xn
ym+1 ym+2 ym+n
a11 a12 ··· a1n y1 xn+1
··· y2 xn+2
a21 a22 a2n
A =
. .. .. ..
.. .
. .
am1 am2 ··· amn ym xn+m
Figure 4.2: Complementary dual variables. The rows of the technology matrix A correspond to the dual
decision variables and the primal slack variables. The columns correspond to the primal
decision variables and the dual slack variables.
The following relationships exist between the optimal values of the primal and dual variables
(the remarks between parentheses refer to Model Dovetail and Model Salmonnose).
I If a primal slack variable has a nonzero value, then the corresponding dual decision
variable has value zero. (x∗5 = 2 12 ⇒ y3∗ = 0, and x∗6 = 1 12 ⇒ y4∗ = 0).
I If a dual decision variable has a nonzero value, then the corresponding primal slack
variable has value zero. (y1∗ = 1 21 ⇒ x∗3 = 0, and y2∗ = 1
2
⇒ x∗4 = 0).
I If a dual slack variable has a nonzero value, then the corresponding primal decision
variable has value zero. (This does not occur in Model Dovetail and Model Salmonnose,
because y5∗ = 0, and y6∗ = 0).
I If a primal decision variable has a nonzero value, then the corresponding dual slack
variable has value zero. (x1∗ = 4 21 ⇒ y5∗ = 0, and x∗2 = 4 12 ⇒ y6∗ = 0).
In Theorem 4.3.1 it will be shown that these observations are actually true in general. How-
ever, the converses of the implications formulated above do not always hold. If, for instance,
the value of a primal slack variable is zero, then the value of the corresponding dual variable
may be zero as well (see, e.g., Section 5.6).
Note that the observations can be compactly summarized as follows. For each pair of com-
plementary dual variables (i.e., xi and ym+i , or xn+j and yj ), it holds that at least one
of them has value zero. An even more economical way to express this is to say that their
product has value zero.
The following theorem gives an optimality criterion when solutions of both the primal and
the dual model are known.
(i) x and y are optimal (not necessarily feasible basic) solutions of their correspond-
ing models;
(ii) xT ys = 0 and xsT y = 0 (with xs and ys the corresponding slack variables).
Proof of Theorem 4.3.1. The proof of (i) ⇒ (ii) is as follows. Using Theorem 4.2.4, it
follows that: xT ys = xT (AT y − c) = xT (AT y) − xT c = (xT AT )y − cT x = (Ax)T y − cT x =
T T T T T T T T
(b − xs ) y − c x = b y − xs y − c x = −xs y. Therefore, x ys = −xs y, and this implies
that x ys + xs y = 0. Since x, y, xs , ys ≥ 0, we have in fact that x ys = xTs y = 0. The
T T T
proof of (ii) ⇒ (i) is left to the reader. (Hint: use Theorem 4.2.3.)
The expressions xT ys = 0 and xsT y = 0 are called the complementary slackness relations.
Theorem 4.3.1 states a truly surprising fact: if we have a feasible primal solution x∗ and
a feasible dual solution y∗ , and they satisfy the complementary slackness relations, then it
immediately follows that x∗ and y∗ are optimal solutions (for the standard primal and the
standard dual models, respectively).
Example 4.3.1. In the case of Model Dovetail and Model Salmonnose, the complementary
slackness relations are (see Figure 4.1):
xs1 = b1 − A1 x1 − A2 x2 − A3 x3 ,
xs2 = A4 x1 + A5 x2 + A6 x3 − b2 ,
ys1 = A1T y1 + A2T y4 + A7T y3 − c1 , and
ys2 = c2 − A2T y1 − A5T y5 − A8T y3 .
The following theorem summarizes the complementary slackness relations for the general
models (GM) and (DGM).
4 . 3 . C o m p l e m e n ta ry s l ac k n e s s r e l at i o n s 167
x1 y1
x = x2 (∈ R ) and y = y2 (∈ Rm )
n
x3 y3
satisfy the constraints of the primal model (GM) and the corresponding dual model
(DGM), respectively, then the following assertions are equivalent:
(i) x and y are optimal (not necessarily feasible basic) solutions of their correspond-
ing models;
(ii) x1T ys1 = 0, x2T ys2 = 0, xs1
T
y1 = 0, and xs2
T
y2 = 0 (with xs1 , xs2 , ys1 , and
ys2 defined as above).
max −2x1 − 34 x2
s.t. x1 + 2x2 + 2x3 ≥ 5 (slack x4 )
2x1 − 1
x
2 2
≤ −1 (slack x5 )
x1 + x2 + x3 ≤ 3 (slack x6 )
x1 , x2 , x3 ≥ 0.
T T
Suppose that we are given the vector x1 x2 x3 = 0 2 34 , and we want to know whether or
not this vector is an optimal solution of the LO-model. One can easily check that x1 = 0, x2 = 2,
x3 = 34 is a feasible solution, but not a feasible basic solution (why not?). The dual of the above
168 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
model reads:
min 5y1 − y2 + 3y3
s.t. y1 + 2y2 + y3 ≥ −2 (slack y4 )
2y1 − 1
y
2 2
+ y3 ≥ − 34 (slack y5 )
2y1 + y3 ≥ 0 (slack y6 )
y1 , y2 , y3 ≥ 0.
The pairs of complementary dual variables are (x1 , y4 ), (x2 , y5 ), (x3 , y6 ), (x4 , y1 ), (x5 , y2 ),
and (x6 , y3 ). Substituting x1 = 0, x2 = 2, x3 = 43 into the constraints of the primal model leads
to the following values of the primal slack variables: x4 = 21 , x5 = 0, and x6 = 41 . If the given
vector is indeed a primal optimal solution, then we should be able to find a corresponding optimal dual
solution. Since x2 , x3 , x4 , and x6 have nonzero values, it follows that for any optimal dual solution,
it must be the case that y5 = y6 = y1 = y3 = 0. Hence, the dual model can be reduced to:
min −y2
s.t. 2y2 ≥ −2
− 12 y2 = − 34
y2 ≥ 0.
T
The optimal solution of this model satisfies y2 = 1 12 . The vectors x1 x2 x3 x4 x5 x6 =
T T T
0 2 34 21 0 14 and y1 y2 y3 y4 y5 y6 = 0 1 12 0 5 0 0 now satisfy the complemen-
T T
tary slackness relations, and so x1 x2 x3 = 0 2 43 is in fact a (nonbasic) optimal solution.
This example illustrates how it can be determined whether or not a given solution is optimal,
and that the procedure is very effective if the number of nonzero variables (including slack
variables) in the given solution is large.
xi ym+i = 0 for i = 1, . . . , n,
and xn+j yj = 0 for j = 1, . . . , m.
So, in the case of optimal primal and dual solutions, the product of two complementary dual
variables is always zero. It may even happen that both variables have value zero. In Theorem
4.3.3 it will be shown that there exist optimal primal and dual solutions such that for each
pair of complementary dual variables either the value of the primal variable is nonzero or
the value of the dual variable is nonzero. In the literature this theorem is known as the strong
complementary slackness theorem.
4 . 3 . C o m p l e m e n ta ry s l ac k n e s s r e l at i o n s 169
that every pair of complementary dual variables (x∗i , yj∗ ) has the property that exactly
one of xi∗ and yj∗ is nonzero.
n o
Proof of Theorem 4.3.3. Let the standard primal model max cT x Ax ≤ 0, x ≥ 0 and
n o
its dual model min 0T y AT y ≥ c, y ≥ 0 be denoted by (P) and (D), respectively. We first
(?) For each i = 1, . . . , n, either there exists an optimal solution x∗ = x∗1 . . . x∗n of (P)
T
To prove (?), let z ∗ be the optimal objective value of (P). Let i ∈ {1, . . . , n}, and consider the
LO-model:
T
max ei x
s.t. Ax ≤ b
T ∗ (P’)
−c x ≤ −z
x ≥ 0.
(Here, ei is the i’th unit vector in Rn .) Model (P’) is feasible, because any optimal solution of
(P) is a feasible solution of (P’). Let z 0 be the optimal objective value of (P’). Clearly, z 0 ≥ 0.
We now distinguish two cases.
I Case 2b: λ0 > 0. Define y∗ = y0 /λ0 . Then, we have that y∗ ≥ 0, bT y∗ = (bT y0 )/λ0 =
∗ 0 0 ∗ T ∗ T 0 0 0 0 ∗
(z λ )/λ = z , and A y − c = (A y − λ c)/λ ≥ ei /λ . Hence, y is an optimal
solution of (D), and it satisfies (AT y∗ − c)i = (ei /λ0 )i = 1/λ0 > 0, and so (?) holds.
We can now prove the theorem. For i = 1, . . . , n, apply statement (?) to (P) and (D) to obtain
(i)
x and y(i) such that: (1) x(i) is an optimal solution of (P), (2) y(i) is an optimal solution of
(i)
(P), and (3) either xi > 0 or (AT y(i) − c)i > 0. Similarly, for j = 1, . . . , m, apply statement
(?) to (D) and (P) (i.e., with the primal and dual models switched) to obtain x(n+j) and y(n+j)
such that: (1) x(n+j) is an optimal solution of (P), (2) x(n+j) is an optimal solution of (P), and
(n+j) (n+j)
(3) either yj > 0 or (b − Ax )j > 0. Define:
n+m n+m
∗ 1 X (k) ∗ 1 X (k)
x = x , and y = y .
n+m n+m
k=1 k=1
Due to Theorem 3.7.1 and the fact that x∗ is a convex combination of optimal solutions of (P)
(why?), we have that x∗ is also a (not necessarily basic) optimal solution of (P). Similarly, y∗
is an optimal solution of (D). Hence, by Theorem 4.3.1, it follows that x∗ and y∗ satisfy the
complementary slackness relations.
It remains to show that x∗ and y∗ satisfy the assertions of the theorem. To do so, consider any
pair of complementary dual variables (xi∗ , ym+i
∗
) with i = 1, . . . , n. We have that:
n+m n+m
!
∗ 1 X (k) 1 X (k)
xi = x = xi ,
n+m n+m
k=1 i k=1
and
n+m n+m
! !
∗ T ∗ T 1 X (k) 1 X T (k)
ym+i = (A y − c)i = A y −c = (A y − c)i .
n+m n+m
k=1 i k=1
(k) T (k)
Note that xi ≥ 0 and (A y −c)i ≥ 0 for k = 1, . . . , n+m. Moreover, by the construction
(i) (i) (i)
of x and y , we have that either xi > 0 or (AT y(i) − c)i > 0. Therefore, either x∗i > 0
or (AT y∗ − c)i > 0 (and not both, because x∗ and y∗ satisfy the complementary slackness
relations). It is left to the reader that the conclusion also holds for any pair of complementary
dual variables (x∗n+j , yj∗ ) with j = 1, . . . , m.
In Theorem 4.3.3, it has been assumed that the LO-model is in standard form. For LO-
models containing ‘=’ constraints, we have to be careful when applying the theorem. It
may happen that there are multiple optimal primal solutions and that at least one of them
is nondegenerate, while the corresponding optimal dual solution is unique and degenerate;
see Theorem 5.6.1. If, moreover, the optimal dual variable y corresponding to the ‘=’
constraint (with slack variable s = 0) has value 0, then the corresponding complementary
slackness relation reads y × s = 0 × 0 = 0. Since s = 0 for all optimal primal solutions,
and the optimal dual solution is unique, the strong complementary slackness relation does
not hold. For an extended example, see also Exercise 12.7.8.
4 . 4 . I nf easi b i l ity and unbounde dne s s ; Far kas ’ le m ma 171
y2
x2
−2 −1 1 2 y1
1
−1
−2
0 1 x1
Figure 4.3: Unbounded primal feasible region. Figure 4.4: Empty dual feasible region.
max x1 + x2
s.t. −2x1 + x2 ≤ 1
−x1 + 2x2 ≤ 3
x1 , x2 ≥ 0.
The feasible region of this model is depicted in Figure 4.3. Note that this region is unbounded. For
instance, if x2 = 1, then x1 can have any positive value without violating the constraints of the model.
In this figure a level line (with the arrow) of the objective function is drawn. Obviously, the ‘maximum’
of x1 + x2 is infinite. So, the model is unbounded, and, therefore, it has no optimal solution. The
dual of the above model is:
min y1 + 3y2
s.t. −2y1 − y2 ≥ 1
y1 + 2y2 ≥ 1
y1 , y2 ≥ 0.
172 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
DUAL MODEL
PRIMAL MODEL Optimal Infeasible Unbounded
Optimal Possible Impossible Impossible
Infeasible Impossible Possible Possible
Unbounded Impossible Possible Impossible
The arrows in Figure 4.4 point into the halfspaces determined by the constraints of this dual model.
Clearly, there is no point that simultaneously satisfies all constraints. Hence, the feasible region is empty,
and this dual model is infeasible.
In Example 4.4.1, the primal model has an unbounded optimal solution, while its dual
model is infeasible. In Theorem 4.4.1 the relationship between unbounded optimal primal
solutions and infeasibility of the dual model is formulated.
Proof of Theorem 4.4.1. Recall that for each x ≥ 0 and y ≥ 0 with Ax ≤ b and AT y ≥ c,
it holds that cT x ≤ bT y (see the proof of Theorem 4.2.3). Now suppose for a contradiction
that the dual model has a feasible solution ŷ. Hence, we have that cT x ≤ bT ŷ for every primal
feasible solution x. But since bT ŷ < ∞, this contradicts the fact that the primal model is
unbounded. Therefore, the dual model is infeasible.
Note that it may also happen that both the primal and the dual model are infeasible. Exercise
4.8.12 presents an LO-model for which this happens.
In Table 4.2, we have summarized the various possibilities when combining the concepts
‘there exists an optimal solution’ (notation: Optimal), ‘infeasibility of the model’ (notation:
Infeasible), and ‘unboundedness’ (notation: Unbounded) for the primal or the dual model.
The entries ‘Possible’ and ‘Impossible’ in Table 4.2 mean that the corresponding combina-
tion is possible and impossible, respectively. Note that the table is symmetric due to the fact
that the dual model of the dual model of an LO-model is the original LO-model.
In practical situations, feasible regions are usually not unbounded, because the decision
variables do not take on arbitrarily large values. On the other hand, empty feasible regions
do occur in practice; they can be avoided by sufficiently relaxing the right hand side values
(the capacities) of the constraints.
4 . 4 . I nf easi b i l ity and unbounde dne s s ; Far kas ’ le m ma 173
Since AT ŷ = c and ŷ ≥ 0, it follows that ŷ is a feasible point of model (D). Moreover, since all
feasible points of (D) have objective value 0, ŷ is an optimal solution of (D). Because the optimal
objective value of model (D) is 0, model (P) has an optimal solution, and the corresponding
optimal objective value is 0; see Exercise 4.8.10(b). Since the maximum objective value on the
feasible region of model (P) is 0, it follows that system (I) has no solution.
Now assume that system (II) has no solution. Then, model (D) is infeasible. On the other
hand, model (P) is feasible, since 0 is in its feasible region. So, model (P) either has an optimal
solution, or is unbounded. The former is not the case, because if model (P) had an optimal
solution, then model (D) would have been feasible. Hence, model (P) must be unbounded. So,
there exist an x̂ with cT x̂ > 0 and Ax̂ ≤ 0. Hence, system (I) has a solution.
1
Named after the Hungarian mathematician and physicist Gyula Farkas (1847–1930).
174 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
x2
0
K
x1
(II), because:
5 1 −5 −4
0
4 4 4 3 −16 0 T 3
Ax = 5 3 0 −9 = −12 ≤ 0 , c x = 5 2 2 −9 = 1 > 0.
2 2
4 2 3 0 0
So, the system Ax ≤ 0, cT x > 0 has a solution. Therefore, according to Theorem 4.4.2, the system
T
(4.4) has in fact no solution. The vector 3 −9 2 is the corresponding certificate of unsolvability
for the system (4.4). Note that it can be checked that dropping any one of the equality constraints of
(4.4) leads to a system that has a solution.
xI yJ¯
xBI xn+J y ¯
= m+I = yBI c
xN I xI¯ yJ y c NI
xn+J¯ ym+I
AT y y ≥ 0 . It can easily be seen that K is a convex cone spanned by the row vectors
Hence, for K = AT y y ≥ 0 , it follows that:
K 0 = x (AT y)T x ≤ 0, for y ≥ 0
= x yT (Ax) ≤ 0, for y ≥ 0 = {x | Ax ≤ 0}.
versa). In fact, B is a basis matrix corresponding to y∗ = (B−1 )T cBI , which means that
y∗ corresponds to a vertex of the dual feasible region.
x
h i
The idea of the proof of the theorem is as follows. Let x BI be the optimal feasible basic
NI
y c
h i
solution of the standard primal model with respect to the basis matrix B, and let y BI c be
NI
the corresponding optimal solution of the dual model with respect to the complementary
dual matrix B of B. Following the notation of Section 2.2.3, define I = BI ∩ {1, . . . , n},
J = (BI ∩ {n + 1, . . . , n + m}) − n. Since B and B are complementary dual basis
matrices, we have that BI = I ∪ (n + J), N I = I¯ ∪ (n + J) ¯ , BI c = J¯ ∪ (m + I) ¯,
and N I = J ∪ (m + I). The relationships between these sets are depicted in Figure 4.6.
c
In this figure, the arrows refer to the mutually complementary dual variables. That is, xI
corresponds to ym+I , xn+J to yJ , xI¯ to ym+I¯, and xn+J¯ to yJ¯. Hence, xBI corresponds
to yN I c , and xN I to yBI c . According to the complementary slackness relations, it follows
that both B is an optimal primal basis matrix and B is an optimal dual basis matrix, if and
only if it holds that:
T
xBI yN I c = 0 and xN
T
I yBI = 0.
c
We will verify in the proof that the latter condition in fact holds. Note that primal basic
variables correspond to dual nonbasic variables, that primal decision variables correspond to
dual slack variables, and that primal slack variables correspond to dual decision variables.
If B and B correspond to feasible basic solutions, then both B and B are optimal basis
matrices, and:
−1
(i) x∗ = (B )T bBI c is the vertex of the primal feasible region that corresponds
to the primal basis matrix B, and
(ii) y∗ = (B−1 )T cBI is the vertex of the dual feasible region that corresponds to
the dual basis matrix B.
x̂ ŷ c
h i h i
Proof of Theorem 4.5.1. Let x̂BI and ŷBI c be the feasible basic solutions with x̂BI =
NI NI
−1 −1
B b ≥ 0, x̂N I = 0, and ŷBI c = B c ≥ 0, ŷN I c = 0 corresponding to B and B. Using
the notation of Section 2.2.3, define I = BI ∩ {1, . . . , n}, J = (BI ∩ {n + 1, . . . , n + m}) − n.
We have that (see Theorem 2.2.1):
AJ,I
¯ 0J,J
¯
B = A Im ?,BI
= A?,I (Im )?,J ≡
AJ,I (Im )J,J
" #
T
T
h
T
i (A )I,J¯ 0I,I¯
B = A −In ?,BI
c = (A )?,J¯ (−In )?,I¯ ≡ T .
(A )I,
¯ J¯ (−In )I,
¯ I¯
4 . 5 . P r i m a l a n d d ua l f e a s i b l e b a s i c s o lu t i o n s 177
The complementary dual matrix B actually defines a dual (not necessarily feasible) basic
solution for any (not necessarily optimal) primal basis matrix B. Moreover, the values of
the primal basic variables and the dual basic variables have explicit expressions, which are
given in Theorem 4.5.2.
The expressions derived in Theorem 4.5.2have an interesting interpretation. Let
B and B
be complementary dual basis matrices in A Im ≡ B N and AT −In ≡ B N ,
x̂ x̂ ŷ ŷ c
respectively, and let x̂ ≡ x̂ BI and ŷ ≡ ŷ BI c be the corresponding primal and
s NI s NI
dual feasible basic solutions. Recall that, by the definition of a basic solution, the primal
178 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
B−1 b ≥ 0. (4.5)
Recall also that, by the definition of an optimal basic solution (see Section 3.5.2), the primal
basic solution is optimal if and only if:
Applying this condition to the corresponding dual solution, the dual basic solution is optimal
if and only if:
−1
(B N)T bBI c − bN I c ≤ 0. (4.8)
It follows from Theorem 4.5.2 that the left hand side vectors of (4.5) and (4.8) are in fact
equal. Similarly, the left hand side vectors of (4.6) and (4.7) are equal. These two facts have
the following important implications:
I The primal basic solution satisfies the primal feasibility condition (4.5) if and only if the
corresponding dual basic solution satisfies the dual optimality condition (4.8).
I The primal basic solution satisfies the primal optimality condition (4.7) if and only if the
corresponding dual basic solution satisfies the dual feasibility condition (4.6).
Proof. Note that the first equality in (4.9) follows from the definition of a basic solution, and
similarly for (4.10). So, it suffices to prove the second equality in (4.9) and the second equality
in (4.10). Using the notation from Theorem 4.5.1, we have that:
AJ,¯ I¯ (Im )J,
¯ J¯
N = A Im ?,N I
≡ ,
AJ,I¯ 0J,J¯
" #
T
T
(A )I,J (−In )I,I
N = A −In ?,N I
c ≡ T .
(A )I,J
¯ 0I,I
¯
4 . 6 . D ua l i t y a n d t h e s i m p l e x a l g o r i t h m 179
It follows that:
T −1 T T ∗
bN I c − N (B ) bBI c = bN I c − N x
AJ,I¯ x∗I
bJ AJ,I
≡ −
b (−In )I,I 0I,I¯ 0I¯
m+I ∗
bJ − AJ,I xI x̂n+J
= = ≡ x̂BI ,
x̂I x̂I
where we have used the fact that x̂I¯ = 0 and bm+I = 0. Similarly,
T
T −1 T ∗
N B cBI − cN I = N y − cN I
" #
T T
(A )I,
¯ J¯ (A )I,J
¯ ŷJ¯ cI¯
≡ −
(Im )J,
¯ J¯ 0J,J
¯ 0J cn+J¯
T
¯ J¯ŷJ¯ − cI¯
(A )I, ŷm+I¯
= = ≡ ŷBI c ,
ŷJ¯ ŷJ¯
where we have used the fact that ŷJ = 0 and cn+J¯ = 0.
Note that if B and B in Theorem 4.5.2 do not correspond to optimal solutions, then
cT x̂ 6= bT ŷ.
It follows from Theorem 4.5.2 that, at an optimal solution, we have that:
T T −1 ∗
cN I − cBI B N = −yBI c,
i.e., the optimal dual solution (including the values of both the decision and the slack vari-
ables) corresponding to an optimal primal basis matrix B is precisely the negative of the
objective vector (namely, the top row) in the optimal simplex tableau corresponding to
B. In the case of Model Dovetail, the objective vector of the optimal simplex tableau
is 0 0 −1 21 − 12 0 0 ; see Section 3.3. According to Theorem 4.5.1,
it follows that the
corresponding optimal dual solution satisfies: y5∗ y6∗ y1∗ y2∗ y3∗ y4∗ = 0 0 1 12 21 0 0 .
In the present section, we consider the geometry of the dual feasible region. In this setting,
it is convenient to think of the nonnegativity constraints as technology constraints. So, for
180 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
a standard primal LO-model with slack variables max cT x Ax + xs = b, x, xs ≥ 0 ,
0T
T
cN T
I − cBI B
−1
N −cTBI B−1 b one row
Theorem 4.2.4 describes how to find a dual optimal solution once a primal optimal basis
matrix is known. We saw that, if B is an optimal primal basis matrix from a standard LO-
model, then y∗ = (B −1 T
T ) cBITis an optimal dual
solution, i.e., it is an optimal solution of
the dual model min b y A y ≥ c, y ≥ 0 . Moreover, this optimal dual solution cor-
responds to a feasible basic solution of the dual model; see Theorem 4.5.1. We also showed
that y∗ = (B−1 )T cBI is precisely the dual solution that corresponds to the complementary
dual matrix B of B.
It actually makes sense to define a dual solution for any (not necessarily optimal) basis matrix.
Let B be a basis matrix, and let BI be the set of indices of the basic variables. We call
y = (B−1 )T cBI the dual solution corresponding to the basis matrix B.
x
h i
The objective value z of the feasible basic solution x BI corresponding to B is equal to
NI
the objective value of the (possibly infeasible) dual solution, because we have that:
Row 0 of the simplex tableau contains the negative of the values of the complementary dual
variables. This can be seen as follows. Recall that cTBI − cTBI B−1 B = 0T , i.e., the value
4 . 6 . D ua l i t y a n d t h e s i m p l e x a l g o r i t h m 181
of the objective coefficient of any primal basic variable xi (i ∈ BI ) is zero. Therefore, the
row vector of objective coefficients of the simplex tableau satisfies:
≡ cT 0T − cTBI B−1 A Im
T
T T T
c A
= c 0 − y A Im = − .
0 Im
This means that the objective coefficients are the negatives of the slack values corresponding
to the constraints of model (4.11). The constraints in model (4.11) include nonnegativity
constraints, and hence the slack values of these dual nonnegativity constraints are in fact
equal to the values of the dual decision variables.
Let i ∈ BI . Since the value of the objective coefficient of xi is zero, the complementary
dual variable corresponding to the i’th dual constraint has value zero. In other words, the i’th
dual constraint is binding. This means that y is the unique point in the intersection of the
hyperplanes associated with the dual constraints with indices in BI . (Why is there a unique
point in this intersection?) Due to this fact, we can read the corresponding dual solution
from the simplex tableau, as the following example illustrates. In fact, the example shows
how an optimal dual basic solution can be determined from an optimal simplex tableau.
Example 4.6.1. Consider Model Dovetail, and its dual model. The dual model, with the
nonnegativity constraints considered as technology constraints, reads:
x1 x2 x3 x4 x5 x6 −z
0 0 −1 12 − 21 0 0 −22 21
0 1 1 12 − 21 0 0 4 21 x2
1 0 − 21 1
2
0 0 4 21 x1
1
0 0 2
− 21 1 0 2 21 x5
0 0 −1 12 1
2
0 1 1 21 x6
The indices of the dual constraints correspond exactly to the indices of the primal variables. Hence, from
this tableau, we see that the first, second, fifth, and sixth dual constraints are binding. The third and
fourth dual constraints have slack values 1 21 and 12 , respectively.
Columns 1 and 2 correspond to the dual constraints y1 + 3y2 + y3 ≥ 3 and y1 + y2 + y4 ≥ 2.
Columns 3, . . . , 6 correspond to the dual constraints y1 ≥ 0, . . ., y4 ≥ 0. The latter provides us
with a convenient way to determine the values of the dual variables. For instance, the slack variable for
the constraint y1 ≥ 0 (represented by the third column) has value 1 21 . This means that y1 = 1 21 . In
contrast, the slack variable for the constraint y3 ≥ 0 (represented by the fifth column) has value zero.
Hence, we have that y3 = 0. Applying the same arguments to each of the dual constraints, we find
that the dual solution y satisfies:
y1 = 1 21 , y2 = 12 , y3 = 0, y4 = 0.
In fact, because this simplex tableau is optimal, the corresponding (primal) basis matrix is optimal and,
hence, Theorem 4.2.4 implies that this dual solution is in fact an optimal solution of the dual model.
in the simplex tableau are nonpositive, all slack values are nonnegative, and hence, y =
(B−1 )T cBI is a feasible dual solution.
To illustrate this, recall Model Dovetail and its dual model, Model Salmonnose. In order
to gain insight in the relationship between the iteration steps of the simplex algorithm and
the dual model, we will now think of Model Salmonnose as the primal model, and Model
Dovetail as the dual model. (Recall that the dual model of the dual model is the original
primal model.) We will solve Model Salmonnose using the simplex algorithm. After multi-
plying the objective function by −1, replacing ‘min’ by ’max’, and adding slack variables
y5 and y6 , Model Salmonnose is equivalent to the following LO-model:
y1 y2 y3 y4 y5 y6 −z
−9 −18 −7 −6 0 0 0
1 3 1 0 −1 0 3
1 1 0 1 0 −1 2
(Note that this is not yet a simplex tableau because we have not specified which variables
are basic.) The columns of this ‘simplex tableau’ correspond to the constraints of the pri-
mal model. The primal constraints corresponding to the dual variables y5 and y6 are the
nonnegativity constraints x1 ≥ 0, x2 ≥ 0, respectively, of Model Dovetail.
Finding an initial feasible basic solution is particularly easy in this case, because the columns
corresponding to y3 and y4 already form the (2, 2)-identity matrix. Moreover, it can be seen
from the model above that choosing y3 and y4 as the initial basic variables, thus choosing
T
y1 = y2 = y5 = y6 = 0, leads the initial feasible solution y = 0 0 3 2 0 0 . So, we
may choose BI = {3, 4} as the initial set of (dual) basic variable indices.
To turn the above ‘simplex tableau’ into an actual simplex tableau, we apply Gaussian elim-
ination and find the following initial simplex tableau and successive simplex iterations:
I Iteration 1. BI = {3, 4}.
y1 y2 y3 y4 y5 y6 −z
4 9 0 0 −7 −6 33
1 3 1 0 −1 0 3 y3
1 1 0 1 0 −1 2 y4
184 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
x2
x5 =0
9
x4 =0
v3 x6 =0
6 v
4
v2
v1 x3 =0
0 6 7 9 x1
Figure 4.7: Path in the dual problem taken by the simplex algorithm when applied to Model Salmonnose.
The current objective value is 33. The corresponding dual solution (of Model Dovetail) is
the unique point in the intersection of the third and fourth constraints of Model Dovetail,
T
i.e., x1 = 7 and x2 = 6, which means that x = 7 6 . There are two columns with a
positive current objective coefficient. We choose to pivot on y1 ; it is left to the reader to
carry out the calculations when choosing y2 instead.
I Iteration 2. BI = {1, 3}.
y1 y2 y3 y4 y5 y6 −z
0 5 0 −4 −7 −2 25
0 2 1 −1 −1 1 1 y3
1 1 0 1 0 −1 2 y1
The current objective value is 25. The corresponding dual solution is the unique point
in the intersection of the first and third constraints of Model Dovetail, i.e., x1 + x2 = 9
T
and x1 = 7, which implies that x = 7 2 .
I Iteration 3. BI = {1, 2}.
y1 y2 y3 y4 y5 y6 −z
0 0 −2 21 −1 21 −4 21 −4 12 22 21
1
0 1 2
− 21 − 21 1
2
1
2
y2
1 0 − 21 1 21 1
2
−1 12 1 21 y1
Since all current objective values are nonpositive, we have reached an optimal solution.
The optimal objective value is 22 21 . The corresponding optimal dual solution is the
unique point in the intersection of the first and second constraints of Model Dovetail,
T
i.e., x1 + x2 = 9 and x1 + 3x2 = 18, which implies that x = 4 21 4 21 .
Figure 4.7 depicts the successive dual solutions found by the simplex algorithm. The al-
T
gorithm started with the (dual infeasible) point 7 6 , visits the (again infeasible) point
4 . 7. Th e d ua l s i m p l e x a l g o r i t h m 185
T T
7 2 , and finally finds the feasible dual point 4 12 4 21 . So, when viewed from the per-
spective of the dual model, the simplex algorithm starts with a dual infeasible basic solution
(that satisfies the dual optimality criterion), and then performs iteration steps until the dual
basic solution becomes feasible.
Recall that the current objective values of y5 and y6 are the negatives of the slack variables
of the corresponding dual constraints x1 ≥ 0, and x2 ≥ 0, respectively (see Section 4.6.1).
Hence, the dual solutions could also immediately be read from the simplex tableaus by
looking at the objective coefficients corresponding to y5 and y6 . For example, in the initial
simplex tableau, we see that these coefficients are −7 and −6, respectively, meaning that
x1 = 7, x2 = 6.
x1 x2 x3 x4 x5 −z
−2 −4 −7 0 0 0
−2 −1 −6 1 0 −5 x4
−4 6 −5 0 1 −8 x5
This simplex tableau represents an infeasible basic solution, and this solution satisfies the (primal)
optimality criterion. The latter follows from the fact that the current objective coefficients are nonpositive.
The dual simplex algorithm now proceeds by finding a violated constraint with index β . At the current
solution, both constraints are violated. We choose the first constraint, so that β = 1 and therefore
(xBIβ =) x4 is the leaving variable. Next, we choose an entering variable xN Iα , and hence a pivot
entry. We choose a value of α such that:
( )
T T −1 T T −1
(cN I − cBI B N)α (cN I − cBI B N)j (B−1 N)β,j < 0,
= min . (4.12)
(B−1 N)β,α (B−1 N)β,j j = 1, . . . , n
x1 x2 x3 x4 x5 −z
0 −3 −1 −1 0 5
1
1 2
3 − 21 0 5
2
x1
0 8 7 −2 1 2 x5
Note that in this simplex tableau, the right hand side values are nonnegative. The current objective
coefficients are all nonpositive. This means that the current simplex tableau represents an optimal
solution. That is, x∗1 = 52 , x∗2 = x∗3 = x∗4 = 0, x∗5 = 2 is an optimal solution.
The expression (4.12) is called the dual minimum-ratio test. This test guarantees that, after
pivoting on the (β, N Iα )-entry of the simplex tableau, all objective coefficients are non-
positive. To see this, let c̄k be the current objective coefficient of xN Ik (k ∈ {1, . . . , n}).
We need to show that, after the pivot step, the new objective coefficient satisfies:
c̄β
c̄k0 = c̄k − −1 (B−1 N)β,k ≤ 0. (4.13)
(B N)β,α
c̄β
Note that c̄β ≤ 0, and that (B−1 N)β,α < 0 by the choice of α and β . Hence, (B
−1
N)β,α
≥
−1 −1
0. So, if (B N)β,k ≥ 0, then (4.13) holds. If (B N)β,k < 0, then (4.13) is equivalent
4 . 7. Th e d ua l s i m p l e x a l g o r i t h m 187
to
c̄k c̄β
≥
−1 −1 ,
(B N)β,k (B N)β,α
which follows immediately from (4.12).
and an initial basic solution for the model with slack variables (see Section
3.6) that satisfies the (primal) optimality condition (B−1 N)T cBI −cN I ≥
0 (see (4.7)).
Output: Either an optimal solution of the model, or the message ‘no feasible solu-
tion’.
I Step 1: Constructing the basis matrix. Let BI and N I be the sets of the indices of
the basic and nonbasic variables correspondingto the current basic solution,
respectively. Let B consist of the columns of with indices in BI ,
A Im
and let N consist of the columns of A Im with indices in N I . The
Go to Step 4.
I Step 4: Exchanging. Set BI := (BI \ {BIβ }) ∪ {N Iα } and N I := (N I \
{N Iα }) ∪ {BIβ }. Return to Step 1.
188 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
Table 4.3 lists the correspondence between concepts used in the primal and the dual simplex
algorithms. Although we will not prove this here, it can be shown that the dual simplex
algorithm is just the primal simplex algorithm applied to the dual problem.
The dual simplex algorithm is particularly useful when we have already found an optimal
basic solution for an LO-model, and we want to add a new constraint to the model.
Table 4.3: Correspondence of concepts of the primal simplex algorithm and the dual simplex algorithm.
4 . 7. Th e d ua l s i m p l e x a l g o r i t h m 189
x1 x2 x3 x4 x5 x6 −z
0 0 −1 12 − 21 0 0 −22 21
0 1 1 12 − 21 0 0 4 21 x2
1 0 − 21 1
2
0 0 4 21 x1
1
0 0 2
− 21 1 0 2 21 x5
0 0 −1 12 1
2
0 1 1 21 x6
Matches are, of course, not made of only wood. The match head is composed of several chemicals, such
as potassium chlorate, sulfur, and calcium carbonate. Every 100,000 boxes of long matches requires 20
kilograms of this mix of chemicals. The short matches are slightly thinner than the long matches, so every
100,000 boxes of short matches requires only 10 kilograms of the chemicals. Company Dovetail has a
separate production line for mixing the chemicals. However, the company can mix at most 130 kilograms
of the required chemicals. Hence, we need to impose the additional constraint 20x1 + 10x2 ≤ 130,
T
or simply 2x1 + x2 ≤ 13. Recall that the optimal solution of Model Dovetail is 4 21 4 21 . The
x1 x2 x3 x4 x5 x6 x7 −z
0 0 −1 21 − 12 0 0 0 −22 21
0 1 1 12 − 12 0 0 0 4 21 x2
1 0 − 12 1
2
0 0 0 4 12 x1
1
0 0 2
− 12 1 0 0 2 21 x5
0 0 −1 12 1
2
0 1 0 1 21 x6
2 1 0 0 0 0 1 13 x7
This is not a simplex tableau yet, because the columns corresponding to the basic variables do not form
an identity matrix. Applying Gaussian elimination to make the first two entries of the last row equal
to zero, we find the following simplex tableau:
x1 x2 x3 x4 x5 x6 x7 −z
0 0 −1 21 − 12 0 0 0 −22 21
0 1 1 12 − 12 0 0 0 4 21 x2
1 0 − 21 1
2
0 0 0 4 21 x1
1
0 0 2
− 12 1 0 0 2 21 x5
1 1
0 0 −1 2 2
0 1 0 1 12 x6
0 0 − 21 − 12 0 0 1 − 12 x7
190 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
This simplex tableau has one violated constraint, and the current objective values are nonpositive. Hence,
we can perform an iteration of the dual simplex algorithm. The fifth row corresponds to the violated
constraint, so that we choose β = 5. Note that BIβ = 7 so that x7 is the leaving variable. The
dual minimum-ratio test gives:
( )
−1 21 −1 21
min , = min{3, 1} = 1.
− 12 − 12
The second entry attains the minimum, so α = 2. Therefore, (xN I2 =) x4 is the entering variable.
After applying Gaussian elimination, we find the following simplex tableau:
x1 x2 x3 x4 x5 x6 x7 −z
0 0 −1 0 0 0 −1 −22
0 1 2 0 0 0 −1 5 x2
1 0 −1 0 0 0 1 4 x1
0 0 1 0 1 0 −1 3 x5
0 0 −2 0 0 1 1 1 x6
0 0 1 1 0 0 −2 1 x4
This tableau represents an optimal feasible basic solution. The new optimal objective value is z ∗ = 22,
and x1∗ = 5, x∗2 = 4, x∗3 = 0, x∗4 = 1, x∗5 = 3, x∗6 = 1, x∗7 = 0 is a new optimal solution.
4.8 Exercises
Exercise 4.8.1. Determine the dual of the following LO-models. Also draw a diagram
like Figure 4.1 that illustrates the relationship between the primal variables and the comple-
mentary dual variables of the several models.
(a) max −9x1 − 2x2 + 12x3 (b) max 4x1 − 3x2 + 2x3 − 6x4
s.t. 2x1 − 2x2 + 2x3 ≤ 1 s.t. 2x1 + 4x2 − x3 − 2x4 ≤ 1
−3x1 + x2 + x3 ≤ 1 x1 − x2 + 2x3 − x4 ≤ −1
x1 , x2 , x3 ≥ 0. x1 , x2 , x3 , x4 ≥ 0.
Exercise 4.8.4. Determine, for each value of µ (µ ∈ R), the optimal solution of the
following model:
max µx1 − x2
s.t. x1 + 2x2 ≤ 4
6x1 + 2x2 ≤ 9
x1 , x2 ≥ 0.
Exercise 4.8.5. Solve Model Salmonnose as formulated in Section 4.1 (Hint: use either
the big-M or the two-phase procedure; see Section 3.6.)
Exercise 4.8.7. Determine the dual of model (1.20) in Section 1.6.3. (Hint: first write
out the model for some small values of N and M , dualize the resulting models, and then
generalize.)
Exercise 4.8.8. Solve the dual of the LO-model formulated in Example 3.6.1. Determine
the basis matrices corresponding to the optimal primal and dual solutions, and show that
they are complementary dual (in the sense of Section 2.2.3).
192 C h a p t e r 4 . D ua l i t y, f e a s i b i l i t y, a n d o p t i m a l i t y
Exercise 4.8.9. Show that the dual of model (DGM) is the original model (GM).
Exercise 4.8.10.
(a) Prove the following generalization of Theorem 4.2.3. If x and y are feasible solutions
of the LO-models (GM) and (DGM), respectively, and cT x = bT y, then x and y are
optimal.
(b) Prove the following generalization of Theorem 4.2.4. If the LO-model (GM) has an
optimal solution, then the dual model (DGM) has an optimal solution as well, and
vice versa; moreover, the optimal objective values of the primal and the dual model are
equal.
(c) Prove the optimality criterion for nonstandard LO-models as formulated at the end of
Section 4.2.3.
max 2x1 − x2
s.t. x1 − x2 ≤ 1
x1 − x2 ≥ 2
x1 , x2 ≥ 0.
Show that both this model and the dual of this model are infeasible.
max x1 + x2 (P)
s.t. x1 + x2 ≤ 6
x1 ≤3
x2 ≤ 3
x1 , x2 ≥ 0.
(a) Use the graphical method to determine the unique optimal vertex of the model, and
all corresponding optimal feasible basic solutions.
(b) Determine the dual of model (P), and determine the optimal feasible basic solutions of
this dual model.
(c) Verify that the complementary slackness relations hold for each pair of primal and opti-
mal feasible basic solutions found in (a) and (b). Verify also that the strong complemen-
tary slackness relations do not hold.
(d) Construct a pair of primal and dual optimal solutions that satisfy the strong complemen-
tary slackness relations.
4 . 8 . E xe rc i se s 193
Exercise 4.8.14. Use Farkas’ lemma (see Section 4.4) to determine a certificate of un-
solvability for the following systems of inequalities:
Exercise 4.8.15. Prove the following variants of Farkas’ lemma (see Section 4.4):
(a) For any (m, n)-matrix A any m-vector b, exactly one of the following systems has a
solution: (I) Ax = b, x ≥ 0, and (II) AT y = 0, bT y < 0.
(b) For any (m, n)-matrix A, exactly one of the following systems has a solution: (I)
Ax < 0, and (II) AT y = 0, y ≥ 0, y 6= 0. (Hint: consider a primal model with the
constraints Ax ≤ −ε1.)
max 2x1 − x2
s.t. x1 + x2 ≤ 4
x1 − x2 ≤ 2
−x1 + x2 ≤ 2
x1 , x2 ≥ 0.
(a) Use the (primal) simplex algorithm to determine an optimal simplex tableau for the
model.
(b) Use the reoptimization procedure described in Section 4.7.2 to determine an optimal
simplex tableau for the model with the additional constraint x1 ≤ 1.
(c) Check your answers in (a) and (b) by using the graphical solution method.
This page intentionally left blank
5
C hap te r
Sensitivity analysis
Overview
In most practical situations, the exact values of the model parameters are not known with ab-
solute certainty. For example, model parameters are often obtained by statistical estimation
procedures. The analysis of the effects of parameter changes on a given optimal solution
of the model is called sensitivity analysis or post-optimality analysis. Changing the value of
a parameter is called a perturbation of that parameter. In this chapter we will look at per-
turbations of several parameters, including objective function coefficients, right hand side
coefficient of constraints (including the zeroes of the nonnegativity constraints), and the
entries of the technology matrix. In economic terms, while the primal model gives solu-
tions in terms of, e.g., the amount of profit obtainable from the production activity, the
dual model gives information on the economic value of the limited resources and capacities.
So-called shadow prices are introduced in the case of nondegenerate optimal solutions; in
the case of degeneracy, right and left shadow prices are defined.
entries of the constraints (i.e., the entries of the vector b and the 0 in the nonnegativity
constraints xi ≥ 0), the objective coefficients (i.e., the entries of the vector c), and the
entries of the technology matrix (i.e., the entries of A). Let δ ∈ R. By a perturbation by a
195
196 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
The first condition ensures that the feasible basic solution corresponding to the basis matrix
B is feasible, and the second condition ensures that it is optimal (or, equivalently, that the
corresponding dual solution is dual feasible; see Section 4.5). Recall that these inequalities
are obtained by writing the model from the perspective of the optimal basic solution (see
Section 3.2), and asserting that the right hand side values are nonnegative (the inequalities
(5.1)), and the objective coefficients are nonpositive (the inequalities (5.2)).
When the value of some parameter is varied, certain terms in these two expressions change.
For example, when perturbing the objective coefficient of a variable that is basic according
to the optimal feasible basic solution, the vector cBI will change. If, however, the right hand
side of a technology constraint is varied, then b will change. For each such a situation, we
will analyze how exactly the above optimality and feasibility expressions are affected. This
will allow us to determine for which values of the perturbation factor δ the feasible basic
solution remains optimal, that is, we will be able to determine the tolerance interval. For
values of δ for which the feasible basic solution is no longer optimal, a different feasible basic
solution may become optimal, or the problem might become infeasible or unbounded.
5. 2 . Pe rtur b i ng ob j e c tive coe f f i c i e nt s 197
x2 x2
v4 v3 v4 v3
v2 v2
v1 v1
0 x1 0 x1
Figure 5.1: Level line 3x1 + 2x2 = 22 12 . Figure 5.2: Level line 4x1 + 2x2 = 27.
If δ = 0, then the objective function is the same as in the original model (see Figure 5.1).
If δ = 1, then the objective function is ‘rotated’ relative to the original one (see Figure
T
5.2). In both cases, vertex v2 (= 4 21 4 12 ) is the optimal solution. However, the optimal
x2
δ =9
δ =3
δ =0
∗
δ =−1 z (δ)
δ =−2
v4 v3 40
v2 30
20
10
v1
0 x1 −2 −1 0 1 2 3 4 δ
Figure 5.3: Canting level lines. Figure 5.4: Perturbation function for an objective
coefficient.
that point, both v3 and v2 are optimal (and, in fact, all points on the line segment v2 v3 are
optimal). Then, as we increase the value of δ slightly beyond −1, v2 becomes the unique
optimal vertex. Vertex v2 remains optimal up to and including δ = 3, at which point v1
also becomes optimal (and so do all points on the line segment v1 v2 ). As we increase the
value of δ past 3, v1 remains the unique optimal solution.
In Figure 5.4, the perturbation function z ∗ (δ) = (3 + δ)x∗1 + 2x2∗ is depicted. (Notice
that, in the expression for z ∗ (δ), the values of x∗1 and x∗2 are the optimal values and, hence,
they depend on δ .) We can see that the slope of the graph changes at δ = −1 and at δ = 3.
These are the points at which the optimal vertex changes from v3 to v2 and from v2 to v1 ,
respectively.
Table 5.1 lists the optimal solutions for δ = −2, −1, 0, 1, 2, 3, 4 for Model Dovetail. The
second column lists the vertices in which the optimal objective value is attained. The third
through the sixth columns contain the objective coefficients and the corresponding optimal
values of the decision variables. The seventh and the eighth columns list the changes ∆x1∗
and ∆x2∗ in x1∗ and x∗2 , respectively. The last column contains the optimal objective value
cT x∗ = c1 x∗1 + c2 x∗2 .
We will now study Table 5.1, Figure 5.3, and Figure 5.4 more carefully. To that end we start
at δ = 0, the original situation. At this value of δ , v2 is the optimal vertex. If δ is relatively
small, then v2 remains the optimal vertex, and the objective value changes by 4 21 δ . From
Table 5.1 we can conclude that if δ is between −1 and 3, then v2 is an optimal vertex and
the change of the objective value is 4 21 δ . If, for instance, the profit (c1 ) per box of long
matches is not 3 but 2 12 (i.e., δ = − 21 ), then v2 is still an optimal vertex, and the objective
value changes by (− 12 × 4 12 =) −2 14 .
5. 2 . Pe rtur b i ng ob j e c tive coe f f i c i e nt s 199
∗ ∗ ∗ ∗ T ∗
δ optimal c1 x1 c2 x2 ∆x1 ∆x2 c x
vertex
−2 v3 1 3 2 6 −1 12 1 12 15
−1 v3 2 3 2 6 −1 12 1 12 18
v2 2 4 12 2 4 12 0 0 18
0 v2 3 4 12 2 4 12 0 0 22 21
1 v2 4 4 12 2 4 12 0 0 27
2 v2 5 4 12 2 4 12 0 0 31 21
3 v2 6 4 12 2 4 12 0 0 36
v1 6 6 2 0 1 12 −4 12 36
4 v1 7 6 2 0 1 12 −4 12 42
Now consider what happens when the value of δ is decreased below −1. If δ decreases
from −1 to −2, then the optimal vertex ‘jumps’ from v2 to v3 . Similar behavior can be
observed when increasing the value of δ to a value greater than 3: if δ increases from 3
to 4, the optimal vertex ‘jumps’ from v2 to v1 . It appears that for −1 ≤ δ ≤ 3, the
vertex v2 is optimal. Thus −1 ≤ δ ≤ 3 is the tolerance interval for (the feasible basic
solution corresponding to) the optimal vertex v2 . For −2 ≤ δ ≤ −1, v3 is optimal, and
for 3 ≤ δ ≤ 4, v1 is optimal. For δ = −1 and δ = 3 we have multiple optimal solutions.
It is left to the reader to complete the graph of Figure 5.4 for values of δ with δ < −2
and values with δ > 4. Note that the graph of this function consists of a number of line
segments connected by kink points. In general, a function is called piecewise linear if its graph
consists of a number of connected line segments; the points where the slope of the function
changes are called kink points. In Theorem 5.4.1, it is shown that perturbation functions of
objective coefficients are piecewise linear.
We will now show how the tolerance intervals may be calculated without graphing the
perturbation function or making a table such as Table 5.1. Model Dovetail, with slack
variables and the perturbed objective coefficient c1 , reads as follows (see Section 3.2):
At vertex v2 , the variables x1 , x2 , x5 , and x6 are basic. The constraints can be rewritten as:
1 21 x3 − 1
x
2 4
+ x2 = 4 21
− 12 x3 + 1
x
2 4
+ x1 = 4 21
1 1
x
2 3
− x
2 4
+ x5 = 2 21
1 1
−1 2 x3 + x
2 4
+ x6 = 1 21 .
The objective function with perturbation factor δ for the coefficient of x1 is (3+δ)x1 +2x2 .
Using x1 = 4 21 + 12 x3 − 21 x4 and x2 = 4 12 − 1 12 x3 + 12 x4 , the objective function can be
expressed in terms of the nonbasic variables x3 and x4 as:
(3 + δ)x1 + 2x2 = (3 + δ) 4 21 + 12 x3 − 12 x4 + 2 4 12 − 1 12 x3 + 12 x4
= 22 21 + 4 12 δ + 12 δ − 1 12 x3 + − 12 − 12 δ x4 .
max 22 21 + 4 12 δ + 12 δ − 1 12 x3 + 12 − 12 δ x4 = 22 12 + 4 12 δ.
In other words, the optimal objective value is attained at v2 if the profit per box of long
matches is between 2 and 6 (with 2 and 6 included).
In general, we have the following situation. Let B be a basis matrix in A Im correspond-
ing to an optimal feasible basic solution. We perturb the entry of cBI corresponding to the
basic variable xk (k ∈ BI ) by δ . Let β be such that k = BIβ . This means that cBI is
changed into cBI + δeβ , where eβ ∈ Rm is the β ’th unit vector. The feasibility condition
(5.1) and the optimality condition (5.2) become:
B−1 b ≥ 0, (feasibility)
T
and cN
T
I − cBI + δeβ B−1 N ≤ 0. (optimality)
Clearly, the first condition is not affected by the changed objective coefficient. This should
make sense, because changing the objective function does not change the feasible region.
Hence, since the vertex corresponding to B was feasible before the perturbation, it is still
feasible after the perturbation. So, the feasible basic solution corresponding to B remains
optimal as long as the second condition is satisfied. This second condition consists of the n
inequalities:
T −1
T
(cN I − (cBI + δeβ ) B N)j ≤ 0 for j = 1, . . . , n. (5.3)
optimal and, hence, the resulting change ∆z in the value of z satisfies (recall that xN I = 0):
Note that this expression is only valid for values of δ that lie within the tolerance interval.
It is left to the reader to derive the tolerance interval −1 ≤ δ ≤ 3 for the first objective
coefficient in Model Dovetail directly from the inequalities in (5.3).
We conclude this section with a remark about the fact that tolerance intervals depend on the
choice of the optimal feasible basic solution: different optimal feasible basic solutions may
lead to different tolerance intervals. This fact will be illustrated by means of the following
example.
Example 5.2.1. Consider the LO-model:
z = (8 + 4δ) + 43 + 23 δ x3 + 43 − 13 x5
x1 = 4 + 32 x3 − 13 x5
x2 = 2 + 31 x3 − 23 x5
x4 = 0 + 31 x3 + 13 x5 .
I The basic variables are x1 , x2 , x5 . The corresponding optimal simplex tableau yields:
z = (8 + 4δ) + δx3 + (4 − δ)x4
x1 = 4 + x3 − x4
x2 = 2 + x3 − 2x4
x5 = 0 − x3 + 3x4 .
This solution is optimal as long as δ ≥ 0 and 4 − δ ≥ 0, i.e., 0 ≤ δ ≤ 4, while z ∗ (δ) =
8 + 4δ , for 0 ≤ δ ≤ 4.
The conclusion is that the three optimal feasible basic solutions give rise to different tolerance intervals
with respect to the objective coefficient of x1 . It should be clear, however, that the perturbation function
is the same in all three cases.
In Exercise 5.8.25, the reader is asked to draw the perturbation function for the above exam-
ple. Exercise 5.8.26 deals with a situation where a tolerance interval consists of one point.
max 6x1 + x2
s.t. x1 + x2 ≤ 9
3x1 + x2 ≤ 18
x1 ≤ 7
x2 ≤ 6
x1 , x2 ≥ 0.
(The original objective function cannot be used, since both decision variables are basic at
the optimal solution.) The optimal vertex is v1 with x∗1 = 6 and x2∗ = 0. At vertex v1 ,
the basic variables are x1 , x3 , x5 , and x6 . The corresponding optimal objective value is 36.
We will consider a perturbation of the objective coefficient of x2 , namely:
Rewriting the model from the perspective of the optimal feasible basic solution (see Section
3.2), we find the following equivalent model:
B−1 b ≥ 0, (feasibility)
and (cTN I + δeTα ) − cTBI B−1 N ≤ 0. (optimality)
Again, the first condition is unaffected, as should be expected (see the discussion in Section
5.2.1). The second condition is equivalent to:
−1
δeαT ≤ − cN T T
I − cBI B N .
This condition consists of a set of n inequalities. Only the inequality with index α actually
involves δ . Hence, this condition is equivalent to:
So, the value of xN Iα (= xk ) may be increased by at most the negative of the current
objective coefficient of that variable. Recall that, since xk is a decision variable (i.e., k ∈
{1, . . . , n}), the current objective coefficient is the negative of the value of the comple-
mentary dual variable ym+k (see also Figure 4.1 and Figure 4.2). Hence, the feasible basic
solution in fact remains optimal as long as:
I Suppose that xk is a basic variable, i.e., k = BIβ for some β ∈ {1, . . . , m}. Rewrite
(5.3) in the following way:
cTN I − cBI
T
B−1 N j ≤ (δeTβ B−1 N)j for j = 1, . . . , n.
Note that (δeβT B−1 N)j = δ B−1 N β,j . The tolerance interval of ck , with lower
cN I − cTBI B−1 N j
( T )
B−1 N β,j < 0, j = 1, . . . , n , and
δmin = max
B−1 N β,j
B−1 N j
( T T
)
cN I − cBI
B−1 N β,j > 0, j = 1, . . . , n .
δmax = min
B−1 N β,j
Recall that, by definition, max(∅) = −∞ and min(∅) = +∞. The numbers in these
expressions can be read directly from the optimal simplex tableau, because the term
cTN I − cTBI B−1 N is the row vector of objective coefficients of the nonbasic
variables in
−1
the optimal simplex tableau corresponding to B, and the term B N β,j is the entry
in row β and the column N Ij .
I Suppose that xk is a nonbasic variable, i.e., k = N Iα for some α ∈ {1, . . . , n}. It
∗ ∗
follows from (5.4) that the tolerance interval is (−∞, ym+k ], with ym+k the optimal
value of the complementary dual variable of xk . Following the discussion in Section 4.6,
the optimal value of the dual slack variable ym+k can be read directly from the optimal
simplex tableau.
Example 5.2.2. Consider the example of Section 5.2.2. The optimal simplex tableau, correspond-
ing to BI = {1, 3, 5, 6} and N I = {2, 4}, is:
x1 x2 x3 x4 x5 x6 −z
0 −1 0 −2 0 0 −36
2
0 3
1 − 31 0 0 3 x3
1 1
1 3
0 3
0 0 6 x1
0 − 13 0 − 31 1 0 1 x5
0 1 0 0 0 1 6 x6
We first determine the tolerance interval for the objective coefficient c1 of the basic variable x1 . Note
that k = 1 and β = 2. The tolerance interval [δmin , δmax ] satisfies:
( )
−1 −2
δmin = max 1 , 1 = −3, δmax = min{?, ?} = ∞.
3 3
Thus, if we perturb the objective coefficient c1 by a factor δ , then the solution in the optimal simplex
tableau above remains optimal as long as δ ≥ −3. Next, we determine the tolerance interval for the
5 . 3 . P e r t u r b i n g r i g h t h a n d s i d e va lu e s ( n o n d e g e n e rat e ca s e ) 205
objective coefficient c2 of the nonbasic variable x2 . The tolerance interval is (−∞, y6∗ ]. Following the
discussion in Section 4.6, the optimal dual solution can be read from the optimal simplex tableau, and
it reads: y1∗ = 0, y2∗ = 2, y3∗ = y4∗ = y5∗ = 0, and y6∗ = 1. So, the tolerance interval for the
objective coefficient c2 is (−∞, 1].
I If increasing the right hand side value results in enlarging the feasible region, then the
optimal objective value increases or remains the same when the right hand side value is
increased. Hence, the shadow price should be nonnegative.
I If increasing the right hand side value results in shrinking the feasible region, then the
optimal objective value decreases or remains the same when the right hand side value is
increased. Hence, the shadow price should be nonpositive.
For example, consider a perturbation of a constraint of the form aT x ≤ b in a maximiz-
ing LO-model. Increasing the right hand side value of this constraint enlarges the feasible
region, and hence increases (more precisely: does not decrease) the optimal objective value.
Therefore, the shadow price of a ‘≤’ constraint in a maximizing LO-model is nonnegative.
In contrast, the shadow price of a constraint of the form aT x ≥ b (such as a nonnegativity
constraint) in a maximizing LO-model has a nonpositive shadow price. Similar conclusions
hold for minimizing LO-models. For instance, increasing the right hand side value of the
constraint aT x ≤ b enlarges the feasible region, and hence decreases (more precisely: does
not increase) the optimal objective value in a minimizing LO-model. Therefore, the shadow
price of a ‘≤’ constraint in a minimizing LO-model is nonpositive.
The main results of this section are the following. We will consider a standard maximizing
primal LO-model max cT x Ax ≤ b, x ≥ 0 , and assume that the model has a non-
degenerate optimal solution. In Theorem 5.3.3, it is shown that the shadow price of a
technology constraint is the optimal value of the corresponding dual decision variable. In
Theorem 5.3.4, it is shown that the shadow price of a nonnegativity constraint is the neg-
ative of the optimal value of corresponding dual slack variable. So we have the following
‘complementary dual’ assertions (i = 1, . . . , n and j = 1, . . . , m):
I the shadow price of technology constraint j is equal to the optimal value dual decision
variable yj ;
I the shadow price of the nonnegativity constraint xi ≥ 0 is equal to the negative of the
optimal value dual slack variable ym+i .
We will also see that the shadow price of any nonbinding constraint is zero.
Proof of Theorem 5.3.1. Suppose that the technology constraint aTj x ≤ bj is nonbinding at
some optimal solution x∗ , meaning that aTj x∗ < bj . This means that there exists a small ε > 0
such that aTj x∗ < bj − ε. Consider the perturbed model:
n o
∗ T
z (δ) = max c x Ax ≤ b + δej , x ≥ 0 , (5.6)
m
where ej is the j ’th unit vector in R . We will show that, for every δ such that −ε ≤ δ ≤ ε,
model (5.6) has the same objective value as the original model, i.e., we will show that z ∗ (δ) =
∗
z (0).
We first show that z ∗ (δ) ≥ z ∗ (0). To see this, note that we have that x∗ ≥ 0 and
∗
Ax ≤ b − εej ≤ b + δej .
Therefore, x∗ is a feasible solution of (5.6). Hence, the optimal objective value of (5.6) must be
at least as large as the objective value corresponding to x∗ . That is, z ∗ (δ) ≥ z ∗ (0).
Next, we show that z ∗ (δ) ≤ z ∗ (0). Let x∗∗ be an optimal solution of (5.6) and suppose for a
contradiction that cT x∗∗ > cT x∗ . Let λ = min{1, δ+ε
ε
}. Note that λ ≤ ε
δ+ε and 0 < λ ≤ 1.
∗∗ ∗
Define x̂ = λx + (1 − λ)x . Then, we have that:
T T ∗∗ T ∗ T ∗∗ T ∗ T ∗
c x̂ = c (λx ) + c ((1 − λ)x ) = λ(c x ) + (1 − λ)(c x ) > c x ,
∗∗ ∗
x̂ = λx + (1 − λ)x ≥ λ0 + (1 − λ)0 = 0, and
∗∗ ∗
Ax̂ = λAx + (1 − λ)Ax ≤ λ(b + δej ) + (1 − λ)(b − εej )
ε
= b + λ(δ + ε)ej − εej ≤ b + δ+ε (δ + ε)ej − εej = b.
Therefore, x̂ is a feasible solution of the original LO-model. But x̂ has a higher objective value
than x∗ , contradicting the optimality of x∗ . So, z ∗ (δ) ≤ z ∗ (0). It follows that z ∗ (δ) = z ∗ (0),
and, therefore, the shadow price of the constraint is zero.
A fact similar to Theorem 5.3.1 holds for nonnegativity and nonpositivity constraints:
δ = −4 δ = −3 δ = −1
6 6 6
4 4 4
2 2 2
0 2 4 6 0 2 4 6 0 2 4 6
6 6 6
4 4 4
2 2 2
0 2 4 6 0 2 4 6 0 2 4 6
Figure 5.5: Right hand side perturbations. The region marked with solid lines is the perturbed feasible
region; the region marked with dotted lines is the unperturbed feasible region.
produce in one year. This number may vary in practice, for instance because the machine is
out of order during a certain period, or the capacity is increased because employees work in
overtime. Perturbation of the right hand side by a factor δ transforms this constraint into:
x1 + x2 ≤ 9 + δ.
Recall that the objective in Model Dovetail is max 3x1 + 2x2 . In Figure 5.5, the feasible
regions and the corresponding optimal vertices are drawn for δ = −4, −3, −1, 0, 1, 2.
The figure shows that, as δ varies, the feasible region changes. In fact, the feasible region
becomes larger (to be precise: does not become smaller) as the value of δ is increased. Since
the feasible region changes, the optimal vertex also varies as the value of δ is varied.
Table 5.2 lists several optimal solutions for different values of δ . The perturbation function
z ∗ (δ) of x1 + x2 ≤ 9 is depicted in Figure 5.6. As in Figure 5.4, we again have a piecewise
linear function. In Theorem 5.4.2 it is shown that perturbation functions for right hand side
parameters are in fact piecewise linear functions.
We are interested in the rate of increase of z = 3x1 + 2x2 in the neighborhood of the
initial situation δ = 0. For δ = 0, this rate of increase is 1 21 . So, when changing the right
hand side by a factor δ , the objective value changes by 1 12 δ , i.e., ∆z ∗ = 1 12 δ . In Section
4.2, it was calculated that the dual variable y1 (which corresponds to x1 + x2 ≤ 9) has the
optimal value 1 12 , which is exactly equal to the slope of the function in Figure 5.6 in δ = 0.
This fact holds in general and is formulated in Theorem 5.3.3.
5 . 3 . P e r t u r b i n g r i g h t h a n d s i d e va lu e s ( n o n d e g e n e rat e ca s e ) 209
∗ ∗ ∗ ∗ T ∗ ∗
δ 9+δ x1 x2 ∆x1 ∆x2 c x z (δ)
−4 5 5 0 1
−4 12 15 24
2
infeasible
−3 6 6 0 1 12 −4 12 18 20
−2 7 5 12 1 12 1 −3 19 12 16
−1 8 5 3 1
−1 12 21 12
2
8
0 9 4 12 4 12 0 0 22 12
4
1 10 4 6 − 12 1 12 24
2 11 4 6 − 12 1 12 24 −10 −8 −6 −4 −2 0 2 4 δ
Theorem 5.3.3 expresses the fact that the shadow price of a technology constraint equals the
optimal value of the corresponding complementary dual decision variable.
Proof of Theorem 5.3.3. Consider the feasible standard LO-model with slack variables:
x
max cT 0T A Im x = b, x, xs ≥ 0 ,
x s x s
m×n
with A ∈ R , Im the (m, m) identity matrix, x ∈ Rn , xs ∈ Rm , and b ∈ Rm . Let B be
a basis matrix, corresponding to the optimal solution x∗BI = B−1 b ≥ 0h, xi∗N I = 0. It follows
c
from Theorem 4.2.4 that y∗ = (B−1 )T cBI , with cBI the subvector of 0 corresponding to
n o
T T
B, is an optimal solution of the dual model min b y A y ≥ c, y ≥ 0 .
Now consider changing the right hand side value of the j ’th technology constraint (j ∈
{1, . . . , m}). This means replacing b by b + δej . Following (5.1) and (5.2), the feasible basic
solution corresponding to the basis matrix B remains optimal as long as:
−1
B (b + δej ) ≥ 0.
Note that the optimality condition (5.2) is unaffected by the change of b, and therefore does
not need to be taken into account. Because of the nondegeneracy, we have that all entries of
−1
B b are strictly positive, and hence B remains an optimal basis matrix as long as δ > 0 is small
enough. (In the case of degeneracy, small changes in the vector b may give rise to a different
optimal basis matrix; see Section 5.6.2).
The perturbation δej of b will influence the optimal solution. Because B remains an optimal
basis matrix, and therefore the same set of technology constraints and nonnegativity constraints
(corresponding to N I ) remain binding at the optimal vertex, we have that x∗N I remains equal to
210 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
∗ −1 −1
0, but xBI changes from B b to B (b + δej ). The corresponding increase of the objective
function satisfies:
∗ ∗ T −1 T −1 T −1
z (δ) − z (0) = cBI B (b + δej ) − cBI B b = cBI B δej
T
−1 T ∗ T ∗
= (B ) cBI δej = y δej = δyj ,
where y∗ = y1∗ . . . ym
∗ T
is an optimal dual solution. The equation z ∗ (δ) − z ∗ (0) = δyj∗
shows that yj∗ determines the sensitivity of the optimal objective value with respect to pertur-
bations of the vector b. Hence, we have that the shadow price of constraint j is given by:
∗ ∗ ∗
z (δ) − z (0) δyj ∗
= = yj .
δ δ
So, yj∗ is the shadow price of constraint j . This proves the theorem.
Recall from Section 4.6.1 that the optimal values of the dual variables may be read directly
from an optimal simplex tableau of a standard LO-model. Since these values correspond to
the shadow prices of the technology constraints, this means that the shadow prices can also
be read directly from an optimal simplex tableau.
Example 5.3.1. Consider again Model Dovetail (see Section 3.3). The objective vector in the
T
optimal simplex tableau is 0 0 −1 12 − 12 0 0 , and so the shadow prices of the four technology
The shadow price of the constraint x1 + x2 ≤ 9 of Model Dovetail is 1 12 . This fact holds
only in a restricted interval of δ (see Figure 5.6). The tolerance interval of δ , i.e., the (interval
of the) values of δ for which y1∗ = 1 12 , is determined as follows. In the model with slack
variables, the constraint x1 + x2 + x3 = 9 turns into the constraint x1 + x2 + x3 = 9 + δ .
Rewriting Model Dovetail from the perspective of the optimal feasible basic solution (see
Section 3.2) yields:
z ∗ (δ) = max −1 21 x3 − 12 x4 + 22 12 + 1 21 δ
s.t. 1 12 x3 − 12 x4 + x2 = 4 21 + 1 12 δ
− 12 x3 + 12 x4 + x1 = 4 21 − 12 δ
1 1
x − 2 x4
2 3
+ x5 = 2 21 + 12 δ
1 1
−1 2 x3 + 2 x4 + x6 = 1 21 − 1 12 δ
x1 , . . . , x6 ≥ 0.
So, whenever there is one additional unit of machine capacity available for the annual produc-
tion, the profit increases by 1 12 (×$1,000). At the optimal solution it holds that x3 = x4 = 0
and x1 , x2 , x5 , x6 ≥ 0. Hence,
4 12 + 1 12 δ ≥ 0 (from x2 ≥ 0),
4 21 − 12 δ ≥ 0 (from x1 ≥ 0),
2 21 + 12 δ ≥ 0 (from x5 ≥ 0),
1 21 − 1 12 δ ≥ 0 (from x6 ≥ 0).
5 . 3 . P e r t u r b i n g r i g h t h a n d s i d e va lu e s ( n o n d e g e n e rat e ca s e ) 211
As long as these inequalities hold, the optimal solution does not jump to another set of basic
variables, i.e., for these values of δ , the optimal feasible basic solutions all correspond to the
same basis matrix. From these four inequalities one can easily derive that:
−3 ≤ δ ≤ 1,
and so the tolerance interval is [−3, 1] = {δ | −3 ≤ δ ≤ 1}. Therefore, the shadow price
1 21 holds as long as the value of δ satisfies −3 ≤ δ ≤ 1. In other words, the constraint
x1 + x2 ≤ 9 + δ has shadow price 1 21 in Model Dovetail as long as −3 ≤ δ ≤ 1.
In the general case, the tolerance interval of the k ’th technology constraint (k = 1, . . . , m)
with perturbation factor δ is determined by solving the feasibility condition (5.1) and the
optimality condition
(5.2). Let
B be an optimal basis matrix of the standard LO-model
max cT x Ax ≤ b, x ≥ 0 . The feasibility condition and optimality condition of the
The optimality condition is not affected by the perturbation, and, hence, the current feasible
basic solution is optimal as long the following set of m inequalities is satisfied:
Hence, the β ’th column of B is equal to ek . This implies that Beβ = ek , which in
turn means that B−1 ek = eβ . Therefore, (5.7) reduces to xBI∗
+ δeβ ≥ 0. This, in
turn, reduces to the following inequality:
∗
δ ≥ −xn+k , (5.8)
∗
with xn+k the optimal value of the slack variable xn+k . So, the tolerance interval is
∗
[−xn+k , ∞).
I xn+k is a nonbasic variable. Then, the lower bound δmin and the upper bound δmax of
the tolerance interval for δ can be determined by two ratio tests, namely:
( )
(B−1 b)j −1
δmin = max − −1 (B ek )j > 0, j = 1, . . . , m , and
(B ek )j
( ) (5.9)
(B−1 b)j −1
δmax = min − −1 (B ek )j < 0, j = 1, . . . , m .
(B ek )j
Recall that, by definition, max(∅) = −∞ and min(∅) = +∞. The tolerance interval
is [δmin , δmax ].
212 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
The tolerance intervals derived above can be obtained directly from the optimal simplex
tableau. In the optimal simplex tableau, for δ = 0 and basis matrix B, the vector B−1 b is
the right hand side column, and B−1 ek is the column corresponding to the slack variable
xn+k of the k ’th technology constraint.
Example 5.3.2. Consider again the perturbed constraint x1 + x2 ≤ 9 + δ of Model Dovetail.
The optimal simplex tableau reads:
x1 x2 x3 x4 x5 x6 z
0 0 −1 12 − 21 0 0 −22 21
0 1 1 21 − 21 0 0 4 12
1 0 − 12 1
2
0 0 4 12
1
0 0 2
− 12 1 0 2 12
0 0 −1 12 2
1
0 1 1 21
Since the first constraint of the model is perturbed, we have that k = 1. Hence, the vector B−1 ek
is the column of the simplex tableau corresponding to x3 , which is a nonbasic variable at the optimal
feasible basic solution. Therefore, (5.9) yields that:
( )
4 12 2 21
δmin = max − 1 , ?, − 1 , ? = max{−3, ?, −5, ?} = −3,
12 2
( )
1 1
42 12
δmax = min ?, 1 , ?, 1 = min{?, 9, ?, 1} = 1.
2
12
Hence, the tolerance interval for the perturbation of the first constraint is [−3, 1]. Next, consider a
perturbation of the constraint x1 ≤ 7. Because this is the third constraint of the model, we have that
k = 3. The slack variable corresponding to this constraint is x5 . Since x∗5 = 2 12 , it follows from
(5.8) that the tolerance interval for a perturbation of the constraint x1 ≤ 7 is [−2 12 , ∞).
Note that a constraint can be nonbinding at some optimal solutions, and binding at other
optimal solutions.
Example 5.3.3. We will illustrate case 2 above one more time by considering Model Dovetail. The
constraint x1 ≤ 7 is nonbinding at the optimal solution, and the corresponding dual variable y3 has
value zero (see Figure 4.1). Perturbing the right hand side by δ , this constraint becomes x1 ≤ 7 + δ ,
or after adding the slack variable x5 ,
x1 + (x5 − δ) = 7.
Note that this equation is the same as x1 + x5 = 7 except that x5 is replaced by x5 − δ . We will
determine (the interval of ) the values of δ for which the constraint x1 ≤ 7 + δ remains nonbinding,
i.e., we will determine the tolerance interval of x1 ≤ 7. The constraints of Model Dovetail rewritten
5 . 3 . P e r t u r b i n g r i g h t h a n d s i d e va lu e s ( n o n d e g e n e rat e ca s e ) 213
1 12 x3 − 1
x
2 4
+ x2 = 4 21
− 12 x3 + 1
x
2 4
+ x1 = 4 21
1 1
x
2 3
− x
2 4
+ (x5 − δ) = 2 21
1 1
−1 2 x3 + x
2 4
+ x6 = 1 21 .
The objective function in terms of x3 and x4 reads:
z = 22 21 − 1 12 x3 − 12 x4 .
x2 = 4 21 , x1 = 4 12 , x5 = 2 12 + δ, x6 = 1 12 .
Moreover, x5 ≥ 0 implies that δ ≥ −2 21 . So the tolerance interval is δ −2 12 ≤ δ . The
shadow price of the constraint x1 ≤ 7 (see Section 1.1) is 0 and remains 0 as long as the perturbation
of the capacity is not less than −2 21 , e.g., as long as the right hand side of x1 ≤ 7 + δ is not less than
4 21 . As soon as δ < −2 12 , then this constraint becomes binding, and therefore the optimal solution
changes; compare also Section 1.1.2.
Proof of Theorem 5.3.4. Let k ∈ {1, . . . , n}. Suppose that we perturb the nonnegativity
constraint xk ≥ 0, i.e., we replace it by xk ≥ δ , for δ small enough. The set of nonnegativity
constraints then becomes x ≥ δek , with ek the k’th unit vector in Rn . Substituting w =
x − δek , the optimal objective value is transformed into:
n o
∗ T T
z (δ) = δc ek + max c w Aw ≤ b − δAek , w ≥ 0 . (5.10)
214 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
In this new model, the right hand side values of the constraints are perturbed by the vector
−δAek . Considering the dual of the LO-model in (5.10), we have that:
n o
∗ T T T
z (δ) = δc ek + min (b − δAek ) y A y ≥ c, y ≥ 0
T T ∗ T T T ∗ T ∗
= δc ek + (b − δAek ) y = δc ek − δek A y + b y ,
with y∗ the optimal solution of the dual model. Let ys∗ be the corresponding vector of values
of the slack variables. From AT y∗ − ys∗ = c, it follows that (recall that z ∗ (0) = cT x∗ = bT y∗ ):
∗ ∗ T T ∗ T T T ∗
z (δ) − z (0) = δc ek − δek (c + ys ) = δc ek − δek c − δek ys
T ∗ ∗ ∗
= −δek ys = −δ(ys )k = −δym+k .
∗ ∗
z (δ)−z (0) ∗
So, we have that δ = −ym+k . Hence, the shadow price of the constraint xk ≥ 0 is
equal to the negative of the optimal value of the dual variable ym+k .
It is left to the reader to formulate a similar theorem for the nonstandard case (the signs of
the dual variables need special attention).
Similar to the case of technology constraints, the shadow price and the tolerance interval of
a nonnegativity constraint can be read directly from the optimal simplex tableau. It follows
from Theorem 4.5.1 that the optimal objective coefficients corresponding to the primal
decision variables are precisely the negatives
of the shadow prices. Let B be an optimal
basis matrix of the LO-model max c x Ax ≤ b, x ≥ 0 . It is shown in the proof of
T
Theorem 5.3.4 that the perturbed model is equivalent to model (5.10). This means that B
is an optimal basis matrix of the perturbed model if and only if it is an optimal basis matrix
of (5.10). The feasibility condition (5.1) and the optimality condition (5.2) for model (5.10)
are:
The feasibility is affected by the perturbation; the optimality condition remains unaffected.
We again distinguish two cases:
I xk is a basic decision variable, i.e., k = BIβ for some β ∈ {1, . . . , m}, and k ∈
{1, . . . , n}. Since xk is a basic variable, B−1 Aek (= (B−1 A)?,k ) is equal to the unit
vector eβ . Hence, B−1 (b − δAek ) ≥ 0 is equivalent to δ ≤ (B−1 b)β . Therefore,
the tolerance interval of the nonnegativity constraint xk ≥ 0 satisfies:
Note that the value of (B−1 b)β can be found in the column of right hand side values
in the optimal simplex tableau corresponding to B.
I xk is a nonbasic decision variable, i.e., k = N Iα for some α ∈ {1, . . . , m}, and
k ∈ {1, . . . , n}. Using the fact that B−1 Aek = (B−1 N)?,α , the feasibility condition
is equivalent to B−1 b ≥ δ(B−1 N)?,α . Note that the right hand side of this inequality,
δ(B−1 N)?,α , is δ times the column corresponding to xk in the optimal simplex tableau.
5 . 3 . P e r t u r b i n g r i g h t h a n d s i d e va lu e s ( n o n d e g e n e rat e ca s e ) 215
∗
z (δ)
infeasible
20
10
−2 −1 0 1 2 3 4 5δ
Figure 5.7: Perturbation function for the nonnegativity constraint x3 ≥ 0 of the model in Example 5.3.4.
Let δmin and δmax be the lower and upper bounds, respectively, of the tolerance interval
of the right hand side of xk ≥ 0. Similar to the formulas in Section 5.3, we find that:
( )
(B−1 b)j −1
δmin = max (B N)j,α < 0, j = 1, . . . , m , and
(B−1 N)j,α
( )
(B−1 b)j −1
δmax = min (B N)j,α > 0, j = 1, . . . , m .
(B−1 N)j,α
x1 x2 x3 x4 x5 −z
0 0 −1 12 − 12 −2 −18 12
1 1
1 0 2 2
0 2 12
0 1 1 21 − 21 1 5 21
The first three entries in the objective coefficient row correspond to the dual slack variables y3 , y4 ,
∗ y5∗, and
and the last
two to the dual decision variables y1 , y2 . Hence, following Theorem 4.2.4,
y1 y2 y3∗ y4∗ y5∗ = 21 2 0 0 1 12 is an optimal dual solution. The shadow prices of the two
constraints are 12 and 2, respectively. The shadow prices of the three nonnegativity constraints are
0, 0, and 1 21 , respectively. The tolerance interval [δmin , δmax ] for the right hand side value of the
216 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
f (δ)
f (δ2 )
f (λδ1 +(1−λ)δ2 )
λf (δ1 )+(1−λ)f (δ2 )
f (δ1 )
δ1 λδ1 +(1−λ)δ2 δ2 δ
Figure 5.7 depicts the perturbation function for the nonnegativity constraint x3 ≥ 0. Note that
x1 ≥ 0, x2 ≥ 0, and x1 + x2 + 2x3 ≤ 8 imply that x3 ≤ 4, and hence the model is infeasible
for δ > 4.
for each λ with 0 ≤ λ ≤ 1; see Figure 5.8. It can be easily shown that if the function
f (δ) is convex, then −f (δ) is concave. For a discussion of ‘convexity’ and ‘concavity’, see
Appendix D.
We prove these observations in Theorem 5.4.1 and Theorem 5.4.2.
5 . 4 . P i e c e w i s e l i n e a r i t y o f p e r t u r b at i o n f u n c t i o n s 217
Theorem 5.4.1. (Piecewise linearity and convexity of objective coefficient perturbation func-
tions)
Consider perturbing the objective coefficient of the decision variable xi of an LO-
model. The following assertions hold:
(i) The perturbation function is a piecewise linear function; the slope of any of its
line segments is equal to the optimal value of xi in the model in which the
objective coefficient is perturbed with a factor δ .
(ii) The perturbation function is convex for maximizing LO-models and concave for
minimizing LO-models.
Proof of Theorem 5.4.1. We prove the theorem for a maximizing LO-model; the case of a
minimizing LO-model is left to the reader. First note that any LO-model can be transformed
into a model of the form: n o
T
max c x Ax = b, x ≥ 0 ,
by adding appropriate slack variables; see Section 1.3. Consider the perturbation function z ∗ (δ)
of the objective coefficient of xi , i.e.,
n o
∗ T
z (δ) = max c x + δxi Ax = b, x ≥ 0 .
We will first show that the function z ∗ (δ) is convex, and that the set of all perturbation factors
δ for which the LO-model has a finite-valued optimal solution is a connected interval of R.
Assume that the model has finite optimal solutions for δ = δ1 and δ = δ2 , with δ1 < δ2 .
Take any λ with 0 ≤ λ ≤ 1, and let δ = λδ1 + (1 − λ)δ2 . Clearly, cT x + (λδ1 + (1 −
T T
λ)δ2 )xi = λ(c x + δ1 xi ) + (1 − λ)(c x + δ2 xi ). Hence, for any feasible x it holds that
T ∗ ∗
c x + (λδ1 + (1 − λ)δ2 )xi ≤ λz (δ1 ) + (1 − λ)z (δ2 ), and so:
∗ ∗ ∗ ∗
z (δ) = z (λδ1 + (1 − λ)δ2 ) ≤ λz (δ1 ) + (1 − λ)z (δ2) .
This proves that z ∗ (δ) is a convex function for δ1 ≤ δ ≤ δ2 . Moreover, since both z ∗ (δ1 ) and
∗ ∗
z (δ2 ) are finite, z (δ) is finite as well. Hence, the set of all values of δ for which the model has
a finite optimal solution is in fact a connected interval of R. Let I be this interval. According
to Theorem 2.2.4 and Theorem 2.1.5, the model also has an optimal feasible basic solution for
each δ ∈ I .
We will now prove that, if for both δ1 and δ2 the matrix B is an optimal basis matrix, then B is
optimal basis matrix for all δ ∈ [δ1 , δ2 ]. Let x∗ with x∗BI = B−1 b be the optimal feasible basic
solution for both δ1 and δ2 . Take any λ with 0 ≤ λ ≤ 1, and let δ = λδ1 + (1 − λ)δ2 . Then,
∗ T ∗ ∗ T ∗ ∗ T ∗ ∗ ∗
z (δ) ≥ c x + (λδ1 + (1 − λ)δ2 )xi = λ(c x + δ1 xi ) + (1 − λ)(c x + δ2 xi ) = λz (δ1 ) +
∗ ∗ ∗ ∗
(1 − λ)z (δ2 ). On the other hand, we know already that z (δ) ≤ λz (δ1 ) + (1 − λ)z (δ2 ).
Hence, z ∗ (δ) = cT x∗ + (λδ1 + (1 − λ)δ2 )x∗i . This shows that x∗ is an optimal solution for
each δ ∈ [δ1 , δ2 ].
218 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
A consequence of the above discussions is that for any δ ∈ I , it holds that, if B is an optimal
basis matrix for this δ , then there is a connected subinterval in I for which B is optimal. Since
there are only finitely many basis matrices in A, the interval I can be partitioned into a finite
number of subintervals on which a fixed basis matrix is optimal. For any of these subintervals,
there is a feasible basic solution that is optimal for each δ of that subinterval. Note that it may
happen that a feasible basic solution is optimal for two (adjacent) subintervals.
Let B be any basis matrix such that x∗ is a feasible basic solution with x∗BI = B−1 b. Let
∗ T ∗ ∗
[δ1 , δ2 ] be an interval on which B is optimal. Then, z (δ) = c x + δxi , which is a linear
function of δ on [δ1 , δ2 ]. Hence, z ∗ (δ) is a piecewise linear function on the interval I .
The slope of the function z ∗ (δ) = cT x + δxi∗ is xi∗ , which is in fact the optimal value of the
variable xi of which the objective coefficient is perturbed with a factor δ .
The following theorem states a similar fact about right hand side value perturbations.
Theorem 5.4.2. (Piecewise linearity and concavity of right hand side value perturbation func-
tions)
For any constraint (including equality, nonnegativity, and nonpositivity constraints) of
an LO-model, the following assertions hold.
(i) The perturbation function is a piecewise linear function; the slope of any of its
line segments is equal to the shadow price of the corresponding constraint of
which the right hand side is perturbed with a factor corresponding to an interior
point of that line segment.
(ii) The perturbation function is concave for maximizing LO-models, and convex
for minimizing LO-models.
Proof of Theorem 5.4.2. We prove the theorem for a maximizing LO-model; the case of
a minimizing LO-model is left to the reader. Let z ∗ (δ) be the optimal objective value of an
LO-model with the perturbation factor δ in the right hand side of constraint j . According
to Theorem 4.2.4 (see also Theorem 5.7.1), z ∗ (δ) is also the optimal objective value of the
corresponding dual LO-model. This dual model has δ in the j ’th objective coefficient. Since
the dual model is a minimizing LO-model, it follows from Theorem 5.4.1 that z ∗ (δ) is in fact
a concave piecewise linear function.
n o
Let z ∗ (δ) = max cT x Ax = b + δej , with ej the j ’th unit vector in Rm . Let x∗ be an
optimal feasible basic solution of the perturbed model with optimal objective value z ∗ (δ), and
let B be the corresponding basis matrix. Let [δ1 , δ2 ] be an interval on which B is optimal.
Then for each ε ∈ [δ1 , δ2 ], it holds that z ∗ (δ) = cT x∗ = (b + δej )T y∗ = bT y∗ + δyj∗ with
5 . 5 . P e r t u r b at i o n o f t h e t e c h n o l o g y m at r i x 219
∗ −1 T ∗
y = (B ) cBI (see Theorem 4.2.4 and Theorem 5.7.1). Therefore the slope of z (δ) on
∗
[δ1 , δ2 ] is equal to yj , which is in fact the optimal value of the corresponding dual variable.
Note that the perturbation function for an objective coefficient corresponding to a non-
negative variable xi is nondecreasing, since the slope of this function on each subinterval is
equal to the optimal value x∗i of xi , which is nonnegative. The perturbation function for a
‘≤’ constraint in a maximizing model is nondecreasing, because the slope of this function is
equal to the optimal value of the complementary dual variable, which is nonnegative. The
perturbation function for an objective coefficient in a maximizing (minimizing) LO-model
is convex (concave). See also Exercise 5.8.20.
when the labor or the materials requirements change due to a change in technology. The
following example elaborates on this case.
Example 5.5.1. Suppose that the company Dovetail decides to use less wood for the boxes. So the
entries in the second row of the technology matrix are perturbed by, say, t1 and t2 , respectively. The
perturbed model of Model Dovetail becomes:
9 − 9t2 ≥ 0,
9 + 9t1 ≥ 0,
5 + 7t1 + 2t2 ≥ 0, and
3 − 3t1 − 6t2 ≥ 0.
1 + t2 ≥ 0, and 3 + t1 ≥ 0.
In Figure 5.10 the tolerance region, determined by the above five inequalities, is drawn (the shaded
T T
region in this figure). The tolerance region, which is determined by the four vertices 0 0 , − 57 0 ,
T T
− 7 −1 , and 0 −1 , denotes the feasible values in the case t1 ≤ 0 and t2 ≤ 0 (i.e., when
3
∗
z (δ)
t2
30
27 2
20 1
12
10 −2 −1 1 2 3 t1
−1
−10 −5 0 5 10 15 20 δ
Note that we have not developed a general theory for determining tolerance intervals of
technology matrix entries. So we need to use computer packages and calculate a (sometimes
long) sequence of optimal solutions.
0
v
x3
x2
v3
0
v2
v1 u
x1
optimal solution of the dual model), the entries of the vector yBI c (∈ Rn ) of dual basic
variables and the entries of the vector yN I c (∈ Rm ) of dual nonbasic variables satisfy:
−1
yBI c = B c ≥ 0, and yN I c = 0.
Here, BI c and N I c are the index sets corresponding to B and N, where N is the comple-
ment of B in AT −In (see, e.g., Section 2.2.3 and Theorem 4.5.1).
vertices and feasible basic solutions. Note that an LO-model may have both degenerate and
nondegenerate optimal solutions; see Exercise 5.8.8.
(b) The logical reversion of (a) is: if there exists at least one nondegenerate optimal dual basic
solution, then the optimal primal solution is unique. The uniqueness in (b) follows now by
interchanging the terms ‘primal’ and ‘dual’ in this logical reversion. So we only need to show
the nondegeneracy of the optimal
dual solution. Suppose, to the contrary, that the optimal dual
∗
xBI
solution is degenerate. Let ∗ be the unique nondegenerate optimal primal basic solution
xN I
∗
y c
with respect to the matrix B in A Im . Let B be the dual complement of B, and let ∗BI c
yN I
be the corresponding optimal dual basic solution. Since the latter is degenerate, there exists
an index k ∈ BI c such that yk∗ = 0. Let xN Iα be the complementary dual variable of yk
(see Section 4.3). Let A Im ≡ B N , and let eα be the α’th unit vector in Rn . We will
x̂
h i
show that the vector x̂ BI with x̂BI = x∗BI − δB−1 Neα and x̂N I = δeα satisfies Ax̂ ≤ b,
NI
∗
x̂ ≥ 0 for small enough δ > 0. Since xBI > 0 (because of the nondegeneracy), it follows that
224 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
(c) The degeneracy follows from (a), and the uniqueness from the logical reversion of (a) with
‘primal’ and ‘dual’ interchanged.
∗ ∗
x y c
(d) Let ∗BI be an optimal primal basic solution and let ∗BI c be the corresponding optimal
xN I yN I
dual basic solution. Since the optimal primal basic solution is degenerate, it follows that x∗i = 0
for some i ∈ BI . Let yj be the complementary dual variable of xi . Since xi is a basic primal
variable, yj is a nonbasic dual variable, and therefore we have that yj∗ = 0. On the other hand,
Theorem 4.3.3 implies that there exists a pair of optimal (not necessarily basic) primal and dual
solutions x and y for which either xi > 0 or yj > 0. Since the optimal primal solution is
unique and x∗i = 0, it follows that there is an optimal dual solution y∗ for which yj∗ > 0.
Hence, the dual model has multiple optimal solutions.
In case (a) in Theorem 5.6.1, there is either a unique optimal dual solution or there are
multiple optimal dual solutions; in case (d), the optimal dual solution is either degenerate
or nondegenerate. This can be illustrated by means of the following two examples. We first
give an example concerning (d).
Example 5.6.1. Consider the following primal LO-model and its dual model:
max x1 − x2 min y1 + y2
s.t. x1 − x2 + x3 ≤ 1 s.t. y1 + y 2 ≥ 1
x1 ≤1 −y1 ≥ −1
x1 , x2 , x3 ≥ 0, y1 ≥ 0
y1 , y2 ≥ 0.
T
The primal model has the unique degenerate optimal solution 1 0 0 0 0 (the last two zeros of
this vector are the values of the slack variables); see Figure 5.12. Using the graphical solution method
(see Section 1.1.2), one can easily verify that the dual model has multiple optimal solutions, namely all
T T T
points of the line segment with end points 1 0 and 0 1 . The end vertex 1 0 is degenerate,
T
while the vertex 0 1 is nondegenerate.
x3
0
" #
0
1
0
x2
1
" #
0
x1 0
T
Figure 5.12: Degenerate vertex 1 0 0 .
Example 5.6.2. Consider the model with the same feasible region as in Example 5.6.1, but with
objective max x1 . From Figure 5.12, it follows that this model has multiple optimal solutions: all
points in the ‘front’ face of the feasible region as drawn in Figure 5.12 are optimal. Mathematically, this
T T
set can be described as the cone with apex 1 0 0 and extreme rays the halflines through 1 1 0
T
and 1 1 1 . (See Appendix D for the definitions of the concepts in this sentence.) Note that the
T
optimal solution corresponding to the vertex 1 0 0 is degenerate. The dual of this model reads:
min y1 + y2
s.t. y1 + y2 ≥ 1
−y1 ≥0
y1 ≥0
y1 , y2 ≥ 0.
T
Obviously, this model has a unique optimal solution, namely 0 1 0 0 0 which is degenerate (the
x3 x3
v v
00
0
v
v
x2 x2
v3 v3
0 0 0
v2
v2 v2
0
v1 v1 v1
x1 x1
Figure 5.13: Degenerate optimal vertex v and a Figure 5.14: Degenerate optimal vertex v and a
positive perturbation of the negative perturbation of vv1 v2 .
redundant constraint vv1 v2 .
Example 5.6.3. Consider again the situation of Example 5.6.5. The LO-model corresponding
to Figure 5.11 in Section 5.6.1 is:
max x3
s.t. 3x1 + 2x3 ≤ 9
−3x1 + x3 ≤ 0
−3x2 + 2x3 ≤ 0
3x1 + 3x2 + x3 ≤ 12
x1 , x2 , x3 ≥ 0.
The feasible region of this model is depicted in Figure 5.13 and in Figure 5.14, namely the region
0vv1 v2 v3 . The optimal degenerate vertex is v. In Figure 5.13, the constraint 3x1 + 2x3 ≤ 9 is
perturbed by a positive factor, and in Figure 5.14 by a negative factor.
In the case of Figure 5.13, the constraint corresponding to vv1 v2 has moved ‘out of’ the feasible
region. This change enlarges the feasible region, while v remains the optimal vertex. Note that, once
the constraint has moved by a distance ε > 0, the optimal vertex v becomes nondegenerate.
On the other hand, when moving the constraint through vv1 v2 ‘into’ the feasible region, the feasible
region shrinks, and the original optimal vertex v is cut off. So, the objective value decreases; see Figure
5.14.
In Figure 5.15 the perturbation function for the constraint 3x1 + 2x3 ≤ 9 is depicted. It shows the
optimal objective value z ∗ as a function of δ , when in the above model the first constraint is replaced by
3x1 + 2x3 ≤ 9 + δ . For δ < −9 the model is infeasible. Note that z ∗ (δ) = 3 + 31 δ if δ ≤ 0,
and z ∗ (δ) = 3 if δ ≥ 0. Remarkable for this perturbation function is that its graph shows a ‘kink’
for δ = 0. Such kinks can only occur when the corresponding vertex is degenerate; see Theorem 5.6.2.
Compare this, for example, to Figure 5.6 and Figure 5.7. The reader may check that, at each kink
point in these figures, the corresponding optimal solution is degenerate.
5 . 6 . S e n s i t i v i t y a na ly s i s f o r t h e d e g e n e rat e ca s e 227
∗
z (δ)
3
2
infeasible 1
−9 −5 0 5 δ
Figure 5.15: Perturbation function for ‘v1 v2 v’ in Figure 5.13 and Figure 5.14.
Recall that, assuming that the optimal solutions are nondegenerate, the shadow price of
a constraint in a standard LO-model is precisely the slope of the perturbation function at
δ = 0. We have seen that shadow prices correspond to the optimal solution of the dual
model. The fact that the optimal dual solution is unique is guaranteed by the fact that the
optimal primal solution is assumed to be nondegenerate; see Theorem 5.6.1. As soon as
the optimal primal solution is degenerate, then (again see Theorem 5.6.1) there may be
multiple optimal dual solutions, and so shadow prices cannot, in general, be directly related
to optimal dual solutions.
Based on these considerations, we introduce left and right shadow prices. The right shadow
price of a constraint is the rate at which the objective value increases by positive perturbations
of the right hand side of that constraint. Similarly, the left shadow price of a constraint is the
rate at which the objective value decreases by negative perturbations of its right hand side.
Formally, we have that:
z ∗ (δ) − z ∗ (0)
left shadow price of the constraint = lim , and
δ↑0
∗
δ ∗
z (δ) − z (0)
right shadow price of the constraint = lim ,
δ↓0 δ
where z ∗ (δ) is the optimal objective value of the model with the constraint perturbed by a
factor δ . In other words, the left shadow price (right shadow price) is the slope of the line
segment of z ∗ (δ) that lies ‘left’ (‘right’) of δ = 0. Compare in this respect the definition
(5.5) of the (regular) shadow price of a constraint.
For example, the left shadow price of the constraint 3x1 +2x2 ≤ 9 of the model in Example
5.6.3 is 31 (see Figure 5.15). On the other hand, the right shadow price is zero. This example
also shows that a binding constraint may have a zero left or right shadow price. In fact, it
may even happen that all binding constraints have zero right shadow prices; it is left to the
reader to construct such an example.
In Theorem 5.6.2, we show that the left and right shadow
prices can be determined by
solving the dual model. In fact, the expression max yj y is an optimal dual solution
in the theorem may be computed by determining the optimal objective value z ∗ of the
dual model, and then solving the dual model with the additional constraint bT y = z ∗ .
228 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
∗
z (δ)
r
slope yj
`
slope yj
∗ ∗
z (δ0 )−z (0)
δ1 δ2 ... δp δ0 δ
The theorem applies to the standard primal LO-model. It is left to the reader to make
the usual adaptations for nonstandard LO-models, and to formulate similar assertions for
nonnegativity and nonpositivity constraints; see Exercise 5.8.21.
Proof of Theorem 5.6.2. Take any j ∈ {1, . . . , m}. We only prove the expression in the
theorem for yjr , because the proof for yjl is similar. Perturbing the right hand side value of
constraint j is equivalent to perturbing the objective coefficient of yj in the dual model. So,
we have that: n o
∗ T T
z (δ) = min b y + δyj A y ≥ c, y ≥ 0 . (5.11)
Recall from Section 5.2 that every optimal feasible solution of the dual model (corresponding
to δ = 0) has a tolerance interval with respect to the perturbation of the objective coefficient
of yj . Let y∗ be an optimal dual solution with tolerance interval [0, δ0 ] such that δ0 > 0. Such
a solution exists, because if every optimal dual solution has a tolerance interval [0, 0], then the
dual model cannot have an optimal solution for any sufficiently small δ > 0, contrary to the
assumptions of the theorem.
5 . 6 . S e n s i t i v i t y a na ly s i s f o r t h e d e g e n e rat e ca s e 229
Recall from Section 5.2 that for any value of δ in the tolerance interval [0, δ0 ], the optimal
objective value of the perturbed dual model (5.11) is equal to bT y∗ + δyj∗ , i.e., we have that
∗ T ∗ ∗ r
z (δ) = b y + δyj for δ ∈ [0, δ0 ]. Hence, the right shadow price yj of constraint j is equal
∗
to yj .
It remains to show that yj∗ = min yj y is an optimal dual solution . Suppose for a contra-
diction that this is not the case, i.e., suppose that there exists a dual optimal solution ỹ∗ such
that ỹj∗ < yj∗ . Since both y∗ and ỹ∗ are optimal dual solutions, their corresponding objective
values are equal. Moreover, the objective values corresponding to these dual solutions in the
perturbed model (5.11) with δ ∈ (0, δ0 ] satisfy:
T ∗ T ∗ ∗ T ∗ ∗ T ∗
(b + δej ) ỹ = b ỹ + δ ỹj < b y + δyj = (b + δej ) y .
Therefore, ỹ∗ has a smaller corresponding objective value than y∗ . It follows that y∗ cannot
be an optimal dual solution, a contradiction.
of the corresponding dual decision variables, and the shadow prices of the nonnegativity constraints are
the negatives of the values of the corresponding dual slack variables. The last two columns contain the
upper and lower bounds of the respective tolerance intervals.
In Figure 5.17 the perturbation function for the first constraint is drawn. This perturbation function was
determined using an LO-package by calculating the optimal solutions for a number of right hand side
values of constraint (1), and using the bounds of the tolerance intervals which are available as output of
the package.
Based on the fact that there are two decision variables and three slack variables with optimal value
zero, it follows that one basic variable must have value zero, and so this optimal solution is degenerate.
Moreover, this solution is unique, because otherwise the dual solution would have been degenerate (see
Theorem 5.6.1(d)). Since the optimal values of x1 , x3 , and x8 are nonzero, the optimal solution
corresponds to the feasible basic solutions with BI = {1, 3, 8, i}, where i = 2, 4, 5, 6, 7. As it
turns out, only the feasible basic solutions with BI = {1, 3, 6, 8} and BI = {1, 3, 7, 8} are
optimal (i.e., have nonpositive objective coefficients). The corresponding simplex tableaus are:
I BI = {1, 3, 6, 8}.
x1 x2 x3 x4 x5 x6 x7 x8 −z
0 −1 0 −3 −1 0 −1 0 −27
1 1 0 −1 2 0 −1 0 9
0 0 1 2 −1 0 1 0 9
0 −3 0 −10 3 1 −4 0 0
0 5 0 −3 5 0 −3 1 32
The corresponding optimal dual solution reads: y1∗ = 1, y2∗ = 0, y3∗ = 1, and y4∗ = 0.
I BI = {1, 3, 7, 8}.
x1 x2 x3 x4 x5 x6 x7 x8 −z
0 − 41 0 − 21 −1 43 − 14 0 0 −27
1 1 43 0 1 21 1 14 − 14 0 0 9
0 − 43 1 − 21 − 14 1
4
0 0 9
3
0 4
0 2 21 − 34 −4 1
1 0 0
0 7 41 0 4 21 2 34 − 34 0 1 32
The corresponding optimal dual solution reads: y1∗ = 1 34 , y2∗ = − 14 , y3∗ = 0, and y4∗ = 0.
Note that the solution shown in Table 5.3 is the feasible basic solution corresponding to BI =
{1, 3, 6, 8}. Which feasible basic solution is found by a computer package depends on the imple-
mentation details of the package: it could well be that a different LO-package returns the other feasible
basic solution, i.e., the one corresponding to BI = {1, 3, 7, 8}.
5 . 6 . S e n s i t i v i t y a na ly s i s f o r t h e d e g e n e rat e ca s e 231
∗ ∗
z (δ) z (δ)
(0)
30 (0) 28
27 (1) (− 14 ) 27 (−1)
(1.75) 20.6
(−2.83)
15.8
14.4
(3)
infeasible
infeasible
−12 −7.2 0 3 δ −4 0 6.4 8.2 δ
Figure 5.17: Perturbation function for constraint Figure 5.18: Perturbation function for
(3) of model (5.12). The numbers in nonnegativity constraint x2 ≥ 0 of
parentheses are the slopes. model (5.12). The numbers in
parentheses are the slopes.
This can fact can be seen in Figure 5.17, because the line segment left of δ = 0 has slope 1 34 , and the
line segment right of δ = 0 has slope 1.
vv0 is the intersection of the planes through 0v1 v and v2 v3 v, and intersects the x1 -axis in u.
T
The optimal vertex of the new region is v0 = 0 2 32 4 . The optimal objective value changes from
z ∗ = 3 to z ∗ = 4.
232 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
Recall that a constraint is redundant with respect to the feasible region if removing the constraint
from the model does not change the feasible region. Removing a nonredundant (with
respect to the feasible region) constraint from the model enlarges the feasible region. If
the optimal objective value remains unchanged, then more (extreme) points of the feasible
region may become optimal. If the optimal objective value changes, a new set of points
becomes optimal, or the model becomes unbounded.
A constraint is called redundant with respect to the optimal point v, if v is also an optimal point of
the feasible region after removing the constraint from the model. Note that if a constraint
is redundant with respect to the feasible region, then it is also redundant with respect to
any optimal point. The converse, however, is not true: it may happen that a constraint is
redundant with respect to an optimal point v, while it is not redundant with respect to the
feasible region. Note that, in the latter case, removing the constraint may also change the
set of optimal points. However, the new set of optimal points includes v.
Example 5.6.6. Consider again the feasible region of Example 5.6.5. The constraint corresponding
to the face vv1 v2 is redundant with respect to v, because its removal does not change the optimality
of this vertex. The constraint is, however, not redundant with respect to the feasible region, because
removing it changes the feasible region into a pyramid with base 0uv3 and top v. In fact, none of the
constraints are redundant with respect to the feasible region.
From this example, we may also conclude that, in Rn , if there are more than n binding
constraints at an optimal vertex (i.e., if the optimal vertex is degenerate), then this does
not mean that all of these n constraints are redundant with respect to the optimal vertex.
However, in Theorem 5.6.3, we show that in the case of a degenerate optimal feasible
basic solution, corresponding to vertex v, say, there is at least one binding constraint that is
redundant with respect to v. Let k ≥ 0 be an integer. A vertex v of the feasible region of
an LO-model is called k -degenerate if the number of binding constraints at v is n + k . For
k = 0, v is nondegenerate, and so ‘0-degenerate’ is another way of saying ‘nondegenerate’.
Similarly, an optimal feasible basic solution is called k -degenerate if and only if the number
of zero basic variables is k . Note that the vertex v of the feasible region 0vv1 v2 v3 in Figure
5.14 is 1-degenerate, and that the vertex 0 is 2-degenerate. There is an interesting duality
relationship in this respect: the dimension of the dual optimal face is equal to degeneracy
degree of the primal optimal face; see Tijssen and Sierksma (1998).
∗
x
Proof of Theorem 5.6.3. Let k ≥ 0, let ∗BI be a k-degenerate optimal basic solution, and
xN I
let v be the corresponding optimal vertex. Then, x∗N I = 0, and x∗BI contains k zero entries.
Let S ⊆ BI with |S| = k be the set of indices of the k zero entries, i.e., xS = 0. The binding
constraints at v can be partitioned into two subsets: the n binding constraints corresponding
to N I , namely xN I ≥ 0, and the k binding
h constraints corresponding to S (⊆ BI ), namely
x̂BI
i
xS ≥ 0. The objective value of any point x̂ is (see Section 3.2):
NI
T −1 T T −1
cBI B b + (cN I − cBI B N)x̂N I .
∗
x −1
Since ∗BI is an optimal basic solution, we have that cTN I − cBI
T
B N ≤ 0. It follows
xN I
x̂
h i
that the objective value of any point x̂ BI that satisfies x̂N I ≥ 0 is at most the objective
NI
value of v (which corresponds to xN I = 0). Therefore, if we remove the k hyperplanes
defined by xS ≥ 0 (while retaining the ones defined by xN I = 0), the current optimal
vertex v remains optimal, and the optimal objective value does not change. Therefore, the k
hyperplanes corresponding to BI are redundant with respect to the optimal solution. It also
follows that the technology constraints among these k constraints have zero right shadow prices,
and the nonnegativity constraints have zero left shadow prices. Note that, since removing these
k constraints may enlarge the feasible region, they are not necessarily redundant with respect to
the feasible region (see the beginning of this section).
Proof of Theorem 5.6.4. First, assume that the model has multiple optimal solutions. Let
∗ ∗
x be any optimal vertex with corresponding optimal objective value z . Since the model has
multiple optimal solutions, there exists a different optimal solution, x̃∗ , say, that is not a vertex.
The set of binding constraints at x∗ is different from the set of binding constraints at x̃∗ . Hence,
there exists a constraint j that is binding at x∗ but not binding at x̃∗ . We will prove the statement
for the case where constraint j is a technology constraint; the case of nonnegativity constraints
is similar and is left for the reader. Because constraint j is not binding at x̃∗ , perturbing it by a
sufficiently small factor δ < 0 does not decrease the optimal objective value (because x̃∗ is still
feasible and has the objective value z ∗ ). Because the perturbation shrinks the feasible region,
it also does not increase the optimal objective value. This means that constraint j has zero left
shadow price, i.e., yjl = 0. It follows now from Theorem 5.6.2 that 0 ≤ yjr ≤ yjl = 0, and
hence yjr = 0.
The reverse can be shown as follows. Let x∗ be any optimal feasible basic solution, and assume
that there is a constraint j with yjr = yjl = 0 that is binding at x∗ . Let z ∗ (δ) be the optimal
objective value of the model with the constraint i perturbed by the factor δ . Since the right
shadow price of constraint i is zero, it follows that z ∗ (δ) = z ∗ (0) for small enough δ > 0. Let
∗
x̃ be an optimal feasible basic solution corresponding to the perturbation factor δ . Since the
unperturbed constraint i is binding at x∗ but not at x̃∗ , it follows that x∗ 6= x̃∗ . Because x∗
and x̃∗ have the same corresponding objective value z ∗ (0), it follows that there are multiple
optimal solutions.
Note that the requirement that the model has an optimal vertex is necessary. For example,
the nonstandard LO-model max{5x1 | x1 ≤ 1, x2 free} has multiple optimal solutions be-
T
cause any point 1 x2 with x2 ∈ R is optimal, but the left and right shadow price of the
only constraint x1 ≤ 1 is 5.
The binding constraints with zero shadow prices in the statement of Theorem 5.6.4 have to
be binding at the optimal vertices of the feasible region. Consider, for example, the model
T
min{x2 | x1 + x2 ≤ 1, x1 , x2 ≥ 0}. The set { x1 x2 | 0 ≤ x1 ≤ 1, x2 = 0} is
optimal. In the case of the optimal solution x1 = 21 , x2 = 0, the only binding constraint is
x2 ≥ 0, which has nonzero right shadow price.
phenomenon is called the more-for-less paradox; see also Section 1.6.1. It will be explained
by means of the following example.
Example 5.7.1. Consider the LO-model:
max x1 + x2
s.t. −x1 + x2 = 1
x2 ≤ 2
x1 , x2 ≥ 0.
One can easily check (for instance by using the graphical solution method) that the optimal solution reads
x∗1 = 1, x2∗ = 2, and z ∗ = 3. When decreasing the (positive) right hand side of −x1 + x2 = 1
by 12 , the perturbed model becomes:
max x1 + x2
1
s.t. −x1 + x2 = 2
x2 ≤ 2
x1 , x2 ≥ 0.
The optimal solution of this model reads x∗1 = 1 12 , x∗2 = 2, and z ∗ = 3 21 . Hence, the optimal
objective value increases when the right hand side of −x1 + x2 = 1 decreases; i.e., there is more
‘revenue’ for less ‘capacity’. Recall that increasing the right hand side of the second constraint results in
an increase of the optimal objective value.
This ‘paradox’ is due to the fact that dual variables corresponding to primal equality con-
straints may be negative. This will be explained in Theorem 5.7.1. The theorem concerns
nondegenerate optimal solutions for LO-models with equality constraints.
Degeneracy in the case of LO-models with equality constraints means that the correspond-
ing feasible basic solution contains at least one zero entry, with the basis matrix defined as
in Section 1.3. Degeneracy in this case has the same geometric interpretation as in the case
of models with only inequality constraints: the number of hyperplanes corresponding to
binding equalities or inequalities is at least n + 1, with n the number of decision variables.
Note that rewriting the above model in standard form (i.e., with inequality constraints only)
leads to a degenerate optimal solution, because equality constraints are split into two linearly
dependent inequality constraints.
Proof of Theorem 5.7.1. Consider the following LO-model with one equality constraint
(the other equality constraints may be considered as hidden in Ax ≤ b):
n o
T T 0
max c x Ax ≤ b, a x = b , x ≥ 0 ,
Note that the dual variable y 0 , corresponding to the equality constraint, is free. Now let b0 be
perturbed by a small amount δ (∈ R). The perturbed model becomes:
n o
T T 0
max c x Ax ≤ b, a x = b + δ, x ≥ 0 .
This is equivalent to:
n o
T T 0 T 0
max c x Ax ≤ b, a x ≤ b + δ, −a x ≤ −b − δ, x ≥ 0 .
Note that the values of the optimal dual variables associated with an equality constraint are
not present in optimal simplex tableaus. They can be calculated from y = (B−1 )T cBI .
We close this section with a remark on redundancy. Suppose that we have a maximizing
LO-model, and aT x = b is one of the restrictions. It can be seen from the sign of its shadow
price whether either aT x ≤ b or aT x ≥ b is redundant with respect to a given optimal
solution. Namely, if the shadow price of aT x = b is nonnegative, then aT x = b can be
replaced in the model by aT x ≤ b while the given optimal solution remains optimal, i.e.,
aT x ≥ b is redundant with respect to the given optimal solution. Similarly, if aT x = b has
a nonpositive shadow price then aT x ≤ b is redundant. We will explain this by means of
the following example.
Example 5.7.2. Consider Model Dovetail. Suppose the supplier of the shipment boxes for the long
matches offers these boxes against a special price, when precisely 500,000 of these boxes are purchased
per year. The number of boxes for long matches produced per year is x2 (×100,000). In the original
Model Dovetail (see Section 1.1.1) the constraint x2 ≤ 6 has to be replaced by the constraint x2 = 5.
One can easily check that the optimal solution of the new model is: x∗1 = 4, x∗2 = 5, with z ∗ = 22.
Moreover, the shadow price of the constraint x2 = 5 is negative (the corresponding dual variable
has the optimal value −1). This means that if the constraint x2 = 5 is replaced by the constraints
x2 ≤ 5 and x2 ≥ 5, then both constraints are binding, while x2 ≤ 5 has shadow price zero and
is redundant. The optimal solution can be improved when less than 500,000 ‘long’ boxes have to be
used. (We know this already, since we have solved the model for x2 ≤ 6; the optimal solution satisfies
x1∗ = x2∗ = 4 21 , z ∗ = 22 21 .)
5. 8 . E xe rc i se s 237
Hence, it can be seen from the sign of the shadow price of an equality constraint which
corresponding inequality is redundant with respect to the optimal solution.
5.8 Exercises
Exercise 5.8.1. Consider the LO-model:
max (1 + δ)x1 + x2
s.t. x2 ≤ 5
2x1 + x2 ≤ 8
3x1 + x2 ≤ 10
x1 , x2 ≥ 0.
Construct a table, similar to Table 5.1, for δ = −2, −1, 0, 1, 2, 3. Draw the graph of the
perturbation function z ∗ (δ) = (1 + δ)x∗1 + x∗2 .
Exercise 5.8.3. Give a proof of Theorem 5.3.2 (see Section 5.3.3): the shadow price of a
nonbinding nonnegativity constraint is zero.
Exercise 5.8.4.
(a) Consider the second constraint of Model Dovetail in Section 1.1, which has shadow
price 12 . Determine the tolerance interval for a perturbation of the right hand side value
of this constraint.
(b) Give an example of an LO-model with a binding nonnegativity constraint; determine
the tolerance interval and the perturbation function for this binding nonnegativity con-
straint.
max x1 + x2
s.t. 2x1 + x2 ≤ 9
x1 + 3x2 ≤ 12
x1 , x2 ≥ 0.
(a) Determine the optimal solution and rewrite the model so that the objective function
is expressed in terms of the nonbasic variables of the optimal solution.
(b) Determine the tolerance interval of 2x1 + x2 ≤ 9. Use this interval to calculate its
shadow price. Check the answer by dualizing the model, and calculating the optimal
dual solution.
238 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
(a) Check that the primal model has multiple optimal solutions and one of these solutions
is degenerate.
(b) Check (directly, i.e., without using Theorem 5.6.1) that the dual of this model has
multiple optimal solutions, and contains a degenerate optimal solution.
(c) Which case of Theorem 5.6.1 describes the situation in this exercise?
Exercise 5.8.7. Use Theorem 5.6.1 to determine whether the following models have
a unique optimal solution or multiple optimal solutions, and whether these solutions are
degenerate or nondegenerate. (Hint: solve the dual models.)
(a) min −4x1 + 5x2 + 8x3
1
s.t. x
2 1
− 2x2 + x3 ≥ 1
−x1 + x2 + x3 ≥ 1
x1 , x2 , x3 ≥ 0.
(b) min 11x1 + 3x2 + 10x3
s.t. x1 + 13 x2 + 2x3 ≥ 0
2x1 + x2 + x3 ≥ 1
x1 , x2 , x3 ≥ 0.
(c) min 6x1 + 3x2 + 12x3
s.t. x1 + x2 + 3x3 ≥ 2
x1 + x3 ≥ 2
x1 , x2 , x3 ≥ 0.
Exercise 5.8.8. Construct an LO-model that has both a degenerate optimal solution and
a nondegenerate optimal solution.
Exercise 5.8.11. The management of the publishing company Book & Co wants to
design its production schedule for the next quarter, where three books A, B , and C can
be published. Book A can only be published as a paperback, books B and C can also be
published in a more fancy form with a hard cover. We label book A by the number 1, book
B as paperback by 2 and as hard cover by 4, book C as paperback by 3 and as hard cover
by 5. The profits (in $ units) per book are listed in the second column of the table.
The objective is the maximization of the total profit. There are the following constraints.
The inventory of paper is limited to 15,000 sheets of paper. In the table, the number of
sheets needed for the different types of books is listed in the third column. Book & Co has
only one machine, and all books are processed on this machine.
The machine processing times for the various types of books are given in the fourth column
of the table. During the next quarter, the machine capacity (the total time that the machine
is available), is 80,000 time units. The number of employees for finishing the books is
limited. During the next quarter, there are 120,000 time units available for finishing the
books. The finishing time per book is given in the fifth column of the table.
240 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
Finally, each type of book needs a certain number of covers, while the total amount of
covers for paperbacks and hard covers is limited to 1,800 and 400 covers, respectively. The
amounts of covers required per book are given in the last two columns of the table.
Since the sale of hard cover books is expected to go more slowly than the sale of paperbacks,
Book & Co produces less hard cover books than paperbacks in the same time period. The
target rule is that the number of paperbacks for each different book must be at least five
times the number of hard cover books.
(a) Write this problem as an LO-model, and calculate the maximum amount of profit and
the optimal number of books to be produced.
Solve the following problems by using the sensitivity analysis techniques discussed in this
chapter.
(b) Book & Co can purchase ten paperback covers from another company for the price
of $100. Is this advantageous for Book & Co? From which result can you draw this
conclusion?
(c) The government wants book A to be published and is therefore prepared to provide
subsidy. What should be the minimum amount of subsidy per book such that it is
profitable for Book & Co to publish it.
(d) Consider the effects of fluctuating prices and fluctuating profits on this decision con-
cerning the production. Is this decision very sensitive to a change in profits?
(e) Consider the effects of changing the right hand side values as follows. Vary the number
of paperback covers from 400 to 2,000; draw the graph of the profits as a function of
the varying number of paperback covers. Are there kinks in this graph? Relate this
perturbation function to shadow prices.
Exercise 5.8.12. Analyze the dual of the LO-model for the company Book & Co from
Exercise 5.8.11. What is the optimal value? Perform sensitivity analysis for this model by
perturbing successively the right hand side values and the objective coefficients. What do
shadow prices mean in this dual model? Determine the basis matrices for the optimal primal
and dual solutions and show that they are complementary dual (in the sense of Section 2.2.3).
(a) Determine the optimal solution and perform sensitivity analysis by perturbing the right
hand side values. Which constraints are binding?
(b) Determine the dual model, and its optimal solution. Check the optimality of the ob-
jective value. Determine the complementary slackness relations for this model.
(c) The vertex with x1 = 0, x2 = 4, x3 = 1, and x4 = 4 is a point of the feasible region.
Check this. What is the objective value in this vertex? Compare this value with the
optimal value.
Exercise 5.8.15. A farm with 25 acres can cultivate the following crops: potatoes, sugar
beets, oats, winter wheat, and peas. The farmer wants to determine the optimal cultivating
schedule, and should take into account the following factors. The expected yields and the
expected costs per acre are given in the table below.
Other facts to be taken into account are the crop-rotation requirements: to control the
development of diseases and other plagues, the crops should not grow too often on the
same piece of land. In the county in which this farm is located, the following rules of
thumb are used:
I Potatoes: at most once every two years;
I Sugar beets: at most once every four years;
I Oats and winter wheat: individually, at most once every two years; together, at most
three times every four years;
I Peas: at most once every six years.
The objective is to make a schedule such that the crop-rotation requirements are satisfied
and the profit is as high as possible. The crop-rotation requirements can be modeled in
different ways. A simple way is the following. If a certain crop is allowed to be cultivated
242 C h a p t e r 5 . S e n s i t i v i t y a na ly s i s
only once per two years on a given piece of land, then this is the same as to require that the
crop is cultivated each year on one half of the land, and the next year on the other half.
(a) Write this problem as an LO-model, and solve it. If a planning period of twelve years is
chosen, then there is a second solution with the same profit during these twelve years.
Determine this solution. Suppose that the average annual interest rate is 6.5% during
the twelve years. Which solution is the most preferable one?
A serious disease for potatoes increasingly infects the soil in the county. To control this
process, the government considers compensating farmers that satisfy stricter crop-rotation
requirements. The following is proposed. Every farmer that cultivates potatoes once in five
years (or even less) receives a subsidy that depends on the size of the farm. The amount of
the subsidy is $160 per acre. For our farmer, a reward of $4,000 is possible.
(b) Do you think the farmer will adapt his plan if the above government proposal is carried
out? How strict can the government make the crop-rotation requirements to ensure
that it is profitable for the farmer to adapt his original plans?
Exercise 5.8.17. Consider the diet problem as formulated in Section 5.7. Perform
sensitivity analysis to the model corresponding to this problem by perturbing successively the
right hand side values and the objective coefficients. Draw the graphs of the corresponding
perturbation functions. Analyze the results in terms of the original diet problem.
5. 8 . E xe rc i se s 243
Exercise 5.8.22. Let (P) be an LO-model, and (D) its dual. Answer the following
questions; explain your answers.
(a) If (P) is nondegenerate, can it be guaranteed that (D) is also nondegenerate?
(b) Is it possible that (P) and (D) are both degenerate?
(c) Is it possible that (P) has a unique optimal solution with finite objective value, but (D)
is infeasible?
(d) Is it possible that (P) is unbounded, while (D) has multiple optimal solutions?
Exercise 5.8.23. A company manufactures four types of TV-sets: TV1, TV2, TV3, and
TV4. Department D1 produces intermediate products which department D2 assembles to
end products. Department D3 is responsible for packaging the end products. The available
number of working hours for the three departments is 4,320, 2,880, and 240 hours per
time unit, respectively. Each TV1 set requires in the three departments 12, 9, and 13 hours
per time unit, respectively. For each TV2 set these figures are: 15, 12, and 1; for TV3: 20,
12, and 1; and for TV4: 24, 18, and 2. The sales prices are $350 for TV1, $400 for TV2,
$600 for TV3, and $900 for TV4. It is assumed that everything will be sold, and that there
are no inventories. The objective is to maximize the yield.
(a) Formulate this problem as an LO-model, and determine an optimal solution.
(b) Calculate and interpret the shadow prices.
(c) Suppose that one more employee can be hired. To which department is this new
employee assigned?
(d) Which variations in the sales prices of the four products do not influence the optimal
production schedule?
(e) What happens if all sales prices increase by 12%?
(f ) Formulate the dual model. Formulate the complementary slackness relations, and give
an economic interpretation.
Exercise 5.8.24. A company wants to produce 300 kilograms of cattle fodder with the
following requirements. First, it needs to consist of at least 75% so-called digestible nutrients.
Second, it needs to contain at least 0.65 grams of phosphorus per kilogram. Third, it needs
to consist of at least 15% proteins. (The percentages are weight percentages.) Five types of
raw material are available for the production, labeled 1, . . . , 5; see the table below.
Raw material
1 2 3 4 5
Digestible nutrients (weight percentage) 80 60 80 80 65
Phosphorus (grams per kg) 0.32 0.30 0.72 0.60 1.30
Proteins (weight percentage) 12 9 16 37 14
Price ($ per kg) 2.30 2.18 3.50 3.70 2.22
5. 8 . E xe rc i se s 245
min 4x3
s.t. x1 + 2x3 ≤ 7
−x1 + x3 ≤ −7
x1 − x2 + 3x3 ≤ 7
x1 , x2 , x3 ≥ 0.
Draw the simplex adjacency graph corresponding to all optimal feasible basic solutions. For
each feasible basic solution, determine the tolerance interval corresponding to a perturbation
of the objective coefficient of x1 . Note that this coefficient is 0. Which optimal feasible
basic solution corresponds to a tolerance interval consisting of only one point? Draw the
perturbation function.
This page intentionally left blank
6
C hap te r
Large-scale linear optimization
Overview
Although Dantzig’s simplex algorithm has proven to be very successful in solving LO-
models of practical problems, its theoretical worst-case behavior is nevertheless very bad.
In 1972, Victor Klee (1925–2007) and George J. Minty (1929–1986) constructed examples
for which the simplex algorithm requires an exorbitant amount of computer running time;
see Chapter 9. The reason that the simplex algorithm works fast in practice is that LO-
models arising from most practical problems are usually so-called ‘average’ problems and are
fortunately not the rare ‘worst-case’ problems. Besides the theoretical question whether or
not there exists a fast algorithm that solves large-scale linear optimization problems, also the
practical need for such an efficient algorithm has inspired researchers to try to answer this
question.
The first such efficient algorithm (the so-called ellipsoid algorithm) was published in 1979 by
Leonid G. Khachiyan (1952–2005). Although this algorithm is theoretically efficient in
the sense that it has polynomial running time (see Chapter 9), its performance in practice
is much worse than the simplex algorithm. In 1984, Narendra Karmarkar (born 1957)
presented a new algorithm which – as he claimed – could solve large-scale LO-models
as much as a hundred times faster than the simplex algorithm. Karmarkar’s algorithm is
a so-called interior point algorithm. The basic idea of this algorithm is to move towards an
optimal solution through points in the (relative) interior of the feasible region of the LO-
model. Since Karmarkar’s publication, a number of variants of this algorithm have been
constructed.
Recently, interior point algorithms have received a significant amount of attention because
of their applications in non-linear optimization. In particular, interior point algorithms
are fruitful for solving so-called semidefinite optimization models, which generalize linear op-
timization models in the sense that the variables in semidefinite optimization models are
247
248 C h a p t e r 6 . L a r g e - s ca l e l i n e a r o p t i m i z at i o n
positive semidefinite matrices. This is a generalization because any nonnegative number can
be viewed as a positive semidefinite (1, 1)-matrix.
In this chapter1 we will discuss a particular interior point algorithm that is called the interior
path algorithm, developed by, among others, Cees Roos (born 1941) and Jean-Philippe Vial.
Roughly speaking, the interior path algorithm starts at an initial (relative) interior point of
the feasible region and uses the so-called interior path as a guide that leads this initial interior
point to an optimal point. The number of required iteration steps increases slowly in the
number of variables of the model. Experiments with specific versions of the interior path
algorithm have shown that, even for very large models, we often do not need more than
60 iterations. In spite of the fact that the time per iteration is rather long, the algorithm is
more or less insensitive to the size of the model; this phenomenon makes the interior path
algorithm useful for large-scale models.
FP+ = {x ∈ Rn | Ax = b, x > 0} =
6 ∅, and
+
m
T
FD = y ∈ R A y<c = 6 ∅.
The expression a > b means that all entries of a − b are strictly positive. In Section 6.4.2,
we will see that it is possible to transform any LO-model into a larger equivalent model,
whose feasible regions contain interior points.
1
Parts of this chapter are based on lecture notes by Cees Roos of Delft University of Technology.
6 . 1 . Th e i n t e r i o r pat h 249
Suppose that x̂ and ŷ are feasible solutions of (P) and (D), respectively. By the comple-
mentary slackness relations (see Section 4.3), it follows that both x̂ and ŷ are optimal for
their respective models if and only if x̂T ŵ = 0, where ŵ = c − AT ŷ. Note that x̂ ≥ 0
and ŵ ≥ 0, because x̂and ŷ are feasible.
This follows
directly from Theorem 4.3.1 by
A b
rewriting (P) as − max (−c) x −A x ≤ −b , x ≥ 0 . Hence, x̂ and ŷ are opti-
T
mal solutions of (P) and (D), respectively, if and only if x̂, ŷ, and ŵ form a solution of the
following system of equations:
Ax = b, x ≥ 0, (primal feasibility)
AT y + w = c, w ≥ 0, (dual feasibility) (PD)
Xw = 0, (complementary slackness)
where X denotes the diagonal (n, n)-matrix with the entries of the vector x on its main di-
agonal. The conditions (PD) are called the Karush-Kuhn-Tucker conditions; see also Appendix
E.4. The system of equations Xw = 0 consists of the complementary slackness relations.
Karush-Kuhn-Tucker conditions:
Ax = b, x ≥ 0, (primal feasibility)
AT y + w = c, w ≥ 0, (dual feasibility) (PDµ)
Xw = µ1. (relaxed complementary slackness)
x2
µ = 0 (optimal)
1
0.5
µ = +∞
0 0.5 1 x1
It is no coincidence that the system in Example 6.1.1 has a unique solution: this is actually
true in general. The following theorem formalizes this. We postpone proving the theorem
until the end of this section.
Theorem 6.1.1 also holds without the boundedness condition. However, without the bound-
edness condition, the proof is more complicated. On the other hand, in practical situations,
the boundedness condition is not a real restriction, since all decision variables are usually
bounded.
For every µ > 0, we will denote the unique solution found in Theorem 6.1.1 by x(µ),
y(µ), w(µ). The sets {x(µ) | µ > 0}, and {y(µ) | µ > 0} are called the interior paths of
(P) and (D), respectively. For the sake of brevity, we will sometimes call x(µ) and y(µ) the
interior paths. The parameter µ is called the interior path.
In general, interior paths cannot be calculated explicitly. The following simple example
presents a case in which calculating the interior path analytically is possible.
Example 6.1.2. Consider the following simple LO-model.
max x1 + 2x2
s.t. 0 ≤ x1 ≤ 1 (A)
0 ≤ x2 ≤ 1.
The dual of (A), with the slack variables y3 and y4 , reads:
min y1 + y2
s.t. y1 − y3 = 1
(B)
y2 − y4 = 2
y1 , y2 , y3 , y4 ≥ 0.
6 . 1 . Th e i n t e r i o r pat h 251
In the context of this chapter, we consider (B) as the primal model and (A) as its dual. The system
(PDµ) becomes:
y
1
1 0 −1 0 y2
= 1 , y1 , y2 , y3 , y4 ≥ 0
0 1 0 −1 y3 2
y4
w1
1 0 1
0 1 x1 w2 1
−1 0 x2 + w3 = 0 , w1 , w2 , w3 , w4 ≥ 0
0 −1 w4 0
y1 w1 = µ, y2 w2 = µ, y3 w3 = µ, y4 w4 = µ.
We will solve for x1 and x2 as functions of µ. Eliminating the yi ’s and the wi ’s results in the equations
1
1−x1
− x11 = µ1 and 1−x 1
2
− x12 = µ2 . Recall that µ > 0, and so x1 > 0 and x2 > 0. Solving
the two equations yields:
q
x1 = x1 (µ) = −µ + 21 + 12 1 + 4µ2 , and
q
x2 = x2 (µ) = − 2 µ + 2 + 2 1 + µ2 .
1 1 1
One can easily check that lim x1 (µ) = 1 = lim x2 (µ). The expression µ ↓ 0 means µ → 0
µ↓0 µ↓0
with µ > 0. Figure 6.1 shows the graph of this interior path.
The proof of Theorem 6.1.1 uses a so-called logarithmic barrier function. The logarithmic
barrier function for the primal model is defined as:
n
1 X
BP (x; µ) = cT x − ln(xi ),
µ i=1
with domain FP+ = {x ∈ Rn | Ax = b, x > 0}. Figure 6.2 depicts the graph of the
function B(x1 , x2 ) = − ln(x1 ) − ln(x2 ), corresponding to the last term in the definition
of BP (x; µ) (for n = 2). Since ln(xi ) approaches −∞ when xi → 0, it follows that,
for fixed µ > 0, BP (x; µ) → ∞ when xi → 0 (i = 1, . . . , n). Intuitively, the barrier
function is the objective function plus a term that rapidly tends to infinity as we get closer
to the boundary of the feasible region. The logarithmic barrier function for the dual model
is defined as n
1 X
BD (y; µ) = − bT y − ln(wj ),
µ j=1
B(x1 , x2 )
x2
1
x1
Figure 6.2: Logarithmic barrier function B(x1 , x2 ) = − ln(x1 ) − ln(x2 ), drawn for 0 < x1 ≤ 1 and
0 < x2 ≤ 1. The value of B(x1 , x2 ) diverges to infinity as x1 ↓ 0 and as x2 ↓ 0.
The proof of Theorem 6.1.1 uses an interesting connection between the logarithmic barrier
functions BP (· ; µ) and BD (· ; µ), and the system of equations (PDµ): as we will see, the
functions BP (· ; µ) and BD (· ; µ) each have a unique minimum that, together, correspond
to the unique solution of (PDµ). One of the key observations to show this connections is
the fact that the two logarithmic barrier functions are strictly convex (see Appendix D). We
show this fact in Theorem 6.1.2.
Proof of Theorem 6.1.2. We will show that BP is strictly convex on FP+ . Take any µ > 0.
We need to show that for any x and x0 in FP+ and any λ with 0 < λ < 1, it holds that:
0 0
BP (λx + (1 − λ)x ; µ) < λBP (x; µ) + (1 − λ)BP (x ; µ).
We use the fact that ln(xi ) is strictly concave, i.e., that ln(λxi0 + (1 − λ)x00i ) > λ ln(xi0 ) + (1 −
00
λ) ln(xi ). It now follows that:
n
0 0 0 00
X
1 T
BP (λx+(1 − λ)x ; µ) = µ c (λx + (1 − λ)x ) − ln(λxi + (1 − λ)xi )
i=1
6 . 1 . Th e i n t e r i o r pat h 253
1 T 0 0 00
X
1 T
=λ µc x + (1 − λ) µc x − ln(λxi + (1 − λ)xi )
i
1 T 0 0 00
X X
1 T
<λ µc x + (1 − λ) µc x −λ ln(xi ) − (1 − λ) ln(xi )
i i
0
= λBP (x; µ) + (1 − λ)BP (x ; µ).
+
The proof of the strict convexity of BD (· ; µ) on FD is similar and is left to the reader.
We can now prove Theorem 6.1.1. The main idea of the proof is that, for any µ > 0, a
minimizer of BP (· ; µ) together with a minimizer of BD (· ; µ) form a solution of (PDµ).
Proof of Theorem 6.1.1. Take any µ > 0. Since the feasible region FP is bounded and
closed, and since BP (· ; µ) is infinite on the boundary of FP , it follows that the strictly convex
+
function BP (· ; µ) has a unique minimum on FD ; see Appendix D. In order to determine this
unique minimum, which we call x(µ), we use the Lagrange function (see Section E.3):
n n
! !
X X
Lλ (x, µ) = BP (x; µ) − λ1 a1i xi − b1 − . . . − λm ami xi − bm ,
i=1 i=1
T
with A = {aij }, and b = b1 . . . bm . Differentiating Lλ (x, µ) with respect to xi for each
i = 1, . . . , n yields:
∂ ∂
L (x, µ) = B (x; µ) − (λ1 a1i + . . . + λm ami )
∂xi λ ∂xi P
1 1
= ci − − (λ1 a1i + . . . + λm ami ).
µ xi
Hence, the gradient vector of Lλ (x, µ) satisfies:
T
∂ ∂ 1 −1 T
∇Lλ (x, µ) = L (x, µ) . . . L (x, µ) = c − X 1 − A λ,
∂x1 λ ∂xn λ µ
T
with λ = λ1 . . . λm , and X and 1 as defined before. From the theory of the Lagrange
multiplier method (see Section E.3), it follows that a necessary condition for x(µ) being a
minimizer for BP (x; µ) is that there exists a vector λ = λ(µ) such that ∇Lλ (x(µ), µ) = 0,
Ax(µ) = b, and x(µ) ≥ 0. Since for x > 0 there is only one stationary point, it follows that
the following system has the unique solution x = x(µ):
1 −1 T
c − X 1 = A λ, Ax = b, x > 0. (6.1)
µ
Premultiplying the first equation of (6.1) by A, it follows, since AAT is nonsingular (see The-
orem B.4.1 in Appendix B), that λ = (AAT )−1 ( µ1 Ac − AX−1 1), and so λ is also uniquely
determined; this value of λ is called λ(µ). Defining y(µ) = µλ(µ) and w(µ) = µX−1 1, it
follows that (6.1) is now equivalent to the system:
Ax = b, x > 0
T
A y + w = c, w > 0
Xw = µ1.
254 C h a p t e r 6 . L a r g e - s ca l e l i n e a r o p t i m i z at i o n
Because for µ > 0, (PDµ) has only solutions that satisfy x > 0 and w > 0 (this follows
from Xw = µ1 > 0), (PDµ) also has the unique solution x = x(µ) > 0, y = y(µ), and
w = w(µ) > 0.
Proof of Theorem 6.1.3. We first prove the monotonicity of cT x(µ) as a function of µ. Take
any µ and µ0 with 0 < µ < µ0 . We will show that cT x(µ) < cT x(µ0 ). Since x(µ) minimizes
0 0
BP (x; µ), and x(µ ) minimizes BP (x; µ ), it follows that:
1 T X 1 T 0
X 0
c x(µ) − ln(xi (µ)) < c (x(µ ) − ln(xi (µ )),
µ µ
i i
1 1 T
and 0 cT x(µ0 ) − 0
X X
ln(xi (µ )) < 0 c x(µ) − ln(xi (µ)).
µ i
µ i
Adding up these inequalities gives:
1 T 1 T 0 1 T 0 1 T
c x(µ) + 0 c x(µ ) < c x(µ ) + 0 c x(µ),
µ µ µ µ
T T 0 0
and so ( µ1 − 1
0 )(c x(µ) − c x(µ )) < 0. Since 0 < µ < µ , it follows that 1
µ − 1
0 > 0, and
µ µ
T T 0 T
therefore c x(µ) < c x(µ ), as required. The monotonicity of b y(µ) can be shown similarly
by using BD (y; µ).
We finally show that cT x(µ) − bT y(µ) = nµ. The proof is straightforward, namely:
T
T T T T
c x(µ) − b y(µ) = A y(µ) + w(µ) x(µ) − b y(µ)
T T T
= y(µ) A + w(µ) x(µ) − b y(µ)
T T T
= y(µ) b + w(µ) x(µ) − b y(µ)
6 . 2 . F o r m u l at i o n o f t h e i n t e r i o r pat h a l g o r i t h m 255
optimal point
x2 x(µk )
)
;µ 1
( x1
p
; µ 0)
p (x 0 x x(µ0 )
1
x0
interior path
n
X
T
= w(µ) x(µ) = xi (µ)wi (µ) = nµ,
i=1
which proves the theorem.
In Section 6.4.1, we will see how the duality gap for points on the interior path is used to
terminate the calculations of the interior path algorithm with a certain prescribed accuracy
for the optimal objective value.
xk+1 = xk + p(xk ; µk ).
Notice that xk+1 is in general not equal to x(µk ), so xk+1 need not lie on the interior
path. However, it will be ‘close to’ the interior path. Next, the algorithm decreases the
current value of the parameter µk with a given updating factor θ that satisfies 0 < θ < 1;
the new interior path parameter µk+1 is defined by:
µk+1 = (1 − θ)µk ,
and corresponds to the interior path point x(µk+1 ). The procedure is repeated for the pair
(xk+1 ; µk+1 ), until a pair (x∗ , µ∗ ) is reached for which:
nµ∗ ≤ e−t ,
256 C h a p t e r 6 . L a r g e - s ca l e l i n e a r o p t i m i z at i o n
N (A) u
PA (u)
x3
QA (u)
x2
R(A)
0
x1
where t is a prescribed accuracy parameter. Since this implies that µ∗ is close to 0 and nµ∗
is approximately (recall that x∗ need not be on the interior path) equal to the duality gap
cT x∗ − bT y∗ (see Theorem 6.1.3 and Theorem 6.4.1), it follows that x∗ and y∗ are ap-
proximate optimal solutions of (P) and (D), respectively; see Theorem 4.2.3. Therefore,
the interior points generated by the algorithm, converge to an optimal point. Informally
speaking, the interior path serves as a ‘guiding hand’ that leads an initial interior point to the
vicinity of the optimal solution. Figure 6.3 schematically shows the working of the interior
path algorithm.
There remain a number of questions to be answered:
I How to determine an initial interior point x0 and an initial interior path parameter µ0 ?
This will be discussed in Section 6.4.2.
I How to determine the search direction p(x; µ), and what should be the length of this
vector? This is the topic of the next section.
I What is a good choice for the updating factor θ ? This will also be discussed in the next
section.
I Last but not least, we should prove that the algorithm is efficient, i.e., it runs in a poly-
nomial amount of time. We postpone this discussion to Section 9.3.
The set N (A) is called the null space of A, and it consists of all vectors x such that Ax = 0.
Clearly, 0 ∈ N (A). The set R(A) is called the row space of A, and it consists of all vectors
x that can be written as a linear combination of the rows of A.
Take any u ∈ Rn . Let PA (u) be the point in N (A) that is closest to y. The point PA (u)
is called the projection of u onto N (A). Similarly, let QA (u) be the point in R(A) that is
closest to u. The point QA (u) is called the projection of u onto R(A). (See Figure 6.4.)
By definition, PA (u) is the optimal solution of the following minimization model (the 21
and the square are added to make the calculations easier):
min 12 ku − xk2
s.t. Ax = 0 (6.2)
x ∈ Rn .
T
Using the vector λ = λ1 . . . λm of Lagrange multipliers, it follows that the optimal
where the gradient is taken with respect to x. One can easily check that this expression is
equivalent to −(u − x) + AT λ = 0 and, hence, to AT λ = (u − x). Premultiplying both
sides by A gives AAT λ = A(u − x) = Au, where the last equality follows from the fact
that Ax = 0. Using the fact that AAT is nonsingular (see Theorem B.4.1 in Appendix
B), it follows that λ = (AAT )−1 Au. Substituting this expression into AT λ = (u − x)
yields AT (AAT )−1 Au = u − x, and, hence:
Note that PA (u) is the product of a matrix that only depends on A and the vector u.
Hence, we may write PA (u) = PA u, where
PA = In − AT (AAT )−1 A.
The projection QA (u) of u onto the row space of A can be determined as follows. By
definition, QA (u) = AT y, where y is the optimal solution of the model (the square is
again added to simplify the calculations)
After straightforward calculations, it follows that in this case the gradient expression is equiv-
alent to (AAT )y = Au. Therefore, we have that y = (AAT )−1 Au, so that:
x3 x3
4 4
T
0.2 0.4 2.4
2 2
x2 T x2
4 1 11 4
2 2
2 2
4 4
x1 x1
Clearly, ku − QA (u)k = ku − AT (AAT )−1 Auk = kPA (u)k. Using (6.3), it follows
that:
kPA (u)k = min ku − AT yk.
y
Now, replace the vector x of decision variables by Xk x, where Xk denotes the diagonal
(n, n)-matrix with xk on its main diagonal. Thus, the transformation is
x := Xk x.
Applying this transformation can be thought of as measuring the value of each decision
variable xi in different units. The transformed model becomes:
min (Xk c)T x (AXk )x = b, x ≥ 0 .
Notice that 1 is interior point of F̄P+ = {x ∈ Rn | AXk x = b, x > 0}. Moreover, the
open hyperball
B1 (1) = {x ∈ Rn | AXk x = b, kx − 1k < 1}
with radius 1 and center 1 is contained in the affine space {x ∈ Rn | AXk x = b}. Fur-
thermore, B1 (1) consists of interior points of F̄P . In other words, B1 (1) ⊂ F̄P+ . To prove
T
this, take any x = x1 . . . xn ∈ B1 (1). Hence, (x1 − 1)2 + . . . + (xn − 1)2 < 1. If,
say x1 ≤ 0, then (x1 − 1)2 ≥ 1, which is not possible. Hence, B1 (1) ⊂ F̄P+ .
Example 6.2.1. Consider the situation as depicted in Figure 6.5, where
n T o
FP = x1 x2 x1 + x2 = 4, x1 , x2 ≥ 0 .
T
Take xk = 0.3 3.7 . After scaling with
0.3 0
Xk = ,
0 3.7
n T o
we find that F̄P = x1 x2 |0.3x1 + 3.7x2 = 4, x1 , x2 ≥ 0 . The boundary points of
T T
F̄P are 13.3 0 and 0 1.1 . Note that inside F̄P there is enough room to make a step of length
T
1 starting at 1 = 1 1 in any direction inside F̄P .
As argued in the previous subsection, the point 1 is the current interior point of (P̄ ), and
X−1
k x(µk ) is the current point on the interior path in F̄P . The logarithmic barrier function
for (P̄ ) is
n
1 T
X
BP̄ (x; µk ) = (X c) x − ln(xi ),
µk k i=1
260 C h a p t e r 6 . L a r g e - s ca l e l i n e a r o p t i m i z at i o n
x2
4
−1
Xk x(µk )
3
2
−∇BP (1; µk )
1
1
0 1 2 3 4 5 6 x1
x2
tangent at 1
1
1 −1
ϕ Xk x(µk )
−∇BP̄ (1; µk )
level curves
0 1 x1
−1
Figure 6.7: The vector −∇BP̄ (1; µk ) points ‘more or less’ in the direction of Xk x(µk ).
Let p(1; µk ) be the search direction for the next interior point u in FP̄ when starting at 1;
i.e., u = 1 + p(1; µk ). In the next section, we will determine the length of p(1; µk ) such
that u is actually an interior point of FP̄ . Since u ∈ FP̄ , it follows that AXk (u − 1) =
AXk p(1; µk ) = 0, and so p(1; µk ) has to be in the null space N (AXk ); see Section
6.2.2. This can be accomplished by taking the projection of −∇BP̄ (1; µk ) onto the null
space of AXk . So, the search direction in the scaled model (P) becomes:
The search direction p(1; µk ) corresponds to the search direction Xk p(1; µk ) for the
original model, where xk is the current interior point, and µk is current value of the
interior path parameter. Define:
p(xk ; µk ) = Xk p(1; µk ).
Note that p(xk ; µk ) is in the null space N (A). It now follows that the search direction for
the unscaled model is:
and the next point xk+1 is chosen to be xk+1 = xk + p(xk ; µk ); see Figure 6.3.
Example 6.2.3. Consider again the data used for Figure 6.1 in Section 6.1. In order to apply
Dikin’s affine scaling procedure, we first need the (P)-formulation of model (A), namely:
− min − x1 − 2x2
s.t. x1 + x3 = 1
x2 + x4 = 1
x1 , x2 , x3 , x4 ≥ 0.
T
Take xk = 0.4 0.8 0.6 0.2 , and µk = 2. After scaling with respect to xk , we find the
equivalent formulation
0 −0.46
0.69 0
0 0.06 0 −0.24
PAXk = −0.46
,
0 0.31 0
0 −0.24 0 0.94
and therefore
1 T
p(xk ; µk ) = Xk PAXk 1 − Xk c = 0.15 −0.10 −0.14 0.10 .
µk
T
The projection of p(xk ; µk ) on the X1 OX2 -plane is the vector 0.15 −0.10 . Hence, the new
point xk+1 satisfies:
T T T
xk+1 = xk + p(xk ; µk ) = 0.4 0.8 + 0.15 −0.10 = 0.55 0.70 ;
The entity δ(x; µ) is called the convergence measure of the interior path algorithm. It plays
a key role in proving both the correctness of the algorithm (meaning that at each iteration
of the algorithm the feasibility of the generated points is preserved, and that these points
converge to an optimal solution), and the polynomial running time of the algorithm.
Proof of Theorem 6.3.1. For any interior point x in FP and any interior path parameter µ,
it holds that:
n o
T
min kXw/µ − 1k |A y + w = c
y,w
T
= min k(X/µ)(c − A y) − 1k
y
T
= min k(Xc/µ − 1) − (AX) (y/µ)k (see Section 6.2.2)
y
= kPAX (Xc/µ − 1)k = δ(x, µ).
This proves the first part. For the second part, take any x > 0 with Ax = b. Suppose first
that x is on the interior path, i.e., x = x(µ) for some µ > 0. Let y(µ) and w(µ) satisfy (PDµ).
Then Xw(µ) = µ1 implies that δ(x, µ) = 0, as required. Next, assume that δ(x, µ) = 0. Let
y(x, µ) and w(x, µ) be minimizing values of y and w in the above formula for δ(x, µ); i.e.,
264 C h a p t e r 6 . L a r g e - s ca l e l i n e a r o p t i m i z at i o n
−1 T
δ(x, µ) = kXw(x, µ)µ − 1k, and A y(x, µ) + w(x, µ) = c. Hence, Xw(x, µ) = 1µ,
or w(x, µ) = µX−1 1 = µx > 0. Therefore, x, y(x, µ), and w(x, µ) satisfy the relaxed
Karush-Kuhn-Tucker conditions (PDµ). Hence, x = x(µ). This proves the theorem.
For w(x, µ) and y(x, µ) being minimizing values satisfying w(x, µ) = c − AT y(x, µ)
in the formula for δ(x, µ) in Theorem 6.3.1, we define:
Proof of Theorem 6.3.2. We first show that xk+1 = xk + p(xk , µ) is an interior point of
FP , provided that xk is an interior point of FP . Since p(xk , µ) = Xk PAXk (1 − Xk c/µ), and
PAXk (1 − Xk c/µ) is in N (AXk ), it follows directly that Axk+1 = b. So, we only need to
show that xk+1 > 0. Clearly, X−1
k p(xk , µ) = PAXk (1 − Xk c/µ). Since kPAXk (Xk c/µ) −
+
1k < 1, and B1 (1) ⊂ FP̄ (see Section 6.2.4), it follows that u = 1+p(1, µ) is an interior point
of FP̄ . Hence, u > 0. Moreover, since xk > 0, it follows that xk+1 = Xu > 0. Therefore,
xk+1 is in fact an interior point of FP .
We now show that δ(xk , µ) < 1 for each k = 1, 2, 3, . . ., provided that δ(x0 , µ) < 1. To that
end, we show that:
2
δ(xk+1 , µ) ≤ (δ(xk , µ)) .
For brevity, we will write sk = s(xk , µ); Sk denotes the diagonal matrix associated with sk .
Then, it follows that
1
µ Xk+1 w(xk+1 , µ) − 1
δ(xk+1 , µ) =
−1
≤
µ1 Xk+1 w(xk , µ) − 1
= kXk+1 Xk sk − 1k.
x2
6
5 optimal point
4
(∗)
3
(∗∗)
2
1
0 1 2 3 4 5 6 7 x1
Figure 6.8: Approximations of interior paths. The path denoted (∗) refers to Model Dovetail, and path
(∗∗) to the same model without the constraint x1 ≤ 7.
Hence, lim (δ(xk , µ))k = 0. Since δ(x, µ) = 0 if and only if x = x(µ) (see Theorem 6.3.1),
k→∞
it follows that the sequence {xk } converges to x(µ) for k → ∞.
Theorem 6.3.2 can be used to computationally approximate the interior path of an LO-
model. We can do this by choosing various initial interior points x0 and various values of
µ, and approximately calculating x(µ). For example, Figure 6.8 shows two interior paths
for Model Dovetail (see Section 1.1). The path denoted (∗) refers to Model Dovetail, and
path (∗∗) to the same model without the constraint x1 ≤ 7. It is left to the reader to repeat
some of the calculations for different choices of x0 and µ. In Exercise 6.5.16, the reader is
asked to draw the interior paths for different objective vectors, and for different right hand
side vectors.
Proof of Theorem 6.3.3. Let xk , µk be the current pair, and xk+1 , µk+1 the next pair
defined as before, with θ = 1
√
6 n
. Assuming that δ(xk , µk ) ≤ 2,
1
we will show that it also
266 C h a p t e r 6 . L a r g e - s ca l e l i n e a r o p t i m i z at i o n
Theorem 6.3.3 shows that if we choose the initial pair x0 , µ0 such that δ(x0 , µ0 ) ≤ 21 and
θ = 6√1n , then the value of δ(xk , µk ) remains ≤ 12 at each iteration step, and so at the end
of each iteration step the algorithm is actually back in its initial situation. The algorithm
proceeds as long as nµk > e−t , where t is the accuracy parameter (see Section 6.2); this
will be explained in the next section.
We can now summarize the above discussions by formulating the interior path algorithm.
The expression for the dual slack variables wk follows from Xk wk = µ1, and the expression
for the dual decision variables yk will be explained in Section 6.4.1.
6 . 4 . Te r m i nat i o n a n d i n i t i a l i z at i o n 267
Proof of Theorem 6.4.1. From Theorem 6.3.3, it follows that δ(xk , µk ) < 1 for each
k = 1, 2, . . .. Hence, δ(xk , µk ) = ks(xk , µk )−1k < 1. Define sk = s(xk , µk ). It then follows
√
that sk ≥ 0, because if, to the contrary, sk ≤ 0, then ksk −1k = k−sk +1k ≥ k1k = n > 1,
which is clearly false. Since xk > 0 and µ > 0, it follows that c−AT yk = wk = X−1
k sk µk ≥ 0
and, therefore, yk is dual feasible. In order to show the inequality of the theorem, we apply the
Cauchy-Schwarz inequality (kak kbk ≥ |aT b|, see Appendix B):
T
√
1
T 1 x k wk
n δ(xk , µk ) = k1k
Xk wk − 1
≥ 1
Xk wk − 1 =
− n.
µ k µ k µ k
√
Since µk > 0, it follows that |xTk wk − nµk | ≤ µk n δ(xk , µk ). Recall that cT xk − bT yk =
T T T T T
(A yk + wk ) xk − b yk = wk xk = xk wk . This proves the theorem.
Let (x0 , µ0 ) be an initial pair for the interior path algorithm such that δ(x0 , µ0 ) ≤ 21 , and
θ = 6√1n . Moreover, let
be a sequence of pairs of interior points and values of the interior path parameter generated
in successive steps of the interior path algorithm. Since lim µk = lim (1 − θ)k µ0 = 0,
k→∞ k→∞
Theorem 6.4.1 implies that:
√
0 ≤ lim |cT xk − bT yk − nµk | ≤ lim 12 µk n = 0.
k→∞ k→∞
Hence,
lim |cT xk − bT yk − nµk | = 0.
k→∞
This implies that, for large enough values of k , it holds that:
cT xk − bT yk ≈ nµk
(‘≈’ means ‘approximately equal to’), and so nµk is a good measure for the desired rate of
optimality. If t is the desired accuracy parameter, then the algorithm stops as soon as:
nµk ≤ e−t ,
6.4.2 Initialization
The interior path algorithm starts with an initial interior point x0 and an initial value µ0
for the interior path parameter which satisfies δ(x0 , µ0 ) ≤ 12 . It is not immediately clear
how x0 and µ0 should be chosen. In this section we will present an auxiliary model, based
on the LO-model (P) that needs to be solved. For this auxiliary model, an initial solution
is readily available. Moreover, it has the property that, from any optimal solution of the
auxiliary model, an optimal solution of (P) can be derived.
The idea behind the initialization procedure is similar to the big-M procedure (see Section
3.6.1) that may be used to initialize the simplex algorithm (see Chapter 3). Similar to the
big-M procedure, the LO-model at hand is augmented with an artificial variable with a large
objective coefficient. The value of the objective coefficient is taken to be large enough to
guarantee that, at any optimal solution, the value of the artificial variable will be zero. A
complication compared to the big-M procedure, however, is the fact that such an artificial
variable needs to be introduced in both the primal and the dual models. So, an artificial
variable with large objective coefficient MP is introduced in the primal model and an artifi-
cial variable with large objective coefficient MD is introduced in the dual model. Since the
objective coefficients of the primal model appear in the right hand side values of the dual
model, this means that MD will appear in the right hand side values of the primal model.
6 . 4 . Te r m i nat i o n a n d i n i t i a l i z at i o n 269
It is not immediately clear how to construct suitable primal and dual models together with
constants MP and MD . We will show in the remainder of this section that the following
auxiliary primal and dual models may be used. Let α > 0. Define MP = α2 and MD =
α2 (n + 1) − αcT 1. Consider the following pair of primal and dual LO-models:
min cT x + MP xn+1
s.t. Ax + (b − αA1)xn+1 = b
(P0 )
(α1 − c)T x + αxn+2 = MD
x ≥ 0, xn+1 ≥ 0, xn+2 ≥ 0,
and
max bT y + MD ym+1
s.t. AT y + (α1 − c)ym+1 + w = c
(b − αA1)T y + wn+1 = MP
(D0 )
αym+1 + wn+2 =0
y, ym+1 free,
w ≥ 0, wn+1 ≥ 0, wn+2 ≥ 0.
We first show that an initial primal solution x0 (∈ Rn+2 ) of (P0 ) that lies on the central path
is readily available. The following theorem explicitly gives such an initial primal solution
along with an initial dual solution.
Theorem 6.4.2.
Let α > 0. Define x0 ∈ Rn+2 , y0 ∈ Rm+1 , and w0 ∈ Rn+2 by:
x̂ α1
x0 = x̂n+1 = 1 ,
x̂n+2 α
ŵ α1
ŷ 0
y0 = = , and w0 = ŵn+1 = α2 .
ŷm+1 −1
ŵn+2 α
y
h i
Let µ0 = α2 . Then, x0 is a feasible solution of (P0 ), w0 is a feasible solution of (D0 ),
0
and δ(x0 ; µ0 ) = 0.
αŷm+1 + ŵn+2 = −α + α = 0.
Finally, consider the complementary slackness relations:
x̂i ŵi = α × α = µ0 , for i = 1, . . . , n,
x̂n+1 ŵn+1 = α × α = µ0 , and (6.4)
2
x̂n+2 ŵn+2 = 1 × α = µ0 .
Hence, X0 w0 = µ0 1. It follows from Theorem 6.3.1 that:
X0 w
A y + w = c ≤
X0 w0 − 1
= 0.
T
δ(x0 , µ0 ) = min
− 1
y,w
µ 0
µ
0
Since δ(x0 , µ0 ) ≥ 0, we have that δ(x0 , µ0 ) = 0, as required.
So, we have constructed auxiliary LO-models for which it is straightforward to find an initial
pair of primal and dual solutions that lie on the central path. What remains to show is that,
once an optimal solution of (P0 ) has been found, and the artificial primal variable xn+1 and
dual variable ym+1 have optimal value zero, then this optimal solution can be turned into
an optimal solution of the original LO-model.
Theorem 6.4.3. T
Consider the primal model ( ): Ax = b, x ≥ 0 , and the corresponding
P
min c x
dual model (D): max bT y AT y + w = c, w ≥ 0 . Let
∗ ∗
x ∗ w
∗ ∗ ∗ y ∗ ∗
x0 = xn+1 , and y0 = ∗
, w0 = wn+1
∗ ym+1 ∗
xn+2 wn+2
Proof. Because x∗n+1 = 0, we have that Ax∗ = b, x∗ ≥ 0, and hence x∗ is a feasible solution
∗ T ∗ ∗
of the original primal model
(P ). Similarly, because ym+1 = 0, we have that A y + w = c
∗
y
and w∗ ≥ 0, and hence ∗ is a feasible solution of the original dual model (D). Because
w
∗ ∗ ∗ 0 0
x0 , y0 , and w0 are optimal for (P ) and (D ), respectively, it follows from the complementary
∗ ∗
slackness relations (6.4) that xi wi = 0 for i = 1, . . . , n. Hence, it follows from the comple-
mentary slackness relations for (P) and (D) that x∗ and y∗ are optimal solutions of (P) and (D),
respectively.
6 . 4 . Te r m i nat i o n a n d i n i t i a l i z at i o n 271
∗ x2
z
25
6
20 5 optimal point
15 4
3
10
2
5 1
0 1 2 3 4 α 0 1 2 3 4 5 6 7 x1
Figure 6.9: The (negative of the) optimal Figure 6.10: Path taken by the interior path
objective value of the auxiliary algorithm for the auxiliary
LO-model for Model Dovetail. LO-model for Model Dovetail with
α = 2.5.
Since x3∗ = 0 and y5∗ = 0, it follows from Theorem 6.4.3 that these solutions correspond to optimal
solutions of Model Dovetail and its dual model. Figure 6.9 illustrates how the optimal objective value
of the auxiliary model depends on α. It can be checked that, for α ≥ 2.45, the optimal solution of
the auxiliary model coincides with the optimal solution of Model Dovetail. Figure 6.9 shows the path
taken by the interior path algorithm when applied to the auxiliary problem with α = 2.5.
The following theorem shows that it is possible to take α sufficiently large so as to guarantee
that any pair of optimal solutions of the auxiliary LO-models satisfies x∗n+1 = 0 and ym+1
∗
=
0.
272 C h a p t e r 6 . L a r g e - s ca l e l i n e a r o p t i m i z at i o n
Theorem 6.4.4. T
Consider the Tmodel (P): min c x Ax = b, x ≥ 0 and its dual model
Tprimal
(D): max b y A y + w = c, w ≥ 0 . Assume that both models have optimal
vertices, and let z ∗ be their optimal objective value. For large enough values of α > 0,
(i) the models (P0 ) and (D0 ) have optimal objective value z ∗ < α, and
(ii) any pair of optimal primal and dual solutions
∗ ∗
x ∗ w
x∗n+1 , and y ∗
∗ , wn+1
∗ ym+1 ∗
xn+2 wn+2
We will first show that, for every vertex x̂ (∈ Rn ) of the feasible region of (P) and every vertex
m
ŷ (∈ R ) of the feasible region of (D), it holds that:
T T
MP > (α1 − c) x̂, and MD > (b − αA1) ŷ. (6.6)
To show this, note that, by the definition of α, we have that:
T T T 2 2
(α1 − c) x̂ ≤ α|1 x̂| + |c x̂| < α(α − 1) + α − 1 = α − 1 < α = MP .
Similarly, we have that:
T T T 2 2 2
(b − αA1) ŷ ≤ |b ŷ| + α|(A1) ŷ| < α + α < α + α (n − 1)
2 2 2 T
= α (n + 1) − α < α (n + 1) − αc 1 = MD .
This proves (6.6).
∗
y
Now, let x∗ be an optimal vertex of (P), and let ∗ be an optimal vertex of (D). Define:
w
∗ ∗
x ∗ w
∗ T ∗ ∗ y ∗
x̂ = MP − (α1 − c) x , and ŷ =
, ŵ = 0 .
0 T ∗
0 MD − (b − αA1) y
Note that using (6.6), we have that MP − (α1 − c)T x∗ > 0 and MD − (b − αA1)T y∗ > 0. It
is easy to checkfeasibility, and optimality follows from the complementary slackness relations.
∗
∗ ŷ 0 0
Hence, x̂ and ∗ are optimal solutions of (P ) and (D ), respectively. Moreover, it can be
ŵ
seen that z ∗ < α.
6 . 5. E xe rc i se s 273
∗ ∗
(ii) It suffices to show that wn+1 > 0 and xn+2 > 0. To see that this is sufficient, suppose that
∗ ∗ ∗ ∗
wn+1 > 0 and xn+2 > 0. The complementary slackness relation xn+1 wn+1 = 0 immediately
∗ ∗ ∗
implies that xn+1 = 0. Similarly, the complementary slackness relation xn+2 wn+2 = 0 implies
∗ ∗
that wn+2 = 0 and, consequently, the dual constraint αym+1 +wn+2 = 0 implies that ym+1 =
0.
∗ ∗
We first prove that wn+1 > 0. Suppose for a contradiction that wn+1 = 0. In that case, the dual
constraint (b − αA1)T y + wn+1 = MP implies that (b − αA1)T y∗ = MP , and therefore:
T ∗ T ∗ 2 T ∗ 2 T ∗ 2
b y = MP + αA1 y = α + αA1 y ≥ α − α|A1 y | ≥ α − α(α − 1) = α.
Hence, the objective value of the dual auxiliary model satisfies z ∗ = bT y∗ + MD ym+1
∗
≥ α, a
∗
contradiction. So, we have shown that wn+1 = 0.
Next, we prove that x∗n+2 > 0. Suppose for a contradiction that x∗n+2 = 0. In that case, the
primal constraint (α1 − c)T x + αxn+2 = MD implies that (α1 − c)T x∗ = MD , and therefore:
T ∗ T ∗ 2 T T ∗
c x = MD − α1 x = α (n + 1) − αc 1 − α1 x
2 T T ∗ 2 2 2
≥ α (n + 1) − α|c 1| − α|1 x | ≥ α (n + 1) − α − α
2 2
≥ α (n − 1) ≥ α .
Hence, the objective value of the dual auxiliary model satisfies z ∗ = cT x∗ + MP x∗n+1 ≥ α2 ,
∗
a contradiction. So, we have shown that xn+2 = 0. This proves the theorem.
Note that we have shown that the initialization procedure is theoretically guaranteed to
work for any value of α that is larger than the right hand side of (6.5). In practice, however,
much lower values of α suffice.
6.5 Exercises
Exercise 6.5.1. There are several ways to formulate an LO-model in either the form (P)
or in the form (D); see Section 6.1.1. Consider for instance the LO-model:
min x1 − 2x2
s.t. x1 ≤1
−x1 + x2 ≤ 1
x1 , x2 ≥ 0.
(D)-formulation: (P)-formulation:
−max −x1 + 2x2 −min y1 + y3
s.t. x1 ≤1 s.t. y1 − y2 − y3 = −1
−x1 ≤0 y3 − y4 = 2
−x1 + x2 ≤ 1 y1 , y2 , y3 , y4 ≥ 0.
−x2 ≤ 0
x1 , x2 free,
(P)-formulation: (D)-formulation:
min x1 − 2x2 max y1 + y2
s.t. x1 + x3 =1 s.t. y1 − y2 ≤ 1
−x1 + x2 + x4 = 1 y2 ≤ −2
x1 , x2 , x3 , x4 ≥ 0, y1 ≤ 0
y2 ≤ 0
y1 , y2 free.
The number of variables of the above (P), (D) combinations can be determined as follows.
For the first (P), (D) combination: the (D)-formulation has two decision variables and
four slack variables; the (P)-formulation has four decision variables. Hence, the number of
variables is (4 + 6 =) 10.
For the second (P), (D) combination: the (P)-formulation has four decision variables; the
(D)-formulation has two decision plus four slack variables. So, there are (4 + 6 =) 10
variables.
Determine (P)- and (D)-formulations with the smallest total number of variables (decision
variables plus slack variables) for the following LO-models:
Exercise 6.5.2. Determine the following entities for both a (P)- and a (D)-formulation
with the smallest number of variables of the models formulated in Exercise 6.5.1.
(a) The sets of interior points FP+ and FD+ (as defined in Section 6.1); if the sets are
nonempty, determine an interior point.
(b) The Karush-Kuhn-Tucker conditions.
(c) The logarithmic barrier functions.
Exercise 6.5.3. Determine the interior paths for a (P)- and a (D)-formulation with the
smallest number of variables of the models of Exercise 6.5.1(b),(d)–(g); draw the interior
paths for the cases (e), (f), and (g).
Exercise 6.5.4. Calculate, for each of the models of Exercise 6.5.1(b),(d)–(g), the duality
gap as a function of µ (see Theorem 6.1.3), and draw the functions cT x(µ) and bT y(µ) in
one figure. Use the interior paths to obtain optimal solutions.
Exercise 6.5.5. Let x(µ) denote the interior path of the primal model (P), and suppose
that the primal feasible region FP be bounded. Then xP = lim x(µ) is called the analytic
µ→∞
center of FP . The analytic center of the dual feasible region is defined similarly.
(a) Show that xP ∈ FP .
n P T o
n
(b) Show that xP = min − i=1 ln(xi ) x = x1 . . . xn ∈ FP .
(c) Calculate, if they exist, the analytic centers of the feasible regions of the models of
Exercise 6.5.1(b)–(g).
Exercise 6.5.6. Draw the graph of BP (x1 , x2 , µ) = µ1 (x1 + 2x2 ) − ln(x1 x2 ) for several
values of the parameter µ > 0; compare Figure 6.2.
276 C h a p t e r 6 . L a r g e - s ca l e l i n e a r o p t i m i z at i o n
Exercise 6.5.7.
(a) Calculate PA (see Section 6.2.2), where A is the matrix corresponding to a (P)-
formulation with the smallest number of variables of the models of Exercise 6.5.1.
(b) Draw the sets N (A) and R(A) for the case A = 1 −1 . Calculate, and draw in
T T
one figure, the points 0 1 and PA ( 0 1 ).
Exercise 6.5.8. Apply Dikin’s affine scaling procedure with respect to an interior point x̂
to the following LO-models; show that the unit ball (circle in R2 , line segment in R) with
center 1 and radius 1 is completely contained in the transformed feasible region (compare
the example in Section 6.2.3).
Exercise 6.5.11. Consider model (A) in Section 6.1.2; see also Figure 6.1. Calculate
T T
the tangent in the point 0.4 0.8 of the level curve of BP ( x1 x2 1−x1 1−x2 , 2)
T
through the point 0.4 0.8 . Draw this tangent in Figure 6.1. Explain why ‘starting at
T T
0.4 0.8 and moving in the direction of the vector 0.15 −0.10 ’ is moving in the
T
direction of the interior path. Answer the same questions for the points 0.1 0.2 and
T
0.5 0.4 .
6 . 5. E xe rc i se s 277
Exercise 6.5.12. Consider model (A) in Section 6.1.2; see also Figure 6.1. Determine
T
the search direction when x0 = 0.55 0.70 is taken as initial interior point and µ0 = 1.
Exercise 6.5.16. Draw the interior paths in case the right hand side values and objective
coefficients in Model Dovetail are perturbed. Give special attention to the cases where the
objective function is parallel to a constraint.
This page intentionally left blank
7
C hap te r
Integer linear optimization
Overview
An integer linear optimization model (notation: ILO-model) is a linear optimization model in
which the values of the decision variables are restricted to be integers. The reason for
investigating integer optimization models is that many practical problems require integer-
valued solutions. Solving practical integer optimization models is usually very complicated,
and solution techniques usually require excessive amounts of (computer) calculations. We
give several examples of ILO-models, and we introduce binary variables, which can be used to
model certain nonlinear constraints and objective functions, and logical if-then constraints.
The most widely used solution technique for solving ILO-models is the so-called branch-
and-bound algorithm: a solution procedure that creates a ‘tree’ of (noninteger) LO-models
whose solutions ‘grow’ to an optimal integer solution of the initial model. The branch-and-
bound algorithm is applied to a knapsack problem, a machine scheduling problem (traveling
salesman problem), and a decentralization problem. We also discuss the cutting plane algorithm
discovered by Ralph E. Gomory (born 1929) for solving mixed integer linear optimization
models (MILO-models), in which some of the decision variables are constrained to be integer-
valued and the others are allowed to take arbitrary (nonnegative real) values.
7.1 Introduction
In this section we introduce a prototype example which will be used to explain and illustrate
the new concepts and techniques. We then introduce the standard forms of ILO- and
MILO-models, and the corresponding terminology. A round-off procedure is introduced
for complicated situations in which a quick solution is demanded (see also Chapter 17). We
will see that round-off procedures may lead to suboptimal or even infeasible outputs, and so
more sophisticated techniques are needed.
279
280 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
Of course, the demand for cheese and milk fluctuates over time. The management of Dairy
Corp has decided that the total vehicle capacity to be purchased should not exceed the
minimum daily demand, which is 2,425 crates of cheese and 510 crates of milk. Therefore
the constraints are:
100x1 + 50x2 ≤ 2425
20x2 ≤ 510.
Since it does not make sense to buy parts of vehicles, the variables x1 and x2 have to be
integer-valued. So the model becomes:
The variables of Model Dairy Corp are restricted to be integer-valued. For this reason, the
model is an example of an ILO-model. In Figure 7.1 the feasible region of Model Dairy
Corp is depicted. Note that the feasible region consists of points with integer coordinate
values, called integer points. The problem is to find a feasible integer point that maximizes
the objective function 1000x1 + 700x2 ; compare Section 1.2.3. This model can be solved
by means of the branch-and-bound algorithm, to be formulated and discussed in Section
7.2.
7. 1 . I n t r o d u c t i o n 281
x2
25
20
15
10
0 5 10 15 20 25 x1
7.1.2 ILO-models
In general, the standard form of an ILO-model is:
max cT x Ax ≤ b, x ≥ 0, x integer ,
7.1.3 MILO-models
The general form of a maximizing MILO-model can be written as follows:
x2
(c)
1 (d)
(a)
(b)
0 1 2 3 4 5 6 x1
Figure 7.2: Different solutions found by rounding the optimal LO-relaxation solution. Point (a) is the
optimal LO-relaxation solution, point (b) is the rounded-down solution, point (c) is the
rounded-up solution, and point (d) is the optimal ILO-solution.
The feasible region of this model and its LO-relaxation are shown in Figure 7.2. The shaded area is
the feasible region of the LO-relaxation and the stars represent the feasible points of the ILO-model.
(a) The optimal solution of the LO-relaxation is: x1 = 4 54 , x2 = 35 with optimal objective
value z = 11 25 .
T
(b) Rounding down 4 45 35 to the closest feasible integer solution leads to x1 = 4,
T
(c) Rounding up 4 45 35 to the closest integer solution leads to x1 = 5, x2 = 1, which
Hence, rounding may lead to a large deviation of the optimal solution. If, for instance, in
the above example the production is in batches of 1,000 units, then the difference between
8,000 and 11,000 may be far too large.
T
Notice that, in the example, the optimal solution 4 1 can be obtained from the optimal
solution of the LO-relaxation by rounding down the value found for x1 and rounding up
the value found for x2 . This suggests that an optimal solution of an ILO-model can be found
by solving the corresponding LO-relaxation and appropriately choosing for each variable
whether to round its value up or down. Unfortunately this, in general, does not work
either. The reader is presented with an example in Exercise 7.5.2. The conclusion is that
more specific techniques are needed for solving ILO-models.
x2
T x2 ≥ 26
11 12 22 21
25
x2 ≤ 25
20
0 5 10 15 20 x1
Figure 7.3: Region of Figure 7.1 that is excluded by requiring that x2 ≤ 25 or x2 ≥ 26.
x2 , is chosen to branch on. This means the following. Since there are no integers between
25 and 26, any optimal integer solution must satisfy either x2 ≤ 25, or x2 ≥ 26. The
feasible region is accordingly split into two parts by adding (a) x2 ≤ 25, and (b) x2 ≥ 26,
respectively, to the current model, resulting in submodels M2 and M3, respectively. So the
region determined by 25 < x2 < 26, and hence also the solution x1 = 11 12 , x2 = 25 21 ,
is excluded from the original feasible region. This excluded (shaded) region is shown in
Figure 7.3.
In the next step of the branch-and-bound algorithm, the following two LO-models are
solved:
M1
z=29,350
x1 =11 21 , x2 =25 12
x2 ≤25 x2 ≥26
M2 M3
z=29,250 z=−∞
x1 =11 43 , x2 =25 (infeasible)
x1 ≤11 x1 ≥12
M4∗ M5
z=28,500 z=29,150
x1 =11, x2 =25 x1 =12, x2 =24 21
Figure 7.4: The first iterations of the branch-and-bound algorithm for Model Dairy Corp. The models
marked with a star have integer optimal solutions.
branching on M4; therefore, the process is not continued here. However, model M5 (with
x1 = 12, x2 = 24 12 ) may give rise – by further branching – to a value of z that is larger
than the current largest one (namely, z = 28,500). Therefore, we continue by branching
on model M5.
In Figure 7.5, the full branch-and-bound tree of Model Dairy Corp is shown. In model M8,
we find a new candidate integer solution, namely x1 = 12 and x2 = 24, with z = 28,800,
which is better than the integer solution found in model M4. This solution gives the lower
bound 28,800, i.e., z ∗ ≥ 28,800. Notice that the optimal solution found for model M9
is not integer yet. So we might consider branching on it. However, branching on model
M9 will only lead to models that are either infeasible, or have optimal objective values at
most 28,750. So, it does not make sense to continue branching on model M9. Since
there are no models left to branch on, the branch-and-bound algorithm stops here. The
optimal solution of Model Dairy Corp is therefore the one found in model M8, namely
z ∗ = 28,800, x∗1 = 12, and x∗2 = 24.
The branch-and-bound algorithm for ILO-models can easily be modified for MILO-models.
Namely, branching should only be carried out on variables that are required to be integer-
valued. Also for a solution of a submodel to be a candidate solution, it need only assign
integer values to those variables that are required to be integer-valued.
Example 7.2.1. Consider the following MILO-model:
max x1 + 2x2
s.t. x1 + x2 ≤ 3
2x1 + 5x2 ≤ 8
x1 , x2 ≥ 0, x2 integer.
We ask for an optimal solution for which x2 is the only integer-valued variable. First, the LO-
relaxation of this model is solved: x1 = 2 31 , x2 = 23 , z = 3 32 . Since x2 is the only variable that
286 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
M1
z=29,350
x1 =11 21 , x2 =25 12
x2 ≤25 x2 ≥26
M2 M3
z=29,250 z=−∞
x1 =11 43 , x2 =25 (infeasible)
x1 ≤11 x1 ≥12
∗ M5
M4
z=28,500 z=29,150
x1 =11, x2 =25 x1 =12, x2 =24 12
x2 ≤24 x2 ≥25
M6 M7
z=29,050 z=−∞
x1 =12 21 , x2 =24 (infeasible)
x1 ≤12 x1 ≥13
M8∗ M9
z=28,800 z=28,750
x1 =12, x2 =24 x1 =13, x2 =22 21
Figure 7.5: Branch-and-bound tree for Model Dairy Corp. The models marked with a star have integer
optimal solutions.
has to be an integer, we are forced to branch on x2 . This yields a submodel with x2 ≤ 0 (that is,
x2 = 0), and x2 ≥ 1. Next, we choose to solve the model on the ‘x2 ≤ 0’-branch; the optimal
solution is: x1 = 3, x2 = 0, z = 3. The LO-relaxation of the submodel on the ‘x2 ≥ 1’-branch
has the optimal solution: x1 = 1 12 , x2 = 1, z = 3 12 . The submodel with solution z = 3 can be
excluded from consideration, because the other submodel has already a larger z -value, namely z = 3 12 .
Therefore, the optimal solution is x∗1 = 1 12 , x∗2 = 1, z ∗ = 3 21 . It is left to the reader to draw the
corresponding branch-and-bound tree.
subnode of Mk . The algorithm also keeps track of upper and lower bounds on the optimal
objective value z ∗ of the original (M)ILO–model.
In the case of Model Dairy Corp, the relaxation concerned removing the restriction that
the variables should be integer-valued. In general, the relaxation should be chosen such that
the new model is easier to solve than the original problem.
We distinguish the following phases of a branch-and-bound iteration.
I Node selection phase. Select a node Mk (k ≥ 1) of the branch-and-bound tree that has
not been selected so far. In the case of Model Dairy Corp, we chose to always select the
most recently created unselected node. In general there are a number of different node
selection rules. We mention three possibilities; see also Exercise 7.5.5:
I Backtracking. This is a procedure where branching is performed on the most recently
created unselected node of the branch-and-bound tree. Backtracking is a so-called
LIFO search procedure (last-in-first-out search procedure). Note that the above solution
procedure for Model Dairy Corp uses backtracking.
I A FIFO search procedure (first-in-first-out search procedure) proceeds the branching
process on the unselected node that was created earliest.
I Jumptracking. Here, branching is performed on the branch with the current best (finite)
objective value. Jumptracking is a form of a so-called best-first search procedure.
I Bounding phase. The optimal objective value zj of the submodel corresponding to sub-
node Mj of Mk is at most the optimal objective zk of the submodel corresponding to
node Mk . This means that, if we select node Mj in the node selection phase, and the
current lower bound is greater than zk , then the submodel corresponding to node Mj
can be excluded from further consideration, because we have already found a solution
that is better than the best that we can obtain from this submodel. In that case, we return
to the node selection phase, and select a different node (if possible). In the case of Model
Dairy Corp, we used LO-relaxations to determine these upper bounds.
I Solution phase. The submodel corresponding to node Mk (which was selected in the
node selection phase) is solved. In Model Dairy Corp, this entails solving the LO-model
using a general purpose linear optimization algorithm like the simplex algorithm or an
interior point algorithm. Sometimes, however, there is a simpler algorithm to solve the
submodel (see, for instance, Section 7.2.3). If the solution satisfies the constraints of the
original model (e.g., the integrality constraints in the case of Model Dairy Corp), then
we have found a candidate solution for the original model, along with a lower bound for
the optimal value z ∗ of the original (M)ILO-model. In that case, we return to the node
selection phase. Otherwise, we continue with the branching phase.
I Branching phase. Based on the solution found in the solution phase, the feasible region of
the submodel corresponding to node Mk is partitioned into (usually) mutually exclusive
subsets, giving rise to two or more new submodels. Each of these submodels is repre-
sented by a new node that is added to the branch-and-bound tree and that is connected
to node Mk by an arc that points in the direction of the new node. In Model Dairy
288 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
Determining the branch-and-bound tree in the case of (M)ILO-models uses these phases.
We may therefore formulate the branch-and-bound algorithm for any maximizing opti-
mization model to which the various phases can be applied. The following assumptions are
made. First, it is assumed that an algorithm is available for calculating optimal solutions of
the submodels. In the case of (M)ILO-models this can be the simplex algorithm, applied
to an LO-model of which the feasible region is a subset of the feasible region of the LO-
relaxation of the original model. Moreover, it is assumed that a branching rule is available.
The process is repeated until a feasible solution is determined whose objective value is no
smaller than the optimal objective value of any submodel.
The general form of the branch-and-bound algorithm for maximizing models can now be
formulated as follows. We use the set N S to hold the nodes that have not been selected in
the node selection phase; the variable zL denotes the current lower bound on the optimal
solution of the original model. The function f : F → R, where F is the feasible region,
is the objective function. The feasible region of the LO-relaxation of the original model
is denoted by F R . Optimal solutions of submodels are denoted by (z, x), where x is the
optimal solution of the submodel, and z is the corresponding objective value.
Input: Values for the parameters of the model max{f (x) | x ∈ F }, where F is
the feasible region.
Output: Either
(a) the message: the model has no optimal solution; or
(b) an optimal solution of the model.
I Step 0: Initialization. Define F0 = F R , N S = {0}, and zL = −∞.
I Step 1: Optimality test and stopping rule. IfN S 6= ∅, then go to Step 2. Stop the
procedure when N S = ∅; the current best solution is optimal. If there is
no current best solution, i.e., zL = −∞, then the original model has no
optimal solution.
7. 2 . Th e b ra n c h - a n d - b o u n d a l g o r i t h m 289
In the case of a minimizing model, the above described branch-and-bound algorithm can
easily be adapted: e.g., zL has to be replaced by the ‘current upper bound’ zU , −∞ by ∞,
and the inequality signs reversed.
Regardless of how branch-and-bound algorithms are implemented, their drawback is that
often they require excessive computing time and storage. In the next section we will apply
the branch-and-bound algorithm to a problem with a specific structure through which the
algorithm works very effectively, whereas in Chapter 9 we give an example for which the
algorithm needs an enormous amount of computing time.
exceed the size of the knapsack. So we need to decide which objects are to be packed in
the knapsack and which ones are not. Knapsack problems can usually effectively be solved
by means of the branch-and-bound algorithm.
Example 7.2.2. Consider a knapsack of fifteen liters and suppose that we are given five objects,
whose values and volumes are listed in Table 7.1. One can easily check that the best selection of objects
is an optimal solution to the ILO-model:
For knapsack problems in which the decision variables are not required to be in {0, 1}, see
for instance Exercise 7.5.9 and Section 16.2.
This type of knapsack problems can be solved by the branch-and-bound algorithm. In fact,
because of the special structure of the model, the branching phase and the solution phase
in the branch-and-bound algorithm are considerably simplified. The branching phase is
simplified because each variable must be equal to either 0 or 1, which means that branching
on the value of xi (i = 1, . . . , n) yields exactly two branches, namely an ‘xi = 0’-branch
and an ‘xi = 1’-branch. More importantly, the solution phase can be simplified because the
7. 2 . Th e b ra n c h - a n d - b o u n d a l g o r i t h m 291
LO-relaxation of a knapsack problem can be solved by inspection. To see this, observe that
ci /ai may be interpreted as the value of object i per size unit. Thus, the most promising
objects have the largest values of ci /ai . In order to solve the LO-relaxation of a knapsack
problem, we compute the ratios ci /ai and order the objects in nonincreasing order of ci /ai ;
the largest has the best ranking and the smallest the worst. The algorithm continues by first
packing the best ranked object in the knapsack, then the second best, and so on, until the
best remaining object will overfill the knapsack. The knapsack is finally filled with as much
as possible of this last object. In order to make this more precise we show the following
theorem.
Theorem 7.2.1.
The LO-relaxation of Model 7.2.1 has the optimal solution:
1
x1∗ = . . . = x∗r = 1, x∗r+1 = (b − a1 − . . . − ar ), x∗r+2 = . . . = x∗n = 0,
ar+1
with r such that a1 + . . . + ar ≤ b and a1 + . . . + ar+1 > b, under the assumption
that c1 /a1 ≥ . . . ≥ cn /an .
(Informally, the interpretation of r is that the first r objects may leave some space in the
knapsack but the (r + 1)’th object overfills it.)
∗
Proof. Clearly, 0 ≤ 1
ar+1 (b − a1 − . . . − ar ) < 1. One can easily check that x , as defined in
the statement of the theorem, is feasible. We will use Theorem 4.2.3 to prove Theorem 7.2.1.
In order to use this theorem, we need a dual feasible solution, say y1∗ . . . yn+1
∗ T
, for which
∗ ∗ ∗ ∗ ∗
c1 x1 + . . . + cn xn = by1 + y2 + . . . + yn+1 . The dual model reads:
min by1 + y2 + . . . + yn+1
s.t. a1 y1 + y2 ≥ c1
.. ..
. .
an y1 + yn+1 ≥ cn
y1 , . . . , yn+1 ≥ 0.
It is left to the reader to check that the choice
∗ c ∗ cr+1
y1 = r+1 , yk = ck−1 − ak−1 for k = 2, . . . , r + 1, and
ar+1 ar+1
∗
yk =0 for k = r + 2, . . . , n + 1.
is dual feasible, and that the objective values for the points x∗ = x∗1 . . . x∗n and y∗ =
T
∗ ∗ T ∗
y1 . . . yn+1 are the same. Hence, x is an optimal solution of the LO-relaxation of the
knapsack problem.
Example 7.2.3. We will solve the LO-relaxation of the knapsack problem from Example 7.2.2.
We start by computing the ratios ci /ai and assigning rankings from best to worst (see Table 7.2). The
LO-relaxation is now solved as follows. There are three objects with the highest ranking 1; choose object
292 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
x1 = 1, x2 = 0, x3 = 27 , x4 = 1, x5 = 1, and z = 14 57 .
To show that this solution is actually optimal, we have to calculate y as defined in the proof of Theorem
7.2.1. Clearly, y1 = c3 /a3 = 67 , y2 = c1 − a1 (c3 /a3 ) = 75 , y5 = c4 − a4 (c3 /a3 ) = 76 ,
y6 = c5 − a5 (c3 /a3 ) = 27 (because c1 /a1 = c4 /a4 = c5 /a5 ≥ c3 /a3 ≥ c2 /a2 ), and
y3 = y4 = 0. Note that z = by1 + y2 + y3 + y4 + y5 + y6 = 14 57 , and so the feasible solution
above is in fact an optimal solution of the LO-relaxation.
We have shown how to solve the LO-relaxation of a knapsack problem. What remains is
to deal with the variables that have been branched on. When solving a submodel in an
‘xi = 0’-branch, we simply ignore object i. On the other hand, when solving a submodel
in an ‘xi = 1’-branch, object i is forced to be in the knapsack; we solve the LO-relaxation
of a knapsack problem for the remaining capacity b − ai .
Example 7.2.4. Consider again the knapsack problem of Example 7.2.2. We illustrate the above
by solving a submodel in which we have that x1 = x3 = 1 and x4 = 0. Since x4 = 0, object
4 is ignored in the analysis. Because x1 = x3 = 1, objects 1 and 3 are already packed in the
knapsack. What remains to be solved is the LO-relaxation of a knapsack problem with knapsack
capacity b − a1 − a3 = 15 − 5 − 7 = 3, where objects 2 and 5 can still be packed into the
knapsack. We have that c2 /a2 = 34 and c5 /a5 = 1, and therefore object 5 has ranking 1 and
object 2 has ranking 2. Now it follows that an optimal solution is x5 = 1 and x2 = 14 , along with
x1 = 1, x3 = 1, and x4 = 0.
The process is repeated until an optimal solution is reached in the branch-and-bound tree. A possible(!)
tree is presented in Figure 7.6 (the variables with a zero value are not mentioned in the blocks of
the tree). Note that if we use the backtracking node selection rule, then we would successively solve
submodels M1, M2, M4, M5, and M3, and we could already stop after solving submodel M3. The
7. 2 . Th e b ra n c h - a n d - b o u n d a l g o r i t h m 293
M1
z=14 57
x1 =x4 =x5 =1, x3 = 72
x3 =0 x3 =1
M2 M3
z=14 21 z=14
x1 =x4 =x5 =1, x2 = 12 x3 =x4 =1, x1 = 52
x2 =0 x2 =1 x1 =0 x1 =1
x4 =0 x4 =1
M8
M9
z=13 34
z=−∞
x1 =x3 =x5 =1, x2 = 41
Figure 7.6: Branch-and-bound tree for the knapsack problem. The models marked with a star have
integer optimal solutions.
Note that no unit of the knapsack remains unused in these optimal solutions. The reader may check
that these are the only optimal solutions.
From To job j
job i 0 1 2 3 4 5 6
0 − 1 1 5 4 3 2
1 1 − 2 5 4 3 2
2 1 5 − 4 2 5 4
3 5 4 6 − 6 2 5
4 5 2 6 3 − 5 4
5 5 3 5 1 5 − 3
6 6 5 4 6 6 5 −
Example 7.2.5. Suppose that we have the five jobs labeled 1, 2, 3, 4, and 5, each requiring
a processing time of two minutes. Hence, n = 5 and pi = 2 for i = 1, . . . , 5. The setup times
are given in Table 7.3. One possible schedule is to process the jobs in the order 35241. The
schedule is presented in Figure 7.7. In this figure, time is drawn from left to right. The left edge
corresponds to the beginning of the schedule, and the right edge to the end of the schedule. The numbers
are minute numbers; for example, the number 1 corresponds to the first minute of the schedule. In
the first five minutes, the machine is being set up to start processing job 3. The next two minutes are
spent on processing job 3. The two minutes following this are spent on setting up the machine to start
processing job 5, and so on. It should be clear that the corresponding total processing time is 10 and
the total setup time is 17. This type of chart is called a Gantt chart; see also Section 8.2.6.
We introduce the following variables. For each i, j = 0, . . . , n, define the decision variable
δij , with the following interpretation:
(
1 if job j is processed immediately after job i
δij =
0 otherwise.
Since δij can only take the values 0 and 1, δij is called a binary variable, or a {0, 1}-variable.
The objective of the problem can now be formulated as:
n X
X n
min δij cij .
i=0 j=0
The constraints can also be expressed in terms of the δij ’s. Since each job has to be processed
precisely once, we have that:
n
X
δij = 1 for j = 0, . . . , n, (7.1)
i=0
and because after each job exactly one other job has to be processed, we also have that:
n
X
δij = 1 for i = 0, . . . , n. (7.2)
j=0
7. 2 . Th e b ra n c h - a n d - b o u n d a l g o r i t h m 295
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Figure 7.7: Gantt chart for Example 7.2.5. The horizontal axis denotes time; the blocks denoted by ‘ij’
are the setup times; the blocks denoted by ‘i’ are the processing times (i, j = 0, . . . , 5).
Example 7.2.6. Consider again the machine scheduling problem of Example 7.2.5. The solution
depicted in Figure 7.7 can be represented by taking δ03 = δ35 = δ52 = δ24 = δ41 = δ10 = 1
and all other δij ’s equal to zero. It is left to the reader to check that this solution satisfies constraints
(7.1) and (7.2). This solution can also be represented as a directed cycle in a graph; see Figure 7.8(a).
The nodes of this graph are the jobs 0, . . . , 5, and an arc from job i to job j (i 6= j ) means that job
j is processed directly after job i.
A feasible schedule should consist of only one cycle. Therefore, the model needs constraints
that exclude subtours from optimal solutions. Such a set of constraints is:
X
δij ≤ |S| − 1 for S ⊂ {0, . . . , n}, S =
6 ∅;
i,j∈S
|S| denotes the number of elements of the set S , and ‘S ⊂ {0, . . . , n}’ means that S is a
subset of {0, . . . , n} with S 6= {0, . . . , n}. These constraints are called subtour-elimination
constraints. In Exercise 7.5.19 the reader is asked to show that the above subtour-elimination
constraints really exclude subtours. One may now ask the question whether including the
subtour-elimination constraints is sufficient, or whether even more constraints need to be
added. Exercise 7.5.19 asks to show that the subtour-elimination constraints are in fact
sufficient, i.e., once the subtour-elimination constraints have been included, every feasible
296 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
1 2 1 2
0 3 0 3
5 4 5 4
(a) Feasible schedule. (b) Infeasible schedule.
Figure 7.8: Solutions to the machine scheduling problem. The arrows represent the order in which the
jobs are processed on the machine.
solution of the ILO-model corresponds to a feasible schedule, and vice versa. So, the ILO-
model for the machine-scheduling problem can now be formulated as follows.
M1
z=14
062410,
353
δ53 =0 δ35 =0
M2 M3
z=16 z=17
010, 02410,
243562 3 653
δ01 =0 δ10 =0
M4 M5
z=17 z=18
02410, 01620,
3563 3543
M6 M7 M8
z=19 z=17 z=17
024310, 06243510 02435610
5 6 5
(by giving the weight on this arc a very large value), the value of an optimal solution to
the relaxation with this restriction cannot be any better than the previous optimal solution.
Hence, the value of the lower bound cannot decrease. If the value of the new lower bound
is at least as large as the value of any known feasible schedule, then we need not consider
restricting any more arcs, since this can only increase the value of the optimal solution. This
process of restricting arcs is repeated until a better feasible schedule is found, or no further
restrictions can be better.
Example 7.2.8. Consider again the machine scheduling problem of Example 7.2.5. We will apply
the above described branch-and-bound process to the data of Table 7.3. The branch-and-bound tree is
depicted in Figure 7.9. The solution procedure is as follows:
I Iteration 1. The original model without the subtour-elimination constraints is solved, resulting in a
lower bound z for the optimal objective value. The calculations are carried out with a computer pack-
age. The solution that is found has z = 14 and contains two subtours, namely 062410
and 353. Since optimal solutions do not contain either of these subtours, we may choose an
arbitrary one of them to branch on, meaning that all branches of the tree below this branch do not
contain the subtour that is chosen for branching. We choose the subtour 353, corresponding to
δ35 = 1 and δ53 = 1. In order to exclude this subtour, we branch by adding the constraints
δ35 = 0 and δ53 = 0, respectively, in the next two submodels.
I Iteration 2. The new submodels correspond to the nodes M2 and M3 in Figure 7.9. Model
M2 has optimal objective value z = 16 and there are again two subtours, namely 010
and 243562. Model M3 has as optimal objective value z = 17 with the subtours
02410 and 3653. Since model M2 has a larger optimal objective value than model
298 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
M3, we select M2 for further branching. The branches in the tree correspond to δ01 = 0 and
δ10 = 0, respectively.
I Iteration 3. Model M4 has optimal objective value z = 17 and subtours 02410 and
3563. Model M5 has as optimal objective value z = 18 and subtours 01620 and
3543. Note that at this stage of the calculation procedure the submodels M3, M4, and M5
have not been selected yet. We select M4 for further branching. The cycle 3563 is used to
exclude the solution of M4. This gives three new branches.
I Iteration 4. Model M6 has a larger optimal objective value (z = 19) than model M7 (z = 17)
and M8 (z = 17), while models M7 and M8 both give a feasible solution. Hence, M6 is excluded
from further consideration. Similarly, M5 is excluded. Model M3 may give rise to a feasible solution
with objective value z = 17, but since models M7 and M8 already have feasible solutions with the
same objective value z = 17, M3 is excluded from further consideration as well.
The conclusion is that M7 provides an optimal solution of the original model; the solution corresponds
to the schedule 06243510, and the total reset and setup time is 17 time units. Note that
M8 gives an alternative optimal solution. It is left to the reader to try whether or not M3 also gives rise
to alternative optimal solutions.
7.2.5 The traveling salesman problem; the quick and dirty method
The ILO-model for the machine scheduling problem in Section 7.2.4 can also be viewed as
a model for the famous traveling salesman problem, which can be described as follows. Given a
number of cities along with distances between them, the objective is to determine a shortest
route through the cities, such that each city is visited precisely once, and the route starts and
ends in the same city. Let the cities be labeled 1 through n (with n ≥ 1), and let cij be the
distance between city i and city j .
Traveling salesman problems (TSPs), such as the machine scheduling problem in Section
7.2.4, can usually not be solved to optimality within reasonable time limits. Part of the
reason of this phenomenon is the fact that when the number n of ‘cities’ increases, the
number of constraints in the ILO-formulation of the problem increases by an exorbitant
amount. In particular, the number of constraints that avoid subtours may become very large.
Note that the number of subtour-elimination constraints is 2n − 2, because the number of
subsets of {1, . . . , n} is 2n , minus the empty set and the set {1, . . . , n} itself. In Example
7.2.5, we have n = 7 (six jobs plus the dummy job 0), and so there are 27 − 2 = 126
subtour-elimination constraints. Note that the number of subtour-elimination constraints
grows exponentially in n; for instance, if n = 40, then 240 − 2 = 1,099,511,627,774 ≈
1012 . The computational problems arising from exponentiality are discussed in more detail
in Chapter 9.
Although the formulation requires a large number of subtour-elimination constraints, it
turns out that many of them are nonbinding at any given optimal solution. This suggests a
so-called quick and dirty method (Q&D method), which can be formulated as follows.
7. 2 . Th e b ra n c h - a n d - b o u n d a l g o r i t h m 299
I Step 1. Solve the ILO-model for the traveling salesman problem without the subtour-
elimination constraints. To be precise, solve the following model:
n X
X n
min δij cij
i=1 j=1
Xn
s.t. δij = 1 for i = 1, . . . , n
j=1
Xn
δij = 1 for j = 1, . . . , n
i=1
δij ∈ {0, 1} for i, j = 1, . . . , n.
I Step 2. The current solution may contain subtours. If so, we first remove all possible
subtours of length two by adding the following constraints:
After adding these constraints to the model, the resulting model is solved.
I Step 3. If, after Step 2, the solution still contains subtours, then we eliminate them one
by one. For instance, if n = 8, then the subtour 12341 can be eliminated by the
requirement that from at least one of the cities 1, 2, 3, 4 a city different from 1, 2, 3,
4 (i.e., one of the cities 5, 6, 7, 8) has to be visited. This is achieved by the following
constraint:
X4 X 8
δij ≥ 1. (7.4)
i=1 j=5
In general, let X ⊂ {1, . . . , n} be the subset of cities visited by some subtour in the
current optimal solution. Then, we add the following constraint:
X X
δij ≥ 1. (7.5)
i∈X j∈{1,...,n}\X
Step 3 is repeated until the solution consists of only one tour, which is then an optimal
solution of the problem.
When prohibiting certain subtours, some new subtours may appear. So when the Q&D
method is applied, it is possible that a lot of constraints have to be added. Also notice that
if the number of cities is large, then the subtour-elimination constraints as described above
may contain a lot of terms. Indeed, the number of terms in (7.5) is |X| × (n − |X|). So,
if at some point during the Q&D method, we encounter a subtour that contains, e.g., n/2
cities, then the number of terms in (7.5) is n2 /4.
the number of iterations might be enormous; see Section 9.4. For many practical problems,
an optimal solution cannot be found within reasonable time limits. This fact has led to the
development of algorithms that do not guarantee optimality, but (hopefully) good feasible
solutions within (hopefully) reasonable time limits. Such algorithms are called heuristics. Al-
though heuristics generally do not guarantee anything regarding the quality of the solution
it finds or how much time it will take to find a solution, they do work well in many practical
situations.
Branch-and-bound algorithms can be used as heuristics, when one stops the calculation
process before it is finished (provided that a feasible solution has been found when the
calculation is stopped). The current best feasible solution (if there is one) is then used as
the solution of the problem. There are two commonly used criteria to stop the branch-and-
bound algorithm before it finishes. The first one simply sets a time limit so that the algorithm
terminates after the given amount of time. If a feasible solution has been found, then the
current best solution is used. Notice that, because the branch-and-bound algorithm keeps
track of upper bounds for the (unknown) optimal objective value z ∗ of the original (M)ILO-
model, we are able to assess the quality of the current best solution. Let zU be the current
best known upper bound for the optimal objective value, and let zL be the objective value
corresponding to the best known feasible solution. Then, clearly, zL ≤ z ∗ ≤ zU . Hence,
assuming that zL > 0, the objective value of the current best feasible solution lies within
zU − zL
× 100%
zU
of the optimal objective value. To see this, notice that
zU − zL ∗ z z
1− z = L z ∗ ≤ L zU = zL .
zU zU zU
Example 7.2.9. Suppose that we had set a time limit for finding a solution for the example
knapsack problem in Section 7.2.3, and suppose that the branch-and-bound algorithm was terminated
after solving models M1, M2, M3, and M4; see Figure 7.6. Then the current best solution is the
one found in M4 with zL = 13. The best known upper bound at that point is the one given by
M2, i.e., zU = 14 21 (recall that, since we have not solved M5 yet, its optimal objective value is not
known). Thus, we know at this point that the objective value of the current best solution lies within
100 × (14 21 − 13)/14 12 ≈ 10.4% of the optimal objective value. Notice that we have not used the
value of z ∗ , the actual optimal objective value, to establish this percentage. The solution with z = 13
actually lies within 100 × (14 − 13)/14 ≈ 7.2% of the optimal solution.
The discussion about the quality of the current best solution motivates the second stopping
criterion: sometimes we are satisfied (or we are forced to be satisfied!) with a solution that
lies within a certain fraction α of the optimal solution. In that case, we can stop the branch-
and-bound algorithm as soon as it finds a feasible solution with objective value zL and an
upper bound zU satisfying (zU − zL )/zU ≤ α.
7. 3 . L i n e a r i z i n g l o g i ca l f o r m s w i t h b i na r y va r i a b l e s 301
In practice, it often happens that the branch-and-bound algorithm finds a feasible solution
that is actually very close to an optimal solution (or, even better, it is an optimal solution).
However, the algorithm cannot conclude optimality without a good upper bound on the
optimal objective value, and as a result it sometimes spends an excessive amount of time
branching nodes without finding improved solutions. So the branch-and-bound algorithm
often spends much more time on proving that a solution is optimal than on finding this
solution. One way to improve the upper bounds is by using the so-called cutting plane
algorithm which will be introduced in Section 7.4.
The function T (x3 ) becomes T (δ, x3 ) = 5000x3 − 2000δ . The nonlinearity is now
hidden in the fact that δ can only take the values 0 and 1. This turns the problem into:
Proof of Theorem 7.3.1. (i) ⇒ (ii): We need to show that f (x) ≤ M δ for each x ∈ D and
each δ ∈ {0, 1}, given that the implication [δ = 0 ⇒ f (x) ≤ 0] holds for each x ∈ D and
each δ ∈ {0, 1}. Take any x ∈ D. For δ = 1, we have to show that f (x) ≤ M . This follows
from the definition of M . For δ = 0, we have to show that f (x) ≤ 0. But this follows from
the given implication.
(ii) ⇒ (i): We now have to show that the implication [δ = 0 ⇒ f (x) ≤ 0] holds for each x ∈ D
and each δ ∈ {0, 1}, given that f (x) ≤ M δ for each x ∈ D and each δ ∈ {0, 1}. Again take
any x ∈ D. If δ = 0, then f (x) ≤ 0 follows from f (x) ≤ M δ with δ = 0. For δ = 1, the
implication becomes [1 = 0 ⇒ f (x) ≤ 0], which is true because 1 = 0 is false; see Table 7.5.
x3 − M1 δ ≤ 0.
(1 − x3 ) − M2 (1 − δ) ≤ 0, or, equivalently, x3 − M2 δ ≥ 1 − M2 .
x3 − mδ ≥ 0.
necting propositions with logical connectives such as ‘and’, ‘or’, ‘if-then’, and ‘if and only if ’.
Table 7.4 lists a number of logical connectives with their names and interpretations; P1 and
P2 are propositions. The meaning of the ‘Constraint’ column will be explained in Section
7.3.3.
Example. For i = 1, 2, let Pi = ‘item i is produced’. Then, ¬P1 = ‘item 1 is not produced’,
(P1 ∧ P2 ) = ‘both item 1 and item 2 are produced’, (P1 ∨ P2 ) = ‘either item 1 or item 2 is
produced, or both’, (P1 Y P2 ) = ‘either item 1 or item 2 is produced, but not both’, (P1 ⇒ P2 ) =
‘if item 1 is produced then item 2 is produced’, (P1 ⇔ P2 ) = ‘item 1 is produced if and only if item
2 is produced’.
In Table 7.4, the connective ‘disjunction’ is used in two meanings. The term inclusive disjunc-
tion means that at least one proposition in the disjunction is true, allowing for the possibility
that both are true, and the term exclusive disjunction means that exactly one of the propositions
in the disjunction is true. The exclusive disjunction can be written as:
In words, the expression on the right hand side states that either (1) P1 is false and P2 is
true, or (2) P1 is true and P2 is false, or (3) both. Clearly, (1) and (2) cannot be true at the
same time.
The proofs of relationships of this kind lie beyond the scope of this book. However, since
the theory needed for such proofs is so elegant and easy, we will give a short introduction.
Such proofs can be given by using so-called truth tables. The truth tables for the connectives
introduced above are called the connective truth tables; they are listed in Table 7.5. Table 7.5
means the following. P1 and P2 denote propositions, ‘1’ means ‘true’ and ‘0’ means ‘false’.
Now consider, for instance, the implication P1 ⇒ P2 . If P1 is ‘true’ and P2 is ‘true’ then
P1 ⇒ P2 is ‘true’, if P1 is ‘true’ and P2 is ‘false’ then P1 ⇒ P2 is ‘false’, if P1 is ‘false’ then
P1 ⇒ P2 is ‘true’, regardless of whether P2 is ‘true’ or ‘false’. In order to prove that two
compound propositions are equal (also called logically equivalent), we have to show that the
corresponding truth tables are the same.
We will give a proof for (7.6). To that end, we have to determine the truth table of (¬P1 ∧
P2 ) ∨ (P1 ∧ ¬P2 ); see Table 7.6. This truth table is determined as follows. The first two
columns of Table 7.6 represent all ‘true’ and ‘false’ combinations for P1 and P2 . The fifth
7. 3 . L i n e a r i z i n g l o g i ca l f o r m s w i t h b i na r y va r i a b l e s 305
P1 P2 ¬P1 P1 ∧ P2 P1 ∨ P2 P1 Y P2 P1 ⇒ P2 P1 ⇔ P2
1 1 0 1 1 0 1 1
1 0 0 0 1 1 0 0
0 1 1 0 1 1 1 0
0 0 1 0 0 0 1 1
Table 7.6: Truth table of the compound proposition (¬P1 ∧ P2 ) ∨ (P1 ∧ ¬P2 ).
column, labeled ‘P2 ’, follows immediately; it is just a copy of the second column. The
seventh column, labeled ‘P1 ’, follows similarly. The columns corresponding to ¬P1 and
¬P2 then follow by using the connective truth table for ‘¬’. The fourth column, whose
column label is ‘∧’, corresponds to the formula ¬P1 ∧ P2 ; it is obtained from the third
and the fifth column by using the connective truth table for ‘∧’. The seventh column,
also labeled ‘∧’, is determined similarly. Finally, the sixth column, labeled ‘∨’, is the main
column. It follows from the fourth and the eighth column by applying the connective truth
table for ‘∨’. This sixth column is the actual truth table for this compound proposition. It
is equal to column corresponding to P1 Y P2 in Table 7.6, as required.
P1 ∨ . . . ∨ Pn ≡ δ1 + . . . + δn ≥ 1;
P1 Y . . . Y Pn ≡ δ1 + . . . + δn = 1;
P1 ∧ . . . ∧ Pk ⇒ Pk+1 ∨ . . . ∨ Pn ≡ (1−δ1 )+ . . . +(1−δk )+δk+1 + . . . +δn ≥1;
at least k out of n are ‘true’ ≡ δ1 + . . . + δn ≥ k;
exactly k out of n are ‘true’ ≡ δ1 + . . . + δn = k.
Substituting these values into (7.7), it follows that the expression [x = 0 Y x ≥ 10] is equivalent to:
x + 25δ ≤ 25
x + 10δ ≥ 10
x + εδ ≥ε
x + (15 + ε)δ ≤ 25.
One can easily check that δ = 0 corresponds to 10 ≤ x ≤ 25, and δ = 1 to x = 0. So the last
two inequalities are actually redundant. It is left to the reader to check that the above formulation is
equivalent to [x > 0 ⇐⇒ x ≥ 10].
(XA ∨ XB ) ⇒ (XC ∨ XD ∨ XE ).
The compound proposition XA ∨XB can then be represented as δA +δB ≥ 1, and XC ∨XD ∨XE
as δC + δD + δE ≥ 1; see Table 7.4. In order to linearize the above implication, we introduce a
dummy proposition X which satisfies:
XA ∨ XB ⇒ X ⇒ XC ∨ XD ∨ XE ,
(Explain why we need a dummy variable.) Hence, we need to model the implications:
δA + δB ≥ 1 ⇒ δ = 1 ⇒ δC + δD + δE ≥ 1.
Using Theorem 7.3.1 one can easily check that these implications are equivalent to:
δA + δB − 2δ ≤ 0, and δ − δC − δD − δE ≤ 0.
It may happen that an expression of the form δ1 δ2 appears somewhere in the model, for instance in the
objective function (see Section 7.3.4). Note that the value of δ1 δ2 is also binary. Clearly, δ1 δ2 = δ3 is
equivalent to [δ3 = 1 ⇔ δ1 = 1 and δ2 = 1], which is equivalent to [δ3 = 1 ⇔ δ1 + δ2 = 2],
and can be split into:
1 − δ3 = 0 ⇒ 2 − δ1 − δ2 ≤ 0, together with δ1 + δ2 = 2 ⇒ δ3 = 1.
Since max(2 − δ1 − δ2 ) = 2 − min(δ1 + δ2 ) = 2, it follows from Theorem 7.3.1 that the first
one is equivalent to (2 − δ1 − δ2 ) − 2(1 − δ3 ) ≤ 0, or to:
δ1 + δ2 − 2δ3 ≥ 0.
δ1 + δ2 − δ3 ≤ 1.
The inequalities (7.8) and (7.9) linearize the expression of this example.
Example 7.3.5. Recall that the graph of a piecewise linear function is composed of a number of line
segments connected by kink points. We will illustrate the linearization of a piecewise linear function by
7. 3 . L i n e a r i z i n g l o g i ca l f o r m s w i t h b i na r y va r i a b l e s 309
f (x)
6,000
5,000
4,000
3,000
2,000
1,000
means of the following example. Suppose that at most 800 pounds of a commodity may be purchased.
The cost of the commodity is $15 per pound for the first 200 pounds, $10 for the next 200 pounds,
and $2.5 for the remaining 400 pounds. Let x be the amount of the commodity to be purchased (in
pounds), and f (x) the total costs (in dollars) when x pounds are purchased. The function f can be
represented as a piecewise linear function in the following way:
15x
if 0 ≤ x ≤ 200
f (x) = 1000 + 10x if 200 < x ≤ 400
4000 + 2.5x if 400 < x ≤ 800.
The graph of this function is given in Figure 7.10. Note that the kink points occur at x1 = 200, and
x2 = 400. Let x0 and x3 be the start and end points, respectively, of the definition interval of the
function; i.e., x0 = 0 and x3 = 800. For any x with 0 ≤ x ≤ 800, the following equation
holds.
x = λ0 x0 + λ1 x1 + λ2 x2 + λ3 x3 ,
where λ0 , λ1 , λ2 , λ3 ≥ 0, λ0 + λ1 + λ2 + λ3 = 1, and at most two adjacent λi ’s are positive
(λi and λj are adjacent if, either j = i + 1, or j = i − 1). It can also be easily checked that
for each x with x0 ≤ x ≤ x3 . The statement that at most two adjacent λi ’s are positive can be
linearized by applying Theorem 7.3.1. To that end we introduce binary variables δ1 , δ2 , and δ3 , one
for each segment, with the following interpretation (with i = 1, 2, 3):
δi = 1 ⇒ λk = 0 for each k 6= i − 1, i.
δ1 + δ2 + δ3 = 1.
This definition means that if, for instance, δ2 = 1 then δ1 = δ3 = 0 and λ0 = λ3 = 0, and so
x = λ1 x1 + λ2 x2 and f (x) = λ1 f (x1 ) + λ1 f (x2 ) with λ1 , λ2 ≥ 0 and λ1 + λ2 = 1.
310 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
A B C A B C
Large DC 1,000 1,200 1,500 A - 1,100 1,500
Small DC 400 700 800 B 1,100 - 1,600
C 1,500 1,600 -
λ2 + λ3 + 2δ1 ≤ 2. (7.10)
λ0 + λ3 + 2δ2 ≤ 2, (7.11)
λ0 + λ1 + 2δ3 ≤ 2. (7.12)
The objective function consists of two parts. The first part concerns the cost reductions
with respect to the present-day situation. The total cost reduction is:
2 X
X 3
Bij δij .
i=1 j=1
Each term in this summation is Bij if δij = 1, and zero otherwise. The second part
concerns the transportation costs. These are:
X 2 X
2 X 3
3 X
1
2
δij δkl Kjl .
i=1 k=1 j=1 l=1
Each term in this summation is Kjl if δij = 1 and δkl = 1, and zero otherwise. Note that
if δij δkl = 1, then also δi0 j δk0 l for i0 = k and k 0 = i, and hence each Kjl appears in the
sum with coefficient either 0 or 2. To counter this double-counting, we multiply the sum
by 21 . The objective becomes:
2 X 3 2 X 2 X3 X 3
!
X X
max δij Bij − 12 δij δkl Kjl
i=1 j=1 i=1 k=1 j=1 l=1
There are two sets of constraints. First, each DC has to be assigned to precisely one city.
Hence,
X3
δij = 1 for i = 1, 2.
j=1
In the objective function the quadratic term δij δkl appears. In Section 7.3.3 (Example 7.3.3),
it is described how such a term can be linearized by introducing binary variables. Namely,
define binary variables νijkl (for i, k = 1, 2 and j, l = 1, 2, 3) with the interpretation:
(
1 if δij = 1 and δkj = 1
νijkl =
0 otherwise.
This can be rewritten as [νijkl = 1 ⇔ δij = 1 ∧ δkl = 1], which is equivalent to:
This ILO-model can be solved by means of a computer package. It turns out that there are
two optimal solutions:
The management of the supermarket chain can choose either of the two optimal solutions,
and take – if desired – the one that satisfies as much as possible any criterion that has not
been included in the model.
It is left to the reader to formulate the general form of the decentralization problem for m
DCs and n locations (m, n ≥ 1).
x2
3
( 74 , 11
4
)
0 1 2 3 x1
Figure 7.11: Feasible region consisting of integer points. The stars are the integer feasible points, the
shaded area is the feasible region of the LO-relaxation, and the dot is the optimal solution of
the LO-relaxation.
max x1 + x2
s.t. 7x1 + x2 ≤ 15
(P)
−x1 + x2 ≤ 1
x1 , x2 ≥ 0, and integer.
The feasible region of this model consists of the integer points depicted in Figure 7.11.
Gomory’s algorithm starts by solving the LO-relaxation of the original ILO-model. Let x3
and x4 be the slack variables. The optimal simplex tableau corresponds to the following
model. It is left to the reader to carry out the calculations leading to the final simplex
tableau.
max − 41 x3 − 34 x4 + 4 21
s.t. x1 + 18 x3 − 18 x4 = 1 43
(7.13)
x2 + 18 x3 + 78 x4 = 2 43
x1 , x2 , x3, x4 ≥ 0.
So an optimal solution of the LO-relaxation of model (P) is x01 = 1 43 , x02 = 2 34 , x03 =
x40 = 0, and z 0 = 4 21 . Note that this solution does not correspond to an integer point of
the feasible region.
Gomory’s algorithm now derives a constraint that is satisfied by all feasible integer solutions,
but that is not satisfied by the optimal solution of the LO-relaxation. In other words, the
halfspace corresponding to this new constraint contains all feasible integer points, but not
the optimal vertex of the LO-relaxation. So adding the constraint ‘cuts off’ the optimal
solution of the LO-relaxation. This new constraint is therefore called a cut constraint. The
cut constraint is added to the current LO-model (the LO-relaxation of the original ILO-
314 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
model), and the new model is solved. The process is then repeated until an integer solution
is found.
Such a cut constraint is calculated as follows:
I Step 1. Choose one of the noninteger-valued variables, say x1 . Notice that x1 is a basic
variable (because its value is noninteger, and hence nonzero). We select the constraint
in model (7.13) in which x1 appears (there is a unique choice for this constraint because
every basic variable appears in exactly one constraint):
x1 + 81 x3 − 18 x4 = 1 34 . (7.14)
The variables (other than x1 ) that appear in this equation are all nonbasic. Note that
this constraint is binding for every feasible solution, including the solution of the LO-
relaxation, x10 = 1 43 , x02 = 2 34 , x03 = x04 = 0. We call this constraint the source constraint.
I Step 2. Rewrite (7.14) by separating the integer and fractional parts of the coefficients
of the nonbasic variables. That is, we write each coefficient, p, say, as p = bpc + ε with
ε ∈ [0, 1). Doing this gives the following equivalent formulation of (7.14):
Separating the terms of this expression into integer and fractional parts, we obtain:
Because this constraint is equivalent to (7.14), it holds for all feasible points of the LO-
relaxation. So, up to this point, we have merely derived a constraint that is satisfied
by all feasible points of the LO-relaxation. But now we are going to use the fact that
the variables may only take on integer values. First notice that, because x3 and x4 are
nonnegative variables, the right hand side of (7.15) is at most 1 43 . That is,
1 34 − 18 x3 − 78 x4 ≤ 1 43 . (7.16)
On the other hand, because x1 , x3 , and x4 are only allowed to take on integer values,
the left hand side of (7.15) has to be integer-valued, which implies that the right hand
side has to be integer-valued as well. Because there are no integers in the interval (1, 1 34 ],
this means that, in fact, the right hand side has to be at most 1. This gives the first cut
constraint:
1 34 − 18 x3 − 78 x4 ≤ 1. (7.17)
This constraint is satisfied by every integer feasible point, but not by the optimal solution
of the LO-relaxation (observe that 1 43 − 18 x03 = 1 34 6≤ 1).
x2 x2
3 3
(7.16)
( 13
7
, 2) (1, 2)
2 (7.17) 2
1 1
0 1 2 3 x1 0 1 2 3 x1
Figure 7.12: Adding cut constraints. The points marked by a dot represent the optimal solutions of the
corresponding LO-relaxations.
3 3 T
1 4 2 4 . However, since all integer feasible points satisfy the cut constraint x2 ≤ 2, no
integer points are cut off.
The procedure to derive a cut constraint from its source constraint is described formally
in the next section. We will first finish the example.
I Step 3. Introducing the slack variable x5 for this cut constraint, the new LO-model
becomes:
max − 14 x3 − 34 x4 + 4 12
s.t. x1 + 81 x3 − 18 x4 = 1 34
x2 + 81 x3 + 78 x4 = 2 34
− 18 x3 − 78 x4 + x5 = − 34
x1 , x2 , x3, x4 , x5 ≥ 0.
The basic solution with BI = {1, 2, 5}, i.e., x1 = 1 34 , x2 = 2 34 , x5 = − 34 , is
not feasible. On the other hand, all objective coefficients are nonpositive, and so we may
apply the simplex algorithm to the dual model (see Section 4.6). The solution is x1 = 1 76 ,
x2 = 2, x4 = 67 , x3 = x5 = 0, and the objective value is z = 3 76 . The optimal simplex
tableau corresponds to the model:
max − 17 x3 − 76 x4 + 3 37
s.t. x1 + 17 x3 − 17 x5 = 1 67
x2 + x5 = 2
1
x
7 3
+ x4 − 1 17 x5 = 67
x1 , x2 , x3 , x4 , x5 ≥ 0.
The optimal solution of this model is still noninteger. The source constraint will now
be x1 + 17 x3 − 17 x5 = 1 76 . The corresponding cut constraint in terms of the nonbasic
variables x3 and x5 is − 71 x3 − 67 x5 ≤ − 67 . One can easily check that this cut constraint
316 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
Let BI be the index set of the (current) basic variables and N I the index set of (current) non-
basic variables (see Section 2.1.1) corresponding to an optimal solution of the LO-relaxation.
Let xk be a variable that has a noninteger value at the current optimal feasible basic solution.
Consider the unique constraint in the optimal simplex tableau in which xk appears:
X
xk = bk − akj xj , (7.18)
j∈N I
where akj is the (k, j)’th entry of A. This is the (current) source constraint. Notice that bk is
necessarily noninteger because xk = bk (recall that, for all j ∈ N I , the nonbasic variable
xj has value zero). Define:
where bpc is the largest integer such that bpc ≤ p. Clearly, 0 ≤ εkj < 1. By using (7.4.1)
and rearranging, the source constraint (7.18) can equivalently be written as:
X X
xk + bakj cxj = bk − εkj xj . (7.19)
j∈N I j∈N I
Since xj ≥ 0 and εkj ≥ 0 for all k ∈ BI and j ∈ N I , it follows that every feasible
solution satisfies j∈N I εkj xj ≥ 0, and so every feasible solution satisfies:
P
X
bk − εkj xj ≤ bk .
j∈N I
On the other hand, because xk , xj , and bakj c are all integers, the left hand side of (7.19)
has to be integer-valued. Therefore, bk − j∈N I εkj xj is integer-valued as well. Hence,
P
X
bk − εkj xj ≤ bbk c,
j∈N I
I Phase 1. The LO-phase. The current LO-relaxation is solved. If the optimal solution
x0 is integer-valued, then x0 is an optimal solution of the original ILO-model.
I Phase 2. The cutting plane phase. A cut constraint is derived that does not satisfy x0 ,
but satisfies the original feasible region. This cut constraint is added to the current model
and solved by means of the simplex algorithm applied to the dual model (see Section 4.6).
If the resulting optimal solution is integer-valued, then the algorithm stops. Otherwise,
Phase 2 is repeated. If, at some iteration, the simplex algorithm indicates that no feasible
solution exists, then the original ILO-model has no solution.
In practice, solving ILO-models using the cutting plane algorithm alone needs a lot of
computer time and memory. Also, it is not immediately clear from the above that the
algorithm will ever terminate. It turns out that the algorithm can be adapted in such a
way that it will always terminate and find an optimal solution (provided that it exists). This
analysis lies beyond the scope of this book; the interested reader is referred to, e.g., Wolsey
(1998).
After a number of iterations one can stop the algorithm (for instance, if not much is gained
anymore in the objective value by adding new cut constraints), and use the current objective
value as an upper bound. However, unlike in the case of the branch-and-bound algorithm,
intermediate stopping does not yield a feasible integer solution, and so intermediate stopping
can mean having invested a lot of computer time with no useful result. To overcome this
situation, algorithms have been developed that produce suboptimal solutions that are both
integer and feasible in the original constraints.
The cutting plane algorithm can also be used in combination with the branch-and-bound
algorithm. In particular, cut constraints can be added during the iterations of the branch-
and-bound algorithm to yield better upper bounds on the optimal objective value than the
branch-and-bound algorithm alone can provide.
with N I the index set of nonbasic variables. Clearly, bk is noninteger. We will write
bk = bbk c + β , where 0 < β < 1. Hence,
X
xk − bbk c = β − aj xj . (source constraint)
j∈N I
In a MILO-model, some of the nonbasic variables may not be restricted to integer values,
and so we cannot simply apply the cutting plane algorithm from Section 7.4.1. However, a
new cut constraint can be obtained as follows.
Because xk is an integer variable, every integer feasible point satisfies either xk ≤ bbk c, or
xk ≥ bbk c + 1. Using the source constraint, this condition is equivalent to:
X X
either akj xj ≥ β, or akj xj ≤ β − 1. (7.20)
j∈N I j∈N I
In its current form, this constraint states that at least one out of the two constraints should be
satisfied, but not necessarily both. Such a constraint is called a disjunctive constraint (see also
Table 7.4). In particular, it is not linear, and, therefore, it cannot be used as a cut constraint.
In what follows, we will use the disjunctive constraint (7.20) to construct a linear constraint
that can be used as a cut constraint.
We first divide both sides of the first inequality in (7.20) by β , and both sides of the second
one by 1 − β . This gives a disjunctive constraint that is equivalent to (7.20) (recall that both
β > 0 and 1 − β > 0):
X akj X akj
either xj ≥ 1, or − x ≥ 1. (7.21)
j∈N I
β j∈N I
1−β j
Determining the maximum of akj /β and −akj /(1 − β) is straightforward, because for
each j ∈ N I , the values of these expressions have opposite signs, which depend only on
the sign of akj . To be precise, define:
This is called a mixed cut constraint. It represents a necessary condition for xk to be integer-
valued. Since xj = 0 for each j ∈ N I , it follows that the slack variable of this mixed cut
constraint has a negative value (namely, −1) in the current optimal primal solution. So, the
dual simplex algorithm (see Section 4.7) can be applied to efficiently reoptimize the model
with the added cut constraint.
The following example illustrates the procedure.
Example 7.4.1. Suppose that x1 is the only integer variable in the example of Section 7.4. The
constraint corresponding to the basic variable x1 is:
x1 + 81 x3 − 18 x4 = 1 34 .
Then, J + = {3}, J − = {4}, b1 = 1 34 , and β = 34 . Hence, the mixed cut constraint is:
1
8
− 18
x3 − x ≥ 1,
3
4
1 − 34 4
which is equivalent to
1
x
6 3
+ 12 x4 ≥ 1.
Since x3 = 15 − 7x1 − x2 , and x4 = 1 + x1 − x2 , it follows that the mixed cut constraint in
terms of the decision variables reads x1 + x2 ≤ 3.
7.5 Exercises
Exercise 7.5.1. A company wants to produce two commodities, X and Y. Commodity X
will be produced on type A machines, each with a capacity of six units per hour. Commodity
Y will be produced on type B machines, each with a capacity of four units per hour. The
costs for the production of X and Y are partly fixed (machine costs), namely $30 per hour
for a type A machine and $40 per hour for a type B machine. The variable (raw material)
costs are $40 per unit of X and also $40 per unit of Y. The returns are $37 per unit for
commodity X and $55 per unit for commodity Y. The maximum number of machines that
can be deployed in the factory is ten. The company wants to maximize the production such
that the costs do not exceed the returns.
320 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
29
28.5 28
∗ ∗
26 27.5 27.5 27
∗ ∗
27.5 26 26 −∞
∗
27 −∞
min x1
s.t. 15x1 − 20x2 ≤ 4
21x1 − 20x2 ≥ 10
x1 , x2 ≥ 0, and integer.
Show that rounding off the optimal solution of the LO-relaxation of this model to a closest
integer solution does not lead to an optimal integer solution.
Exercise 7.5.4. Consider the branch-and-bound tree in Figure 7.13. The numbers in the
nodes are the various values of the objective function. The optimal objective values marked
with a star indicate that the corresponding solution is integer-valued.
(a) Show that further branching is not needed and that therefore the branch-and-bound
algorithm stops.
(b) Which of the two optimal solutions will be found first in the case of ‘backtracking’?
7. 5 . E x e r c i s e s 321
Exercise 7.5.6. Verify the complementary slackness relations for the pair of primal and
dual solutions given in the proof of Theorem 7.2.1.
Exercise 7.5.7. In this exercise, ILO-models have to be solved by using the branch-and-
bound algorithm. Draw the branch-and-bound trees and use a computer package to solve
the subproblems. Compare your solutions by directly using an ILO-package.
(a) max 23x1 + 14x2 + 17x3 (c) max 19x1 + 27x2 + 23x3
s.t. −x1 + x2 + 2x3 ≤ 7 s.t. 4x1 − 3x2 + x3 ≤ 7
2x1 + 2x2 − x3 ≤ 9 2x1 + x2 − 2x3 ≤ 11
3x1 + 2x2 + 2x3 ≤ 13 −3x1 + 4x2 + 2x3 ≤ 5
x1 , x2 , x3 ≥ 0, and integer. x1 , x2 , x3 ≥ 0, and integer.
Exercise 7.5.8. Use the branch-and-bound algorithm to solve the following MILO-
models.
Exercise 7.5.9. A vessel has to be loaded with batches of N items (N ≥ 1). Each unit of
item i has a weight wi and a value vi (i = 1, . . . , N ). The maximum cargo weight is W .
It is required to determine the most valuable cargo load without exceeding the maximum
weight W .
(a) Formulate this problem as a knapsack problem and determine a corresponding ILO-
model.
(b) Determine, by inspection, the optimal solution for the case where N = 3, W = 5,
w1 = 2, w2 = 3, w3 = 1, v1 = 65, v2 = 80, and v3 = 30.
(c) Carry out sensitivity analysis for the value of v3 .
Exercise 7.5.10. A parcel delivery company wants to maximize its daily total revenue.
For the delivery, the company has one car with a volume of eleven (volume units). On
the present day, the following packages have to be delivered: package 1 with volume two,
package 2 with volume three, package 3 with volume four, package 4 with volume six, and
package 5 with volume eight. The revenues of the packages are $10, $14, $31, $48, and
$60, respectively.
(a) Formulate this problem in terms of an ILO-model. What type of problem is this? Solve
it by means of the branch-and-bound algorithm, and draw the branch-and-bound tree.
(b) Formulate the dual model of the LO-relaxation of the ILO-model, together with the
corresponding complementary slackness relations. Use these relations and a result ob-
tained in (a) to determine an optimal solution of this dual model.
(c) Determine the optimal integer solution of the dual of (b), and show that the com-
plementary slackness relations do not hold for the primal and dual integer optimal
solutions.
7. 5 . E x e r c i s e s 323
Exercise 7.5.11. Let P , Q, and R be simple propositions. Prove the following equivalen-
cies:
(a) P ⇒ Q ≡ ¬P ∨ Q
(b) ¬(P ∨ Q) ≡ ¬P ∧ ¬Q (De Morgan’s first law1 )
(c) ¬(P ∧ Q) ≡ ¬P ∨ ¬Q (De Morgan’s second law)
(d) P ⇒ Q ∧ R ≡ (P ⇒ Q) ∧ (P ⇒ R)
(e) (P ∧ Q) ⇒ R ≡ (P ⇒ R) ∨ (Q ⇒ R)
(f ) (P ∨ Q) ⇒ R ≡ (P ⇒ R) ∧ (Q ⇒ R)
(g) P ∧ (Q ∨ R) ≡ (P ∧ Q) ∨ (P ∧ R) (first distributive law)
(h) P ∨ (Q ∧ R) ≡ (P ∨ Q) ∧ (P ∨ R) (second distributive law)
Exercise 7.5.14. In this exercise, we consider models that include arbitrary piecewise
linear functions and investigate two methods for rewriting such models as MILO-models.
Let a, b ∈ R with a < b. Let g : [a, b] → R be a (continuous) piecewise linear function
defined on the interval [a, b], with n (≥ 1) kink points α1 , . . . , αn . Write α0 = a and
αn+1 = b. Consider a (nonlinear) optimization model that includes g(x). First, introduce
1
Named after the British mathematician and logician Augustus De Morgan (1806–1871).
324 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
x2
G
6
F
4
H
E D
2
I
A B C
0 1 2 3 4 x1
Pn+1
n + 2 variables λ0 , . . . , λn+1 , satisfying i=0 λi = 1 and λi ≥ 0 for i = 0, . . . , n + 1,
Pn+1
and introduce n + 1 binary variables δ1 , . . . , δn+1 , satisfying i=1 δi = 1.
(a) Show that the original model may be solved by adding the above variables, replacing
Pn+1 Pn+1
every occurrence of g(x) by i=0 λi g(αi ), every occurrence of x by i=0 λi αi ,
and adding the following set of linear constraints:
λ0 ≤ δ 1
λi ≤ δi + δi+1 for i = 1, . . . , n
λn+1 ≤ δn+1 .
(b) Show that, instead of the set of linear constraints under (a), the following smaller set of
constraints may be used:
λi−1 + λi ≥ δi for i = 1, . . . , n + 1.
(c) Derive the n + 1 constraints in part (b) from the n + 2 constraints in part (a).
Exercise 7.5.15. Describe the nonconvex shaded region of Figure 7.14 in terms of linear
constraints.
Exercise 7.5.16. Which of the following statements are true, and which are not? Explain
the answers.
(a) If an ILO-model has an optimal solution, then its LO-relaxation has an optimal solution.
(b) If the LO-relaxation of an ILO-model has an optimal solution, then the ILO-model
itself has an optimal solution.
7. 5 . E x e r c i s e s 325
(c) If the feasible region of the LO-relaxation is unbounded, then the feasible region of the
corresponding ILO-model is unbounded. Remark: Recall that the feasible region of an
ILO-model is a set of integer points. The feasible region of an ILO-model is bounded
(unbounded) if the convex hull (see Appendix D) of the feasible region (i.e., the convex
hull of the integer points) is bounded (unbounded).
(d) If an ILO-model has multiple optimal solutions, then the corresponding LO-relaxation
has multiple optimal solutions.
Exercise 7.5.19. The machine scheduling problem of Section 7.2.4 can be interpreted as
a traveling salesman problem (TSP).
(a) What is the meaning of the variables, the objective function and the constraints, when
considering this machine scheduling problem as a TSP?
(b) Show that the subtour-elimination constraints exclude subtours (including loops; see
Appendix C) from optimal solutions.
(c) Show that every feasible solution of Model 7.2.2 (which includes the subtour-elimination
constraints) corresponds to a traveling salesman tour, and vice versa.
Exercise 7.5.20.
(a) Explain why the constraints in (7.3) exclude all subtours of length 2.
(b) Explain why constraint (7.4) eliminates the subtour 12341. Does it also eliminate
the subtour 13241?
(c) Explain why constraint (7.5) eliminates any subtour that visits exactly the cities in X .
326 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
Exercise 7.5.21. The company Whattodo has $100,000 available for investments in six
projects. If the company decides to invest in a project at all, it has to invest a multiple
of a given minimum amount of money (in $). These minimum amounts along with the
(expected) rate of return for each project are given in Table 7.8. For example, if the company
decides to invest in project 1, it may invest $10,000, $20,000, $30,000, . . ., or $100,000
in the project. The rate of return is the amount of money gained as a percentage of the
investment. For example, if the company decides to invest $20,000 in project 1, then it
receives $20,000 × 1.13 = $26,600 when the project finishes.
Project 1 2 3 4 5 6
Minimum investment (in $) 10,000 8,000 20,000 31,000 26,000 32,000
Rate of return 13% 12% 15% 18% 16% 18%
(a) Formulate an ILO-model that can be used to find an optimal investment strategy. De-
termine such an optimal strategy.
(b) This problem can be seen as a special case of a problem described in this chapter. Which
problem is that?
Table 7.9: Weekly use of different types of pigment and corresponding prices per kg.
Exercise 7.5.22. A paint factory in Rotterdam needs pigments for its products and places
an order with a supplier in Germany. The delivery time is four weeks. The transportation
of the pigments is done by trucks with a capacity of 15,000 kgs. The company always
orders a quantity of 15,000 kgs or multiples of it. The trucks can transport several types of
pigments. The question is when and how much should be ordered, so that the inventory
costs are as low as possible and the amount of pigment in stock is always sufficient for the
daily production of paint. For a planning period of a given number of weeks, the production
scheme of paint is fixed, and the amount of pigment necessary for the production of paint is
known for each week of the planning period. In Table 7.9, the weekly use of the different
types of pigment during a planning period of eight weeks is given, together with the prices
per kg. We assume that there are six different types of pigment. In order to formulate
this problem as a mathematical model, the pigments are labeled from 1 through n, and the
7. 5 . E x e r c i s e s 327
Furthermore, we assume that the orders will be placed at the beginning of the week and that
the pigments are delivered four weeks later at the beginning of the week. The requirement
that enough pigment is delivered to meet the weekly demands can be expressed as follows:
t
X t
X
si0 + xik ≥ dik for i = 1, . . . , n, and t = 1, . . . , T ,
k=1 k=1
where si0 is the inventory of pigment i at the beginning of the planning period. It is assumed
that the initial inventory is zero for all pigments. All trucks have the same capacity w (i.e.,
w = 15,000). For t = 1, . . . , T , we define:
In order to use the full capacity of the trucks, the following equation must hold:
n
X
xit = pt w for t = 1, . . . , T .
i=1
The inventory costs depend on the amount of inventory and on the length of the time period
that the pigment is in stock. We assume that the weekly cost of holding one kilogram of
pigment in the inventory is a fixed percentage, 100β %, say, of the price of the pigment.
It is assumed that β = 0.08. The average inventory level of pigment i during week t is
(approximately) (si,t−1 + sit )/2. Thus, the average inventory cost for pigment i during
week t is:
βci (si,t−1 + sit )/2,
where ci is the price per kilogram of pigment i; see Table 7.9. The inventory of pigment i
at the end of week t is given by:
t
X t
X
sit = si0 + xik − dik for i = 1, . . . , n, and t = 1, . . . , T .
k=1 k=1
The total inventory costs during the planning period can be written as:
n X
X T
β ci (si,t−1 + sit )/2.
i=1 t=1
The costs of transportation are taken to be constant and hence do not play a part in deter-
mining the delivery strategy. Also the administration costs of placing orders are not taken
into account. So the objective is to minimize the total inventory costs.
328 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
δi is equal to 1 if the final inventory si8 of pigment i is larger than 3,000 kgs, and 0
otherwise.
(c) Consider the models formulated under (a) and (b). For which of these models is the
value (in e) of the final inventory smallest? Is it possible to decrease the final inventory?
Exercise 7.5.23. A lot sizing problem is a production planning problem for which one has
to determine a schedule of production runs when the product’s demand varies over time.
Let d1 , . . . , dn be the demands over an n-period planning horizon. At the beginning of
each period i, a setup cost ai is incurred if a positive amount is produced, as well as a unit
production cost ci . Any amount in excess of the demand for period i is held in inventory
until period i + 1, incurring a per-unit cost of hi ; i = 1, . . . , n (period n + 1 is understood
as the first period of the next planning horizon).
(a) Solve the LO-relaxation of this model, and determine a feasible integer solution by
rounding off the solution of the LO-relaxation.
(b) Show that z ∗ = 176, x∗1 = x∗3 = x∗6 = x∗9 = 0, x∗2 = x∗4 = x∗5 = x7∗ = x∗8 = 1
is an optimal solution of the model, by using the branch-and-bound algorithm. Draw
the corresponding branch-and-bound tree.
(c) Draw the perturbation functions for the right hand side of the first constraint.
(d) Draw the perturbation function for the objective coefficient 77 of x1 .
7. 5 . E x e r c i s e s 329
Project Employee
A B C D E F G
1 ? ?
2 ? ?
3 ? ?
4 ? ?
5 ? ?
6 ?
7 ? ?
8 ? ? ?
9 ? ? ?
10 ? ?
11 ?
12 ? ? ?
13 ? ?
14 ? ?
x1 , x2 ≥ 0, x3 ∈ {0, 1}.
(a) Determine an optimal solution by means of the branch-and-bound algorithm, and draw
the corresponding branch-and-bound tree. Is the optimal solution unique?
(b) Draw the perturbation function for the right hand side of the ‘=’ constraint. Why is
this a piecewise linear function? Explain the kink points.
Exercise 7.5.26. This question is about the quick and dirty (Q&D) method for the
traveling salesman problem. (See Section 7.2.5.)
(a) Explain why the constraints in (7.3) exclude all subtours of length 2.
(b) Explain why constraint (7.4) eliminates the subtour 34563.
(c) Show that the constraint:
δ24 + δ25 + δ26 + δ42 + δ45 + δ46 + δ52 + δ54 + δ5 + δ62 + δ64 + δ65 ≤ 3
Exercise 7.5.27. Mr. T is the supervisor of a number of projects. From time to time he
reserves a full day to discuss all his projects with the employees involved. Some employees are
involved in more than one project. Mr. T wants to schedule the project discussions in such
a way, that the number of employees entering and leaving his office is as small as possible.
330 C h a p t e r 7. I n t e g e r l i n e a r o p t i m i z at i o n
There are fourteen projects (labeled 1 to 14) and seven employees (labeled A to G). The stars
in Table 7.10 show which employees are involved in which projects. This problem can be
formulated as a traveling salesman problem in which each ‘distance’ cij denotes the number
of employees entering and leaving Mr. T’s office if project j is discussed immediately after
project i.
(a) Formulate the above problem as a traveling salesman problem. (Hint: introduce a
dummy project 0.)
(b) Use both the Q&D method from Section 7.2.5 and the branch-and-bound algorithm
to solve the meeting scheduling problem.
Exercise 7.5.28. Every year, a local youth association organizes a bridge drive for 28
pairs in 14 locations in downtown Groningen, the Netherlands. In each location, a table
for two pairs is reserved. One session consists of three games played in one location. After
three games, all pairs move to another location and meet a different opponent. At each table,
one pair plays in the North-South direction (NS-pair) and the other pair in the East-West
direction (EW-pair).
The locations are labeled 1 through 14. After each session, the NS-pairs move from location
i to location i + 1, and the EW-pairs from location i to location i − 1. During the whole
bridge drive, the NS-pairs always play in the NS direction and the EW-pairs in the EW
direction. The problem is to determine an order in which the locations should be visited
such that the total distance that each pair has to walk from location to location is minimized.
This problem can be formulated as a traveling salesman problem. The distances between the
14 locations, in units of 50 meters, are given in Table 7.11.
Use both the Q&D method and the branch-and-bound algorithm to find an optimal solu-
tion of this problem.
Exercise 7.5.29. The organizer of a cocktail party has an inventory of the following
drinks: 1.2 liters of whiskey, 1.5 liters of vodka, 1.6 liters of white vermouth, 1.8 liters of
red vermouth, 0.6 liters of brandy, and 0.5 liters of coffee liqueur. He wants to make five
different cocktails: Chauncy (2/3 whiskey, 1/3 red vermouth), Black Russian (3/4 vodka,
1/4 coffee liqueur), Sweet Italian (1/4 brandy, 1/2 red vermouth, 1/4 white vermouth),
Molotov Cocktail (2/3 vodka, 1/3 white vermouth), and Whiskey-on-the-Rocks (1/1
whiskey). The numbers in parentheses are the amounts of liquor required to make the
cocktails. All cocktails are made in glasses of 0.133 liters. The organizer wants to mix the
different types of drinks such that the number of cocktails that can be served is as large as
possible. He expects that Molotov Cocktails will be very popular, and so twice as many
Molotov Cocktails as Chauncies have to be mixed.
(a) Formulate this problem as an ILO-model.
7. 5 . E x e r c i s e s 331
To
From 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 − 4 10 10 11 10 9 7 8 11 12 12 12 11
2 − 6 6 7 6 5 6 7 9 11 11 11 13
3 − 1 1 2 2 4 11 10 12 14 15 19
4 − 1 2 2 5 12 11 13 15 16 20
5 − 1 2 5 12 11 14 16 16 20
6 − 1 4 11 10 13 15 15 19
7 − 4 11 10 12 14 15 19
8 − 7 6 9 11 11 15
9 − 2 4 4 4 8
10 − 2 4 5 9
11 − 2 3 6
12 − 1 4
13 − 4
14 −
Table 7.11: The distances between the 14 locations (in units of 50 meters) for Exercise 7.5.28.
The organizer himself likes to drink vodka. He decides to decrease the number of served
cocktails by five, if at least 0.25 liters of vodka are left over.
(b) How would you use binary variables to add this decision to the model formulated under
(a)? Solve the new model.
Exercise 7.5.30. Solve the following ILO-model by using Gomory’s cutting plane algo-
rithm.
max 2x1 + x2
s.t. 3x1 + x2 ≤ 21
7x1 + 4x2 ≤ 56
x1 + x2 ≤ 12
x1 , x2 ≥ 0, and integer.
Overview
A network LO-model is a model of which the technology matrix is based on a network (i.e., a
graph). In this chapter we consider network models that can be formulated as ILO-models,
and use methods from linear optimization to solve them. Many network models, such as the
transportation problem, the assignment problem, and the minimum cost flow problem, can
be formulated as an ILO-model with a technology matrix that has a special property, namely
the so-called ‘total unimodularity’ property. It is shown in this chapter that, for these types
of problems, any optimal solution produced by the simplex algorithm (see Chapter 3) when
applied to the LO-relaxation is integer-valued. Moreover, we derive an important theorem
in combinatorial optimization, namely the so-called max-flow min-cut theorem. We also
show how project scheduling problems can be solved by employing dual models of network
models.
Finally, we introduce the network simplex algorithm, which is a modification of the usual
simplex algorithm that is specifically designed for network models, including the transship-
ment problem. The network simplex algorithm is in practice up to 300 times faster than
the usual simplex algorithm. We will see that, like the usual simplex algorithm, the network
simplex algorithm may cycle, and we will show how to remedy cycling behavior.
333
334 C hap te r 8 . L i near networ k mode l s
In this chapter, we will show that some LO-models have a special structure which guarantees
that they have an integer-valued optimal solution. Besides the fact that the simplex algorithm
applied to these problems generates integer-valued solutions, the usual sensitivity analysis
(see Chapter 5) can also be applied for these problems.
Proof of Theorem 8.1.1. Suppose that A is unimodular and b is an integer vector. Consider
any vertex of the region x ∈ Rn Ax = b, x ≥ 0 . Since A has full row rank, this vertex
corresponds to a feasible basic solution with xBI = B−1 b ≥ 0 and xN I = 0 for some
nonsingular basis matrix B in A; see Section 3.8. We will show that xBI is an integer vector.
By Cramer’s rule (see Appendix B), we have that, for each i ∈ BI :
det(A(i; b))
xi = ,
det(B)
where A(i; b) is the matrix constructed from A by replacing the i’th column by b. Since
A is unimodular, it follows that the nonsingular matrix B satisfies det(B) = ±1. Moreover,
all entries of A(i; b) are integers, and hence det(A(i; b)) is an integer. It follows that xi is
integer-valued for all i ∈ BI . So, xBI is an integer vector.
To show the converse, suppose that, for every integer vector b (∈ Rm ), all vertices of the
feasible region x ∈ Rn Ax = b, x ≥ 0 are integer vertices. Let B be any basis matrix in
A. To show that A is unimodular, we will show that det(B) = ±1. We claim that a sufficient
8 . 1 . L O - m o d e l s w i t h i n t e g e r s o lu t i o n s ; t o ta l u n i m o d u l a r i t y 335
So it remains to show that (∗) holds. Take any integer vector t ∈ Rm . Then there obviously
−1 0
exists an integer
h vector
i hs such i that B t + s ≥ 0. Define b = t + Bs, and define the feasible
ũ ũ
basic solution ũ ≡ ũ BI of x ∈ Rn Ax = b0 , x ≥ 0 by ũBI = B−1 b0 = B−1 t + s,
s NI
0 n 0
ũN I = 0. Because b is an integer vector, all vertices of x ∈ R Ax = b , x ≥ 0 are
ũ
h i
integer vertices. Hence, it follows that ũBI is an integer vector as well. Therefore, B−1 t +
NI
s = ũBI is an integer vector, as required.
Note that Theorem 8.1.1 applies to models with equality constraints. The analogous concept
for models with inequality constraints is ‘total unimodularity’. A matrix A is called totally
unimodular if the determinant of every square submatrix of A is either 0, +1, or −1. Any
totally unimodular matrix consists therefore of 0 and ±1 entries only (because the entries are
determinants of (1, 1)-submatrices). Note that total unimodularity implies unimodularity,
and that any submatrix of a totally unimodular matrix is itself totally unimodular. Also note
that a matrix A is totally unimodular if and only if AT is totally unimodular.
The following theorem deals with LO-models in standard form, i.e., with models for which
the feasible region is defined by inequality constraints.
Proof of Theorem 8.1.2. Let A be any (m, n) integer matrix. We first show that
A is totally unimodular if and only if A Im is unimodular. (∗)
First, assume that A Im is unimodular. Take any (k, k)-submatrix C of A (k ≥ 1). We will
show that det(C) ∈ {0, 1, −1}. If C is singular, then det(C) = 0. So, we may assume that C
is nonsingular, because otherwise there is nothing left to prove. If k = m, then det(C) = ±1,
because C is an (m, m)-submatrix of A Im . So we may also assume that k < m. Let I be
the set of row indices of C in A, and J the set of column indices of C in A. So, C = AI,J .
336 C hap te r 8 . L i near networ k mode l s
required. Therefore, A is totally unimodular. The converse follows by similar reasoning and is
left as an exercise (Exercise 8.4.3). Hence, (∗) holds.
It is left to the reader to check that for any integer vector b ∈ Rm , it holds that the vertices of
{x | Ax ≤ b, x ≥ 0} are integer if and only if the vertices of
x A Im x = b, x ≥ 0
xs x x
s s
are integer. The theorem now follows directly from Theorem 8.1.1 by taking A Im instead
of A.
In general, in order to solve such an ILO-model, we have to use the techniques from Chapter
7. We have seen in that chapter that the general techniques for solving ILO-models can be
computationally very time consuming.
Suppose, however, that we know that the matrix A is totally unimodular and that b is an
integer vector. Then, solving (8.1) can be done in a more efficient manner. Recall that the
LO-relaxation (see Section 7.1.2) of (8.1) is found by removing the restriction that x should
be an integer vector.Consider
what happens when we apply the simplex algorithm to the
LO-relaxation max cT x Ax ≤ b, x ≥ 0 . If the algorithm finds an optimal solution,
then Theorem 8.1.2 guarantees that this optimal solution corresponds to an integer vertex
of the feasible region, because the technology matrix A of this model is totally unimodular
and b is integer-valued. Clearly, any optimal solution of the LO-relaxation is also a feasible
solution of the original ILO-model. Since the optimal objective value of the LO-relaxation
is an upper bound for the optimal objective value of the original ILO-model, this means that
the solution that we find by solving the LO-relaxation is actually an optimal solution of the
original ILO-model. So the computer running time required for determining an optimal
solution is not affected by the fact that the variables are required to be integer-valued.
8 . 1 . L O - m o d e l s w i t h i n t e g e r s o lu t i o n s ; t o ta l u n i m o d u l a r i t y 337
The following theorem makes this precise. In the statement of the theorem, ‘two-phase
simplex algorithm’ means the simplex algorithm together with the two-phase procedure;
see Section 3.6.2.
Theorem 8.1.3.
Consider the standard ILO-model (8.1). If A is totally unimodular and b is an integer
vector, then applying the two-phase simplex algorithm to the LO-relaxation of (8.1)
either produces an optimal solution of the ILO-model, or establishes that the model is
infeasible or unbounded.
Proof. Let (P) and (RP) denote the ILO-model and its LO-relaxation, respectively. Applying
the two-phase simplex algorithm to (RP) results in one of the following three outcomes: either
the algorithm establishes that (RP) is infeasible, or it finds an optimal vertex x∗ of (RP), or it
establishes that (RP) is unbounded.
First, suppose that the algorithm establishes that (RP) is infeasible. Since (RP) is the LO-
relaxation of (P), it immediately follows that (P) is infeasible as well.
Next, suppose that the algorithm finds an optimal vertex x∗ of (RP). Since A is totally uni-
modular and b is an integer vector, it follows from Theorem 8.1.2 that x∗ is in fact an integer
vertex. Let zP∗ and zRP
∗
be the optimal objective values of (P) and (RP), respectively. Since
∗ ∗ ∗ ∗ ∗ ∗
zP ≤ zRP , x is a feasible point of (P), and x has objective value zRP , it follows that x is
in fact an optimal solution of (P).
Finally, suppose that the algorithm establishes that (RP) is unbounded. Let B be the basis matrix
at the last iteration of the algorithm, and let α ∈ {1, . . . , n} be the index found in step 2 in the
T T −1
last iteration of the simplex algorithm (Algorithm 3.3.1). We have that (cN I −cBI B N)α > 0
and (B−1 N)?,α ≤ 0. Theorem 3.5.2 implies that the set L0 defined by:
−1 −1
0 n x xBI B b −(B N)?,α
L = x ∈ R ≡ = +λ , λ ≥ 0, λ integer
xs xN I 0 eα
is contained in the feasible region of (RP). Since A is totally unimodular, B−1 t is integer-
valued for any integer vector t ∈ Rm ; see the proof of Theorem 8.1.1. It follows that both
−1 −1 −1 0
B b and (B N)?,α = B (N?,α ) are integer-valued. Hence, all points in L are integer-
valued, and therefore L0 is also contained in the feasible region of (P). Because the objective
function takes on arbitrarily large values on L0 , it follows that (P) is unbounded.
Note that the interior path algorithm described in Chapter 6 does not necessarily terminate
at a vertex. Hence, Theorem 8.1.3 does not hold if the simplex algorithm is replaced by the
interior path algorithm.
338 C hap te r 8 . L i near networ k mode l s
(Note that the word ‘partitioned’ is in quotes because, usually, the parts of a partition are
required to be nonempty.)
Proof of Theorem 8.1.4. We will show that, if A is a matrix that satisfies conditions (i), (ii),
(iii)a, and (iii)b, then every (k, k)-submatrix Ck of A has determinant 0, ±1 (k ≥ 1). The
proof is by induction on k. For k = 1, we have that every (1, 1)-submatrix C1 consists of a
single entry of A, and hence det(C1 ) ∈ {0, ±1} for every such C1 . So, the statement is true
for k = 1.
Now assume that det(Ck ) ∈ {0, ±1} for all (k, k)-submatrices Ck of A. Take any (k+1, k+1)-
submatrix Ck+1 of A. If Ck+1 contains a column with precisely one nonzero entry, say aij ,
8 . 1 . L O - m o d e l s w i t h i n t e g e r s o lu t i o n s ; t o ta l u n i m o d u l a r i t y 339
So we may assume that each column of Ck+1 has precisely two nonzero entries. We will prove
that this implies that det(Ck+1 ) = 0. According to condition (iii), the rows of Ck+1 can be
partitioned into two subsets, corresponding to two submatrices, say C0 and C00 , such that each
column of Ck+1 either has two equal nonzero entries with one entry in C0 and the other one
in C00 , or has two different nonzero entries which are either both in C0 or both in C00 . Thus,
for each column (Ck+1 )?,j , we have the following four possibilities:
#1 #2 #3 #4
0 T T T T
C?,j ≡ 1 −1 0 . . . 0 0 0 0 ... 0 1 0 ... 0 −1 0 . . . 0
00 T T T T
C?,j ≡ 0 0 0 ... 0 1 −1 0 . . . 0 1 0 ... 0 −1 0 . . . 0
Now notice that, for each of these four possibilities, the sum of the entries of C0?,j is equal to the
00
sum of the entries of C?,j . This means that the sum of the rows of C0 is equal to the sum of the
rows in C00 and, hence, the rows of Ck+1 are linearly dependent. Therefore, det(Ck+1 ) = 0,
as required.
This theorem provides a useful way to check whether a given matrix (with entries 0, ±1)
is totally unimodular. We will use the conditions in the next few sections. The follow-
ing example shows, however, that the conditions of Theorem 8.1.4 are sufficient but not
necessary.
Example 8.1.1. The following matrix is totally unimodular, but does not satisfy the conditions of
Theorem 8.1.4:
1 0 0 1 0 0
A = 1 1 0 0 1 0 .
1 0 1 0 0 1
In Exercise 8.4.4, the reader is asked to show that all vertices of x ∈ R6 Ax ≤ b, x ≥ 0 are
e1 e1
1 2 1 2
e5
e4 e2 e4 e2
4 e3 3 4 e3 3
Figure 8.1: Graph G1 with a totally unimodular Figure 8.2: Graph G2 with a node-edge
node-edge incidence matrix. incidence matrix that is not totally
unimodular.
Example 8.1.2. Consider the graph G1 drawn in Figure 8.1. The node-edge incidence matrix
MG1 associated with this graph is:
e1 e2 e3 e4
1 1 0 0 1
1 1 0 0
2
MG1 = .
3 0 1 1 0
4 0 0 1 1
Before stating and proving Theorem 8.1.6, which gives necessary and sufficient conditions
for a node-edge incidence matrix being totally unimodular, the following example shows
that not all undirected graphs have a node-edge incidence matrix that is totally unimodular.
Example 8.1.3. Consider the graph G2 drawn in Figure 8.2. The node-edge incidence matrix
MG2 associated with this graph is:
e1 e2 e3 e4 e5
1 1 0 0 1 0
1 1 0 0 1
2
MG2 = .
3 0 1 1 0 0
4 0 0 1 1 1
The matrix MG2 is not totally unimodular, because, for instance, the submatrix corresponding to the
columns with labels e2 , e3 , e4 , e5 has determinant −2. In fact, the submatrix corresponding to
columns with labels e2 , e3 , e5 , and the rows with labels 2, 3, 4 has determinant 2. Note that the
edges e2 , e3 , e5 , together with the nodes 2, 3, 4 form the cycle 2-e2 -3-e3 -4-e5 -2 of length three in
the graph in Figure 8.2.
The fact that G2 in Figure 8.2 contains a cycle of length three is the reason why the node-
edge incidence of G2 is not totally unimodular. In general, as soon as an undirected graph
contains a cycle of odd length ≥ 3, the node-edge incidence matrix is not totally unimodular.
The following theorem shows that the determinant of the node-edge incidence of any odd
cycle is equal to ±2. A cycle graph is a graph consisting of a single cycle.
8 . 1 . L O - m o d e l s w i t h i n t e g e r s o lu t i o n s ; t o ta l u n i m o d u l a r i t y 341
Proof. Let k ≥ 3 and let Gk be a cycle graph of length k. The node-edge incidence matrix
of Gk can be written as:
1 0 0 1
1 1 0 0 0
0
Mk ≡ .
0
1 0
0 0 1 1
Moreover, for n ≥ 1, define:
1 1 0 0
0
Dn = 0 .
1
0 0 1
Note that det(Dn ) = 1 for all n ≥ 1. Next, by developing the determinant of Mk along its
first row, we have that | det(Mk )| = |1 × det(Dk−1 ) + (−1)k−1 det(Dk−1 )| = 1 + (−1)k−1 .
Thus, det(Mk ) = 0 if k is even, and | det(Mk )| = 2 if k is odd.
Theorem 8.1.5 shows that the node-edge incidence matrix of an odd cycle graph is not
totally unimodular. Hence, an undirected graph that contains an odd cycle corresponds
to a node-edge incidence matrix that is not totally unimodular, because the node-edge
incidence matrix corresponding to the odd cycle has a determinant that is not equal to 0,
±1. Therefore, if we have a graph with a totally unimodular node-edge incidence matrix,
then it does not contain any odd cycles, i.e., it must be bipartite (see Appendix C). The
following theorem makes this precise, and shows that the converse also holds.
Proof of Theorem 8.1.6. Let G = (V, E) be any undirected graph. First assume that G is a
bipartite graph. Then V can be partitioned into two disjoint subsets V1 and V2 with V1 ∪V2 = V ,
and every edge in E has one end in V1 and the other end in V2 . It can be easily checked that
the node-edge incidence matrix MG associated with G is a {0, 1}-matrix with precisely two 1’s
per column, namely one in the rows corresponding to V1 and one in the rows corresponding
342 C hap te r 8 . L i near networ k mode l s
to V2 . Applying Theorem 8.1.4 with row partition V1 and V2 , it follows that MG is totally
unimodular.
Now, assume that G it not bipartite. Then, because a graph is bipartite if and only if it does not
contain an odd cycle, G must contain a cycle v1 -e1 -v2 -e2 - . . . -vk -ek -v1 with k odd. It follows
from Theorem 8.1.5 that the node-edge submatrix associated with this cycle has determinant
equal to ±2. This contradicts the definition of total unimodularity of MG . Hence, MG is not
totally unimodular.
Proof of Theorem 8.1.7. Let G = (V, A) be a digraph without loops, and let MG be its
node-arc incidence matrix. Since each column of MG corresponds to some arc (i, j) ∈ A,
we have that each column of MG contains exactly one 1 (namely, the entry corresponding
to i) and one −1 (namely, the entry corresponding to j ), and all other entries are 0. Hence,
if we choose the (trivial) partition {V, ∅} of V , all conditions of Theorem 8.1.4 are satisfied.
Therefore, Theorem 8.1.4 implies that MG is totally unimodular.
Applications of Theorem 8.1.6 and Theorem 8.1.7 are given in the next section. The fol-
lowing example illustrates Theorem 8.1.7.
8 . 1 . L O - m o d e l s w i t h i n t e g e r s o lu t i o n s ; t o ta l u n i m o d u l a r i t y 343
a1
1 2
a2
a6
a3 a5
3 a4 4
Example 8.1.4. Consider the loopless digraph G of Figure 8.3. The node-arc incidence matrix
MG associated with G reads:
a1 a2 a3 a4 a5 a6
1 1 1 0 0 1 0
2
−1 −1 1 0 0 1
MG = .
3 0 0 −1 1 0 0
4 0 0 0 −1 −1 −1
According to Theorem 8.1.7, MG is totally unimodular. Note that this also follows directly from
Theorem 8.1.4.
Node-arc incidence matrices of loopless digraphs do not have full row rank, since the sum
of the rows is the all-zero vector (why is this true?). However, if the digraph G is connected,
then deleting any row of MG leads to a matrix with full row rank, as the following theorem
shows.
Proof of Theorem 8.1.8. Let m ≥ 2. We will first prove the special case that if Tm is a directed
tree with m nodes, then deleting any row from MTm results in an invertible matrix. Note that
MTm is an (m, m − 1)-matrix, and therefore deleting one row results in an (m − 1, m − 1)-
matrix. The proof is by induction on m. The statement is obviously true for m = 2. So suppose
that the statement holds for directed trees on m − 1 nodes. Let Tm be a directed tree with m
nodes and let r be the index of the row that is deleted. Since Tm is a tree, it has at least two leaf
nodes (a leaf node is a node with exactly one arc incident to it). So, T has a leaf node v that is
different from r. Let a be the index of the unique arc incident with v . Row v consists of zeros,
except for (MT )v,a = ±1. Note that MTm−1 can be obtained from MTm by deleting row v
344 C hap te r 8 . L i near networ k mode l s
Now, consider the general case. Let G = (V, A) be a connected digraph with node set V and arc
set A, and let m = |V|. Furthermore, let MG be the node-arc incidence matrix associated with
G. We need to show that, after removing an arbitrary row from MG , the remaining submatrix
has full row rank. Because G is connected, G contains a spanning tree T (see Appendix C). Let
MT denote the node-arc incidence matrix corresponding to T . Clearly, MT is an (m, m − 1)-
submatrix of MG . Let r be the index of the row that is deleted, and let M0G and M0T be
obtained from M and MT , respectively, by deleting row r. Then, M0T is an (m − 1, m − 1)-
submatrix of M0G . It follows from the result for trees (above) that M0T has rank m − 1, and
hence M0G has rank m − 1, as required.
Example 8.1.5. Consider the node-arc incidence matrix MG of the graph G in Figure 8.3.
Suppose that we take r = 4, i.e., the fourth row of MG will be deleted. Let M0G be the matrix
obtained from MG by deleting the fourth row. In order to show that M0G is invertible, the second part
of the proof shows that it suffices to consider an arbitrary spanning tree in G. So, we may use the tree
T with arc set {a2 , a3 , a5 }. The node-arc incidence matrix MT of T is:
a2 a3 a5
1 0 1
−1 1 0
MT = .
0 −1 0
0 0 −1
To show that deleting the fourth row from MT results in a matrix of rank 3, the proof of Theorem
8.1.8 proceeds by choosing a leaf node v of T that is not vertex (r =) 4, and the unique arc a incident
with v . Since T has only two leaf nodes, we have to take v = 3, and hence a = a3 . Deleting the
fourth row yields:
a2 a3 a5
1 0 1
1 1
MT0 = −1 1 0
0 , and det(MT ) = −1 × det = −1 =
6 0.
−1 0
0 −1 0
8 . 2 . I L O - m o d e l s w i t h t o ta l ly u n i m o d u l a r m at r i c e s 345
Depots Customers
1 2
4
6 7
8 1 3 2 3
Customer 5 4
1 2 3 4 5
Demand
Supply
8
5 2 6 3 5
Depot 1 4 3 8 7 9 7
Depot 2 6 5 6 4 5 4 7
Depot 3 7 4 7 5 4 6 3 5 4 2
5 9
4
5 7
Therefore M0T is indeed invertible. It is left to the reader to check that deleting any other row of MG
results in an invertible matrix as well.
It is also left to the reader to check that the node-arc incidence matrices of the transportation
problem (see Section 8.2.1), the minimum cost flow problem (see Section 8.2.4), and the
transshipment problem (see Section 8.3.1) have full row rank when an arbitrary row is deleted
from the corresponding incidence matrix.
(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5)
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 Depot 1
Depot 2
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
Depot 3
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
−1 0 0 0 0 −1 0 0 0 0 −1 0 0 0 0 Customer 1
0 −1 0 0 0 0 −1 0 0 0 0 −1 0 0 0 Customer 2
−1 −1 −1 Customer 3
0 0 0 0 0 0 0 0 0 0 0 0
Customer 4
0 0 0 −1 0 0 0 0 −1 0 0 0 0 −1 0
Customer 5
0 0 0 0 −1 0 0 0 0 −1 0 0 0 0 −1
The matrix T is almost, but not exactly, equal to the technology matrix of the ILO-model
(8.4). By multiplying both sides of each of the last five constraints by −1 and flipping the
inequality signs from ‘≥’ to ‘≤’, we obtain an equivalent (standard) ILO-model that has
technology matrix T. Since T is totally unimodular according to Theorem 8.1.7, this im-
plies that (8.4) is equivalent to an ILO-model with a totally unimodular technology matrix.
Since the right hand side values are integers, it follows from Theorem 8.1.3 that (8.4) can be
solved by applying the simplex algorithm to its LO-relaxation. However, it is not immedi-
ately obvious that the LO-relaxation is feasible. Finding an initial feasible basic solution for
the simplex algorithm can in general be done using the big-M or the two-phase procedure,
but in this case it can also be done directly. In Exercise 8.4.6, the reader is asked to find a
feasible solution that shows that this model is feasible.
Using a linear optimization computer package, one may verify that the minimum trans-
portation costs are 9,000; the optimal nonzero values of the xij ’s are attached to the arcs
8 . 2 . I L O - m o d e l s w i t h t o ta l ly u n i m o d u l a r m at r i c e s 347
Depots Customers
x c
2×4 1 2
1 3×3 2
8 3
Demand
3×8
Supply 2 2×6 3
5 5
2×4
6 3 6×4 1× 4 2
5
5 7
in Figure 8.5. In Section 8.3, this problem will be solved more efficiently by means of the
so-called network simplex algorithm.
We now formulate the general form of the transportation problem. Let m (≥ 1) be the number
of depots and n (≥ 1) the number of customers to be served. The cost of transporting
goods from depot i to customer j is cij , the supply of depot i is ai , and the demand of
customer j is bj (i = 1, . . . , m and j = 1, . . . , n). The model can now be formulated as
follows:
Since the right hand side values ai and bj are integers for all i and j , it follows from Theorem
8.1.1 and Theorem 8.1.7 that the above ILO-model of the transportation problem has an
integer optimal solution. (Why is the model not unbounded?) In Exercise 8.4.8, the reader
is asked to find a method that finds a feasible solution and an initial feasible basic solution
for the LO-relaxation.
be at least as large as the total demand. Suppose now that the total supply and demand are
actually exactly equal, i.e.,
Xm Xn
ai = bj . (8.5)
i=1 j=1
This equation is called a supply-demand balance equation. If this equation holds, then it should
be clear that no customer can receive more than the corresponding demand, and every
depot has to exhaust its supply. This means that, under these circumstances, the inequalities
in the technology constraints of Model 8.2.1 may without loss of generality be replaced by
equality constraints. The resulting model is called the balanced transportation problem.
The reader is asked in Exercise 8.4.7 to show that Model 8.2.2 has an optimal solution if
and only if the supply-demand balance equation (8.5) holds. Because the technology matrix
of Model 8.2.2 is again the node-arc incidence matrix T, it is totally unimodular. We can
therefore use the simplex algorithm for models with equality constraints (see Section 3.8)
to solve the model. Note that the model has m + n constraints. The simplex algorithm
requires an initial basis matrix, i.e., an invertible (m+n, m+n)-submatrix of T. However,
according to Theorem 8.1.8, T does not have full row rank. Hence, no such invertible
submatrix exists. So, in order to be able to apply the simplex algorithm, we should drop
at least one of the constraints. Following Theorem 8.1.8, we may drop any one of the
constraints in order to obtain a new technology matrix that has full row rank. This results
in an LO-model that can be solved using the simplex algorithm.
The important advantage of the formulation of Model 8.2.2 compared to Model 8.2.1 is
the fact that the formulation of Model 8.2.2 is more compact. In fact, the LO-relaxation
of Model 8.2.2 has nm variables and n + m − 1 constraints, whereas the LO-relaxation of
Model 8.2.1 has nm + n + m variables and n + m constraints.
Note that each of the m+n constraints of Model 8.2.2 is automatically satisfied once the re-
maining constraints are satisfied. In other words, each individual constraint is redundant with
respect to the feasible region. Suppose, for example, that we drop the very last constraint,
i.e., the constraint with right hand side bn . Assume that we have a solution x = {xij } that
8 . 2 . I L O - m o d e l s w i t h t o ta l ly u n i m o d u l a r m at r i c e s 349
Since the first m + n − 1 constraints of Model 8.2.2 are satisfied, this implies that:
m
X n−1
X m
X
ai = bj + xin .
i=1 j=1 i=1
Note that the assignment problem is a special case of the transportation problem; see Section
8.2.1. The constraints of (8.6) express the fact that each person has precisely one job to do,
while the constraints of (8.7) express the fact that each job has to be done by precisely
one person. Because of (8.6) and (8.7), the expression (8.8) can be replaced by ‘xij ≥
0 and integer for i, j = 1, . . . , n’ . (Why this is true?) One can easily verify that the
technology matrix corresponding to (8.6) and (8.7) is precisely the same as the one of the
transportation problem. Therefore, Theorem 8.1.1 and Theorem 8.1.6 can be applied, and
so the assignment problem has an integer optimal solution. (Why is this model feasible and
bounded?) Also note that the ILO-model of the assignment problem is the same as the one
for the traveling salesman problem without the subtour-elimination constraints; see Section
7.2.4.
be the external flow vector, i.e., the entries of b are the differences between the external
8 . 2 . I L O - m o d e l s w i t h t o ta l ly u n i m o d u l a r m at r i c e s 351
c k 10, 3
2 5
3, 5
6, 7 5, 4
10, 5
(b1 =)11 =⇒ 1 4 7 =⇒ 11(= −b7 )
3, 2 6, 7
13, 7 4, 4
3 6
6, 3
flow out of the nodes and the external flow into the nodes. This implies that the value of
bm is negative. For example, in Figure 8.6, we have that b7 = −11. It is assumed that there
is no loss of flow on the various nodes, so that b2 = . . . = bm−1 = 0. This implies that
the following supply-demand balance equation holds:
m
X
bi = 0.
i=1
In particular, because there is one source node and one sink node, it follows that b1 = −bm .
We will write b1 = v0 (the flow that leaves the source node 1), and bm = −v0 (the flow
that enters the sink node m). Let x(i,j) be the flow (number of units of flow) through the
arc (i, j). The vector x (∈ Rn ) with entries x(i,j) , (i, j) ∈ A, is called the flow vector. The
minimum cost flow problem can now be formulated as follows.
The reason why we have written a ‘−’ sign in the second constraint is that we want to have
the technology matrix, excluding the capacity constraints, to be equal to the corresponding
node-arc incidence matrix. The objective function cT x = (i,j)∈A c(i,j) x(i,j) is the total
P
cost of the flow vector x. The first and the second constraints guarantee that both the total
flow out of the source node and the total flow into the sink node are v0 . The third set of
352 C hap te r 8 . L i near networ k mode l s
Arcs
b i (1, 2) (1, 3) (2, 5) (2, 4) (3, 4) (3, 7) (3, 6) (4, 7) (5, 7) (6, 7)
11 1 1 1 0 0 0 0 0 0 0 0
0 2 −1 0 1 1 0 0 0 0 0 0
0 3 0 −1 0 0 1 1 1 0 0 0
0 4 0 0 0 −1 −1 0 0 1 0 0 = MG
0 5 0 0 −1 0 0 0 0 0 1 0
0 6 0 0 0 0 0 0 −1 0 0 1
−11 7 0 0 0 0 0 −1 0 −1 −1 −1
T
7 7 3 4 2 7 3 5 5 4 =k
T
6 13 10 5 3 6 6 10 3 4 =c
constraints guarantees that there is no loss of flow on the intermediate nodes. To understand
these constraints, notice that, for i = 1, . . . , m,
X X
x(i,j) = total flow into node i, and x(j,i) = total flow out of node i.
(i,j)∈A (j,i)∈A
The last constraints of Model 8.2.4 are the nonnegativity constraints and the capacity con-
straints.
Note that if both v0 = 1 and the constraints x(i,j) ≤ k(i,j) are removed from the model,
then the minimum cost flow problem reduces to the problem of determining a shortest path
from the source node to the sink node; see Appendix C.
Consider again the network of Figure 8.6. The values of the input data vectors b, c, k,
and the matrix MG of the minimum cost flow problem of Figure 8.6 are listed in Table 8.1.
The corresponding ILO-model now reads as follows:
of Figure 8.6. The capacity constraints give rise to the identity matrix I|A| (|A| = 10 in
MG
the above example), and so the technology matrix is I . If we are able to prove that
|A|
MG 0
I I is totally unimodular, then Theorem 8.1.8 and Theorem 8.1.1 can be used to
10 10
show that there is an integer optimal solution. (Note that the model contains both equality
and inequality constraints, and the ‘second’ I10 in the latter matrix refers to the slack variables
of the capacity constraints. Also note that the model is bounded.) That is exactly what the
following theorem states.
Theorem 8.2.1.
M 0
If M is a totally unimodular (p, q)-matrix (p, q ≥ 1), then so is I I , where 0 is
q q
the all-zero (p, q)-matrix.
M 0 B 0
h i
Proof. Take any square submatrix B of I I . We may write B = B1 B , where B1 is a
q q 2 3
submatrix of M, and B2 and B3 are submatrices of Iq . If B1 is an empty matrix, then B = B3 ,
and hence det B = det B3 ∈ {0, ±1}. So we may assume that B1 is not an empty matrix. If B3
contains a column of zeroes, then B contains a column of zeroes, and hence det B = 0. So, we
may assume that B3 is an identityhmatrix,isay B
h 3 = Iri, with r ≥ 0. It follows that B1 is a square
B 0 B 0
submatrix of M. Hence, B = B1 I ∼ 0 1 I , so that det B = ± det B1 ∈ {0, ±1},
3 r r
because B is a square submatrix of M, and M is totally unimodular.
So, it follows from Theorem 8.2.1 that the technology matrix of the minimum cost flow
problem is totally unimodular.
(4)
2 4
(5) (2)
1 (6) 6
(1) (5)
3 5
(4)
Figure 8.7: Network with capacities. The numbers in parentheses are the capacities k(i,j) .
x(i,j) = the amount of flow from node i to node j along arc (i, j), for (i, j) ∈ A;
z = the net flow out of node 1.
For every arc (i, j), we have a capacity k(i,j) (> 0). The flow variables should satisfy
We assume that flow is preserved at the intermediate nodes, meaning that, at each interme-
diate node k (i.e., k ∈ {2, . . . , m − 1}), the total flow (i,k)∈A x(i,k) into node k equals
P
the total flow (k,j)∈A x(k,j) out of node k . The objective of the maximum flow problem
P
is to find the maximum amount of flow that can be sent from the source node 1 to the sink
node m. Since flow is preserved at the intermediate nodes, it should be intuitively clear
that the amount of flow that is sent through the network from node 1 to node k is equal to
the total flow z out of the source node, and also to the total flow into the sink node. The
reader is asked to prove this fact in Exercise 8.4.17.
The maximum flow problem can now be formulated as the following LO-model:
max z
X X
s.t. x(i,j) − x(j,i) − αi z = 0 for i = 1, . . . , m (8.9)
(i,j)∈A (j,i)∈A
The first set of constraints (8.9) has the following meaning. For each i = 1, . . . , m, the first
summation in the left hand side is the total flow out of node i, and the second summation
is the total flow into node i. Hence, the left hand side is the net flow out of node i. So,
for the intermediate nodes 2, . . . , m − 1, these constraints are exactly the flow preservation
constraints. For i = 1, the constraint states that the net flow out of the source node should
be equal to z . Similarly, for i = m, the constraint states that the net flow into the sink node
should be equal to z . The second set of constraints (8.10) contains the capacity constraints
and the nonnegativity constraints.
Example 8.2.1. Consider the network in Figure 8.7. The network has six nodes: the source node
1, the sink node 6, and the intermediate nodes 2, 3, 4, and 5. The objective is to send z units of flow
from node 1 to node 6, while maximizing z . Analogously to the minimum cost flow problem, every
arc (i, j) has a capacity k(i,j) . However, there are no costs associated with the arcs. The LO-model
corresponding to this network is:
max z
s.t. x(1,2) + x(1,3) −z=0
−x(1,2) + x(2,4) =0
−x(1,3) + x(3,5) − x(4,3) =0
−x(2,4) + x(4,3) + x(4,6) =0
−x(3,5) + x(5,6) =0
−x(4,6) − x(5,6) + z = 0
x(1,2) ≤5
x(1,3) ≤1
x(2,4) ≤4
x(3,5) ≤4
x(4,3) ≤6
x(4,6) ≤2
x(5,6) ≤5
x(1,2) , x(1,3) , x(2,4) , x(3,5) , x(4,3) , x(4,6) , x(5,6) , z ≥ 0.
Figure 8.8(a) shows an example of a feasible solution for the network in Figure 8.7. Notice that the
capacities are respected and no flow is lost. Figure 8.8(b) shows a feasible solution with objective value
5. We will see in the next section that, in fact, this feasible solution is optimal.
2 (4) 4 (4)
2 (5) 2 4 2 (2) 4 (5) 2 4 1 (2)
1 0 (6) 6 1 3 (6) 6
Figure 8.8: A feasible and an optimal solution of the maximum flow problem of Figure 8.7. The numbers
on the arcs are the flow values and the numbers in parentheses are the capacities.
of A, such that after deleting the arcs of C from the network G, there are no directed paths
(see Appendix C) from the source node 1 to the sink node m (with m = |V|). For example,
Figure 8.9 depicts two cuts in the network of Figure 8.7. The sets C1 = {(1, 2), (1, 3)}
(Figure 8.9(a)) and C2 = {(2, 4), (5, 6)} (Figure 8.9(b)) are cuts. The arcs in the cuts are
drawn with solid lines. The dotted lines show how the sets C1 and C2 ‘cut’ the paths from
the source node to the sink node. Note that the definition of a cut states that after removing
the arcs in the cut, there are no directed paths from node 1 to node m left. Hence, in Figure
8.9(b), the arc (4, 3) is not part of the cut C2 (although adding (4, 3) results in another cut).
On the other hand, for instance, {(1, 3), (3, 5), (5, 6)} is not a cut, because after deleting
the arcs in that set there is still a path from node 1 to node 6 in the graph, namely the path
1246.
Cuts have the following useful property. If C is a cut, then every unit of flow from the
source node to the sink node has to go through at least one of the arcs in C . This fact is
also illustrated by the cuts in Figure 8.9: every unit of flow from node 1 to node m has to
cross the dotted line, and hence has to go through one of the arcs in the cut. This property
implies that if C is a cut, then we can use the capacities of the arcs of C to find an upper
bound on the maximum amount of flow that can be sent from node 1 to node m. To be
precise, define the capacity k(C) of any cut C as:
X
k(C) = k(i,j) .
(i,j)∈C
Then, the total amount of flow from node 1 to node m can be at most k(C). Therefore,
the optimal objective value z ∗ of Model 8.2.5 is at most k(C). So, we have that:
Cut C1 Cut C2
(4) (4)
(5) 2 4 (2) (5) 2 4 (2)
1 (6) 6 1 (6) 6
Example 8.2.2. Consider again Figure 8.8(b), and take C ∗ = {(1, 3), (2, 4)}. The reader
should check that C ∗ is a cut. The capacity of C ∗ is (1 + 4 =) 5. Hence, we know from (8.11)
that no feasible solution has an objective value larger than 5. Since the objective value of the feasible
solution in Figure 8.8(b) is exactly 5, this implies that this feasible solution is in fact optimal.
The above discussion begs the question: can we find the best upper bound on z ∗ ? In other
words, can we find a cut that has the minimum capacity? This is the so-called minimum
cut problem. It asks the question: which cut C minimizes k(C)? In other words, it asks to
determine a cut C ∗ ∈ argmin{k(C) | C is a cut of G}. We will show that the minimum
cut problem corresponds to the dual model of the maximum flow problem.
We first derive the dual model of Model 8.2.5 as follows. Recall that this primal model has
two sets of constraints: the flow preservation constraints (8.9), and the ‘capacity constraints’
(8.10). Since there is a dual decision variable for each constraint of the primal problem, we
introduce the following two sets of dual variables:
In order to find the constraints of the dual model, we need to transpose the technology
matrix of the primal model. This primal technology matrix A reads:
T em −e1
A= ,
In 0
358 C hap te r 8 . L i near networ k mode l s
where m = |V|, n = |A|, T is the node-arc incidence matrix of G, and ei is the usual i’th
unit vector in Rm (i ∈ V ). The columns of A correspond to the primal decision variables
x(i,j) with (i, j) ∈ A, except for the last column, which corresponds to z . The rows of A
correspond to the primal constraints.
Recall that there is a constraint in the dual model for every decision variable in the primal
model, and that the coefficients of any dual constraint are given by the corresponding col-
umn of the primal technology matrix. Consider the decision variable x(i,j) with (i, j) ∈ A.
The column in A corresponding to x(i,j) contains three nonzero entries: the entry corre-
sponding to the flow preservation constraint for node i (which is 1), the entry corresponding
to the flow preservation constraint for node j (which is −1), and the entry corresponding
to the capacity constraint x(i,j) ≤ k(i,j) (which is 1). Thus, the left hand side of the dual
constraint corresponding to the primal decision variable x(i,j) is yi − yj + w(i,j) . The
primal decision variable x(i,j) is nonnegative, so the constraint is a ‘≥’ constraint. The right
hand side is the primal objective coefficient of x(i,j) , which is 0. Hence, the dual constraint
corresponding to the primal decision variable x(i,j) is:
The following theorem relates the optimal solutions of Model 8.2.6 to the minimum cut(s)
in the graph G. The theorem states that Model 8.2.6 solves the minimum cut prob-
lem.
Theorem 8.2.2.
Any integer-valued optimal solution of Model 8.2.6 corresponds to a minimum cut in
G, and vice versa.
8 . 2 . I L O - m o d e l s w i t h t o ta l ly u n i m o d u l a r m at r i c e s 359
∗
y
Proof. Consider any integer-valued optimal solution ∗ of Model 8.2.6, with correspond-
w
∗
ing optimal objective value z . Let N1 be the set of nodes isuch that yi∗ ≤ y1∗ , and let N2
∗
y
be the set of nodes i such that yi∗ ≥ y1∗ + 1. Note that since ∗ is integer-valued, we have
w
∗ ∗
that N1 ∪ N2 = {1, . . . , m}. Clearly, we also have that 1 ∈ N1 and, since ym − y1 ≥ 1, we
have that m ∈ N2 . Define C = {(i, j) ∈ A | i ∈ N1 , j ∈ N2 }. Because 1 ∈ N1 and m ∈ N2 ,
any path v1 v2 . . . vk with v1 = 1 and vk = m, must contain an arc (vp , vp+1 ) ∈ C (with
p ∈ {1, . . . , k − 1}). Therefore, removing the arcs of C from G removes all directed paths from
node 1 to node m, and hence C is a cut.
To show that C is a minimum cut, we first look at the capacity of C . Observe that it follows
from the above definition of C that yj∗ − yi∗ ≥ (y1∗ + 1) − y1∗ ≥ 1 for (i, j) ∈ C . Moreover,
∗ ∗ ∗ ∗
(8.13) implies that w(i,j) ≥ yj − yi for all (i, j) ∈ A. Hence, w(i,j) ≥ 1 for (i, j) ∈ C . Using
this observation, we find that:
∗ ∗ ∗ ∗
X X
z = k(i,j) w(i,j) ≥ k(i,j) w(i,j) (because k(i,j) , w(i,j) ≥ 0 for (i, j) ∈ A)
(i,j)∈A (i,j)∈C
∗
X
≥ k(i,j) = k(C). (because w(i,j) ≥ 1 for (i, j) ∈ C )
(i,j)∈C
Now suppose for a contradiction that there exists some other cut C 0 with k(C 0 ) < k(C). Let
0
N1 be the set of nodes of G that are reachable from node 1 by using a directed path without
using any arc in C , and let N20 be the set of all other nodes. Define yi0 = 0 for i ∈ N10 , yi0 = 1
for i ∈ N20 , and w(i,j)
0 0 0 0
= max{0, yj − yi } for (i, j) ∈ A. We have that y1 = 0 and, because
0
0 0 0 y
C is a cut, we have that m ∈ N2 and hence ym = 1. Thus, 0 satisfies the constraints of
w
Model 8.2.6. Moreover,
0 0
1 if i ∈ N1 , j ∈ N2
0 0 0 0
yj − yi = 0 if either i, j ∈ N1 , or i, j ∈ N2
0 0
−1 if i ∈ N2 , j ∈ N1 .
Hence, (
0 0 0 1 if (i, j) ∈ C 0
w(i,j) = max{0, yj − yi } =
otherwise. 0
0
0 y
This means that the objective value z corresponding to 0 satisfies:
w
0 0 0 0 ∗
X X
z = k(i,j) max{0, yj − yi } = k(i,j) = k(C ) < k(C) ≤ z ,
(i,j)∈A 0
(i,j)∈C
∗
contrary to the fact that z is the optimal objective value of Model 8.2.6. So, C is indeed a
minimum cut.
It now follows from the duality theorem, Theorem 4.2.4, that Model 8.2.5 and Model 8.2.6
have the same objective value. So, we have proved the following important theorem, the
so-called max-flow min-cut theorem, which is a special case of Theorem 4.2.4:
360 C hap te r 8 . L i near networ k mode l s
The following example illustrates the maximum flow problem and Theorem 8.2.3.
Example 8.2.3. (Project selection problem) In the project selection problem, there are m
projects, labeled 1, . . . , m, and n machines, labeled 1, . . . , n. Each project i yields revenue ri and
each machine j costs cj to purchase. Every project requires the usage of a number of machines that
can be shared among several projects. The problem is to determine which projects should be selected and
which machines should be purchased in order to execute the selected projects. The projects and machines
should be selected so that the total profit is maximized. Table 8.2 lists some example data for the project
selection problem.
Let P ⊆ {1, . . . , m} be the set of projects that are not selected, and let Q ⊆ {1, . . . , n} be the set
of machines that should be purchased. Then, the project selection problem can be formulated as follows:
m
X X X
∗
z = max ri − ri − cj .
i=1 i∈P j∈Q
Pm
Note that the term i=1 ri is a constant, and hence it is irrelevant for determining an optimal
solution. Also, recall that instead of maximizing the objective function, we may equivalently minimize
the negative of that function. So, the above problem can be formulated as a minimizing model as
follows: X X
min ri + cj . (8.15)
i∈P j∈Q
The above minimizing model can now be formulated as a minimum cut problem as follows. Construct
a network G with nodes s, u1 , . . . , um , v1 , . . . , vn , and t. The nodes u1 , . . . , um represent the
projects, and the nodes v1 , . . . , vn represent the machines. Let i = 1, . . . , m and j = 1, . . . , n.
The source node s is connected to the ‘project node’ ui by an arc with capacity ri . The sink t is
connected to the ‘machine node’ vj by an arc with capacity cj (j = 1, . . . , n). The arc (ui , vj )
with infinite capacity is added if project i requires machine j .
Since the arc from ui to vj has infinite capacity, this arc does not appear in a minimum cut (otherwise
the capacity of that cut would be infinite, whereas taking all arcs incident with s constitute already a cut
with finite capacity). Hence, a minimum cut consists of a set of arcs incident with s and arcs incident
8 . 2 . I L O - m o d e l s w i t h t o ta l ly u n i m o d u l a r m at r i c e s 361
u1 ∞ v1
100 200
∞
200 u2 ∞ v2 100
s t
150 u3 ∞ v3 50
Figure 8.10: The project selection problem as a maximum flow model. The dotted curve represents the
unique minmum cut.
with t. The arcs in the cut that are incident with s represent the projects in P , and the arcs in the cut
P P
incident with t represent the machines in Q. The capacity of the cut is i∈P ri + j∈Q cj , which
is the objective function of (8.15). By Theorem 8.2.3, one can solve this minimum cut problem as a
maximum flow problem.
Figure 8.10 shows a network representation of the project selection problem corresponding to the data in
Table 8.2. The minimum cut C ∗ = {(s, u1 ), (v2 , t), (v3 , t)} is represented by the dotted curve in
Figure 8.10. Note that the arc (u1 , v2 ) crosses the dotted line, but it crosses from the side containing
t to the side containing s. So, its capacity (∞) is not included in the capacity of the cut. Hence,
the corresponding capacity k(C ∗ ) is 250. The corresponding set P consists of the indices i such that
(s, ui ) ∈ C ∗ , and the set Q consists of the indices j such that (vj , t) ∈ C ∗ . So, P = {1} and
Q = {2, 3}. The total revenue of the projects is 450. Therefore, the corresponding maximum profit
z ∗ is (450 − 250 =) 200, which is attained by selecting projects 2 and 3, and purchasing machines
2 and 3.
2
15 17
3
10 7 10
1 3 5 6
6
8 3
4
Figure 8.11: Project network. The thick arcs form a critical path.
A goal is realized when all activities corresponding to its incoming arcs have been completed.
Moreover, the activities corresponding to a goal’s outgoing arcs can start only when the goal
has been realized. Clearly, the goals should be chosen in such a way that the precedence
constraints of the activities are represented in the network.
Example 8.2.4. Consider the network of Figure 8.11. There are six nodes, corresponding to the
goals 1, 2, 3, 4, 5, and 6. The numbers attached to the arcs refer to the execution times, i.e., the
number of time units (days) required to perform the activities corresponding to the arcs. For instance,
the activity of arc (4, 5) requires six days. The network depicts the precedence relationships between
the activities. For instance, activity (3, 5) cannot start until both the activities (1, 3) and (2, 3) are
completed.
As the following example illustrates, it is not always immediately clear how to construct a
project network for a given project,
Example 8.2.5. Consider the following project with four activities, A, B , C , and D. Activity
C can be started when activity A is completed, and activity D can be started when both activities A
and B are completed. How should the arcs corresponding to the four activities be arranged? Because
of the precedence constraints, it seems that the arcs corresponding to A and C should share a node
(representing a goal) in the network, and similarly for A and D, and for B and D. There is only one
arrangement of the arcs that respects these requirements, and this arrangement is depicted in the network
of Figure 8.12(a). This network, however, does not correctly model the precedence constraints of the
project, because the network indicates that activity C can only be started when B is completed, which
is not a requirement of the project. To model the project’s precedence constraints correctly, a dummy arc
needs to be added. This dummy arc corresponds to a dummy activity with zero execution time. The
resulting network is depicted in Figure 8.12(b). The reader should verify that the precedence constraints
are correctly reflected by this project network.
The arcs cannot form a directed cycle, because otherwise some activities can never be carried
out. For example, if the network has three nodes, labeled 1, 2, and 3, and the arcs are (1, 2),
(2, 3), (3, 1), then goal 3 needs to be realized before goal 1, goal 2 before goal 3, and goal
1 before 3; hence, goal 1 must be realized before itself, which is clearly impossible. Thus,
8 . 2 . I L O - m o d e l s w i t h t o ta l ly u n i m o d u l a r m at r i c e s 363
A C
A C
D
B D
B
Figure 8.12: Invalid and valid project networks for Example 8.2.5.
the network cannot contain a directed cycle. Such a network is also called an acyclic directed
graph; see Appendix C.
Project networks contain initial goals and end goals. An initial goal is a node without incoming
arcs, and so an initial goal is realized without doing any activities. An end goal is a node with
only incoming arcs. It follows from the fact that project networks do not contain directed
cycles that any such network has at least one initial goal and at least one end goal. If there
is more than one initial goal, then we can introduce a dummy initial goal connected to the
initial goals by arcs with zero execution time. So we may assume without loss of generality
that there is exactly one initial goal. Similarly, we may assume that there is exactly one end
goal. The objective is then to determine a schedule such that the time between realizing
the initial goal and realizing the end goal, called the total completion time, is minimized.
Example 8.2.6. An optimal solution for the project network of Figure 8.11 can be found by
inspection. Assume that the time units are days. Realize goal 1 on day 0. Then goal 2 can be realized
on day 15, and not earlier than that. Goal 3 can be realized on day (15 + 3 =) 18. Note that,
although activity (1, 3) takes ten days, goal 3 cannot be realized on day 10, because activities (1, 2)
and (2, 3) need to be completed first. Goal 4 can be realized on day 8. In the case of goal 5, there are
two conditions. Its realization cannot take place before day (18 + 7 =) 25 (because goal 3 cannot be
realized earlier than on day 18), and also not before day (8 + 6 =) 14 (compare goal 4). So, goal
5 can be realized on day 25. In the case of goal 6, three conditions need to be fulfilled, and these lead
to a realization on day (25 + 10 =) 35. Hence, the optimal solution is 35 days, and all calculated
realization times are as early as possible.
Figure 8.13 shows this solution in the form of a so-called Gantt chart1 . This chart can be read as
follows. Time runs from left to right. The days 1 through 35 are shown in the top-most row. The
bottom row shows when the several goals of the project are realized. The three rows in between show
when the various activities are carried out. So, the project starts at the left of the chart by performing
activities (1, 2), (1, 3), and (1, 4). Since goal 1 is the initial goal, it is realized immediately. This
fact is illustrated by the flag labeled 1 in the bottom row. After eight days, activity (1, 4) is completed.
This means that, at that time, goal 4 is realized. At that time, activities (1, 2) and (1, 3) continue
to be carried out, and activity (4, 5) is started. The end goal, goal 6, is realized on day 35.
1
Named after the American mechanical engineer and management consultant Henry L. Gantt (1861–
1919).
364 C hap te r 8 . L i near networ k mode l s
It is not necessary to complete all activities as early as possible. For example, for the project
schedule in Figure 8.13, it should be clear that activities (1, 4) and (4, 5) may both be
delayed by a few days without affecting the other activities and the total completion time
of the project. Hence, goal 4 may also be delayed by a few days without affecting the total
completion time.
In general, given an optimal solution for the project network, the earliest starting time of
an activity is the earliest time at which that activity may be started without violating the
precedence constraints, and the latest starting time of an activity is the latest time at which
that activity may be started without increasing the total completion time of the project. The
earliest completion time of an activity is the earliest start time plus the execution time of that
activity, and similarly for the latest completion time. The difference between the earliest and
the latest starting time of an activity is called the slack of that activity.
Example 8.2.7. The earliest starting time of activity (4, 5) in Figure 8.13 is the beginning of
day 9, because (4, 5) can only be started when activity (1, 4) is finished, and activity (1, 4) requires
eight days. In order to not delay the project, activity (4, 5) needs to be completed before the end of
day 25, because otherwise the start of activity (5, 6) would have to be delayed, which in turn would
increase the total completion time of the project. This means that the latest starting time for activity
(4, 5) is the beginning of day 20. Hence, the slack of activity (5, 6) is (20 − 9 =) 11. On the
other hand, activity (2, 3) has zero slack. This activity cannot start earlier, because of activity (1, 2)
that precedes it, and it cannot start later because that would delay activity (3, 5), hence activity (5, 6),
and hence the whole project. So, in order to realize the end goal as early as possible, it is crucial that
activity (2, 3) starts on time.
So, an important question is: which activities cannot be postponed without affecting the
total completion time? These are exactly the activities with zero slack. For the example
above, one can easily check that activities (1, 2), (2, 3), (3, 5), and (5, 6) have zero slack,
while the activities (1, 3), (1, 4), (2, 6), (4, 5), and (4, 6) have nonzero slack.
A sequence of goals, from the initial goal to the end goal, consisting of goals that cannot be
postponed without increasing the total completion time is called a critical path of the activity
network. For instance, in the project network of Figure 8.11, 12356 is a critical path.
Note that critical paths are in general not unique. Activities on a critical path have zero slack,
and – conversely – an activity with zero slack is on a critical path (Exercise 8.4.29). It should
be clear that determining the minimum total completion time is equivalent to determining
the total completion time of a critical path.
In order to formulate this problem as an LO-model, let 1 be the initial goal and n the end
goal; the set of nodes is V = {1, . . . , n}, and the set of arcs is A. The time needed to carry
out the activity corresponding to the arc (i, j) is denoted by c(i,j) , and it is assumed that
c(i,j) ≥ 0 (i, j ∈ V ). Let ti be the realization time of goal i (i ∈ V ). The LO-model can
8 . 2 . I L O - m o d e l s w i t h t o ta l ly u n i m o d u l a r m at r i c e s 365
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
(1, 4) (4, 5)
1 4 2 3 5 6
be formulated as follows:
min tn − t1
(P)
s.t. tj − ti ≥ c(i,j) for (i, j) ∈ A.
Note that model (P) does not contain nonnegativity constraints, and we do not require that
t1 = 0. This is because the model contains only differences tj − ti with (i, j) ∈ A, and so
an optimal solution with t1 =
6 0 can always be changed into one with t1 = 0 by subtracting
the optimal value of t1 from the optimal value of ti for all i = 2, . . . , n.
Example 8.2.8. In the case of the data of Figure 8.11, the LO-model becomes:
min t6 − t1
s.t. t2 − t1 ≥ 15 t3 − t1 ≥ 10 t4 − t1 ≥ 8
(P’)
t3 − t2 ≥ 3 t6 − t2 ≥ 17 t5 − t3 ≥ 7
t5 − t4 ≥ 6 t6 − t4 ≥ 3 t6 − t5 ≥ 10.
Using a computer package, we found the following optimal solution of (P’):
t1∗ = 0, t2∗ = 15, t3∗ = 18, t4∗ = 8, t5∗ = 25, t6∗ = 35.
The corresponding optimal objective value is z ∗ = 35. Compare this with the Gantt chart of Figure
8.13.
The optimal solution of model (P) contains the realization times of the several goals. How-
ever, it does not list which arcs are on a critical path. These arcs may be determined by
solving the dual of model (P) as follows. Note that we may increase the execution time
c(i,j) of any activity (i, j) with nonzero slack by a small amount without affecting the total
completion time of the project, and hence without affecting the optimal objective value of
model (P). On the other hand, increasing the execution time c(i,j) of an activity (i, j) that
has zero slack leads to an increase of the total completion time, and hence to an increase
of the optimal objective value of model (P). This means that we can determine whether an
arc (i, j) ∈ A is on a critical path of the project by determining whether the (nonnegative)
shadow price of the constraint tj −ti ≥ c(i,j) is strictly positive or zero. Recall from Section
5.3.2 that the shadow price of a constraint is the optimal value of the corresponding dual
value (assuming nondegeneracy). Therefore, arc (i, j) ∈ A is on a critical path if and only
366 C hap te r 8 . L i near networ k mode l s
if the dual variable corresponding to the constraint tj − ti ≥ c(i,j) has an optimal value that
is strictly greater than zero.
To construct the dual of (P), we introduce for each arc (i, j) a dual decision variable y(i,j) .
The dual model then reads:
X
max c(i,j) y(i,j)
(i,j)∈A
X
s.t. − y(1,j) = −1
(1,j)∈A
(D)
X X
y(i,k) − y(k,j) = 0 for k = 2, . . . , n − 1
(i,k)∈A (k,j)∈A
X
y(i,n) = 1
(i,n)∈A
y(i,j) ≥ 0 for (i, j) ∈ A.
So the dual model of the original LO-model is the model of a minimum cost flow problem
without capacity constraints, in which one unit of flow is sent through the network against
maximum costs; see Section 8.2.4. This is nothing else than a model for determining a
longest path from the source node to the sink node in the network.
Example 8.2.9. The dual of model (P’) is:
From this optimal solution, it follows that arcs (1, 2), (2, 3), (3, 5), and (5, 6) are on a critical
path. Since they form a path, there is a unique critical path, namely 12356.
The reader may have noticed that both the primal and the dual model have integer-valued
optimal solutions. The fact that there is an integer-valued optimal dual solution follows
8 . 3 . Th e n e t w o r k s i m p l e x a l g o r i t h m 367
from Theorem 8.1.1. Moreover, since the technology matrix of (P’) is the transpose of
the technology matrix of (D’) and the right hand side of (P’) is integer-valued, it follows
from Theorem 8.1.2 that (P’) also has an integer-valued optimal solution. So, an optimal
solution can be determined by means of the usual simplex algorithm. However, because
of the network structure of the model, the network simplex algorithm, to be introduced
in Section 8.3, is a more efficient method to solve this model. So the dual formulation
of the project scheduling problem has two advantages: the dual model can be solved more
efficiently than the primal model, and it allows us to directly determine the critical paths of
the project.
The project scheduling problem described in this section is an example of the problems
that are typically studied in the field of scheduling, which deals with problems of assigning
groups of tasks to resources or machines under a wide range of constraints. For example,
the number of tasks that can be carried out simultaneously may be restricted, there may be
setup times involved (see also the machine scheduling problem of Section 7.2.4), or it may
be possible to interrupt a task and resume it at a later time. Also, the objective may be varied.
For example, in the problem described in the current section, the objective is to complete
the project as quickly as possible, but alternatively there may be a deadline, and as many
tasks as possible need to be completed before the deadline. There exists a large literature on
scheduling; the interested reader is referred to Pinedo (2012).
integers. For (i, j) ∈ A, let cij be the unit shipping cost along arc (i, j), and, for i ∈ V ,
let bi be the supply at i if i is a source node, and the negative of the demand at i if i is a
sink node. We assume that they satisfy the supply-demand balance equation i∈V bi = 0.
P
If x(i,j) is interpreted as the flow on the arc (i, j), then any feasible solution satisfies the
following condition: for each node i, the total flow out of node i minus the total flow
into node i is equal to bi . These conditions, which are expressed in (8.16), are called the
flow-preservation conditions.
The model can be written in a compact way using matrix notation as follows. Let A be
the node-arc incidence matrix associated with G, and let x ∈ Rn be the vector of the
flow variables. Moreover, let c (∈ Rn ) be the flow cost vector consisting of the entries c(i,j)
for each arc (i, j) ∈ A, and let b (∈ Rm ) be the supply-demand vector consisting of the
entries bi . Then, Model 8.3.1 can be compactly written as:
min cT x
s.t. Ax = b (P)
x ≥ 0.
Recall that we assume that i∈V bi = 0. Without this assumption, the system Ax = b
P
may not have a solution at all, because the rows of A sum to the all-zero vector; see Appendix
B, and Theorem 8.1.8.
The rows of A correspond to the nodes of G, and the columns to the arcs of G. Each
column of A contains two nonzero entries. Let i, j ∈ V and (i, j) ∈ A. The column
corresponding to the arc (i, j) contains a +1 in the row corresponding to node i, and a
−1 in the row corresponding to node j . In this section, A(i,j) refers to the column of A
corresponding to the arc (i, j) ∈ A. Hence, the columns of A satisfy:
A(i,j) = ei − ej
8 3 0 8 3 0
2 6 3 6
1 3 5 1 3 5
1
3 2 3 5 4
4 2
2 4 6 2 4 6
5 1
−5 −2 −4 −5 −2 −4
nodes; 2, 4, and 6 are demand nodes; and 5 is a transshipment node. It is left to the reader to
determine the node-arc incidence matrix A of the transshipment problem of Figure 8.14. Note that the
sum of the rows of A is the all-zero vector.
Note that B has rank 5, and removing any row from B results in an invertible (5, 5)-submatrix of
B.
Proof of Theorem 8.3.1. Let T be any spanning tree in the network G = (V, A) with
network matrix A. We first show that the node-arc incidence matrix B associated with T is a
network basis matrix in A. Clearly, B is an (m, m−1)-submatrix of A. Since B is the node-arc
incidence matrix of a connected digraph (a tree is connected; see Appendix C), it follows from
Theorem 8.1.8 that the rank of B is m − 1. Hence, B is a network basis matrix of A.
We now show that each network basis matrix B in A corresponds to a spanning tree in G.
Let B be a network basis matrix, and let GB be the subgraph of G whose node-arc incidence
matrix is B. Since B is an (m, m − 1)-matrix of full column rank m − 1, GB has m nodes and
m − 1 arcs. Hence, GB is a spanning subgraph of G. If GB has no cycles, then it follows that
GB is a tree (see Appendix C) and we are done. Therefore, we need to show that GB contains
no cycles.
Assume for a contradiction that GB does contain some cycle C . Let v1 , . . . , vp be the nodes
of C in consecutive order, and define vp+1 = v1 . Let i ∈ {1, . . . , p}. Because C is a cycle, we
have that either (vi , vi+1 ) ∈ C , or (vi+1 , vi ) ∈ C . Define:
(
A(vi ,vi+1 ) if (vi , vi+1 ) ∈ C
w(vi , vi+1 ) =
−A(vi+1 ,vi ) if (vi+1 , vi ) ∈ C .
Note that w(vi , vi+1 ) is a column of B, multiplied by 1 or −1. Recall that A(i,j) = ei − ej ,
and observe that, for any two consecutive nodes vi , vi+1 , we have two possibilities:
(a) (vi , vi+1 ) ∈ C . Then, w(vi , vi+1 ) = A(vi ,vi+1 ) = evi − evi+1 .
(b) (vi+1 , vi ) ∈ C . Then, w(vi , vi+1 ) = −A(vi ,vi+1 ) = −(evi+1 − evi ) = evi − evi+1 .
Thus, in either case, we have that w(vi , vi+1 ) = evi − evi+1 and hence:
p
X p
X p
X p
X
w(vi , vi+1 ) = evi − evi+1 = evi − evi+1 = 0.
i=1 i=1 i=1 i=1
This means that the columns of B corresponding to the arcs of C are linearly dependent, con-
tradicting the fact that B has full column rank m − 1.
In the following theorem, BI is, as usual, the index set of the columns of B, and N I is
the index set of the remaining columns of A. The theorem states that for any network basis
matrix of the network matrix A, the set of m equalities BxBI = b has a (not necessarily
nonnegative) solution, provided that the sum of all entries of b is zero. If N consists of the
8 . 3 . Th e n e t w o r k s i m p l e x a l g o r i t h m 371
Proof of Theorem 8.3.2. Let G = (V, A) be a network with m nodes, and let A be the
associated network matrix. Moreover, let B be a network basis matrix, and let TB be the
corresponding spanning tree in G. Let v ∈ V , and construct B0 from B by deleting the row
corresponding to v . So, B0 is an (m − 1, m − 1)-matrix. Similarly, construct b0 from b by
deleting the entry corresponding to v . Let bv be that entry. We have that:
0 0
B b
B≡ , and b ≡ .
Bv,? bv
(Recall Bv,? denotes the v ’th row of B.) Since B is the node-arc incidence matrix of a con-
nected digraph (namely, the spanning tree TB ), it follows from Theorem 8.1.8 that B0 is non-
singular and has rank m − 1. Let x̂ = (B0 )−1 b0 . We need to show that x̂ satisfies Bx̂ = b.
Obviously, it satisfies all equations of Bx̂ = b that do not correspond to node v . So it remains
to show that Bv,? x̂ = bv . To see that this holds, recall that the rows of B add up to the
all-zero vector, and that the entries of b add up to zero. Hence, Bv,? = − i6=v Bi,? , and
P
There is an elegant way to calculate a solution of the system Bx = b by using the tree
structure of the matrix B. Recall that the separate equations of the system Ax = b
represent the flow-preservation conditions for the network. Since we want to solve the
system Bx = b, this means that we want to find a flow vector x such that the flow-
preservation conditions hold, and all nonbasic variables (i.e., the variables corresponding to
arcs that are not in the tree TB ) have value zero. This means that we can only use the arcs
in the tree TB to satisfy the flow-preservation conditions.
372 C hap te r 8 . L i near networ k mode l s
Consider the tree of Figure 8.15. It is a spanning tree in the network of Figure 8.14. We can
find a value for x(i,j) with (i, j) ∈ BI as follows. The demand at node 2 is 5. The flow-
preservation condition corresponding to node 2 states that this demand has to be satisfied by
the flow vector x. Since (1, 2) is the only arc incident with node 2 that is allowed to ship
a positive number of units of flow, arc (1, 2) has to be used to satisfy the demand at node
2. This means that we are forced to choose x(1,2) = 5. Similarly, we are forced to choose
x(5,4) = 2 and x(5,6) = 4. This process can now be iterated. We have that x(1,2) = 5,
the total supply at node 1 is 8, and all supply at node 1 should be used. There are two
arcs incident with node 1 in the tree TB , namely (1, 2) and (1, 3). Since we already know
the value of x(1,2) , this means that arc (1, 3) has to be used to satisfy the flow-preservation
condition for node 1. That is, we are forced to choose x(1,3) = 3. This leaves only one arc
with an undetermined flow value, namely arc (3, 5). Because b3 = 3 and x(1,3) = 3, we
have again no choice but to set x(3,5) = 6 in order to satisfy the flow-preservation condition
for node 3. Thus, we have determined a solution of the system Bx = b. Note that while
constructing this solution, for each (i, j) ∈ A, x(i,j) was forced to have a particular value.
So, in fact, the solution that we find is unique.
Note that the system of equations (8.17) is equivalent to BT y = cBI . The reduced cost of
arc (i, j) ∈ A (with respect to the node potential vector y) is denoted and defined as:
According to (8.17), the reduced cost of any arc (i, j) in the tree TB is zero, whereas the
reduced cost of any other arc may be positive, negative, or zero.
The node potential vector and the reduced costs of the arcs can be calculated in a simple
manner. First note that, if y is a node potential vector, then we may add the same number
to each of its entries, and obtain a new node potential vector. Therefore, there are in general
infinitely many node potential vectors for any given network basis matrix. However, setting
one of the entries of the node potential vector (e.g., y1 ) equal to zero results in a unique
solution. Now, expression (8.18) together with (8.17) enable us to calculate c̄(i,j) for all
(i, j) ∈ A, and yi for all i ∈ N . This process is best illustrated with an example.
8 . 3 . Th e n e t w o r k s i m p l e x a l g o r i t h m 373
y1 = 0 y3 = −2 y5 = −8
1 3 5 1 3 5
2
12
2 4 6 2 4 6
−4 2
y2 = −3 y4 = −12 y6 = −11
(a) The entries of the node potential vector. (b) Reduced costs c̄(i,j) .
Figure 8.16: Calculating the node potential vector and the reduced costs.
Example 8.3.3. Consider the network basis matrix B associated with the tree in Figure 8.15. The
entries of the node potential vector are written next to the corresponding nodes in Figure 8.16. The values
of the reduced costs c̄(i,j) are written next to the arcs. The calculations are as follows (see Figure 8.16(a)).
Take y1 = 0. Using (8.17) applied to arc (1, 2), yields that y2 = y1 − c(1,2) = 0 − 3 = −3.
Hence, y2 = −3. Doing this for all arcs in the tree TB , we find that:
For each (i, j) that is not in the tree TB , the value of c(i,j) is calculated as follows (see Figure 8.16(b)):
c̄(2,3) = c(2,3) − y2 + y3 = 1 + 3 − 2 = 2,
c̄(2,4) = c(2,4) − y2 + y4 = 5 + 3 − 12 = −4,
c̄(4,3) = c(4,3) − y4 + y3 = 2 + 12 − 2 = 12,
c̄(4,6) = c(4,6) − y4 + y6 = 1 + 12 − 11 = 2.
As we will show below, if all arcs have nonnegative reduced cost, then the current tree
solution is optimal.
If
c̄(i,j) = 0 for (i, j) ∈ TB , and
(8.19)
c̄(i,j) ≥ 0 for (i, j) ∈ A \ TB ,
then B corresponds to an optimal feasible solution.
374 C hap te r 8 . L i near networ k mode l s
Proof. Let x̂ (∈ Rn ) be the tree solution corresponding to B. We will use the complementary
slackness relations to show that x̂ is an optimal solution of Model 8.3.1. The dual model of
Model 8.3.1 is: X
max bi yi
i∈V
(8.20)
s.t. yi − yj ≤ c(i,j) for (i, j) ∈ A
yi free for i ∈ V.
Note that ŷi − ŷj = c(i,j) − c̄(i,j) ≤ c(i,j) for (i, j) ∈ A. Hence, the node potential vector
ŷ is a feasible solution of this dual model. The complementary slackness relations state that,
for each arc (i, j) ∈ A, we should have that either x̂(i,j) = 0, or ŷi − ŷj = c(i,j) . Note
that if (i, j) ∈ A \ TB , then x(i,j) = 0. Moreover, for each arc (i, j) ∈ TB , we have that
ŷi − ŷj = c(i,j) . Hence, the complementary slackness relations hold, and Theorem 4.3.1
implies that x̂ is an optimal solution of Model 8.3.1, and ŷ is an optimal solution of (8.20).
If (8.19) is not satisfied, then this means that there is an arc (i, j) that is not in the tree and
that satisfies c̄(i,j) = c(i,j) − yi + yj < 0. This is, for instance, the case in the example
above, where c̄(2,4) < 0, and so B does not correspond to an optimal feasible tree solution.
In the next section it will be explained how to proceed from here.
3−t 6−t 1 4
1 3 5 1 3 5
2−t
5+t 4 7 4
2 4 6 2 4 6
t 2
(a) Maintaining feasibility. (b) Updated feasible tree solution.
set of arcs of C(k,l) that point into the direction opposite to the direction of (k, l). Define:
−
∞ if C(k,l) =∅
∆= n
−
o
min x(i,j) (i, j) ∈ C(k,l) otherwise.
Note that this is a special case of the minimum-ratio test of the simplex algorithm (see
Section 3.3). The reader is asked to show this in Exercise 8.4.12. An arc (u, v) ∈ C(k,l) for
which x(u,v) −∆ = 0 is a candidate for leaving the tree TB in favor of (k, l), creating a new
spanning tree in G. The objective value corresponding to the new feasible tree solution is
at most the current objective value, since c̄(k,l) < 0. The new value of x(k,l) satisfies
x(k,l) = ∆ ≥ 0.
If ∆ = 0, then the entering arc (k, l) maintains a zero flow, and so the objective value does
not improve. The case ∆ = 0 happens if the current tree TB contains an arc for which the
current flow is zero; such a tree is called a degenerate spanning tree. Clearly, the corresponding
feasible tree solution is then degenerate, i.e., at least one of the basic variables has a zero
value.
If all arcs in the cycle C(k,l) point into the direction of (k, l), then the value of t can be
chosen arbitrarily large, i.e., ∆ = ∞. This means that the objective value can be made
arbitrarily large, i.e., the model is unbounded; see Section 3.7. On the other hand, if c ≥ 0,
then for every feasible tree solution x, it holds that cT x ≥ 0. Therefore the model is never
unbounded if c ≥ 0.
Example 8.3.4. In the case of the cycle C(2,4) of Figure 8.17(a), the entering arc is (5, 4), because
∆ = max{t | 2 − t ≥ 0, 6 − t ≥ 0, 3 − t ≥ 0} = 2;
the maximum is attained for arc (5, 4). The new tree, with the updated feasible tree solution, is
depicted in Figure 8.17(b).
Input: Values for the entries of the (m, n) network matrix A associated with a
network with m nodes and n arcs, the cost vector c (∈ Rn ), and the supply-
demand vector b (∈ Rm ) with the sum of the entries
T equal
to 0, and an ini-
tial feasible tree solution for the LO-model min c x Ax = b, x ≥ 0 .
Output: Either
I the message: the model is unbounded; or
I an optimal solution of the model.
I Step 1: Calculation of the node potential vector.
Let B be the current network basis matrix. Determine values of the node
potentials y1 , . . . , ym such that y1 = 0 and yi − yj = c(i,j) for each arc
(i, j) in the tree TB ; i, j ∈ {1, . . . , m}; see Section 8.3.3.
I Step 2: Selection of an entering arc.
Calculate c̄(i,j) = c(i,j) − yi + yj for each arc (i, j) not in TB . Select
an arc (k, l) with c̄(k,l) < 0. If such an arc does not exist, then stop: the
current feasible tree solution is optimal.
I Step 3: Selection of a leaving arc.
Augment the current tree with the arc (k, l). Let C(k,l) be the resulting
−
cycle, and let C(k,l) be the set of arcs of C(k,l) that point in the direction
opposite to (k, l). Define:
−
∞ if C(k,l) =∅
∆= n
−
o
min x(i,j) (i, j) ∈ C(k,l) otherwise.
x(k,l) := ∆;
+
x(i,j) := x(i,j) + ∆ for (i, j) ∈ C(k,l) (the set of arcs of C(k,l) that have
the same direction as (k, l));
−
x(i,j) := x(i,j) − ∆ for (i, j) ∈ C(k,l) .
Return to Step 1.
Example 8.3.5. We will apply Algorithm 8.3.1 to the example of Figure 8.14. The procedure
starts with the spanning tree drawn in Figure 8.15. Figure 8.18 shows the various iterations of the
network simplex algorithm that lead to an optimal solution. Column 1 of Figure 8.18 contains the
(primal) feasible tree solutions. The supply and demand values in the vector b are written next to the
8 . 3 . Th e n e t w o r k s i m p l e x a l g o r i t h m 377
bi bj yi yj updated x(i,j)
x(i,j) c(i,j) c̃(i,j)
i j i j i j i j
Iteration 1:
8 3 0 0 −2 −8
3 6 2 6 3−∆ 6−∆
1 3 5 1 3 5 1 3 5 1 3 5
5 2 4 3 4 3 2 12 5+∆ 2−∆ 4
2 4 6 2 4 6 2 4 6 2 4 6
−4 2 ∆=2
−5 −2 −4 −3 −12 −11
Feasible tree solution Node potentials Reduced costs Pivot
Iteration 2:
8 3 0 0 −2 −8
1 4 2 6 1−∆ 4−∆
1 3 5 1 3 5 1 3 5 1 3 5
7 4 3 3 2 8 4 7+∆ 4−∆
2 4 6 2 4 6 2 4 6 2 4 6
2 5 −2 2+∆ ∆=1
−5 −2 −4 −3 −8 −11
Feasible tree solution Node potentials Reduced costs Pivot
Iteration 3:
8 3 0 0 0 −6
3 6 2
1 3 5 1 3 5 1 3 5
8 3 3 3 4 10 2
2 4 6 2 4 6 2 4 6
3 1 5 1
−5 −2 −4 −3 −8 −9
Feasible tree solution Node potentials Reduced costs
Figure 8.18: Three iterations of the network simplex algorithm, and an optimal solution. The arc above
each column explains the meaning of the node and arc labels in that column.
nodes, and the flow values x(i,j) are written next to the arcs. The arcs corresponding to the nonbasic
variables, which have flow value zero, are omitted. Column 2 contains the current node potential vector.
The values of the yi ’s are written next to the nodes. The numbers next to the arcs are the values of
the flow cost c(i,j) . The node potential vector is calculated by the formula c(i,j) = yi − yj (see
Step 1). Column 3 contains the values of the current reduced cost c̄(i,j) , which are calculated by the
formula c̄(i,j) = c(i,j) − yi + yj . An arc with the most negative value of c̄(i,j) enters the current
tree, resulting in a unique cycle (see Step 2). Column 4 shows this unique cycle. The values of x(i,j)
are updated by subtracting and adding ∆. An arc (i, j) with x(i,j) = ∆ leaves the current tree (see
Step 3).
We will briefly describe the different iterations for the data of Figure 8.14.
I Iteration 1. There is only one negative value of c̄(i,j) , namely c̄(2,4) = −4. Hence, (k, l) =
(2, 4) is the entering arc. The leaving arc is (u, v) = (5, 4). The values of x(i,j) are updated
and used as input for Iteration 2.
I Iteration 2. One can easily check that (k, l) = (4, 6), and (u, v) = (1, 3).
378 C hap te r 8 . L i near networ k mode l s
I Iteration 3. Now all values of c̄(i,j) are nonnegative, and so an optimal solution has been reached.
min cT x min cT x
s.t. Ax = b (P) s.t. Ãx = b̃ (P̃)
x ≥ 0. x ≥ 0.
Model (P) is the original transshipment model, and model (P̃) is the same model, but
with the first constraint removed. The following theorem shows that the two models are
equivalent, in the sense that there is a one-to-one correspondence between the feasible tree
solutions of (P) and the feasible basic solutions of (P̃), and the corresponding objective values
coincide.
Theorem 8.3.4.
Every feasible tree solution of (P) corresponds to a feasible basic solution of (P̃), and
vice versa.
x
h i
Proof. Let B be a network basis matrix corresponding to the feasible tree solution x BI of
NI
(P). Construct B̃ from B by removing row m; let aT be this row. Construct b̃ analogously.
Clearly,
B̃xBI b̃
BxBI = T = ,
a xBI bm
8 . 3 . Th e n e t w o r k s i m p l e x a l g o r i t h m 379
x
h i
and hence we have that B̃xBI = b̃. Therefore, x BI is a feasible basic solution of model (P̃).
NI
Moreover, since the objective vectors of (P) and (P̃) are the same, the objective value of the
feasible tree solution of (P) equals the objective value of the feasible basic solution of (P̃). The
converse is left to the reader.
Since we consider minimizing models in this section, feasible basic solutions are optimal
if and only if the corresponding objective coefficientsi are nonnegative. Applying this to
xBI
h
(P̃), this means that the feasible basic solution x with corresponding basis matrix B̃ is
NI
optimal if and only if:
In order to determine optimality using the network simplex algorithm, we use the system
(8.19) in Theorem 8.3.3, which states that as soon as we have found a feasible tree solution
such that, for all (i, j) ∈ A, the reduced cost c̄(i,j) has a nonnegative value, we have a found
an optimal tree solution. Theorem 8.3.5 shows that optimality criterion (8.19) that is used
in the network simplex algorithm is in fact equivalent to (8.21) that is used in the (regular)
simplex algorithm. This means that the network simplex algorithm can be viewed a special
case of the regular simplex algorithm.
Theorem 8.3.5.
A feasible tree solution of (P) together with a node potential vector satisfies (8.19) if
and only if the corresponding feasible basic solution of (P̃) satisfies conditions (8.21).
Let α ∈ {1, . . . , n − m + 1}. We may assume without loss of generality that ym = 0. Recall
that, since y is a node potential vector, we have that hy isi a solution of the system BT y = cBI ;
ỹ
see Section 8.3.3. Since ym = 0, we may write y = 0 , and hence we have that:
T T T B̃ T
cBI = y B = ỹ 0 T = ỹ B̃.
a
Postmultiplying both sides of this equation by B̃−1 gives ỹ = cTBI B̃−1 . Recall that N Iα is
T −1 T
the index of arc (i, j) in A \ TB . Hence, substituting cBI B̃ = ỹ , we find that (8.21) is
380 C hap te r 8 . L i near networ k mode l s
equivalent to:
T −1 T T
Ñ?,α
cN Iα − cBI B̃ Ñ?,α = cN Iα − ỹ Ñ?,α = cN Iα − ỹ 0 T
u
T T
= cN Iα − y N?,α = cN Iα − y (ej − ei )
= c(i,j) + ỹi − ỹj ≥ 0,
where we have used the fact that N?,α = ei − ej . This proves the theorem.
Note also that it follows from the proof of Theorem 8.3.5 that the value of the reduced cost
c̄(i,j) of arc (i, j) is exactly the current objective coefficient of the variable x(i,j) .
The flow costs on all dummy arcs are taken to be some sufficiently large number M . This
means that these arcs are so highly ‘penalized’ that the flow on these arcs should be zero
for any optimal solution. If an optimal solution is found that has a positive flow value on
a dummy arc, then the original problem is infeasible. Compare the working of the big-M
procedure in Section 3.6 in this respect.
As an initial feasible tree solution we take the flow values on the dummy arcs equal to the
absolute values of the corresponding bi ’s, and the remaining flows are taken to be zero. Note
that, if one of the bi ’s has value zero, then the initial solution is degenerate. In Figure 8.19(c)
the initial spanning tree and the corresponding flow values are depicted. Since the total
supply and demand are (assumed to be) balanced, the net flow into node 0 is exactly zero.
8 . 3 . Th e n e t w o r k s i m p l e x a l g o r i t h m 381
10 0 10 0 10 0
1 3 2 1 3 2 1 2
5 4 6 5 4 6
3 4 M 3 2 4 M 10 3 4 9
2
−4 −6 −4 M M −6 −4 4 6 −6
0 0
0 0
(a) Transshipment (b) Augmented network with (c) Initial feasible tree for the
network. artificial arcs with cost M . augmented network.
Figure 8.19: Initial feasible tree for the network big-M procedure.
It is left to the reader to carry out the different iterations of the network simplex algorithm
applied to the network of Figure 8.19(b) with the initial feasible tree solution of Figure
8.19(c); see Exercise 8.4.13. Analogously to the network big-M procedure, the network two-
phase procedure can be formulated. The reader is asked to describe this procedure in Exercise
8.4.14, and to use it to find an optimal feasible tree solution for the network of Figure 8.19.
e, −
1
b, −
f,
c, 3
1
1
d, 1
3
a,
h, −3
g, 3
2 3
i, 3
j, −3
Figure 8.20: Cunningham-Klincewicz network that may cause cycling of the network simplex algorithm.
This example is due to William H. Cunningham (born 1947) and John G. Klincewicz (born
1954). In Table 8.3, we have listed the feasible trees that occur at each iteration, in case the entering and
leaving arcs are the ones listed in the third and fifth column, respectively. In this example, the selection
rule for the entering arc is not relevant; the leaving arcs are selected arbitrarily. The fourth column of
Table 8.4(a) contains the cycle that occurs when the feasible tree is augmented with the entering arc. It
is left to the reader to carry out the various calculations. After ten iterations, we arrive at the initial
feasible tree, consisting of the arcs a and i.
It is also left to the reader to determine the network simplex adjacency graph of the Cunningham-
Klincewicz example. This graph consists of (3 × 3 × 4 =) 36 nodes, namely all possible spanning
trees, and there is an arc between two nodes u and v , if v can be derived from u by means of an
iteration step of the network simplex algorithm. Compare in this respect the definition of the simplex
adjacency graph; see Section 3.4.
Now that we have seen that cycling may occur, we will show how it can be avoided. As
said, strongly feasible spanning trees (to be defined below) play a key role.
In Step 1 of the network simplex algorithm, we have chosen y1 = 0 as the start of the
calculations of the remaining entries of the node potential vector. This choice is arbitrary,
since choosing any value of any entry yi leads to a node potential vector; see Section 8.3.3.
Crucial in the cycle-avoiding procedure is the fact that one node is kept fixed during the
whole iteration process, and that for this node (called the root node of the network) the value
of the corresponding entry of the node potential vector is also kept fixed during the process.
Node 1 is taken as the root node.
From Theorem C.3.1 in Appendix C, we know that there exists a unique path between any
pair of nodes in the tree, where the directions of the arcs are not taken into account (so in
such a path the arcs may have opposite directions). We call the arc (i, j) directed away from
the root in the spanning tree TB , if node i is encountered before node j when traversing the
unique path from the root. For any network basis matrix B, a feasible tree TB is called a
8 . 3 . Th e n e t w o r k s i m p l e x a l g o r i t h m 383
strongly feasible spanning tree if every arc in TB that has zero flow is directed away from the
root in TB . The following theorem holds.
Proof of Theorem 8.3.6. It can be easily checked that cycling can only occur in a sequence
of degenerate steps; i.e., the corresponding feasible tree solutions contain at least one arc with
a zero flow. After a degenerate iteration, the deletion of the last entered arc, say (k, l), splits the
current strongly feasible spanning tree TB into two disjoint subtrees. Let Tr denote the subtree
containing the root, and let To denote the other subtree. Note that TB \ {(k, l)} = Tr ∪ To ,
and that Tr and To are disjoint trees, both containing one end node of the arc (k, l).
Since TB is a strongly feasible spanning tree, both the root node 1 and node k are in Tr , and
l ∈ To . The fact that 1 ∈ Tr implies that the entries of the node potential vector corresponding
to the nodes in Tr did not change in the last iteration. For the ‘new’ value of yl , denoted by
yl,new , the following holds (the variables yk,new , yk,old , and yl,old have obvious meanings):
yl,new = yk,new − c(k,l) = yk,old − c(k,l) = yl,old − c̄(k,l) ,
with c̄(k,l) = c(k,l) − yk,old + yl,old . For each node j ∈ To , the ‘new’ value of yj can be
determined by subtracting c̄(k,l) from its ‘old’ value. Hence,
(
yj,old if j ∈ Tr
yj,new =
yj,old − c̄(k,l) if j ∈ To .
Since c̄(k,l) < 0 (because (k, l) is an entering arc), it follows that:
X X
yj,new > yj,old .
j∈V j∈V
384 C hap te r 8 . L i near networ k mode l s
h h
3−∆ 5−∆
0+∆
3−∆
4+∆ orientation orientation
(leaving)
k 0−∆
∆=3
1+∆
(entering)
0+∆
l ∆=0
(entering) 0−∆
6−∆ 3−∆ k l (leaving)
(a) (b)
Figure 8.21: Leaving arc rule. The orientation of the cycle is in the direction of the entering arc (k, l).
Recall that, in each iteration, we choose y1 = 0, and so for any spanning tree, the values of
the yj ’s are fixed. Hence, the ‘new’ spanning tree is different from the ‘old’ one. In fact, we
have proved that, in each iteration, the sum of the entries of the node potential vector strictly
increases. Therefore, the algorithm never encounters the same spanning tree twice and, hence,
cycling does not occur.
The remaining question is how to find and maintain a strongly feasible spanning tree. De-
termining an initial strongly feasible spanning tree poses no problem when enough artificial
arcs are added to the network. This can be done by following the approach of the big-M
procedure (see Section 3.6.1). Once a strongly feasible spanning tree solution has been ob-
tained, it can be maintained by selecting the leaving arc according to the following leaving
arc rule.
Leaving arc rule: Let (k, l) be the entering arc of the current strongly feasible spanning
tree TB , and let C(k,l) be the unique cycle in TB ∪ {(k, l)}. There are two cases to be
considered:
(a) The iteration is nondegenerate. In order to maintain a strongly feasible spanning tree,
the first zero-flow arc in C(k,l) , when traversing C(k,l) in the direction of (k, l) and
starting in the node closest to the root, has to be the leaving arc. At least one arc obtains
a zero flow after adding and subtracting ∆ (> 0), as in Step 3 of the network simplex
algorithm. For example, consider Figure 8.21(a), where ∆ = 3 and three arcs obtain
zero flow after subtracting ∆. Note that, in fact, all arcs that have been assigned zero
flow are directed away from the root.
8 . 4 . E xe rc i se s 385
(b) The iteration is degenerate, and at least one arc has zero flow in the direction opposite
to (k, l). The leaving arc should be the first one that is encountered when traversing
C(k,l) in the direction of (k, l) and starting in (k, l). In this case ∆ = 0. The zero-
flow arcs are directed away from the root. For example, in Figure 8.21(b), there are four
zero-flow arcs of which two are in the direction opposite to (k, l).
In both cases (a) and (b), the new spanning tree has all its zero-flow arcs pointing away from
the root, and so it is in fact a strongly feasible spanning tree.
Example 8.3.8. We apply the anti-cycling procedure described above to the example of Figure
8.20. The following iterations may occur. In all iterations, we take y1 = 0.
I Iteration 1. Start with the strongly feasible spanning tree {a, d}. Then y2 = −1 and y3 = −1.
Since c̄b = −2, arc a leaves and b enters the tree.
I Iteration 2. The new strongly (!) feasible spanning tree is {b, d}. Then y2 = 1 and y3 = −1.
Since c̄h = −5, arc d leaves and h enters the tree.
I Iteration 3. The new strongly feasible spanning tree is {b, h}. Then y2 = 1 and y3 = 4. Take
j with c̄j = −6 as the entering arc. Extending the tree {b, h} with j , leads to the cycle consisting
of j and h. This cycle does not contain arcs that point opposite to j , and so the flow on this cycle
can increase without bound, while decreasing the cost without bound. Hence, as could have been
expected, this problem has no bounded solution.
Note that in fact a strongly feasible tree has been maintained, and that cycling did not occur.
8.4 Exercises
Exercise 8.4.1. Determine whether or not the following matrices are totally unimodular.
1 0 −1 1 0 −1 0 0
(a) 1 1 0 1 −1 0 −1 0
(d)
1 0 −1 0 1 −1 0 −1
0 0 0 1 −1
1 −1 0 −1 0
−1 0 0 0 1
(b)
1 1 0
0 0 1 1 0 (e) 0
1 0
0 0 −1 0 0 1 1 −1
0 1 0 0 1
−1 1 1 1
−1 1 0 0 0
(f ) 0 −1 −1 −1
(c) 0 0 1 −1 0
−1 0 0 −1
0 0 −1 −1 1
1 0 0 0 0
386 C hap te r 8 . L i near networ k mode l s
Exercise 8.4.2.
(a) Give an example of a full row rank unimodular matrix that is not totally unimodular.
(b) Give an example of a full row rank totally unimodular matrix with at least one positive
and at least one negative entry in each column, that does not satisfy the conditions of
Theorem 8.1.4.
Exercise 8.4.4. Show, by calculating all feasible basic solutions, that the region
1 0 0 1 0 0
x ∈ R6 1 1 0 0 1 0 x ≤ b, x ≥ 0
1 0 1 0 0 1
has only integer vertices when b is an integer vector (see Section 8.1.1).
Exercise 8.4.5. Consider the transportation problem as formulated in Section 4.3.3 for
m = 3 and n = 4. Consider the problem with the following data:
8 6 10 9 T T
C = 9 12 13 7 , a = 35 50 40 , and b = 45 20 30 30 ,
14 9 16 5
with C the cost matrix, a the supply vector, and b the demand vector. Find an optimal
feasible solution by using the complementary slackness relations, given that an optimal dual
T T
solution is {yi } = 2 5 5 with i the index of the supply nodes, and {yj } = 4 4 8 0
Exercise 8.4.6. Consider the transportation problem in Figure 8.4 of Section 8.2.1 and
the corresponding ILO-model (8.4).
(a) Find a feasible solution of the ILO-model (and hence of its LO-relaxation).
(b) Find an initial feasible basic solution for the LO-relaxation that can start the simplex
algorithm.
Exercise 8.4.7. Show that Model 8.2.2 has an optimal solution if and only if the supply-
demand balance equation (8.5) holds.
Exercise 8.4.8. This exercise is a continuation of Exercise 8.4.6. Consider the (general)
transportation problem of Section 8.2.1 and the corresponding ILO-model, Model 8.2.1.
(a) Show how to construct a feasible solution for Model 8.2.1.
8 . 4 . E xe rc i se s 387
(b) Show how to construct an initial feasible basic solution for the LO-relaxation of Model
8.2.1.
Exercise 8.4.10. Show that for every feasible flow in the network of a transshipment
problem, there exists a spanning tree corresponding to a flow with at most the same cost.
Exercise 8.4.11. Give two examples of feasible flows that contain a cycle, and show that
the corresponding node-arc incidence matrices have linearly dependent columns; namely
in the following two cases.
(a) The network has two nodes and two arcs.
(b) The network has three nodes, four arcs, and two cycles.
Exercise 8.4.12. Show that step 3 of the network simplex algorithm, Algorithm 8.3.1,
when applied to (P) is equivalent to the minimum-ratio test when the simplex algorithm is
applied to (P̃).
Exercise 8.4.13. Use the big-M procedure to calculate an optimal solution of the trans-
shipment problem corresponding to the data of the network of Figure 8.19.
Exercise 8.4.14. Describe how the two-phase procedure can be applied to the transship-
ment problem.
Exercise 8.4.15. Consider a digraph G = (V, A). Let s and t be different nodes in
V , and let cij be defined as the distance from node i to node j . Show that the problem of
finding a shortest path from s to t can be formulated as a minimum cost flow problem.
Exercise 8.4.16. In Section 8.3.8, it is claimed that the number of spanning trees in any
given network is finite.
(a) Consider again the network in Figure 8.20. Calculate the total number of spanning
trees in this network.
(b) Let G be any network with m nodes and n edges. Show that the number of spanning
trees in G is finite.
Exercise 8.4.17. Consider the network G with node set {1, . . . , n}, where 1 is the
source node, and n is the sink node. Assume that flow is preserved at all nodes 2, . . ., n − 1.
Show that the net flow out of node 1 equals the net flow into node n.
388 C hap te r 8 . L i near networ k mode l s
Exercise 8.4.18. Consider the supply-demand problem with the following data:
Supply Demand
Node A 12 0
Node B 0 0
Node C 0 3
Node D 0 9
There are only deliveries from A to B (notation AB ), AC , BC , BD, and CD; the costs
per unit flow are 3, 5, 4, 6, 2, respectively. The problem is to satisfy the demand against
minimum costs.
(a) Formulate this problem as a transshipment problem, and determine the node-arc inci-
dence matrix.
(b) Determine an initial feasible tree solution by applying the network big-M procedure.
(c) Determine an optimal solution by means of the network simplex algorithm. Why is
this solution optimal?
Month Month
1 2 3 1 2 3
Regular time 100 200 150 Month 1 Regular time 8 10 12
Overtime 40 80 60 Overtime 11 13 15
Demand 90 200 270 Month 2 Regular time 8 10
Overtime 11 13
Month 3 Regular time 8
Overtime 11
(a) Formulate this problem as an ILO-model, and determine an optimal production sched-
ule.
(b) This problem can be considered as a special case of a problem described in this chapter.
Which problem is that?
Exercise 8.4.22. A salesperson has to visit five companies in five different cities. She
decides to travel from her home town to the five cities, one after the other, and then return-
ing home. The distances between the six cities (including the salesperson’s home town) are
given in the following matrix:
Home 1 2 3 4 5
Home − 12 24 30 40 16
1 12 − 35 16 30 25
2 24 35 − 28 20 12
3 30 16 28 − 12 40
4 40 30 20 12 − 32
5 16 25 12 40 32 −
(a) Write an ILO-model that can be used to find a shortest route along the six cities with
the same begin and end city; determine such a shortest route.
(b) This problem can be seen as a special case of a problem described in this chapter. Which
problem is that?
390 C hap te r 8 . L i near networ k mode l s
(c) Suppose that the salesperson wants to start her trip in her home town and finish her
trip in town 4, while visiting all other cities. Formulate an ILO-model of this problem
and determine and optimal solution.
Exercise 8.4.23. The distances between four cities are given in the following table:
Cities
1 2 3 4
City 1 0 6 3 9
City 2 6 0 2 3
City 3 3 2 0 6
City 4 9 3 6 0
(a) Write an ILO-model that can be used to determine a shortest path from city 1 to 4;
determine a shortest path from 1 to 4.
(b) Why are all feasible basic solutions integer-valued?
(c) Formulate this problem as a minimum cost flow problem. Show that the technology
matrix is totally unimodular. Solve the problem.
Exercise 8.4.24. In the harbor of Rotterdam, five ships have to be unloaded by means
of five cranes. The ships are labeled (i =) 1, . . . , 5, and the cranes (j =) 1, . . . , 5. Each
ship is unloaded by one crane, and each crane unloads one ship. The cargo of ship i is wi
volume units. Each crane j can handle cj volume units of cargo per hour. The problem is
to assign ships to cranes such that the total amount of unloading time is minimized.
(a) What type of problem is this?
(b) Determine the coefficients in the objective function, and formulate the problem as an
ILO-model.
In the table below, values for wi and cj are given.
i, j 1 2 3 4 5
wi 5 14 6 7 10
cj 4 2 8 1 5
(c) In order to determine a solution for this data set, one may expect that the ship with
the largest amount of cargo should be unloaded by the crane with the highest capacity.
Check this idea by determining an optimal solution.
(d) Another idea could be: minimize the maximum amount of time needed for a crane to
unload a ship. Formulate this new objective in mathematical terms, and determine an
optimal solution of the model with this new objective.
8 . 4 . E xe rc i se s 391
Exercise 8.4.25. During the summer season, a travel agency organizes daily sightseeing
tours. For each day, there are m (≥ 1) trips planned during the morning hours, and m trips
during the afternoon hours. There are also exactly m buses (with bus driver) available. So,
each bus has to be used twice per day: once for a morning trip, and once for an afternoon
trip.
As an example, take m = 5, and consider the following time table:
The bus drivers normally work eight hours per day. Sometimes, however, working more
than that is unavoidable and one or more bus drivers need to work overtime. (Overtime is
defined as the number of hours worked beyond the regular hours on a day.) Since overtime
hours are more expensive than regular hours, we want to keep the amount of overtime to
a minimum.
The problem is to find combinations of morning and afternoon trips for the bus drivers,
such that the total overtime is minimized.
(a) This problem can be seen as a special case of a problem described in this chapter. Which
problem is that?
(b) Formulate this problem as an ILO-model. Explain what the constraints mean in terms
of the original problem. Also pay attention to the fact that several combinations of trips
are not allowed.
Now assume that the bus drivers must have a 30-minute break between two trips.
(c) How can this constraint be incorporated into the model formulated under (b)?
(d) Solve the problem formulated in (c).
(e) Perform sensitivity analysis on the input parameters.
Exercise 8.4.26. Draw perturbation graphs of the values of the completion times for
the activities corresponding to the arcs (2, 3), (4, 5), and (5, 6) of the project network in
Section 8.2.6.
Exercise 8.4.27. Consider the project scheduling problem described in Section 8.2.6.
Prove or disprove the following statements.
392 C hap te r 8 . L i near networ k mode l s
(a) Increasing the execution time of an activity (i, j) on a critical path always increases the
total completion time of the project.
(b) Decreasing the execution time of an activity (i, j) on a critical path always decreases
the total completion time of the project.
Exercise 8.4.28. Consider a project network. Show that the fact that the network has no
directed cycles implies that there is at least one initial and at least one end goal.
Exercise 8.4.29. Consider the project scheduling problem described in Section 8.2.6.
(a) Show that any activity with zero slack is on a critical path.
(b) Show that every critical path of a project has the same total completion time.
Exercise 8.4.30. The publishing company Book & Co has recently signed a contract
with an author to publish and market a new textbook on linear optimization. The man-
agement wants to know the earliest possible completion date for the project. The relevant
data are given in the table below. A total of eight activities, labeled A1, . . ., A8, have to be
completed. The descriptions of these activities are given in the second column of this table.
The third column lists the estimated numbers of weeks needed to complete the activities.
Some activities can only be started when others have been finished. The last column of the
table lists the immediate predecessors of each activity.
Time Immediate
Label Description estimate predecessors
A1 Author preparation of manuscript 25 None
A2 Copy edit the manuscript 3 A1
A3 Correct the page proofs 10 A2
A4 Obtain all copyrights 15 A1
A5 Design marketing materials 9 A1
A6 Produce marketing materials 5 A5, A4
A7 Produce the final book 10 A3, A4
A8 Organize the shipping 2 A6, A7
Exercise 8.4.31. The company PHP produces medical instruments, and has decided to
open a new plant. The management has identified eleven major project activities, to be
8 . 4 . E xe rc i se s 393
completed before the actual production can start. The management has also specified the
activities (the immediate predecessors) that must be completed before a given activity can
begin. For each of the eleven activities, the execution time has been estimated. In the table
below, the results are listed.
Time Immediate
Activity Description estimate predecessors
A Select staff 13 None
B Select site 26 None
C Prepare final construction plans and layout 11 B
D Select equipment 11 A
E Bring utilities to site 39 B
F Interview applicants and fill positions 11 A
G Purchase equipment 36 C
H Construct the building 41 D
I Develop information system 16 A
J Install equipment 5 E, G, H
K Train staff 8 F, I, J
Exercise 8.4.32. There are three cities, labeled A, B, and C. The families in these three
cities are partitioned into three categories: “no children”, “one child”, and “at least two
children”. The table below lists census data on these families in the cities A, B, and C in
the form of percentages.
The entries of the table are percentages of the population in different categories. The bold
entries show the row and column sums, except for the bottom right entry which gives the
total of all entries.
The government wants to publish this table, but it wants to round all percentages in the table
(i.e., both the bold and the nonbold entries) to an integer. To make sure that the table does
not display any incorrect numbers, the numbers may only be rounded up or down, but not
necessarily to the nearest integer. At the same time, however, the resulting table needs to be
consistent, in the sense that the rows and columns have to add up to the listed totals.
(a) Show that the following simple approaches do not work: (i) rounding all percentages
to the nearest integer, (ii) rounding down all percentages, and (iii) rounding up all
percentages.
This problem can be viewed as a maximum flow problem on a network G = (V, A) with
V = {1, . . . , n} and A the set of arcs of G. In addition to the capacity k(i,j) , this problem
also has a lower bound l(i,j) on each arc (i, j). The general LO-formulation of this problem
reads:
X
max x(s,j)
(1,j)∈A
X X
s.t. x(j,i) = x(i,j) for i = 2, . . . , n − 1
(j,i)∈A (i,j)∈A
(b) Formulate a maximum flow model with lower bounds that solves the rounding problem.
(Hint: construct a graph that has a source and a sink node, one node for every row, and
one node for every column.) Why is it true that this model solves the rounding problem?
Exercise 8.4.33. In open-pit mining, blocks of earth are dug, starting from the surface,
to excavate the ore contained in them. During the mining process, the surface of the
land is excavated, forming a deeper and deeper pit until the mining operation terminates.
Usually, the final shape of this so-called open pit is determined before the mining operation
begins. A common approach in designing an optimal pit (i.e., one that maximizes profit) is
to divide the entire mining area into 3-dimensional blocks. Using geological information
from drill cores, the value of the ore in each block is estimated. Clearly, there is also a cost
of excavating each particular block. Thus, to each block in the mine, we can assign a profit
value. The objective of designing an optimal pit is then to choose blocks to be excavated
while maximizing the total profit. However, there are also constraints on which blocks can
be dug out: blocks underlying other blocks can only be excavated after the blocks on top
of them have been excavated.
As a special case, consider the two-dimensional pit drawn in Figure 8.22. The values in the
figure refer to the value of the ore in each block (×$1,000). Suppose that excavating one
block costs the same for each block, namely $2,500 per block.
8 . 4 . E xe rc i se s 395
0 2 1 0 0
4 0 6 2
5 3 8
This pit design problem can be represented using a directed graph, G = (V, A). Let n
(= 12) be the number of blocks. Let i = 1, . . . , n. We create a node for each block in
the mining area, and assign a weight bi representing the profit value of excavating block i.
There is a directed arc from node i to node j if block i cannot be excavated before block j ,
which is on a layer immediately above block i. If there is an arc from i to j , then j is called
a successor of i. To decide which blocks to excavate in order to maximize profit, we need to
find a maximum weight set of nodes in the graph such that all successors of all nodes in the
set are also included in the set. Such a set is called a maximum closure of G.
The maximum weight closure problem can be solved by a minimum cut algorithm on a
related graph G0 = (V 0 , A0 ). To construct the graph G0 , we add a source node s and sink
node t, and A0 consists of all arcs in A, as well as an arc from s to each node in V with
positive value (bi > 0), and an arc from each node in V with negative value (bi < 0) to t.
The capacity of each arc in A is set to ∞. For each arc (s, i), we set the capacity to bi and
for arc (j, t) we set the capacity to −bj .
(b) Draw the network G0 for the mining problem described above.
(c) Show that solving the mining problem as described above indeed solves the open pit
(maximum closure) problem.
(d) Solve the problem using a computer package.
This page intentionally left blank
9
C hap te r
Computational complexity
Overview
In this chapter we give a brief introduction to the theory of computational complexity.
For a more extensive account on this subject we refer the reader to Schrijver (1998). We
will successively examine Dantzig’s simplex algorithm from Chapter 3, the interior path
algorithm from Chapter 6, and the branch-and-bound algorithm from Chapter 7.
397
398 C h a p t e r 9 . C o m p u tat i o na l c o m p l e x i t y
problem has a solution, determine a point in the feasible region of the model that optimizes
the objective function’. Both the simplex algorithm (see Chapter 3) and the interior path
algorithm (see Chapter 6) solve linear optimization problems. Even an algorithm that simply
enumerates all (exponentially, but finitely, many) basic solutions and saves the current best
one, solves LO models.
In this chapter, we are not interested in the details of a specific implementation of the algo-
rithm used, but rather in its running time, defined as the number of arithmetic operations
(such as additions, subtractions, multiplications, divisions, and comparisons) required by the
algorithm. For instance, when computing the inner product aT b of two n-vectors a and
b, we need n multiplications and n − 1 additions, resulting in a total of 2n − 1 arith-
metic operations. The parameter n in this example is called the size of the problem. In the
case of linear optimization, the size is the total number of entries in the technology matrix,
the objective coefficient vector, and the right hand side vector. So, the size of a standard
LO-model with n variables and m constraints is nm + n + m.
A measure for the computer running time of an algorithm required to solve a certain instance
of a problem is its computational complexity, denoted using the so-called big-O notation. Let n
(n ≥ 1) be the size of the problem, and let f and g be functions that map positive numbers
to positive numbers. The notation
f (n) = O(g(n))
means that there exist positive numbers n0 and α such that f (n) ≤ αg(n) for each n ≥ n0 .
If the running time of a certain algorithm applied to a certain instance of a problem of size n
needs f (n) arithmetic operations and f (n) = O(g(n)), then we say that this algorithm runs
in order g(n) (notation: O(g(n)) time. For instance, the above ‘algorithm’ used to calculate
aT b for the n-vectors a and b has a running time of order n, because 2n − 1 = O(n).
The latter holds because for f (n) = 2n − 1 and g(n) = n, we may take n0 = 1 and
α = 2. Note that 3n3 + 3567n2 = O(n3 ), 2766n8 + 2n = O(2n ), and n! = O(2n ).
The above definition of running time depends on the numerical values of the input data.
Instead of trying to estimate the running time for all possible choices of the input (i.e., all
possible instances of the problem), it is generally accepted to estimate the running time for a
worst-case instance of the problem, i.e., for an input data set of size n for which the running
time of the algorithm is as large as possible. If this worst-case running time is f (n) and
f (n) = O(g(n)), then we say that the algorithm has a (worst-case) complexity of order g(n).
The notation is again O(g(n)), and g(n) is now called the complexity of the algorithm
applied to the problem of size n. This worst-case complexity, however, has the obvious
disadvantage that ’pathological’ instances determine the complexity of the algorithm. In
practice, the average-case complexity might be more relevant. However, one of the drawbacks
of using the average-case approach is that the average running time over all instances is in
general very difficult to estimate. For this reason we shall take the worst-case complexity
approach when comparing algorithms with respect to their computer running times.
9 . 1 . I n t r o d u c t i o n t o c o m p u tat i o na l c o m p l e x i t y 399
n Running time
2 3 10 n
1 log n n n n n 2 n!
−13 −13 −13 −13 −13 −13 −13
1 10 s 0 10 s 10 s 10 s 10 s 2×10 s 10 s
−13 −13 −12 −11 −10 −3 −10 −7
10 10 s 2×10 s 10 s 10 s 10 s 10 s 10 s 4×10 s
−13 −13 −11 −9 −7 9 137
100 10 s 5×10 s 10 s 10 s 10 s 115 d 4×10 y 10 y
−13 −13 −10 −7 −4 9 280 2547
1,000 10 s 7×10 s 10 s 10 s 10 s 10 y 10 y 10 y
−13 −13 −9 −5 −1 19 2989 35638
10,000 10 s 9×10 s 10 s 10 s 10 s 10 y 10 y 10 y
−13 −12 −8 −3 29 30082 456552
100,000 10 s 10 s 10 s 10 s 1m 10 y 10 y 10 y
Table 9.1: Growth of functions. (‘s’ = seconds, ‘m’ = minutes, ‘d’ = days, ‘y’ = years.) We assume that the
13
computer processor can perform 10 operations per second. All times are approximate. For
80
comparison, the number of atoms in the universe has been estimated at approximately 10 ,
9
and the age of the universe has been estimated at approximately 1.38 × 10 years.
significant. For example, an algorithm with complexity O(n100 ) can hardly be considered
as practically efficient.
Pj−1
The j ’th constraint (with j = 1, . . . , n) in this model is: xj + 2 k=1 xk ≤ 3j−1 .
Figure 9.1 shows the feasible region of the Klee-Minty example for n = 3. The corre-
sponding simplex adjacency graph (see Section 3.5.1) has eight nodes (namely, one for each
vertex of the feasible region), and twelve arcs (depicted in the figure). The general feasible
region has 2n vertices. We will show that the simplex algorithm, when using the following
pivot rules, passes through all 2n vertices before reaching the optimal solution. These pivot
rules are:
x3
0
" #
0
9
1
" #
0
7
1
" #
1
0
" #
5 0
" #
0
3
0
3
1
" #
0 1
" #
x1 0
" #
0 1 x2
3
0 0
Figure 9.1: The feasible region of the Klee-Minty example for n = 3. The arrows show the arcs of the
simplex adjacency graph.
Since the feasible region contains no degenerate vertices, no tie-breaking pivot rule is needed
for the variables that leave the current set of basic variables.
We calculate the first four simplex tableaus for the case n = 3:
I Initialization. BI = {4, 5, 6}.
x1 x2 x3 x4 x5 x6 −z
1 1 1 0 0 0 0
1 0 0 1 0 0 1
2 1 0 0 1 0 3
2 2 1 0 0 1 9
−1 1 1 0 0 0 −1
1 0 0 1 0 0 1
−2 1 0 0 1 0 1
−2 2 1 0 0 1 7
402 C h a p t e r 9 . C o m p u tat i o na l c o m p l e x i t y
1 −1 1 0 0 0 −2
1 0 0 1 0 0 1
−2 1 0 0 1 0 1
2 −1 1 0 0 1 5
column remains nonbasic during the first three iterations, so that the simplex algorithm first
passes through the vertices in the x1 x2 -plane before jumping to the vertices that are formed
by the third constraint. The simplex algorithm successively visits the vertices:
0 1 1 0 0 1 1 0
0 , 0 , 1 , 3 , 3 , 1 , 0 , 0 .
0 0 0 0 3 5 7 9
This example, for n = 3, hints at the situation we face for arbitrary n: The simplex al-
gorithm first passes through the 2k vertices formed by the first k constraints, and then
passes through the 2k vertices in the hyperplane formed by the (k + 1)’th constraint (for
k = 2, . . . , n − 1).
The ‘usual’ pivot rules force the simplex algorithm to perform 2n − 1 iterations, and to
traverse all 2n vertices of the feasible region. The Klee-Minty example therefore shows that
the simplex algorithm is not a theoretically efficient method for solving linear optimization
models, and we have the following theorem.
Theorem 9.2.1.
The worst-case complexity of the simplex algorithm for solving linear optimization
problems is exponential.
At present, for almost all pivot rules that have been formulated for the simplex algorithm,
there exist examples (often pathological) in which the algorithm is forced to traverse through
an exponential number of vertices. These examples also show that the simplex algorithm is
already exponential in the number of decision variables, even if we do not count the number
of arithmetic operations and the number of constraints. Hence, the simplex algorithm is
theoretically ‘bad’ (although in most practical situations very effective, as we have mentioned
already!). For a detailed account on this subject, we refer the reader to Terlaky and Zhang
(1993).
One may ask the question, whether or not there exists a pivot rule (different from Rules 1
and 2 above) for which the simplex algorithm finds an optimal solution of the Klee-Minty
9 . 3 . Th e i n t e r i o r pat h a l g o r i t h m h a s p o ly n o m i a l r u n n i n g t i m e 403
example in polynomial time. Actually, when pivoting on the xn -row in the Klee-Minty
example, the optimum is reached in only one pivot step.
Because of such disappointing discoveries, and also because an increasing number of prob-
lems (including many large-scale crew scheduling problems) could not be solved by means
of the simplex algorithm within reasonable time limits, the scientific hunt for a theoretical,
as well as practical, efficient algorithm for linear optimization was opened. Among other
algorithms, this hunt has yielded the class of interior point algorithms (see Chapter 6).
cT x∗ − bT y(x∗ , µ∗ ) ≤ 32 e−t .
404 C h a p t e r 9 . C o m p u tat i o na l c o m p l e x i t y
Proof of Theorem 9.3.1. Let κ be the number of iterations needed to reach optimality with
an accuracy of t, and let µ∗ be the interior path parameter at the κ’th iteration. Hence,
∗ κ −t
nµ = n(1 − θ) µ0 < e .
Taking logarithms on both sides in the above inequality, we find that ln(nµ0 ) + κ ln(1 − θ) <
−t. Since ln(1 − θ) < −θ, this inequality certainly holds if κθ > t + ln(nµ0 ). This last
√
inequality is equivalent to κ > (6 n )(t + ln(nµ0 )). Since t and µ0 are constants, it follows
√ √
that (6 n )(t + ln(n) + ln(µ0 )) = O( n ln(n)).
As an illustration, consider the example of Section 6.4.2. The extended matrix A has
six decision variables; so in Theorem 9.3.1, n = 6. Let µ0 = 3, and t = 5. Then,
√
6 n (t + ln(nµ0 )) ≈ 113. So 116 is a very rough upper bound for the number of
iterations needed to reach an optimum with an accuracy of 5. An accuracy of 5 means that
the duality gap is about 32 e−5 ≈ 0.0067 when the algorithm stops.
max x1 + . . . + xn
s.t. 2x1 + . . . + 2xn ≤ n
xi ∈ {0, 1} for i = 1, . . . , n,
9 . 4 . Th e b ra n c h - a n d - b o u n d a l g o r i t h m 405
where n is an odd integer ≥ 3. An optimal solution of this model is easily found; namely,
take bn/2c = (n − 1)/2 decision variables xi equal to 1, and the remaining bn/2c +
1 = (n + 1)/2 decision variables xi equal to 0. The optimal objective value z ∗ satisfies
z ∗ = bn/2c. However, solving this model by means of the branch-and-bound algorithm,
takes an exorbitant amount of computer time, since the number of subproblems it solves
turns out to be exponential in n.
As an illustration, consider the case n = 3. The branch-and-bound tree for this case is
depicted in Figure 9.2. Note that both backtracking and jumptracking give rise to the
same tree. If a finite solution of an LO-relaxation of this problem is not integer-valued,
then exactly one variable has the noninteger value 21 . In the next iteration, branching is
applied on this noninteger variable, say xi , giving rise to a ‘xi = 0’-branch and a ‘xi = 1’-
branch. In our illustration, the algorithm does not terminate before all integer solutions are
calculated, leading to eleven solved subproblems.
We will now show that, for arbitrary n (odd, ≥ 3), the number of subproblems solved is
at least 2n . In order to prove this assertion, we show that an integer solution is found on
a node of the branch-and-bound tree for which the corresponding subproblem has at least
(n + 1)/2 decision variables with the value zero obtained from previous branchings. This
means that the path in the tree from the initial node to this node has at least (n + 1)/2 zero-
variable branches. Assume, to the contrary, that an integer solution was found for which
less than (n + 1)/2 variables xi are 0 on the previous branches in the tree. Then there are
at least (n + 1)/2 variables with a value either 1 or 12 . Hence, the optimal solution of the
current subproblem contains precisely (n − 1)/2 variables with the value 1 and one equal
to 21 , and is therefore not integer-valued as was assumed. Hence, the subproblems with an
integer solution occur on a path on which at least (n + 1)/2 previous branches correspond
to a zero variable. On the other hand, infeasible subproblems occur on a path on which at
least (n + 1)/2 previous branches correspond to variables with the value 1. There is no
need to branch subproblems that are either infeasible or have an integer optimal solution.
Hence the branch-and-bound tree is ‘complete’ for the first (n + 1)/2 ‘levels’, i.e., contains
2(n+1)/2 subproblems. So, the total number of subproblems in the branch-and-bound tree
of our example is at least 2(n+1)/2 , and this is at least 2n , which is an exponential function
of n.
We have now shown that there exist problems for which the branch-and-bound algorithm
demands an exponential number of subproblems to solve. This fact is expressed in the
following theorem.
Theorem 9.4.1.
The worst-case complexity of the branch-and-bound algorithm for solving integer
linear optimization problems is exponential.
406 C h a p t e r 9 . C o m p u tat i o na l c o m p l e x i t y
M1
z=1 12
x1 =1, x2 = 12 , x3 =0
x2 = 0 x2 = 1
M2 M3
z=1 12 z=1 21
x1 =1, x2 =0, x3 = 12 x1 =0, x2 =1, x3 = 12
x3 = 0 x3 = 1 x3 = 0 x3 = 1
M4 ∗ M5 M6 M7
z=1 z=1 21 z=1 21 z=−∞
x1 =1, x2 =0, x3 =0 x1 = 12 , x2 =0, x3 =1 x1 = 12 , x2 =1, x3 =0 (infeasible)
x1 = 0 x1 = 1 x1 = 0 x1 = 1
Recall that the size of the branch-and-bound tree depends on the branching rule used
(compare, in this respect, the role of pivot rules in Dantzig’s simplex algorithm; see Section
9.2).
Branch-and-bound algorithms are almost as frequently applied to integer linear optimization
problems as the simplex algorithm (see Section 9.2) to linear optimization problems. But,
frequently, users cannot wait until an optimal solution has been reached. In such situations,
the calculations can be stopped at some stage and the best solution obtained thus far can
be used. This illustrates a positive aspect to branch-and-bound algorithms: we often have a
feasible solution at hand at a very early stage of the algorithmic calculations. However, simi-
larly to the simplex algorithm, the worst-case complexity of branch-and-bound algorithms
is exponential.
So it could be expected that a hunt for an efficient algorithm for ILO-models is on. However,
there are very few hunters working in that direction. This is because the belief that such an
algorithm exists has decreased drastically. Why is that so?
In the early 1970s, a class of optimization problems was defined for which efficient algo-
rithms are extremely unlikely. An interesting feature of this class of problems is their strong
interrelation: if one finds an efficient algorithm to solve any one problem in this class, one
can easily modify it to solve all the problems in this class efficiently. This class is usually
referred to as the class of N P -hard problems. ILO problems have been one of the earliest
members of this class. Other N P -hard problems include the traveling salesman problem
(see Section 7.2.4), the machine scheduling problem (see Section 7.2.4), and the knapsack
problem (see Section 7.2.3). So far, everyone that has tried to find a ‘good’ algorithm for,
9 . 5. E xe rc i se s 407
e.g., the traveling salesman problem has failed. The conjecture that no N P -hard problem
is efficiently solvable is denoted by
P = 6 N P,
where P refers to the class of problems for which a polynomial algorithm is known. Hence,
linear optimization problems are in this class (see Section 9.3). So, what remains in this
particular area of research is to prove (or disprove) the P 6= N P conjecture. Until someone
disproves this conjecture, algorithms for N P -hard problems are designed that generate good
feasible solutions in acceptable computer running times. Such algorithms may be classified
into heuristics and approximation algorithms. A heuristic is an algorithm that in practice finds
reasonably good feasible solutions in a reasonable amount of time. Heuristics are often
based on intuition and practical experience, and they usually do not have any performance
guarantees in terms of computation time and solution quality. An approximation algorithm,
on the other hand, is an algorithm that yields feasible solutions of a provable quality within
a provable time limit.
For example, a special case of the traveling salesman problem is the so-called Euclidean trav-
eling salesman problem. This problem imposes the additional restriction on the input that the
distances between cities should obey the triangle inequality d(A, C) ≤ d(A, B) + d(B, C)
for any three cities A, B , and C . This means that the distance required to travel from city
A to city C should be at most the distance required to travel from city A, through city B ,
to city C ; see also Section 17.3. Just like the general traveling salesman problem, the Eu-
clidean traveling salesman problem is N P -hard. An often used heuristic for the Euclidean
traveling salesman problem is the nearest neighbor heuristic, which chooses an arbitrary initial
city, iteratively adds a nearest unvisited city to the list of visited cities, and finally returns
to the initial city; see also Exercise 17.10.5. Although the nearest neighbor heuristic usually
yields reasonably good traveling salesman tours, there are situations in which the heuristic
yields very bad (i.e., extremely long) tours. It may even happen that the heuristic does not
yield a feasible solution at all. On the other hand, the approximation algorithm discovered
by Nicos Christofides for the Euclidean traveling salesman problem yields, in polynomial
time, a traveling salesman tour that is guaranteed to be at most 50% longer than the opti-
mal tour. For the general traveling salesman problem, however, it is known that no such
approximation algorithm exists.
The popularity of heuristics and approximation algorithms has led to a new branch of re-
search, in which these algorithms are classified and tested, and their performance is com-
pared. It is beyond the scope of this book to elaborate on the design, application, and
performance of heuristics and approximation algorithms. The interested reader is referred
to, e.g., Williamson and Shmoys (2011).
9.5 Exercises
Exercise 9.5.1. Write the following function in the big-O notation O(f (n)) with f (n)
a simple polynomial or exponential function (e.g., np , q n , n!, or (log n)p ).
408 C h a p t e r 9 . C o m p u tat i o na l c o m p l e x i t y
(a) 800n + n4
(b) n3 3n + 22n
(c) 10800n + (n − 1)!
(d) n4 log n
Exercise 9.5.3. Let A and B be (n, n)-matrices (with n ≥ 1). A simple (but not the
most efficient) algorithm to compute the product AB (see Appendix B) is to calculate the
inner product of row i of A and column j of B to obtain entry (i, j) of AB for each
i = 1, . . . , n and j = 1, . . . , n. Determine the complexity of this algorithm. Is it ‘good’
or ‘bad’?
inverse does not exist; see Appendix B. Count the number of arithmetic operations as a
function of n, and show that the complexity of this algorithm is O(n3 ).
Exercise 9.5.6. Let A be any (m, n)-matrix with m ≤ n, and with rank m. Show that
PA can be calculated with O(n3 ) arithmetic operations (+, −, ×, /).
Exercise 9.5.7. Let n ≥ 1 be the size of some problem, and let nlog n be the number
of calculations (as a function of n) needed by some algorithm to calculate a solution of a
worst-case instance of the problem. Show that this algorithm has exponential complexity.
Exercise 9.5.8. Solve Model Dovetail from Section 1.1.1 by considering all basic solutions.
How many iterations are needed in a worst-case situation?
Exercise 9.5.9. In a worst-case situation, how many steps are needed when all integer
points are considered for calculating an optimal solution of Model Dairy Corp in Section
7.1.1?
9 . 5. E xe rc i se s 409
Overview
This case study is based on a research project in Tanzania carried out by Caspar Schweigman
(born 1938); see Schweigman (1979). The LO-model used in this section can partly be solved
by inspection; the more advanced analysis of the model needs computer calculations.
413
414 C h a p t e r 1 0 . D e s i g n i n g a r e s e rv o i r f o r i r r i g at i o n
Qt,in
river
reservoir
Vt
river
s
field
ti on Qt,out
iga
ir r Qt,irr
to
The level of water in the reservoir will vary during the year. Define for t = 1, . . . , T :
River water flows into the reservoir and can be made to flow through lock gates. For
t = 1, . . . , T , define:
Qt,in = the amount of water that flows into the reservoir during month t (in m3 );
Qt = the amount of water that leaves the reservoir in month t (in m3 );
Qt,irr = the amount of water that flows to the irrigation fields in month t (in m3 );
Qt,out = the amount of water that flows downstream back into the river (in m3 ).
In Figure 10.1, this situation is depicted schematically. It follows immediately from the above
definitions that:
Qt = Qt,irr + Qt,out for t = 1, . . . , T. (10.1)
The water in the reservoir consists of rain water and river water. The water leaving the
reservoir is water that either leaves it through the locks, or evaporates, or penetrates the
soil, or leaves the reservoir via leaks. In the present case study, the contributions of rainfall
and evaporation are negligible with respect to the quantity of river water that flows into
the reservoir. We therefore assume that at the end of month t the quantity of water in the
reservoir is equal to the quantity of water in the reservoir at the end of the previous month
t − 1, plus the quantity of river water that flows into the reservoir, minus the quantity of
water that leaves the reservoir, i.e.:
6
αt (in m water Qt,in (×10 m )
3 3
Month t
per m land)
2
January 0.134 41
February 0.146 51
March 0.079 63
April 0.122 99
May 0.274 51
June 0.323 20
July 0.366 14
August 0.427 12
September 0.421 2
October 0.354 14
November 0.140 34
December 0.085 46
The most complicated aspect of building a model for this problem is to find reliable values
for the coefficients αt . These values depend on the assimilation of water by the plants,
the growth phase of the plants, the loss of water, the efficiency of the irrigation system,
and several other factors. One more reason for this difficulty is that the values need to be
known before the actual irrigation takes place. Sometimes it is possible to obtain reasonable
estimates from experiments on test fields. In the present case study, we use estimates obtained
for growing corn in a region in the western part of Tanzania; see Table 10.1. In Figure 10.2
these data are depicted in graphs. These graphs show a certain negative correlation between
Qt,in and αt : in April the inflow is maximum while the least irrigation water is needed, and
in September the situation is the other way around.
Besides the water that is used for irrigation, there should be sufficient water flowing down
the river, because people, cattle, and crops are dependent on it. Determining the minimum
quantity M of water that should go down the river is a profound and difficult political
416 C h a p t e r 1 0 . D e s i g n i n g a r e s e rv o i r f o r i r r i g at i o n
240 ∗ 6
Vt (×10 m )
3
200 −2
αt (×10 m per m )
3 2
180
160
140
120
100
80
60
40
20
0 1 2 3 4 5 6 7 8 9 10 11 12 t
question. One has to take into account a number of aspects, such as the population density,
the future plans, and all kinds of conflicting interests. In the present case study, the minimum
quantity is fixed at 2.3 × 106 m3 , and so:
Note that only in September the inflow is less than 2.3×106 . This means that in September
more water flows out of the reservoir than flows into it. The quantity of water in the reservoir
cannot exceed its capacity, i.e.:
Vt ≤ V for t = 1, . . . , T. (10.6)
In order to keep the differences between successive planning periods as small as possible, we
assume that at the end of the planning period the quantity of water in the reservoir is the
same as at the beginning of the next period. Hence,
V0 = VT . (10.7)
This formula implies that, during the whole planning period, the quantity of water that
flows into the reservoir is equal to the quantity that leaves it. Moreover, all parameters
introduced so far need to be nonnegative, i.e.:
x ≥ 0, V ≥ 0, Vt ≥ 0 for t = 1, . . . , T. (10.8)
The decision variables Qt , Qt,irr , and Qt,out are implicitly nonnegative because of (10.1),
(10.4), (10.5), and x ≥ 0. Since V0 = VT , V0 is also nonnegative.
1 0 . 2 . M ax i m i z i n g t h e i r r i g at i o n a r e a 417
max x
(10.9)
s.t. (10.1), (10.3)–(10.8).
This is an LO-model that can be solved as soon as the values of Qt,in are available. These
values are given in Table 10.1. Before solving (10.9), we will investigate (10.3) in more detail.
It follows from (10.3), (10.7), and Table 10.1 that:
12
X 12
X
Qt = Qt,in = 447 × 106 . (10.10)
t=1 t=1
Eliminating x yields:
12
X 12
X 12
X
Qt − Qt,out 447 × 106 − Qt,out
t=1 t=1 t=1
x= 12
= (10.11)
X 2.871
αt
t=1
12
X
= 155.7 × 106 − 0.348 Qt,out . (10.12)
t=1
From (10.11) it follows that x∗ = 146.1 × 106 . This means that 146.1 km2 is irrigated. We
will show that the above choice of Qt,out yields an optimal solution of the LO-model. To
that end we only have to check its feasibility.
The conditions 0 ≤ Vt ≤ V for t = 1, . . . , 12 become, after the substitution:
0 ≤ V0 ≤ V, and
t
X t
X
∗
0 ≤ V0 + Qk,in − x αk − (2.3 × 106 )t ≤ V for t = 1, . . . , T.
k=1 k=1
0 ≤ V0 ≤ V
0 ≤ V0 + 19.1 × 106 ≤ V
0 ≤ V0 + 46.5 × 106 ≤ V
0 ≤ V0 + 95.7 × 106 ≤ V
0 ≤ V0 + 174.5 × 106 ≤ V
0 ≤ V0 + 183.2 × 106 ≤ V
0 ≤ V0 + 153.7 × 106 ≤ V (10.14)
0 ≤ V0 + 112.0 × 106 ≤ V
0 ≤ V0 + 59.3 × 106 ≤ V
0 ≤ V0 − 2.5 × 106 ≤ V
0 ≤ V0 − 42.5 × 106 ≤ V
0 ≤ V0 − 31.3 × 106 ≤ V
0 ≤ V0 − 0.1 × 106 ≤ V.
In order to assure feasibility, we have to determine values of V0 and V such that all in-
equalities in (10.14) are satisfied. Actually, there are a lot of possibilities: a large reservoir
can certainly contain all the irrigation water. We are interested in the minimum value for
V , and so the building costs are minimum. It is immediately clear from (10.14) that the
minimum value V ∗ for V is found for V0∗ = 42.5 × 106 (m3 ), and so
∗ 6 ∗ 6
x (×10 m ) x (×10 m )
2 2
0 5 10 15 20 25 30 35 40 0 1 2 3 4
6 3 γ
M (×10 m )
Figure 10.3: Sensitivity with respect to M . Figure 10.4: Sensitivity with respect to γ.
Formula (10.15) follows immediately from (10.5) and (10.11). The graph is depicted in Figure
10.3. If the value of M was taken to be 0, then x∗ = 155.7 × 106 (m3 ), so the choice
M = 2.3 × 106 (see (10.5)) is certainly not a bad one. Note that for M = 37.25 × 106 , it
is not possible to use water for irrigation. In Exercise 10.5.2, the reader is asked to calculate
the relationship between M and V ∗ .
Suppose we change all values of αt by a certain fraction γ ; so instead of αt we take γαt .
Formula (10.4) then becomes: Qt,irr = (γαt )x. Hence, the relationship between x∗ and
γ is x∗ = x∗ (γ) = 146/γ . If, for instance, all values of αt are 10% higher, then γ = 1.1
and x∗ = 146/1.1 × 106 , which corresponds to 133 × 106 km2 . The relationship between
γ and x∗ is depicted in Figure 10.4.
When we change the values of Qt,in by a certain fraction β , then formula (10.3) becomes
Vt = Vt−1 + βQt,in − Qt . Summation over t yields:
12
X 12
X 12
X 12
X 12
X
∗
β Qt,in = Qt = Qt,irr + Qt,out = x αt + 27.6 × 106 .
t=1 t=1 t=1 t=1 t=1
Hence,
x∗ = 155.695β − 9.6134. (10.16)
420 C h a p t e r 1 0 . D e s i g n i n g a r e s e rv o i r f o r i r r i g at i o n
So the relationship between x∗ and β , the rate of the change of the Qt,in ’s, is linear. Note
that x∗ = 0 for β = 0.06175, and so the annual inflow is 0.062 × 447 × 106 = 27.6 ×
106 = M , as expected. For β < 0.06175 the model is infeasible. In Exercise 10.5.4 the
reader is asked to draw a graph showing the relationship between x∗ and β .
We finally consider the case that one of the Qt,in ’s is changed. If one of the Qt,in ’s is changed
P12 P12
into Qt,in + δ , then the total t=1 Qt,in changes into δ + t=1 Qt,in . It now follows from
(10.11) that:
x∗ = 146.1 × 106 + 0.348δ. (10.17)
For instance, if in April the inflow changes from 99×106 to 60×106 , and so the peek in the
graph of Figure 10.2 disappears, then δ = 39 × 106 , and x∗ (δ = 39 × 106 ) = 125.1 × 106 .
The change of δ is 39/99, which is approximately 39.4%. On the other hand, the change
of x∗ is (146.1 − 125.1)/146.1, which is approximately 14.4%. Hence, the area that can
be irrigated is certainly dependent on the inflow, but the rate of the change of the area is
significantly lower than the rate of the change of the inflow.
The dependency for the capacity V ∗ is somewhat more complicated. Suppose the inflow
in April is changed by δ . The formulas (10.14) then become:
0 ≤ V0 ≤ V
0 ≤ V0 + 19.1 × 106 − 0.047δ ≤V
0 ≤ V0 + 46.5 × 106 − 0.097δ ≤V
0 ≤ V0 + 95.7 × 106 − 0.125δ ≤V
0 ≤ V0 + 174.5 × 106 + δ − 0.167δ ≤V
0 ≤ V0 + 183.2 × 106 + δ − 0.263δ ≤V
0 ≤ V0 + 153.7 × 106 + δ − 0.375δ ≤V
0 ≤ V0 + 112.0 × 106 + δ − 0.503δ ≤V
0 ≤ V0 + 59.3 × 106 + δ − 0.615δ ≤V
0 ≤ V0 − 2.5 × 106 + δ − 0.798δ ≤V
0 ≤ V0 − 42.5 × 106 + δ − 0.921δ ≤V
0 ≤ V0 − 31.3 × 106 + δ − 0.970δ ≤V
0 ≤ V0 − 0.1 × 106 + δ − 0.999δ ≤ V.
For δ ≤ (42.5 × 106 )/(1 − 0.921) = 538.0 × 106 , it follows that V0∗ = 42.5 × 106 −
0.079δ , and so V ∗ = V0∗ + 183.2 × 106 + δ − 0.263δ = 225.7 × 106 + 0.658δ . So, if
δ = 39 × 106 , then the value of V ∗ changes by 0.658 × 39/225.7 (11.4%). Hence, x∗
and V ∗ are more or less relatively equally dependent on δ .
5
6 var Qout {t in 1..T} >= 0;
7 var Qirr {t in 1..T} >= 0;
8 var Q {t in 1..T} >= 0;
9 var V {t in 0..T} >= 0;
10 var W >= 0;
11 var x >= 0;
12
13 minimize objective:
14 sum {t in 1..T} Qout[t];
15 subject to defQt {t in 1..T}: # (10.1)
16 Q[t] = Qirr[t] + Qout[t];
17 subject to defVt {t in 1..T}: # (10.3)
18 V[t] = V[0] + sum{k in 1..t} (Qin[k] − Q[k]);
19 subject to defQirr {t in 1..T}: # (10.4)
20 Qirr[t] = alpha[t] * x;
21 subject to Qout_lb {t in 1..T}: # (10.5)
22 Qout[t] >= M;
23 subject to Vt_ub {t in 1..T}: # (10.6)
24 V[t] <= W;
25 subject to defV0: # (10.7)
26 V[0] = V[T];
27
28 data;
29
30 param T := 12;
31 param M := 2.3;
32 param alpha :=
33 1 0.134 2 0.146 3 0.079 4 0.122 5 0.274 6 0.323
34 7 0.366 8 0.427 9 0.421 10 0.354 11 0.140 12 0.085;
35 param Qin :=
36 1 41 2 51 3 63 4 99 5 51 6 20
37 7 14 8 12 9 2 10 14 11 34 12 46;
38 end;
10.5 Exercises
Exercise 10.5.1. Consider the irrigation problem of this chapter.
(a) Calculate the number of constraints and decision variables of model (10.9).
(b) Calculate the optimal solution of model (10.9), and draw in one figure the optimal
values of Vt , Qt , Qt,irr , and Qt,out .
Exercise 10.5.2. Draw the graph of the relationship between M and V ∗ . Why is the
model (10.9) infeasible for M > 37.25? Calculate V ∗ for M = 37.25.
Exercise 10.5.3. Consider model (10.9). If the values of αt are changed by the same
(multiplicative) factor γ , then the value of x∗ changes as well; see Section 10.3. Show that
the optimal values of V and Vt do not change.
422 C h a p t e r 1 0 . D e s i g n i n g a r e s e rv o i r f o r i r r i g at i o n
Exercise 10.5.4. Consider model (10.9). Draw the graph of the relationship between β
(see Section 10.3) and x∗ , and between β and V ∗ .
Exercise 10.5.5. Consider model (10.9). Draw the graph of the relationship between δ
(see Section 10.3) and x∗ , and between δ and V ∗ . Show that, for δ ≥ 537.97 × 106 , the
maximum value of Vt∗ is attained for t = 4 (in April), and V ∗ = 174.5 × 106 + 0.83δ .
Exercise 10.5.6. As in Exercise 10.5.2, change the value of Qt,in for t 6= 4; pay extra
attention to the cases t = 9, 10, 11, 12.
Exercise 10.5.7. Consider model (10.9). As in Exercise 10.5.5, let δ be the change of Q4,in .
Show that V10 = 0 for 0 ≤ δ ≤ 230.68 × 106 , and that V11 = 0 for 230.68 × 106 ≤
δ ≤ 611.81 × 106 . For the other values of t, calculate the range of values of δ (possibly
including negative values) for which Vt = 0.
11
C hap te r
Classifying documents by language
Overview
In this chapter we will show how linear optimization can be used in machine learning. Ma-
chine learning is a branch of artificial intelligence that deals with algorithms that identify
(‘learn’) complex relationships in empirical data. These relationships can then be used to
make predictions based on new data. Applications of machine learning include spam email
detection, face recognition, speech recognition, webpage ranking in internet search engines,
natural language processing, medical diagnosis based on patients’ symptoms, fraud detection
for credit cards, control of robots, and games such as chess and backgammon.
An important drive behind the development of machine learning algorithms has been the
commercialization of the internet in the past two decades. Large internet businesses, such
as search engine and social network operators, process large amounts of data from around
the world. To make sense of these data, a wide range of machine learning techniques are
employed. One important application is ad click prediction; see, e.g., McMahan et al. (2013).
In this chapter, we will study the problem of automated language detection of text docu-
ments, such as newspaper articles and emails. We will develop a technique called a support
vector machine for this purpose. For an elaborate treatment of support vector machines in
machine learning, we refer to, e.g., Cristianini and Shawe-Taylor (2000).
423
424 C h a p t e r 11 . C l a s s i f y i n g d o c u m e n t s b y l a n g uag e
quires a set of examples, each of which has a label. An example of a classification problem
is the problem faced by email providers to classify incoming email messages into spam and
non-spam messages. The examples for such a classification problem would be a number of
email messages, each labeled as either ‘spam’ or ‘not spam’. The goal of the classification
algorithm is to find a way to accurately predict labels of new observations. The labels for
the examples are usually provided by persons. This could be someone who explicitly looks
at the examples and classifies them as ‘spam’ or ‘not spam’, or this could be provided by
the users of the email service. For example, when a user clicks on the ‘Mark this message
as spam’ button in an email application, the message at hand is labeled as ‘spam’, and this
information can be used for future predictions.
Other areas of machine learning include: regression, where the goal is to predict a number
rather than a label (e.g., tax spending based on income, see also Section 1.6.2); ranking, where
the goal is to learn how to rank objects (for example in internet search engines); clustering,
which means determining groups of objects that are similar to each other. In the cases of
classification and regression, the examples usually have labels attached to them, and these
labels are considered to be the ‘correct’ labels. The goal of the algorithms is then to ‘learn’ to
predict these labels. Such problems are sometimes categorized as supervised machine learning.
In contrast, in ranking and clustering, the examples usually do not have labels, and they are
therefore categorized as unsupervised machine learning.
The current chapter is a case study of a (supervised) classification algorithm. To make a classifi-
cation algorithm successful, certain features of the messages are determined that (hopefully)
carry predictive knowledge. For example, for spam classification, it can be helpful to count
the number of words in the message that relate to pharmaceutical products, or whether or
not the email message is addressed to many recipients rather than to one particular recipient.
Such features are indicative for the message being classified as spam. Other words, such
as the name of the recipient’s friends may be indicative for the message not being spam.
The features are represented as numbers, and the features of a single example can hence be
grouped together as a vector. Usually, it is not one particular feature that determines the
label of an example, but it is rather the combination of them. For example, an email message
that is sent to many different recipients and that contains five references to pharmaceutical
products can probably be classified as spam, whereas a message that has multiple recipients
and includes ten of the main recipient’s friends should probably be classified as non-spam. A
classification algorithm attempts to make sense of the provided features and labels, and uses
these to classify new, unlabeled, examples.
Clearly, the design of features is crucial and depends on the problem at hand. Features should
be chosen that have predictive power, and hence the design uses prior knowledge about the
problem. In many cases it may not be immediately clear how to choose the features.
11 . 2 . C l a s s i f y i n g d o c u m e n t s u s i n g s e pa rat i n g h y p e r p l a n e s 425
As an example, we have taken 31 English and 39 Dutch newspaper articles from the internet
and calculated the letter frequencies. Table 11.1 shows the relative frequencies of the 26
letters for six English and six Dutch newspaper articles. The columns of the table are the
twelve corresponding feature vectors f d (d = 1, . . . , 12).
Our goal is to construct a function g : Rm → R, a so-called classifier (also called a support
vector machine), which, for each document d ∈ D, assigns to the feature vector f d a real
number that will serve as a tool for deciding in which language document d was written.
The interpretation of the value g(f d ) is as follows. For any document d ∈ D, if g(f d ) > 0,
then we conclude that the text was written in English; if g(f d ) < 0, then we conclude that
the text was written in Dutch.
To construct such a classifier, we assume that for a small subset of the documents, the lan-
guage is known in advance (for example, the articles have been read and classified by a
person). We partition this subset into two subsets, L and V . The subset L is called the
learning set, and it will be used to construct a classifier. The subset V is called the validation
set, and it will be used to check that the classifier constructed from the learning set correctly
predicts the language of given documents. If the classifier works satisfactorily for the valida-
tion set, then it is accepted as a valid classifier, and it may be used to determine the language
of the documents that are not in L ∪ V (i.e., for the documents for which the language is
currently unknown). Let L1 be the subset of L that are known to be written in English
and, similarly, let L2 be the subset of L that are known to be written in Dutch. Define V1
and V2 analogously. In our example of newspaper articles, we will use the data in Table 11.1
as the learning set.
426 C h a p t e r 11 . C l a s s i f y i n g d o c u m e n t s b y l a n g uag e
Table 11.1: Relative letter frequencies (in percentages) of several newspaper articles.
We will restrict our attention to linear classifiers, i.e., the classifier g is restricted to have the
form m
X T
wj fj + b = wT f + b for f = f1 . . . fm ∈ Rm ,
g(f ) =
j=1
where w (∈ R \ {0}) is called the weight vector of the classifier, b (∈ R) is the intercept,
m
and f is any feature vector. Note that we exclude the possibility that w = 0, because the
corresponding classifier does not take into account any feature, and therefore is not of any
use to predict the language of a document. Our goal is to construct a weight vector w and
an intercept b such that:
d ∈ L1 =⇒ wT f d + b > 0, and
(11.1)
d ∈ L2 =⇒ wT f d + b < 0.
Linear classifiers have the following geometric
interpretation.
For any w ∈ Rm \ {0} and
b ∈ R, define the hyperplane b)T = f ∈ R w f− + b = 0 ,and
H(w, m T
the two (strict)
+
halfspaces H (w, b) = f ∈ R m
w f + b < 0 and H (w, b) = f wT f + b > 0
f2 (d) f2 (d)
H2
H1
(∗)
f1 (d) f1 (d)
Figure 11.1: Separable learning set with 40 Figure 11.2: Nonseparable learning set with 40
documents. The solid and the dashed documents. The convex hulls of the
lines are separating hyperplanes. learning sets intersect.
So, we want to construct a hyperplane in Rm such that the feature vectors corresponding
to documents in L1 lie in the halfspace H + (w, b), and the vectors corresponding to L2 in
H − (w, b).
If there exist a weight vector w and an intercept b such that the conditions of (11.2) are
satisfied, then F (L1 ) and F (L2 ) are said to be separable; they are called nonseparable oth-
erwise. The corresponding hyperplane H(w, b) is called a separating hyperplane for F (L1 )
and F (L2 ), and the function wT f d + b is called a separator for F (L1 ) and F (L2 ); see also
Appendix D. We make the following observations (see Exercise 11.8.2):
I H + (−w, −b) = H − (w, b) for w ∈ Rm \ {0}, b ∈ R.
I H(λw, λb) = H(w, b) for w ∈ Rm \ {0}, b ∈ R, and λ 6= 0.
I If w and b define a separating hyperplane for F (L1 ) and F (L2 ) such that F (L1 ) ⊆
H + (w, b) and F (L2 ) ⊆ H − (w, b), then we also have that conv(F (L1 )) ⊆ H + (w, b)
and conv(F (L2 )) ⊆ H − (w, b); therefore, w and b also define a separating hyperplane
for conv(F (L1 )) and conv(F (L2 )).
Note that even for a small learning set L, it is not beforehand clear whether or not F (L1 )
and F (L2 ) are separable. So the first question that needs to be addressed is: does there
exist a separating hyperplane for F (L1 ) and F (L2 )? Figure 11.1 shows an example of a
separable learning set with (m =) 2 features. The squares correspond to the feature vectors
in F (L1 ), and the circles to the feature vectors in F (L2 ). Also, the convex hulls of square
points and the circle points are shown. The solid and the dashed lines represent two possible
hyperplanes. Figure 11.2 shows a learning set which is not separable.
Figure 11.1 illustrates another important fact. Suppose that we discard feature f2 and only
consider feature f1 . Let F 0 (L1 ) (⊂ R1 ) and F 0 (L2 ) (⊂ R1 ) be the feature ‘vectors’
obtained from discarding feature f2 . Then, the vectors in F 0 (L1 ) and F 0 (L2 ) are one-
dimensional and can be plotted on a line; see Figure 11.3. (This graph can also be constructed
by moving all points in Figure 11.1 straight down onto the horizontal axis.) A hyperplane
428 C h a p t e r 11 . C l a s s i f y i n g d o c u m e n t s b y l a n g uag e
f1 (d)
Figure 11.3: The learning set of Figure 11.1 after discarding feature f2 .
Figure 11.4: Scatter plots of relative letter frequencies (in percentages). The squares represent the vectors
in F (L1 ) and the circles are the vectors in F (L2 ). Here, L1 is the set of English documents,
and L2 is the set of Dutch documents.
Because these inequalities are strict inequalities, they cannot be used in an LO-model. To
circumvent this ‘limitation’, we will show that it suffices to use the following ‘≥’ and ‘≤’
inequalities instead:
wT f d + b ≥ 1 for d ∈ L1 , and
(11.4)
wT f d + b ≤ −1 for d ∈ L2 .
Clearly, the solution set (in terms of w and b) of (11.4) is in general a strict subset of the
solution set of (11.3). However, the sets of hyperplanes defined by (11.3) and (11.4) coincide.
To be precise, let H1 = {H(w, b) | w and b satisfy (11.3)}, i.e., H1 is the collection of hy-
perplanes defined by the solutions of (11.3). Let H2 = {H(w, b) | w and b satisfy (11.4)}.
We claim that H1 = H2 . It is easy to check that H2 ⊆ H1 . To see that H1 ⊆ H2 ,
take any w and b that satisfy (11.3). Then, because L1 and L2 are finite sets, there exists
ε > 0 such that wT f d + b ≥ ε for d ∈ L1 and wT f d + b ≤ −ε for d ∈ L2 . Define
ŵ = 1ε w and b̂ = 1ε b. Then, it is straightforward to check that ŵ and b̂ satisfy (11.4) and
that H(ŵ, b̂) = H(w, b), as required.
From now on, we will only consider the inequalities of (11.4). For each w ∈ Rm \ {0}
and b ∈ R, define the following halfspaces:
H +1 (w, b) = f ∈ Rm wT f + b ≥ 1 , and
H −1 (w, b) = f ∈ Rm wT f + b ≤ −1 .
If the halfspaces
H +1 (w, b) and H −1 (w, b) satisfy the conditions of (11.5), then the set
m
−1 ≤ wT f + b ≤ −1 is called a separation for F (L1 ) and F (L2 ), because it
f ∈R
‘separates’ F (L1 ) from F (L2 ). Figure 11.5 illustrates this concept.
430 C h a p t e r 11 . C l a s s i f y i n g d o c u m e n t s b y l a n g uag e
f2
+1 b =1
H (w, b) T f+
w w 1
b =−
T f+
w
−1
H (w, b)
f1
Figure 11.5: Separation for a learning set. The area between the dashed lines is the separation.
It follows from the discussion above that, in order to find a separating hyperplane for F (L1 )
and F (L2 ), the system of inequalities (11.4) needs to be solved. This can be done by solving
the following LO-model:
min 0
s.t. w1 f1d + . . . + wm fm d
+b≥ 1 for d ∈ L1
(11.6)
d d
w1 f1 + . . . + wm fm + b ≤ −1 for d ∈ L2
w1 , . . . , wm , b free.
In this LO-model, the decision variables are the weights w1 , . . . , wm and the intercept b of
the classifier. The values of fid with i ∈ {1, . . . , m} and d ∈ L1 ∪ L2 are parameters of
the model.
Once a classifier (equivalently, a separating hyperplane) for the learning set L1 ∪ L2 has
been constructed by solving the LO-model (11.6), this classifier may be used to predict
the language of any given document d ∈ D. This prediction is done as follows. Let
w1∗ , . . . , wm
∗
, b∗ be an optimal solution of model (11.6). This optimal solution defines the
classifier value w1∗ f1d + . . . + wm∗ d
fm + b∗ for document d, based on the feature values of
that document. If the classifier value is ≥ 1, then the document is classified as an English
document; if the value is ≤ −1, then the document is classified as a Dutch document. If
the value lies between −1 and 1, then the classifier does not clearly determine the language
of the document. In that case, the closer the value lies to 1, the more confident we can be
that d is an English document. Similarly, the closer the value lies to −1, the more confident
we can be that d is a Dutch document.
Example 11.3.1. Consider the learning set of Table 11.1, where L1 is the set of the six newspaper
articles written in English, and L2 is the set of the six newspaper articles written in Dutch. Solving
model (11.6) using a computer package (e.g., the online solver for this book) yields the following optimal
11 . 4 . Va l i dat i o n o f a c l a s s i f i e r 431
solution:
(See Section 11.7 for the GMPL code for this model.) All other decision variables have value zero at
this optimal solution. The corresponding classifier is:
The weights correspond to the letters H, O, Q, U, and Z, respectively. Thus, the classifier bases its
calculations only on the relative frequencies of the letters H, O, Q, U, and Z. Note that the weight
∗
w17 assigned to the letter Q is positive and relatively large compared to the other positive weights. This
means that, for any given document d ∈ D, the expression w1∗ f1d + . . . + wm ∗ d
fm + b tends to
be more positive if the document contains relatively many occurrences of the letter Q. This means that
such a document is more likely to be classified as an English newspaper article. On the other hand,
∗
the weight w26 assigned to the letter Z is negative, and so a document containing relatively many
occurrences of the letter Z is likely to be classified as a Dutch newspaper article.
The above example illustrates the fact that the validation step may reveal problems with the
classifier constructed using model (11.6). One way to improve the classifier is to increase
432 C h a p t e r 11 . C l a s s i f y i n g d o c u m e n t s b y l a n g uag e
Table 11.2: Validation results for the classifier. The articles 1, 2, 7, and 8 are in the learning set; the articles
21, 30, 57, 66, and 67 are in the validation set. The question marks in the row ‘Predicted
language’ indicate that the classifier is inconclusive about the language.
the learning set. In the example, we used only six documents per language. In real-life
applications the learning set is usually taken to be much larger.
In the next sections, we present another way to improve the classification results. Note that
the objective function of model (11.6) is the zero function, which means that any feasible
solution of the model is optimal. So, the objective is in a sense ‘redundant’, because it can be
replaced by maximizing or minimizing any constant objective function. In fact, in general,
the model has multiple optimal solutions. Hence, there are in general multiple separating
hyperplanes. Figure 11.1 shows two hyperplanes corresponding to two feasible solutions,
namely a dotted line and a solid line. In the next section, we study the ‘quality’ of the
hyperplanes.
To measure the robustness of a given separating hyperplane, we calculate its so-called separa-
tion width. Informally speaking, the separation width is the m-dimensional generalization
of the width of the band between the dashed lines in Figure 11.5. For given w ∈ Rm \ {0}
and b ∈ R, the separation width of the hyperplane H(w, b) = f ∈ Rm wT f + b = 0
is defined as the distance between the halfspaces H +1 (w, b) and H −1 (w, b), i.e.,
where kf − f 0 k is the Euclidean distance between the vectors f and f 0 (∈ Rm ). Note that,
for any w ∈ Rm \ {0} and b ∈ R, width(w, b) is well-defined because the minimum in
the right hand side in the above expression is attained. In fact, the following theorem gives
an explicit formula for the separation width.
Theorem 11.5.1.
For any w ∈ Rm \ {0} and b ∈ R, it holds that width(w, b) = 2
kwk
.
Proof. Take any point f̂ ∈ Rm such that wT f̂ + b = −1. Note that f̂ ∈ H −1 (w, b). Define
0 ∗ ∗
f̂ = f̂ + w , with w = 2
2 w. Then, we have that kw∗ k = 2
kwk
. It follows that:
kwk
!
T
T 0 T 2 T 2w w
w f̂ + b = w f̂ + 2
w + b = w f̂ + b + 2
= −1 + 2 = 1,
kwk kwk
where we have used the fact that wT w = kwk2 . Therefore, f̂ 0 ∈ H +1 (w, b). So, we have
that f̂ ∈ H −1 (w, b) and f̂ 0 ∈ H +1 (w, b). Hence, width(w, b) ≤ kf̂ − f̂ 0 k = kw∗ k = 2
kwk
.
To show that width(w, b) ≥ 2
kwk
, take any f̂ ∈ H +1 (w, b) and f̂ 0 ∈ H −1 (w, b). By the
definitions of H +1 (w, b) and H −1
(w, b), we have that:
T T 0
w f̂ + b ≥ 1, and w f̂ + b ≤ −1.
Subtracting the second inequality from the first one gives the inequality wT (f̂ 0 − f̂ ) ≥ 2. The
cosine rule (see Appendix B) implies that:
T 0
w (f̂ − f̂ ) 2
cos θ = 0 ≥ 0 ,
kwk kf̂ − f̂ k kwk kf̂ − f k
where θ is the angle between the vectors w and f̂ 0 − f . Since cos θ ≤ 1, we have that:
2
0 ≤ 1.
kwk kf̂ − f̂ k
Rearranging, this yields that kf̂ 0 − f̂ k ≥ kwk
2
for all f̂ ∈ H +1 (w, b) and all f̂ 0 ∈ H −1 (w, b).
n o
Hence, also min kf − f 0 k f ∈ H +1 (w, b), f 0 ∈ H −1 (w, b) ≥ kwk 2
, i.e., width(w, b) ≥
2
kwk
, as required.
We conclude that the separation direction is determined by the direction of the vector w
and that, according to Theorem 11.5.1, the separation width is inversely proportional to
the length of w. Figure 11.1 depicts two separating hyperplanes. From this figure, we can
434 C h a p t e r 11 . C l a s s i f y i n g d o c u m e n t s b y l a n g uag e
see that the separation width corresponding to hyperplane H2 is much smaller than the
separation width corresponding to hyperplane H1 .
min kwk
s.t. wT f d + b ≥ 1 for d ∈ L1
(11.7)
wT f d + b ≤ −1 for d ∈ L2
b, wj free for j = 1, . . . , m.
qP
m 2
The objective function kwk = i=1 wi is obviously a nonlinear function of the deci-
sion variables w1 , . . . , wm , so that (11.7) is a nonlinear optimization model. Such models
may be hard to solve, especially when the number of documents (and, hence, the number
of variables) is very large. Therefore, we look for a linear objective function. In general,
this will result in a classifier of less quality, i.e., the hyperplane corresponding to the result-
ing (w, b) has smaller separation width than the optimal hyperplane corresponding to an
optimal solution (w∗ , b∗ ) of (11.7).
The objective function of the above (nonlinear) optimization model is the Euclidean norm
(see Appendix B.1) of the vector w. A generalization of the Euclidean norm is the so-called
T
p-norm. The p-norm of a vector w = w1 . . . wm ∈ Rm is denoted and defined as
(p ≥ 1 and integer):
m
!1/p
X
kwkp = |wi |p .
i=1
Clearly, the Euclidean norm corresponds to the special case p = 2, i.e., the Euclidean
norm is the 2-norm. Since the 2-norm is a nonlinear function, it cannot be included in an
LO-model. Below, however, we will see that two other choices for p lead to LO-models,
namely the choices p = 1 and p = ∞. In the remainder of this section, we consecutively
discuss LO-models that minimize the 1-norm and the ∞-norm of the weight vector.
11 . 6 . M o d e l s t h at ma x i m i z e t h e s e pa rat i o n w i d t h 435
Theorem 11.6.1. T
Let w = w1 . . . wm be a vector. Then, lim kwkp = max{|w1 |, . . . , |wm |}.
p→∞
436 C h a p t e r 11 . C l a s s i f y i n g d o c u m e n t s b y l a n g uag e
Proof. Define M = max{|wi | | i = 1, . . . , m}, and let p be any positive integer. We have
that: !1/p
m
p p 1/p
X
kwkp = |wi | ≥ M = M.
i=1
On the other hand, we have that:
m
!1/p
p p 1/p 1/p
X
kwkp = |wi | ≤ mM =m M.
i=1
1/p
It follows that M ≤ kwkp ≤ m M . Letting p → ∞ in this expression, we find that
M ≤ lim kwkp ≤ M , which is equivalent to lim kwkp = M , as required.
p→∞ p→∞
The objective function max{|w1 |, . . . , |wm |} is clearly not linear. However, it can be
incorporated in an LO-model by using the following ‘trick’. First, a new decision variable
z is introduced, which will represent max{|w1 |, . . . , |wm |}. The objective is then replaced
by: ‘min x’, and the following constraints are added:
|wj | ≤ x for j = 1, . . . , m.
Because the value of the variable x is minimized at any optimal solution, we will have that the
optimal value x∗ will be as small as possible, while satisfying x∗ ≥ |wj∗ | for j = 1, . . . , m.
This means that in fact x∗ = max{|w1∗ |, . . . , |wm∗
|} at any optimal solution. Combining
this ‘trick’ with the treatment of absolute values as in model (11.8), we find the following
LO-model:
min x
m
X
s.t. wj+ fjd − wj− fjd + b ≥ 1 for d ∈ L1
j=1
m
X (11.9)
wj+ fjd − wj− fjd + b ≤ −1 for d ∈ L2
j=1
wj+ + wj− ≤ x for j = 1, . . . , m.
x ≥ 0, wj+ ≥ 0, wj− ≥ 0, b free for j = 1, . . . , m.
The values of the fji ’s are parameters of the model, and the decision variables are b, x, wj+ ,
and wj− for j = 1, . . . , m.
learning set of Table 11.1. (See Section 11.7 for the GMPL code corresponding to model
(11.8).)
Model (11.8), corresponding to minimizing the 1-norm of the weight vector, has the optimal
solution (for the learning set of Table 11.1):
All other decision variables have optimal value zero. Therefore, the corresponding linear
classifier g1 (f ) for this learning set is:
Note that this classifier uses very little information from the feature vector f . Only two
features are taken into account, namely the relative frequencies of occurrences of the letters
∗
E and O. Because w15 > 0, a newspaper article with relatively many occurrences of the letter
O is more likely to be categorized as written in English, whereas the letter E is considered
an indication that the article is written in Dutch.
In contrast, consider model (11.9), corresponding to minimizing the ∞-norm of the weight
vector. This model has the optimal solution:
wj∗ = 0.0765 for j = 1, 3, 6, 8, 9, 13, 15, 17, 19, 20, 21, 23, 24, 25,
wj∗ = −0.0765 for j = 2, 4, 5, 10, 11, 12, 14, 16, 18, 22, 26,
w7∗ = 0.0463,
b∗ = −0.5530.
Let g∞ (f ) be the corresponding linear classifier. As opposed to g1 (f ), the classifier g∞ (f )
takes into account all features to make a prediction for the language of a given article. The
first set of weights in the solution above corresponds to the letters A, C, F, H, I, M, O, Q,
S, T, U, W, X, and Y. Since these weights are all positive, the classifier treats a relatively high
frequency of occurrences of any of these letters in a given article as evidence that the article
may be written in English. On the other hand, the second set of weights, corresponding
to the letters B, D, E, J, K, L, N, P, R, V, and Z, are negative. This means that a relatively
high frequency of occurrences of any of these is treated as evidence that the article may be
written in Dutch. Note that weight w7∗ corresponds to the letter G.
It is left to the reader to carry out the validation steps (see Section 11.4) for these classifiers,
i.e., to verify that these classifiers correctly predict the language of all newspaper articles in
the learning set and in the validation set, although both classifiers have values between −1
and 1 for some newspaper articles. (The data are available on the website of this book.) Note
that both models assign a negative weight to the letter E, and a positive weight to the letter
O. Hence, both classifiers treat frequent occurrences of the letter E as evidence towards the
article being written in Dutch, and frequent occurrences of the letter O as evidence towards
it being written in English.
438 C h a p t e r 11 . C l a s s i f y i n g d o c u m e n t s b y l a n g uag e
d
g∞ (f )
−4 0 4 d
g1 (f )
−2
Figure 11.6: Comparison of classifiers based on minimizing the 1-norm of the weight vector, versus
minimizing the ∞-norm.
An interesting question is: is one of the two classifiers significantly better than the other? To
answer this question, we have calculated the values of the two classifiers for all documents in
the learning set and in the validation set. Figure 11.6 shows the results of these calculations.
Each point in the figure represents a newspaper article. On the horizontal axis we have
plotted the values of g1 (f d ), and on the vertical axis we have plotted the values of g∞ (f d )
(d ∈ D). From the figure, we see that the ‘north west’ and ‘south east’ quadrants contain
no points at all. This means that the two classifiers have the same sign for each d ∈ D:
whenever g1 (f d ) is positive, g∞ (f d ) is positive, and vice versa. The ‘north east’ quadrant
contains points for which both classifiers are positive, i.e., these are the newspaper articles
that are predicted to be in English by both classifiers. Similarly, the ‘south west’ quadrant
contains the newspaper articles that are predicted to be in Dutch. It can be seen from the
figure that the values of the two classifiers are more or less linearly related, meaning that
they result in roughly the same predictions.
The horizontal gray band in the figure is the area in which the classifier g1 (f ) has a value
between −1 and 1, i.e., the area in which the classifier does not give a clear-cut prediction.
Similarly, the vertical gray band is the area in which the classifier g∞ (f ) does not give a
clear-cut prediction. The horizontal gray band contains 25 points, whereas the vertical gray
band contains only 9 points. From this, we can conclude that the classifier g∞ (f ) tends to
give more clear-cut predictions than g1 (f ). So, in that sense, the classifier g∞ (f ) is a better
classifier than g1 (f ).
3
4 set DOCUMENTS := 1..(N1+N2); # set of all documents
5 set L1 := 1..N1; # set of English documents
6 set L2 := (N1+1)..(N1+N2); # set of Dutch documents
7 set FEATURES; # set of features
8
9 param f{FEATURES, DOCUMENTS}; # values of the feature vectors
10
11 var wp{FEATURES} >= 0; # positive part of weights
12 var wm{FEATURES} >= 0; # negative part of weights
13 var b; # intercept
14
15 minimize obj: # objective
16 sum {j in FEATURES} (wp[j] + wm[j]);
17
18 subject to cons_L1{i in L1}: # constraints for English documents
19 sum {j in FEATURES} (wp[j] − wm[j]) * f[j, i] + b >= 1;
20
21 subject to cons_L2{i in L2}: # constraints for Dutch documents
22 sum {j in FEATURES} (wp[j] − wm[j]) * f[j, i] + b <= −1;
23
24 data;
25
26 param N1 := 6;
27 param N2 := 6;
28
29 set FEATURES := A B C D E F G H I J K L M N O P Q R S T U V W X Y Z;
30
31 param f :
32 1 2 3 4 5 6 7 8 9 10 11 12 :=
33 A 10.40 9.02 9.48 7.89 8.44 8.49 8.68 9.78 12.27 7.42 8.60 10.22
34 B 1.61 1.87 1.84 1.58 1.41 1.55 2.03 1.08 0.99 1.82 1.79 2.62
35 C 2.87 2.95 4.86 2.78 3.85 3.13 0.80 1.37 1.10 1.03 2.15 1.83
36 D 4.29 3.52 3.16 4.18 3.91 5.04 5.50 6.05 6.13 7.11 5.97 6.46
37 E 12.20 11.75 12.69 11.24 11.88 11.82 18.59 17.83 17.74 17.85 15.29 18.25
38 F 2.12 2.31 2.97 2.00 2.55 1.90 0.76 0.89 0.77 1.26 1.19 1.48
39 G 2.23 1.67 2.17 2.16 1.79 2.58 2.75 2.99 2.96 2.21 2.39 4.10
40 H 4.45 4.36 4.25 5.56 4.45 5.43 1.69 3.29 2.96 1.66 2.27 2.62
41 I 9.20 7.61 7.59 7.62 8.33 8.25 5.25 6.89 6.24 8.14 7.29 5.68
42 J 0.13 0.22 0.14 0.25 0.22 0.40 1.35 1.30 0.88 2.05 1.55 1.57
43 K 0.75 0.64 0.57 0.91 0.33 0.91 3.90 1.90 1.86 2.13 1.91 2.10
44 L 4.05 3.28 4.81 4.74 4.31 3.77 4.11 4.44 3.50 3.40 3.82 3.76
45 M 2.41 3.46 2.55 3.10 2.74 2.58 2.50 2.21 3.40 3.00 1.67 1.75
46 N 7.03 7.70 6.60 7.02 7.16 7.34 10.63 8.80 10.30 11.45 9.32 11.53
47 O 5.85 6.82 8.11 6.74 6.76 6.54 6.48 5.51 4.27 4.82 4.78 4.28
48 P 1.53 2.51 1.79 1.93 2.41 1.43 1.31 1.23 1.42 1.11 2.51 0.52
49 Q 0.11 0.02 0.14 0.12 0.19 0.08 0.00 0.05 0.00 0.00 0.00 0.09
50 R 6.44 7.00 6.04 5.82 6.35 6.07 7.75 6.24 5.04 6.24 6.69 6.20
51 S 7.35 7.68 5.28 7.22 6.35 6.66 3.43 4.66 3.18 5.13 6.57 3.14
52 T 8.50 8.10 8.92 9.03 8.98 7.61 5.42 6.77 7.12 4.82 6.33 5.94
53 U 2.25 3.17 2.12 2.87 2.88 3.09 1.74 1.32 1.20 1.58 2.75 1.40
54 V 0.80 1.01 1.08 0.89 1.22 0.95 3.30 2.56 3.61 4.19 2.75 2.45
55 W 1.26 1.21 1.51 1.61 1.47 2.62 0.97 1.51 1.53 0.55 1.31 0.87
56 X 0.05 0.22 0.09 0.05 0.19 0.28 0.04 0.00 0.00 0.00 0.00 0.00
440 C h a p t e r 11 . C l a s s i f y i n g d o c u m e n t s b y l a n g uag e
57 Y 1.72 1.87 1.04 2.56 1.71 1.43 0.08 0.14 0.11 0.00 0.24 0.09
58 Z 0.40 0.02 0.19 0.14 0.14 0.08 0.93 1.18 1.42 1.03 0.84 1.05;
59
60 end;
11.8 Exercises
Exercise 11.8.1.
(a) Show that if a learning set with a certain set of features is separable, then adding a feature
keeps the learning set separable.
(b) Give an example of a separable learning set with the property that removing a feature
makes the learning set nonseparable.
Exercise 11.8.4. In real-life applications, the learning set is usually not separable. One
way to deal with this problem is to allow that some points in the data set lie ‘on the wrong
side’ of the hyperplane. Whenever this happens, however, this should be highly penalized.
How can model (11.8) be generalized to take this into account?
12
C hap te r
Production planning: a single
product case
Overview
Linear optimization is applied on a large scale in the field of production planning. In partic-
ular, when there is demand for a broad variety of products or when the demand fluctuates
strongly, the design of a good production plan may be very difficult, and linear optimization
may be a useful analysis and solution tool. When planning the production, both social and
economic factors have to be taken into account. A number of questions may be asked in this
respect. Is working in overtime acceptable? Is it profitable to hire employees on a temporary
basis? Is idle time acceptable?
In this chapter, we restrict ourselves to the production of a single product. In Chapter 13,
we describe the situation of several products.
441
442 C h a p t e r 1 2 . P r o d u c t i o n p l a n n i n g : a s i n g l e p r o d u c t ca s e
s0 = the inventory of the product at the beginning of the planning period, i.e., the
initial inventory.
It is assumed that the demand dt is known for all t ∈ {1, . . . , T } and, within each period,
this demand is uniformly distributed over the time span of the period. Also, the initial
inventory s0 is assumed to be known. Moreover, we assume that the amount of production
xt becomes uniformly available during the planning period. The decision variables are xt
and st for t = 1, . . . , T .
The inventory st at the end of period t satisfies:
The equations (12.1) are called the inventory equations. Note that when the demand exceeds
the production, then the value of st in (12.1) is negative; −st is then the shortage at the end
of period t. We will consider a shortage as a negative inventory, and therefore continue to
use the term inventory. The demand has to be satisfied immediately, so that shortages are
not allowed. Hence, st has to satisfy st ≥ 0 for t = 1, . . . , T .
The equations (12.1) can be rewritten as follows:
t
X t
X
st = s0 + xk − dk for t = 1, . . . , T. (12.2)
k=1 k=1
The demand considered in this chapter is the expected demand. So assuming that we have
‘good’ predictions, we have that in about fifty percent of the periods the realized demand
will be higher than the expected demand, and therefore in about fifty percent of the periods
there will be a shortage. When a high shortage rate is unacceptable, one can build in a
so-called safety stock. The safety stock for period t + 1 has to be available at the end of
period t. Define, for each t = 1, . . . , T :
st ≥ σt for t = 1, . . . , T. (12.3)
One way to determine the value of σt is to make it dependent on the expected demand in
period t + 1. For instance, by means of the following relationships:
where dT +1 is the expected demand in the first period after the planning period; i.e., σT is
the initial inventory for the next planning period. It is assumed that σT ≥ 0. The parameter
γ may depend on, for instance, the service level (the percentage of the number of periods
without shortages), or the variance of the expected demand. Alternative relationships are
12 . 1. M ode l de sc ri p ti on 443
0 ≤ γ ≤ 1.
Define:
t
X
at = dk + σt − s0 for t = 1, . . . , T. (12.5)
k=1
We call at the desired production during the first t periods. Note that if at < 0 for certain
t, then the initial inventory s0 is large enough to satisfy the total demand during the first
t periods plus the safety inventory σt . Although at does not agree with our intuition of
‘desired production’ if at is negative, we nevertheless use the term ‘desired production’ in
this case. Note that the values of at are known in advance for each period t, because they
depend only on the values of the dt ’s and σt ’s. If at ≥ 0, then at is the minimum amount
that has to be produced in the periods 1, . . . , t in order to assure that there is no shortage at
the end of period t and the desired safety stock is available. Using (12.5), we may combine
(12.2) and (12.3) into:
x1 + x2 + . . . + xt ≥ at for t = 1, . . . , T ; (12.6)
i.e., the total production during the first t periods is at least equal to the desired production
during the first t periods.
For each t = 2, . . . , T , it follows that:
t
X t−1
X
at − at−1 = dk + σt − s0 − dk − σt−1 + s0
k=1 k=1
= dt + σt − σt−1
= (1 − γ)dt + γdt+1 ≥ 0.
Hence,
at ≥ at−1 for t = 2, . . . , T ;
i.e., the sequence a1 , . . . , aT is nondecreasing. Throughout, we will assume that aT > 0,
because, since at is a nondecreasing function of t, aT ≤ 0 would mean that the company
does not need to produce anything at all during the planning period. Define:
We assume that, for each period, the number of working hours is proportional to the pro-
duction, i.e.:
wt = αxt for t = 1, . . . , T, (12.7)
where the parameter α (> 0) denotes the number of working hours required to produce
one unit of the product. The labor time wt can be supplied by regular employees during
regular working hours, or by means of overtime, and also by means of temporary employees.
In the next section, we will determine the number of regular employees required for the
444 C h a p t e r 1 2 . P r o d u c t i o n p l a n n i n g : a s i n g l e p r o d u c t ca s e
production in each period during regular working hours. We do not consider restrictions
on the machine capacity. So it is assumed that in each period the production is feasible,
provided that the number of regular employees is sufficiently large.
The question is how to realize the production with a minimum number of regular employees.
Based on (12.6), (12.7), and (12.8), we may write the following LO-model to answer that
question:
min x
s.t. tx ≥ at for t = 1, . . . , T (PP1)
x ≥ 0.
The minimum number of regular employees is given by the optimal value x∗ of the decision
variable x in (PP1). Using our assumption that aT > 0, it follows directly that this optimal
value x∗ satisfies: na o a∗
x∗ = max t t = 1, . . . , T = t∗ , (12.9)
t t
with t∗ the value of t for which at /t attains its maximum value.
Example 12.2.1. As an illustration, consider the following example. The relevant data are listed
in Table 12.1. The periods are the twelve months in a year, i.e., T = 12. The largest number
in the column labeled ‘at /t’ of Table 12.1 is 29, so that x∗ = 29. Note that σ12 is the desired
initial inventory for the next planning period. In Table 12.1, we assume that d13 = d1 , so that
σ12 = 0.2 × 30 = 6. This means that the actual total demand is 416 units. The columns labeled
‘tx∗ ’ and ‘tx∗ −a’ follow by taking x∗ = 29. In Figure 12.1 we have drawn the graphs of at and
of tx∗ . Using formula (12.2), it follows that:
t
X
st = s0 + tx∗ − dk = tx∗ − at + σt ,
k=1
and so:
tx∗ − at = st − σt ,
12 . 2 . R e g ular wor k i ng h our s 445
s0 = 140, γ = 0.2
Pt ∗ ∗
t dt σt k=1 dk at at /t tx tx -at
1 30 8 30 −102 −102 29 131
2 40 14 70 −56 −28 58 114
3 70 12 140 12 4 87 75
4 60 14 200 74 18.5 116 42
5 70 8 270 138 27.6 145 7
6 40 4 310 174 29 174 0
7 20 4 330 194 27.7 203 7
8 20 2 350 212 26.5 232 20
9 10 2 360 222 24.7 261 39
10 10 4 370 234 23.4 290 56
11 20 4 390 254 23.1 319 65
12 20 6 410 276 23 348 72
which is the additional inventory on top of the safety stock in month t, or the overproduction in month
t. In month 6, the overproduction is zero, while at the end of the planning period the overproduction
is 72 units.
If the total production T x∗ equals the desired production aT , then the production exactly
matches the total demand for the planning period. However, if T x∗ > aT , then too much is
produced. This is usually not a desired situation, particularly because aT already contains the
safety stock demanded at the end of the planning period. If one does not want to produce
the surplus T x∗ − aT , then some employees should not take part in the production. In
other words, we accept idle time. In order to minimize the idle time, we need to solve model
(PP2) below.
Define:
x = the number of units that can be produced when all employees are working during
the regular working hours; i.e., the regular production.
In order to plan the production process with the least possible number of employees, the
following LO-model has to be solved.
min x
s.t. x1 + . . . + xt ≥ at for t = 1, . . . , T − 1
(PP2)
x1 + . . . + xT = aT
0 ≤ xt ≤ x for t = 1, . . . , T.
Let t̄ be such that max{at /t | t = 1, . . . , T } = at̄ /t̄, and let x̄ = at̄ /t̄. An optimal
solution of (PP2) is easy to find if at −at−1 ≤ x̄ for t = t̄+1, . . . , T . It can be easily verified
that these conditions hold for the input data of Table 12.1, where t̄ = 6; see Exercise 12.7.5.
Let x∗ be the optimal value of the decision variable x. Since tx∗ ≥ x1 + . . . + xt ≥ at ,
446 C h a p t e r 1 2 . P r o d u c t i o n p l a n n i n g : a s i n g l e p r o d u c t ca s e
production
units
400 ∗
tx
300
at
200
100
0
1 2 3 4 5 6 7 8 9 10 11 12 t
−100
−200
we find that x∗ ≥ x̄. It can be easily checked that an optimal solution in this case is (see
Exercise 12.7.5):
(
∗ ∗ x̄ if t ∈ {1, . . . , t̄}
x = x̄, and xt =
at − at−1 if {t = t̄ + 1, . . . , T }.
Example 12.2.2. An optimal solution of model (PP2), using the data of Table 12.1, can now
readily be calculated: x∗ = x∗1 = . . . = x∗6 = 29, x∗7 = 20, x∗8 = 18, x∗9 = 10, x∗10 = 12,
∗ ∗
x11 = 20, x12 = 22. This means that, in each period, sufficiently many employees are available to
produce 29 units of the product during regular hours. Since in periods 7, . . . , 12, the production level
is lower than that, some employees will have nothing to do during those periods. In Exercise 12.7.2,
the reader is asked to check that the given optimal solution is not unique.
Since α is the number of working hours required for the production of one unit, it follows
that the idle time in period t equals α(x − xt ). The total idle time for the whole planning
period is therefore α(x−x1 )+. . .+α(x−xT ) = αT x−α(x1 +. . .+xT ) = α(T x−aT ).
Since α, T , and aT are known parameters, it follows that the objective of (PP2) can be
replaced by min(αT x − αaT ). So, as soon as the minimum value of x∗ in (PP2) is found,
we also know the minimum idle time αT x∗ − αaT . The total number of working hours
is therefore αx∗ .
Example 12.2.3. Using the input data of Table 12.1, we found that x∗ = 29, and so αx∗ =
29α. Assuming that the number of working hours in one month is 180 hours, it follows that the
minimum number of regular employees required to take care of the production in the case of model
(PP2) is αx∗ /180 = 0.16α. In case the company does not want to work with part-time employees,
0.16α has to be an integer. When this constraint is taken into consideration, the model becomes a
mixed integer LO-model; see Section 12.5.
12 . 3. Ove rti m e 447
12.3 Overtime
A possible way of avoiding idle time is to control the fluctuations in demand by using
overtime. This is of course only possible in case the employees are willing to work more
than the regular working hours. The management could make overtime attractive by raising
the hourly wage during overtime hours. We assume that all employees have the same hourly
wage during regular hours, and the hourly wages during overtime are β (> 1) times the
hourly wage during the regular working hours. Define:
If x is again the production by regular employees during regular working hours, then the
total labor costs per period for regular hours is cαx. Moreover, the company needs xt − x
hours of overtime during period t, which means additional labor costs of cαβ(xt − x).
Hence, the total wage cost over the whole planning period is:
T
X
cαxT + cαβ(xt − x).
t=1
It is the interest of the management to know how the production can be realized with
the total wage costs as low as possible. This problem can be formulated as the following
LO-model.
XT
min cαT x + cαβ(xt − x)
t=1
s.t. x1 + . . . + xt ≥ at for t = 1, . . . , T − 1 (PP3)
x1 + . . . + xT = aT
0 ≤ x ≤ xt for t = 1, . . . , T.
Note the difference: we have xt ≤ x in (PP2), whereas we have xt ≥ x in (PP3). By using
x1 + . . . + xT = aT , the objective function in (PP3) can be rewritten as cαβaT − cαxT .
This means that, since α > 0, β > 1, and c > 0, instead of (PP3), we may equivalently
solve the following LO-model.
max x
s.t. x1 + . . . + xt ≥ at for t = 1, . . . , T − 1
(PP4)
x1 + . . . + xT = aT
0 ≤ x ≤ xt for t = 1, . . . , T.
Note that the parameters α, β , and c do not appear in model (PP4), and hence the optimal
solution does not depend on their values. We will show that the optimal objective value x∗
of (PP4) satisfies:
aT aT − at
∗
x = min , min . (12.10)
T t=1,...,T −1 T − t
First observe that for t = 1, . . . , T − 1, it holds that (T − t)x∗ ≤ x∗t+1 + . . . + x∗T ≤
aT − at , so that x∗ ≤ min{(aT − at )/(T − t) | t = 1, . . . , T − 1}. Moreover, from
448 C h a p t e r 1 2 . P r o d u c t i o n p l a n n i n g : a s i n g l e p r o d u c t ca s e
at at
A1 A2
aT aT
B2
B1
0 0
t̄ T t t̄ T t
In Figure 12.2, the two situations are schematically depicted. In case 1, the graph of at is
completely ‘below’ the line 0A1 ; see Figure 12.2(a). It can be easily verified that at /t ≤
aT /T is equivalent to aT /T ≤ (aT − at )(T − t) (see Exercise 12.7.9), and that:
x∗ = x∗1 = . . . = x∗T = aT /T
Figure 12.2(b) is completely ‘below’ the line 0B2 . So in this particular case, all the overtime
work can be done in the first t̄ periods. The overproduction during the first t̄ periods is
then equal to:
at̄ aT − at̄
− .
t̄ T − t̄
The case that the graph of at is somewhere ‘above’ 0B2 can be considered similarly as the
situation above when taking T := t̄, i.e., t̄ is taken as the last period instead of T . It is
left to the reader to check that the production function is a piecewise linear function; see
Exercise 12.7.10.
Example 12.3.1. In the case of the data of Table 12.1, we are in the situation that overtime can
be used to avoid the idle time. A possible optimal solution of model (PP4) is: x∗ = 16, x∗1 = 100,
x2∗ = . . . = x12 ∗
= 16 (with t̄ = 8). Note that in this solution all the overtime is carried out
in the first period; the overtime production is 100 − 16 = 84 units. Taking α = 8 and c = 10,
then the total wage cost is $28,800, which is $15,360 for the regular employees and $13,440 for the
overtime hours.
There are several reasons to be careful when designing a model like model (PP4). For
instance, if aT = aT −1 then there is no demand in period T . It then follows from (12.10)
that x∗ = 0, which means that the whole production should be done in overtime. It is left
to the reader to argue that this result is not acceptable, and how the model can be adapted
such that this situation is avoided; see also Exercise 12.7.11.
x = the number of units to be produced in each period with regular hour labor;
yt = the number of units to be produced in period t with overtime labor.
450 C h a p t e r 1 2 . P r o d u c t i o n p l a n n i n g : a s i n g l e p r o d u c t ca s e
This means that, as with the model in Section 12.3, the values of the parameters α and c are
irrelevant. However, the parameter β (i.e., the ratio between the overtime hourly wage and
regular time hourly wage) remains in the objective function and therefore β plays a role in
the optimal solution.
Example 12.4.1. Suppose that β = 1.7. Since α and c are irrelevant, let us take α = c = 1.
We use a computer package to find an optimal solution of model (PP5), using the data of Table 12.1.
The optimal solution that we find satisfies x∗ = 18, and the values of x∗t and yt∗ are given in the
following table:
t 1 2 3 4 5 6 7 8 9 10 11 12
∗
xt 84 18 18 18 18 18 20 18 18 18 18 10
∗
yt 66 0 0 0 0 0 2 0 0 0 0 0
The corresponding optimal objective value is 331.6. It can be seen from the table that the overtime
production is carried out during periods 1 and 7. Moreover, period 12 is the only period in which idle
time occurs.
In model (PP4), the optimal number of products produced during regular hours is 16, two fewer than
the optimal solution of (PP5). So, the optimal solution of (PP5) shows that it is cheaper to produce
two more products during regular hours (i.e., to hire more regular workers), at the expense of having
some idle time in period 12. In fact, it can be checked that the labor costs of producing 16 products
during regular hours, rather than 18, are 334.8, which is about 1% more expensive.
Similarly, in (PP2), the optimal number of products produced during regular hours is 29. It can be
checked that the labor costs of producing 29 products during regular hours, rather than 18, are 348.0,
which is about 5% more expensive.
Note that (PP4) and (PP5) do not specify an upper bound on the amount of overtime.
This may not be correct, since employees may be willing to work only a certain amount of
1 2 . 5 . S e n s i t i v i t y a na ly s i s 451
∗ ∗
x x
34 60
55.6
33 50
32
40
31 36
30
30
29 20
28 10
Figure 12.3: Perturbation function for γ in model Figure 12.4: Perturbation function for s0 in
(PP1). model (PP1).
overtime. For example, they may be willing to spend at most 20% of the regular working
hours as overtime. This constraint can be included in (PP4) as:
x ≤ xt ≤ 1.2x for t = 1, . . . , T.
On the other hand, (PP4) may now become infeasible. For instance when using the input
data of Table 12.1, then in fact model (PP4) is infeasible. In fact, this model can only be
solved when the percentage of overtime is at least 82%. This is a rather high percentage,
and so overtime does not seem to be an appropriate solution tool in this particular case. In
the case of model (PP5), the maximum fraction of overtime hours can be incorporated by
including the following set of constraints:
yt ≤ 0.20x for t = 1, . . . , T.
∗
u
0 2 4 6 8 10 12 14 16 18 20 22 24 26 α
Figure 12.5: The labor input versus the number of regular employees in model (PP2).
The perturbation function for s0 in model (PP1) is depicted in Figure 12.4. Notice that it
follows from (12.9) that the optimal objective value of (PP1) as a function of s0 satisfies:
( Pt )
k=1 dk − σt + s0
∗
x (s0 ) = max t = 1, . . . , T
t
38−s0 84−s0 152−s0 214−s0 278−s0 314−s0
= max , , , , , ,
1 2 3 4 5 6
334−s0 352−s0 362−s0 374−s0 394−s0 416−s0
, , , , , .
7 8 9 10 11 12
The reader is asked in Exercise 12.7.4 to check that:
278−s0
5
if 0 ≤ s0 ≤ 98
314−s
0
if 98 < s0 ≤ 194
6
x∗ (s0 ) = 334−s0
7
if 194 < s0 ≤ 208 (12.11)
352−s0
if 208 < s0 ≤ 224
8
416−s0
if s0 > 224.
12
If s0 = 0, then the total demand plus the initial inventory σ12 for the next planning period
(416 units) needs to be satisfied completely by the production. It follows from the expression
above that x∗ (0) = 2785
= 55.6. If s0 = 416, then the initial inventory is enough, and
nothing needs to be produced. As the expression for x∗ (s0 ) shows, the kink points of the
perturbation graph in Figure 12.4 occur at s0 = 98, s0 = 194, s0 = 208, and s0 = 224.
For s0 = 98, at /t is maximum for t = 5, 6; for s0 = 194, at /t is maximum for t = 6, 7;
for s0 = 208, at /t is maximum for t = 7, 8; for s0 = 224, at /t is maximum for t = 8, 12.
The perturbation functions of γ and s0 in the case of model (PP2) are the same as the
ones for model (PP1), since the optimal solution x∗ for both models is determined by the
tangent tx∗ to the graph of at ; see Figure 12.1. The relationship between the minimum
1 2 . 5 . S e n s i t i v i t y a na ly s i s 453
∗ ∗
x x
18 20
16
17
16 32
16 10
15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 γ 0 100 224 300 416 s0
Figure 12.6: Perturbation function for γ in model Figure 12.7: Perturbation function for s0 in
(PP4). model (PP4).
number u∗ = αx∗ /180 of regular employees and the number α of working hours per unit
production is the dotted line determined by u∗ = 0.16α (since x∗ = 29) in Figure 12.5. In
case we demand that u∗ is an integer, then the graph becomes the ‘staircase’ in Figure 12.5.
Clearly, if the labor input increases, then the number of regular employees must increase in
order to reach the demanded production.
In the case of model (PP4) the perturbation function for γ is more or less constant. The
graph of Figure 12.6 has a kink at γ = 31 ; see also Exercise 12.7.12.
The perturbation function for s0 for model (PP4), depicted in Figure 12.7, is constant for
0 ≤ s0 ≤ 224, because for these values of s0 the right hand side values in model (PP4) all
change by the same amount, and so x∗ = 16 for 0 ≤ s0 ≤ 224.
T T
When the value of s0 increases to 224, then the line through 8 212 and 12 276 with
T
slope x∗ = 16 in Figure 12.1 moves ‘downwards’ until 0 0 is on this line. This happens
for s0 = a12 − 12x∗ + 140 = 224. Increasing the value of s0 from 224, then x∗ is the
T
slope of the line through 0 0 and 12 a12 . For s0 = 416, we have that a12 = 0, and
the values of a1 , . . . , a11 are all negative. For s0 > 416, model (PP4) is infeasible.
We conclude the section with the perturbation function for β in model (PP5). As was
mentioned, the optimal solution depends on the value of β . Since β is the ratio between
regular working hours and overtime hours, only values of β with β ≥ 1 realistically make
sense. However, as a model validation step we can also consider values below 1. Figure 12.8
shows how the optimal objective value changes as β changes, with β ∈ [0.9, 2.5]. Figure
12.9 shows the corresponding optimal values of x. Clearly, if β < 1, then overtime hours
are cheaper than regular hours, and therefore in the optimal solution the number of regular
hours equals zero. If, on the other hand, β > 2, then it can be seen that overtime hours are
too expensive, and hence the optimal solution satisfies x∗ = 29, i.e., it is optimal to use no
overtime at all.
454 C h a p t e r 1 2 . P r o d u c t i o n p l a n n i n g : a s i n g l e p r o d u c t ca s e
∗ ∗
z x
29
348
332
20
318 18
16
10
276
1 param T;
2 param s0;
3 param c;
4 param alpha;
5 param gamma;
6 param d{1..T};
7 param sigma{t in 1..T} := gamma * d[1 + t mod T];
8 param a{t in 1..T} := sum {k in 1..t} d[k] + sigma[t] − s0;
9
10 var xstar >= 0;
11 var x{1..T} >= 0;
12
13 minimize z:
14 c * alpha * T * xstar + 2 * c * alpha * sum {t in 1..T} (x[t] − xstar);
15 subject to acons{t in 1..T−1}:
16 sum {k in 1..t} x[k] >= a[t];
17 subject to aconsT:
18 sum {k in 1..T} x[k] = a[T];
19 subject to xcons{t in 1..T}:
20 xstar <= x[t];
21
22 data;
23
24 param T := 12;
25 param s0 := 140;
26 param alpha := 8;
27 param c := 10;
28 param gamma := 0.2;
29 param: d :=
30 1 30 2 40 3 70 4 60 5 70 6 40
31 7 20 8 20 9 10 10 10 11 20 12 20 ;
32 end;
12.7 Exercises
Exercise 12.7.1. Determine the dual of model (PP1), as well as its optimal solution. Why
is the optimal dual solution nondegenerate and unique?
Exercise 12.7.2. Show that the optimal solution of model (PP2) found in Example 12.2.2
is not the only optimal solution, i.e., the model has multiple optimal solutions.
Exercise 12.7.3. Consider the perturbation function of the parameter γ in model (PP1).
The function is depicted Figure 12.3.
(a) Show that the kink point of the perturbation function is located at γ = 12 .
(b) Show that the perturbation function is steeper for 1
2
≤ γ < 1 than for 0 < γ ≤ 21 .
Exercise 12.7.10. Consider Figure 12.2. The case that the graph of at is somewhere
‘above’ the line 0A2 can be treated similarly to the situation described in Section 12.3,
when taking t̄ instead of T ; i.e., t̄ is taken as the last period. Draw an example of such a
graph, and show that the production function is a piecewise linear function.
Exercise 12.7.12. Consider model (PP4) in Section 12.3. Draw the graphs of at for
γ = 0 and γ = 1. Show that x∗ (γ=0) = 15, and that x∗ (γ=1) = 18; see Figure 12.6.
Why does the perturbation function for Figure 12.6 have a kink at γ = 13 ?
Exercise 12.7.14. Consider model (PP4) in Section 12.3. Show that, for any optimal
solution, the production level x∗T is never carried out in overtime, i.e., it is cheaper to
produce during the last period in regular working time than in overtime.
Exercise 12.7.15. The optimal solution of model (PP4) given in Example 12.3.1 has
a large amount of overtime production during the first period. Perhaps a more desirable
solution is: x∗ = 16, x∗1 = . . . = x∗6 = 29, x∗7 = 20, x∗8 = 18, x∗9 = . . . = x∗12 = 16. In
this solution, the overtime production is distributed more equally over the periods. Modify
model (PP4) so that its optimal solution has more equally distributed overtime production.
Explain how your model achieves this.
This page intentionally left blank
13
C hap te r
Production of coffee machines
Overview
In this chapter we will describe the situation where more than one product is produced.
This case study is based on the M.S. thesis (University of Groningen, The Netherlands): Jan
Weide, The Coffee Makers Problem; Production Planning at Philips (in Dutch). The data used
in this section are fictional.
459
460 C h a p t e r 13 . P r o d u c t i o n o f c o f f e e mac h i n e s
si0 = the initial inventory of design i, i.e., the inventory at the beginning of the
planning period.
The decision variables of the model that we will construct in this chapter are xit and sit ,
whereas the values of dit and si0 are assumed to be known for i = 1, . . . , m and t =
1, . . . , T .
Next, we need to deal with inventory. If the inventory of design i at the end of period
t − 1 is si,t−1 , then the inventory at the end of period t will be si,t−1 plus the number
xit of coffee machine of design i produced in period t, minus the number of such coffee
machines that are sold in period t. Therefore we have the following inventory equations:
or, equivalently:
t
X t
X
sit = si0 + xik − dik for i = 1, . . . , m, and t = 1, . . . , T. (13.2)
k=1 k=1
The value of sit can become positive if too much is produced, and negative if too little is
produced (i.e., there is a backlog). In order to minimize the total backlog, we introduce the
following new variables:
( (
+ sit if sit ≥ 0 − −sit if sit ≤ 0
sit = and sit =
0 otherwise, 0 otherwise.
s+ −
it × sit = 0 for i = 1, . . . , m, and t = 1, . . . , T. (13.4)
13 . 2 . A n L O - m o d e l t h at m i n i m i z e s b ac k l o g s 461
−
If sit > 0 (hence, s+ −
it = 0) for some i and t, then sit is a backlog. The restrictions stated in
(13.4) cannot be included in an LO-model, since they are not linear. Fortunately, we do not
need to include constraints (13.4) in the model, because the optimal solution always satisfies
(13.4); see Section 1.5.2, where a similar statement is proved.
Pm PT −
The objective of minimizing backlogs can be formulated as min i=1 t=1 sit . However,
backlogs of some particular products may be more severe than backlogs of other products.
−
Therefore we introduce the positive weight coefficient cit for sit . The LO-model becomes:
m X
X T
min cit s−
it
i=1 t=1 (13.5)
s.t. (13.1),(13.2),(13.3)
xit ≥ 0 and integer, for i = 1, . . . , m, and t = 1, . . . , T.
It is not difficult to show that in an optimal solution of (13.5), the restrictions (13.4) are
always satisfied; the reader is asked to show this in Exercise 13.7.8.
Table 13.2 and Table 13.3 list the results of computer calculations for the solution of model
(13.5), where we have taken cit = 1 for all i and t. The first row contains the labels of the
fourteen weeks, and the first column the labels of the different designs. The figures in the
central part of Table 13.2 are the optimal values of xit ; the central part of Table 13.3 contains
the optimal values of sit . Note that design 6 is demanded only in week 9, where the total
production (1,600 coffee machines) is executed, and there is no inventory or backlog of
design 6 during the whole planning period. Design 17 is demanded only in week 1; the
production (6,476) in week 1 covers the backlog (2,476) plus the demand (4,000), and
there is no inventory or backlog during the planning period.
i si0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 813 0 0 1500 7400 1300 1200 1000 1000 1000 1000 1000 1300 1200 1300
2 −272 3400 3600 0 9700 1400 700 800 700 700 700 900 1000 900 900
3 −2500 1500 0 5000 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 2900 0 1000 0 400 0 400 0 400 0 0
5 220 800 2600 2400 1200 1300 1200 700 700 700 700 700 400 400 500
6 0 0 0 0 0 0 0 0 0 1600 0 0 0 0 0
7 −800 400 300 300 300 300 200 100 200 200 100 200 200 200 200
8 16 0 1000 2500 3200 600 600 700 600 700 500 600 500 600 300
9 0 0 0 0 2500 0 0 0 1500 0 0 0 0 0 0
10 1028 0 0 0 800 200 200 200 200 200 100 200 200 200 100
11 1333 600 600 900 1700 1000 1200 500 1000 500 1000 500 1200 600 500
12 68 600 0 0 0 0 0 0 0 0 0 0 0 0 0
13 97 1400 0 0 600 300 300 300 500 400 300 200 300 200 100
14 1644 0 1000 0 0 0 300 300 400 300 400 400 300 400 400
15 0 0 0 0 1200 300 100 100 100 100 200 0 100 100 100
16 0 80 0 0 1300 400 400 400 500 400 300 400 300 400 200
17 −2476 4000 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 2500 0 2500 0 0
19 86 900 0 0 1000 500 200 300 200 200 300 300 300 300 200
20 1640 0 0 0 0 0 1500 0 0 0 0 0 200 0 0
21 0 0 0 0 2250 0 750 0 0 0 0 0 0 0 0
i\t 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 0 0 687 2250 0 0 8650 1000 1000 1000 1000 1300 1200 1300
2 0 7272 0 600 10500 633 0 1567 2200 2000 0 0 0 900
3 3859 0 5141 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 3900 0 400 0 400 0 400 0 0
5 0 3180 2400 1200 1300 1200 0 1400 700 700 700 400 400 500
6 0 0 0 0 0 0 0 0 1600 0 0 0 0 0
7 0 1500 300 0 600 200 0 300 500 0 0 400 0 200
8 0 984 2500 3200 600 600 0 1300 700 1600 0 0 600 300
9 0 0 0 0 0 0 2500 1500 0 0 0 0 0 0
10 0 0 0 0 0 172 0 400 200 100 200 200 200 100
11 0 0 1372 0 0 3045 750 1000 500 1000 500 1200 600 500
12 532 0 0 0 0 0 0 0 0 0 0 0 0 0
13 1303 0 600 0 0 600 300 500 400 500 0 300 200 100
14 0 0 0 0 0 0 0 356 300 800 0 300 400 400
15 0 0 0 1200 0 400 100 100 100 200 0 100 100 100
16 80 0 0 1300 0 800 400 500 400 1000 0 0 400 200
17 6476 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 2500 0 2500 0 0
19 750 64 0 1000 0 700 300 200 200 1200 0 0 0 200
20 0 0 0 0 0 0 0 0 0 0 0 60 0 0
21 0 0 0 2250 0 750 0 0 0 0 0 0 0 0
The computer calculations show that the optimal objective value of model (13.5) is 55,322.
In Section 13.4 we will perform sensitivity analysis on the number of conveyor belts used
for the production. The result shows that the total backlog is rather sensitive to the number
of conveyor belts.
i\t 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 813 813 0 −5150 −6450 −7650 0 0 0 0 0 900 0 0
2 −3672 0 0 −9100 0 −67 −867 0 1500 2800 1900 0 0 0
3 −141 −141 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 −2900 −2900 0 0 0 0 0 0 0 0 0
5 −580 0 0 0 0 0 −700 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 −1200 0 0 −300 0 0 −100 0 300 200 0 200 0 0
8 16 0 0 0 0 0 −700 0 0 1100 500 0 0 0
9 0 0 0 −2500 −2500 −2500 0 0 0 0 0 0 0 0
10 1028 1028 1028 228 28 0 −200 0 0 0 0 0 0 0
11 733 133 605 −1095 −2095 −250 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 0 0 600 0 −300 0 0 0 0 200 0 0 0 0
14 1644 644 644 644 644 344 44 0 0 400 0 0 0 0
15 0 0 0 0 −300 0 0 0 0 0 0 0 0 0
16 0 0 0 0 −400 0 0 0 0 700 300 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 −64 0 0 0 −500 0 0 0 0 900 600 300 0 0
20 1640 1640 1640 1640 1640 140 140 140 140 140 140 0 0 0
21 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+
Hence, dit ≥ rit , i.e., the demand for design i in period t is at least the available
+
quantity rit in t (namely, si,t−1 + xit ≥ 0).
(b) There is backlog both at the beginning of period t, but no old backlog at the end of
−
period t. Then si,t−1 > 0, s+ − + − +
i,t−1 = 0, sit > 0, sit = 0, and rit = 0, rit ≥ 0. Hence,
+ − − +
rit = −si,t−1 + xit ≥ 0, and sit = −rit + dit > 0, and so the demand at the end of
+
period t is larger than rit .
(c) There is backlog at the beginning of period t, as well as an old backlog at the end of
−
period t. Then, si,t−1 > 0, s+ − +
i,t−1 = 0, and rit > 0, rit = 0. Hence, −rit =
−
−s− + − − +
i,t−1 + xit ≤ 0, and so sit − sit = −si,t−1 + xit − dit ≤ 0. Since sit = 0, we
− −
find that the recent backlog is sit − rit = dit ≥ 0.
The expression ‘old backlogs are more severe than recent backlogs’, can mathematically be
formulated by introducing the parameters γ1 and γ2 with γ1 > γ2 > 0 in the following
LO-model:
m X
X T m X
X T
−
min γ1 cit rit + γ2 cit (s− −
it − rit )
i=1 t=1 i=1 t=1 (13.7)
s.t. (13.1),(13.2),(13.3),(13.6)
xit ≥ 0 and integer, for i = 1, . . . , m, and t = 1, . . . , T.
i\t 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 0 0 687 7400 0 428 2300 1772 3000 0 0 3800 0 0
2 0 7131 141 5600 3200 3000 800 700 2300 0 0 2800 0 0
3 2829 1171 5000 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 2900 1000 0 400 400 0 0 400 0 0
5 580 2600 2400 0 2500 1200 700 700 2500 0 0 0 900 0
6 0 0 0 0 0 0 0 0 1600 0 0 0 0 0
7 1200 300 300 0 600 200 100 200 700 0 0 0 400 0
8 0 984 2500 0 3800 600 700 600 2500 0 0 0 700 0
9 0 0 0 0 0 2500 0 1500 0 0 0 0 0 0
10 0 0 0 0 0 172 200 477 0 423 0 0 300 0
11 0 0 767 0 0 3900 500 1500 0 2700 0 0 1100 0
12 532 0 0 0 0 0 0 0 0 0 0 0 0 0
13 1303 0 0 0 0 0 1500 900 0 1000 0 0 0 100
14 0 0 0 0 0 0 0 656 0 1777 0 0 0 123
15 0 0 0 0 0 0 1700 200 0 500 0 0 0 0
16 80 0 0 0 0 0 2500 900 0 1600 0 0 0 0
17 6476 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 5000 0 0 0 0
19 0 814 0 0 0 0 2000 700 0 0 1100 0 0 0
20 0 0 0 0 0 0 0 0 0 0 0 60 0 0
21 0 0 1205 0 0 0 0 1795 0 0 0 0 0 0
i\t 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 0 813 1500 7400 0 −872 228 1000 3000 2000 1000 3800 2500 1300
2 0 3459 0 5600 −900 700 800 700 2300 1600 900 2800 1800 900
3 0 0 5000 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 1000 0 400 400 400 0 400 0 0
5 0 2600 2400 0 1300 1200 700 700 2500 1800 1100 400 900 500
6 0 0 0 0 0 0 0 0 1600 0 0 0 0 0
7 0 300 300 0 300 200 100 200 700 500 400 200 400 200
8 0 1000 2500 0 600 600 700 600 2500 1800 1300 700 900 300
9 0 0 0 0 −2500 0 0 1500 0 0 0 0 0 0
10 0 1028 1028 1028 288 200 200 477 277 500 400 200 300 100
11 0 733 900 0 −1700 1200 500 1500 500 2700 1700 1200 1100 500
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 0 0 0 0 −600 −900 300 900 400 1000 700 500 200 100
14 0 1644 644 644 644 644 344 700 300 1777 1377 977 677 400
15 0 0 0 0 −1200 −1500 100 200 100 500 300 300 200 100
16 0 0 0 0 −1300 −1700 400 900 400 1600 1300 900 600 200
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 5000 2500 2500 0 0
19 0 0 0 0 −1000 −1500 300 700 500 300 1100 800 500 200
20 0 1640 1640 1640 1640 1640 140 140 140 140 140 200 0 0
21 0 0 1205 1205 −1045 −1045 −1795 0 0 0 0 0 0 0
The LO-model (13.7) looks rather complicated. However, it is not too difficult to calculate
a feasible solution with a low objective value. This can be done as follows. Since the
objective is to avoid backlogs, there is a tendency to use the full capacity. Therefore, we
replace i xit ≤ cp by i xit = cp. The rewritten objective of (13.7) suggests that it seems
P P
− −
profitable to choose many rit ’s equal to 0. A solution for which many rit ’s are 0 can be
−
constructed quite easily. Rearrange all designs with respect to the value of si,t−1 ; start with
− P −
the design with the largest backlog. For the time being, choose xit = si,t−1 . If i si,t−1 ≤
cp, then this choice can be done for all designs. Then consider the designs with the largest
demand in period t, and choose, according to this new ordering, xit = s− +
i,t−1 +dit −si,t−1 .
P − +
This production can be realized for all designs as long as i (si,t−1 + dit − si,t−1 ) ≤ cp.
P − +
If i (si,t−1 + dit − si,t−1 ) > cp, then the demand of a number of designs cannot be
− P −
satisfied in period t, because only si,t−1 units are produced. If i si,t−1 > cp, then for a
number of designs even the backlog s− i,t−1 cannot be replenished.
The reader is asked to formalize the above described procedure, and to apply it to the data of
Table 13.1. One should take into account that in a number of periods there are no backlogs
but only inventories. Also compare this solution with the optimal solution presented below;
see Exercise 13.7.5.
In Table 13.4, Table 13.5, and Table 13.6 we present an optimal solution of model (13.7) in
the case γ1 = γ2 = 1. In Section 13.4 we present optimal solutions for various values of the
γ ’s. The reason for taking γ1 = γ2 = 1, is that we are now able to compare this solution
with the optimal solution of model (13.5); especially, the occurrences of old backlogs in
model (13.5) can be investigated; see Exercise 13.7.6.
466 C h a p t e r 13 . P r o d u c t i o n o f c o f f e e mac h i n e s
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 813 813 0 0 −1300 −2072 −772 0 2000 1000 0 2500 1300 0
2 −3672 −141 0 −4100 −2300 0 0 0 1600 900 0 1800 900 0
3 −1171 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 −2900 0 0 0 0 400 0 0 0 0 0
5 0 0 0 −1200 0 0 0 0 1800 1100 400 0 500 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 −300 0 0 0 0 500 400 200 0 200 0
8 16 0 0 −3200 0 0 0 0 1800 1300 700 200 300 0
9 0 0 0 −2500 −2500 0 0 0 0 0 0 0 0 0
10 1028 1028 1028 288 28 0 0 277 77 400 200 0 100 0
11 733 133 0 −1700 −2700 0 0 500 0 1700 1200 0 500 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 0 0 0 −600 −900 −1200 0 400 0 700 500 200 0 0
14 1644 644 644 644 644 344 44 300 0 1377 977 677 277 0
15 0 0 0 −1200 −1500 −1600 0 100 0 300 300 200 100 0
16 0 0 0 −1300 −1700 −2100 0 400 0 1300 900 600 200 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 2500 2500 0 0 0
19 −814 0 0 −1000 −1500 −1700 0 500 300 0 800 500 200 0
20 1640 1640 1640 1640 1640 140 140 140 140 140 140 0 0 0
21 0 0 1205 −1045 −1045 −1795 −1795 0 0 0 0 0 0 0
In Section 13.4 we calculate the influence of different choices of γ1 and γ2 on the total
optimal backlog.
For instance, δit = 3 means that design i in period t is manufactured on three conveyor belts,
during full weeks; so the total production is 2,600 × 3 units (2,600 is the maximum weekly
production on any conveyor belt). Condition (13.1) in model (13.7) should be replaced by:
m
X
δit ≤ p for t = 1, . . . , T,
i=1
(13.8)
δit ∈ {0, 1, . . . , p} for i = 1, . . . , m, and t = 1, . . . , T.
The model has now become a mixed integer linear optimization model; see Chapter 7.
The optimal solution, which was obtained by using the commercial solver Cplex, is given
in Table 13.7 (for γ1 = γ2 = 1, and cit = 1 for all i and t). The entries of Table 13.7
refer to the number of conveyor belts used for the production of the design with label
13 . 5 . S e n s i t i v i t y a na ly s i s 467
i\t 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 - - 1 1 1 1 - 1 - 1 - 1 1 -
2 1 2 - 3 1 - 1 - - 1 - 1 - 1
3 2 - 1 - - - - 1 - - - - - -
4 - - - 1 - - 1 - - - - - 1 -
5 - 1 1 - 1 - 1 - 1 - - 1 - -
6 - - - - - - - - 1 - - - - -
7 - 1 - - - - - - - - 1 - - -
8 - - 2 - - 1 - 1 - - 1 - - -
9 - - - - 1 - - 1 - - - - - -
10 - - - - - - - - - - 1 - - -
11 - - - - 1 - 1 - 1 - - 1 - 1
12 - - - - - - - - - - 1 - - -
13 - 1 - - - - - - 1 - - - - 1
14 - - - - - - - - - 1 - - - 1
15 - - - - - - 1 - - - - - 1 -
16 - - - - - 1 - - 1 - - - - 1
17 2 - - - - - - 1 - - - - - -
18 - - - - - - - - - 1 - 1 1 -
19 - - - - - 1 - - - 1 - - - -
20 - - - - - - - - - - - - 1 -
21 - - - - - 1 - - - - 1 - - -
number in the corresponding row and in the week corresponding to the label number in
the corresponding column. For instance, the 3 in row 2 and column 4 means that design
2 is manufactured during the whole week 4 on three conveyor belts. The total backlog for
the schedule of Table 13.7 is 84,362. The sum of the entries of each column is 5, and so –
as may be expected – the full capacity is used.
and, hence, the total backlog stotal can be derived from z ∗ by using the equation:
−
stotal = z ∗ − (c24 − 1)s24 .
It is remarkable that all total backlogs, except for the case c24 = 0, are equal to 55,322.
This is due the fact that the model has many optimal solutions. The fact that s− 24 = 0 for
c24 > 1 is not unexpected, because c24 is then the only cit with a value larger than 1, and
the value of each other cit is 1, and so c24 has the largest value. For c24 = 0, the backlog of
design 2 in week 4 can take on any value without affecting the optimal solution. However,
more of design 2 should be produced in week 5 in order to reduce the backlog of week 4.
The values of ‘Min.’ and ‘Max.’ in Table 13.9 determine the tolerance interval of c24 (see
also Section 5.2). For instance, for the values of c24 with 0.00 ≤ c24 ≤ 1.00 (the situation
in the second row of Table 13.9), it holds that s− 24 = 9,700; outside this interval the value
−
of s24 may be different from 9,700.
−
For c24 ≤ 1, it holds that s24 > 0, so s−
24 is a basic variable, which implies that its optimal
dual value (shadow price) is 0 (see Theorem 4.3.1). On the other hand, s− 24 is nonbasic
for c24 > 1, since its shadow price is positive (see Table 13.9). Recall that the dual value
in the case of nondegeneracy is precisely the amount that needs to be subtracted from the
corresponding objective coefficient in order to change s− 24 into a basic variable.
p Total backlog
4 189,675
5 55,322
6 16,147
7 702
8 0
−
c24 s24 Obj. value Total backlog Shadow price Min. Max.
0.00 9,841 45,622 55,463 0.00 0.00 0.00
0.10 9,700 46,592 55,322 0.00 0.00 1.00
0.50 9,700 50,472 55,322 0.00 0.00 1.00
0.90 9,700 54,352 55,322 0.00 0.00 1.00
1.00 9,100 55,322 55,322 0.00 1.00 1.00
1.10 0 55,322 55,322 0.10 1.00 ∞
2.00 0 55,322 55,322 1.00 1.00 ∞
3.00 0 55,322 55,322 2.00 1.00 ∞
13.5.3 Changing the weights of the ‘old’ and the ‘new’ backlogs
Consider model (13.7). The values of the weight parameters γ1 and γ2 determine which
backlogs are eliminated first. In Table 13.10, we have calculated the objective values for
several combinations of values of γ1 and γ2 .
As in Table 13.9, the total backlog in terms of the number of coffee machines is the same
for the values of γ1 and γ2 of Table 13.10. Again the reason is that productions can be
transferred from one week to the other. This multiplicity of the optimal solution implies
that the model is rather insensitive to changes in the values of γ1 and γ2 .
1 param m;
2 param nperiods;
3 set I := {1 .. m};
4 set T := {1 .. nperiods};
5 param c;
6 param p;
7 param s0{I};
8 param d{I, T};
9
10 var x{i in I, t in T} >= 0;
11 var sp{i in I, t in T} >= 0;
12 var sm{i in I, t in T} >= 0;
13
14 minimize total_shortage:
15 sum{i in I} sum{t in T} sm[i,t];
16 subject to beltcap {t in T}:
17 sum{i in I} x[i,t] <= c * p;
18 subject to inventory {i in I, t in T}:
19 sp[i,t] − sm[i,t] =
20 (if t = 1 then s0[i] else sp[i,t−1] − sm[i,t−1])
21 + x[i,t] − d[i,t];
22
23 data;
24
25 param m := 21;
26 param nperiods := 14;
27 param c := 2600;
28 param p := 5;
29 param s0 :=
30 1 813 2 −272 3 −2500 4 0 5 220 6 0 7 −800 8 16
31 9 0 10 1028 11 1333 12 68 13 97 14 1644 15 0 16 0
32 17 −2476 18 0 19 86 20 1640 21 0;
33 param d : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
34 := 1 0 0 1500 7400 1300 1200 1000 1000 1000 1000 1000 1300 1200 1300
35 2 3400 3600 0 9700 1400 700 800 700 700 700 900 1000 900 900
36 3 1500 0 5000 0 0 0 0 0 0 0 0 0 0 0
37 4 0 0 0 2900 0 1000 0 400 0 400 0 400 0 0
38 5 800 2600 2400 1200 1300 1200 700 700 700 700 700 400 400 500
39 6 0 0 0 0 0 0 0 0 1600 0 0 0 0 0
40 7 400 300 300 300 300 200 100 200 200 100 200 200 200 200
41 8 0 1000 2500 3200 600 600 700 600 700 500 600 500 600 300
42 9 0 0 0 2500 0 0 0 1500 0 0 0 0 0 0
43 10 0 0 0 800 200 200 200 200 200 100 200 200 200 100
44 11 600 600 900 1700 1000 1200 500 1000 500 1000 500 1200 600 500
45 12 600 0 0 0 0 0 0 0 0 0 0 0 0 0
46 13 1400 0 0 600 300 300 300 500 400 300 200 300 200 100
47 14 0 1000 0 0 0 300 300 400 300 400 400 300 400 400
48 15 0 0 0 1200 300 100 100 100 100 200 0 100 100 100
49 16 80 0 0 1300 400 400 400 500 400 300 400 300 400 200
50 17 4000 0 0 0 0 0 0 0 0 0 0 0 0 0
51 18 0 0 0 0 0 0 0 0 0 2500 0 2500 0 0
52 19 900 0 0 1000 500 200 300 200 200 300 300 300 300 200
53 20 0 0 0 0 0 1500 0 0 0 0 0 200 0 0
54 21 0 0 0 2250 0 750 0 0 0 0 0 0 0 0;
13 . 7. E x e r c i s e s 471
55 end;
Note that the if . . . then . . . else expression in the inventory equations in this listing is
evaluated when the solver constructs the technology matrix. See also Section F.3.
13.7 Exercises
Exercise 13.7.1. What are the decision variables of model (13.5)? Calculate the number
of decision variables, and the number of constraints.
+ −
Exercise 13.7.3. Show that any optimal solution of model (13.5) satisfies rit × rit =0
for all i and t.
Exercise 13.7.4. Calculate the number of decision variables and constraints of model
(13.7). Show that if γ1 = γ2 , then model (13.7) is equivalent to model (13.5). Explain why
the optimal solutions of model (13.7) with γ1 = γ2 and of model (13.5) need not be the
same; compare Table 13.2 and Table 13.3 with Table 13.4 and Table 13.6.
Exercise 13.7.5. Apply the method described in Section 13.3 to find a feasible solution
of model (13.7) for the data in Table 13.1.
Exercise 13.7.6. Consider Table 13.5 and Table 13.6 in Section 13.3.
(a) Why is it plausible that old backlogs usually only occur in the first part of the planning
period?
(b) Calculate the total backlog. Why is this quantity the same as in the optimal solution of
model (13.5)?
(b) How can the constraint that ‘only certain designs are produced during full week periods’
be modeled?
(c) Why is the total backlog in the case of full week productions larger than without this
constraint? Recall that the total backlog of 97,114, found in Section 13.4, might be
lower.
Exercise 13.7.8. Show that, in any optimal solution of (13.5), the restrictions (13.4) are
always satisfied.
Overview
This case study describes an example of multiobjective optimization. In multiobjective optimiza-
tion, models are considered that involve multiple objectives subject to both ‘hard’ and ‘soft’
constraints. The multiple objectives usually conflict with each other, and so optimizing one
objective function is at the expense of the others. Therefore, one cannot expect to achieve
optimal values for all objective functions simultaneously, but a ‘best’ value somewhere in
between the individual optimal values. We will use a method called goal optimization to
determine this ‘best’ value.
473
474 C hap te r 14 . C onf l i c ti ng ob j e c tive s
expand the production facilities. It also does not want to delay the deliveries or to lose this
order.
In order to meet the demand, the management considers purchasing tubes from suppliers
abroad at a delivered cost of $7 per meter of tube type T1 , $7 per meter of T2 , and $9 per
meter of T3 . The imported tubes arrive ready-made, and so no finishing material is needed.
The data are summarized in Table 14.1.
There are two objectives:
I The first objective is to determine how much of each tube type need be produced and
how much purchased from abroad, in order to meet the demands and to maximize the
company’s profit.
I A second objective is due to a request by the government, who wants to reduce the
amount of money spent on imports. So in addition to maximizing the total profit, the
management of Plastics International also needs to minimize the total costs of the imports.
I Objective 2. The cost of imports, to be minimized, is equal to the total cost of importing
the tube types T1 , T2 , and T3 . Hence, this objective becomes:
x1 , x2 , x3 , t1 , t2 , t3 ≥ 0.
The conflict between the two objectives (O1) and (O2) can be illustrated by solving the
above model, separately for the two objectives. The optimal solutions are obtained by using
a linear optimization computer package. In Table 14.2 we have summarized the results. The
variables s1 and s2 are the slack variables of the ‘machine time’ constraint and the ‘finishing
material’ constraint, respectively.
The following conclusions can be drawn from Table 14.2.
I To maximize the profit, while neglecting the amount of import, the company should pro-
duce 3,000 meters of tube type T1 , produce 1,250 meters of T3 , import 5,000 meters of
T2 , and import 5,750 meters of T3 . The profit is $198,500. The total cost of importing
these quantities of tubes is equal to 0 × $7 + 5,000 × $7 + 5,750 × $9 = $86,750.
476 C hap te r 14 . C onf l i c ti ng ob j e c tive s
I To minimize the amount of import, while neglecting the profit, the company should
produce 5,000 meters of T2 , produce 666.67 meters of T3 , import 3,000 meters of T1 ,
and import 6,333.33 meters of T3 . The profit then is 0× $16+5,000× $18+666.67×
$11 + 3,000 × $13 + 0 × $17 + 6,333.33 × $9 = $193,333.
I In both cases, there is no machine idle-time (i.e., s1 = 0). The inventory of finishing
material is better used in the second case (namely, s2 = 333.33 in the case of (O2)
against s2 = 1,750 in the case of (O1)), because in the second case more is produced,
and so more finishing material is used, while the imported tubes are ready-made.
I If the cost of the imports is neglected, then the profit drops from $198,500 to $193,333.
If the profit is neglected, then the costs of the imports increase from $78,000 to $86,750.
Hence, the two objectives are in fact conflicting.
For LO-models with multiple objectives, it might be very useful if one knows all possible
values of the different objectives. The present model has only two objectives, and is therefore
small enough to determine the feasible region for the two objective functions. We use the
following model:
max αz + βw
s.t. Constraints (1) − (5)
z = 16x1 + 18x2 + 11x3 + 13t1 + 17t2 + 9t3
w = 7t1 + 7t2 + 9t3
x1 , x2 , x3 , t1 , t2 , t3 ≥ 0
z, w free.
The choice α = 1 and β = 0 corresponds to objective (O1), and the choice α = 0
and β = −1 corresponds to objective (O2). Since the demand is fixed, the two objective
functions are bounded. The feasible region, as depicted in Figure 14.1, consists of the
T
optimal points w∗ z ∗ that arise when the values of α and β are varied. The vertices of
the feasible region are determined by trial and error. It turns out that the vertices are:
∗ ∗
Point α β w z
A 0 −1 78,000.00 193,333.33
B 3 −2 78,909.09 194,181.82
C 2 −1 84,875.00 197,875.00
D 1 0 86,750.00 198,500.00
E 3 1 98,000.00 196,000.00
F −1 1 119,000.00 187,000.00
G −5 −1 84,000.00 192,000.00
Note that points in the interior of the feasible region never correspond to optimal solu-
tions, since the objectives are maximizing the profit and minimizing the import. Also note
that the points of interests are the ones where neither of the objectives can be improved
without losing something on another objective. They correspond to nonnegative values of
α and nonpositive values of β . In Figure 14.1, this is the piecewise linear curve ABCD;
1 4 . 3 . G oa l o p t i m i z at i o n f o r c o n f l i c t i n g o b j e c t i v e s 477
Profit z ∗
(×1,000)
C D
198
E
B
194 A
G
190
F
186
these points are called Pareto optimal points1 . In general, this set of points is very difficult to
determine.
In the following section, we will show how Plastics International can deal with the conflict-
ing objectives of maximizing the profits and minimizing the costs of imports, by using the
technique of goal optimization.
1
Named after the Italian engineer, sociologist, economist, political scientist, and philosopher Vilfredo
F.D. Pareto (1848–1923).
478 C hap te r 14 . C onf l i c ti ng ob j e c tive s
responding goal has a higher priority. If a deviation from the goal is not unwanted, then
there need not be a penalty. However, if there is an unwanted deviation from the goal, then
there should be a penalty; the larger the unwanted deviation, the higher the penalty. The
function that describes the relationship between the deviation from the goal and the penalty,
is called the penalty function. We will assume that penalty functions are linearly dependent
on the deviation from the corresponding goals. So the penalty is the slope of the penalty
function: the steeper the slope, the higher the penalty.
The management of Plastics International has decided to incur the following penalties:
Obviously, these decision variables have to be nonnegative. Moreover, we must ensure that
at least one of z + and z − , and one of w+ and w− is 0. In other words:
z + × z − = 0, and w+ × w− = 0.
However, similar to the discussion in Section 1.5.2, it can be shown that these (nonlinear)
constraints are redundant (with respect to the optimal solution) in the LO-model to be
formulated below. This means that any solution that does not satisfy these constraints cannot
be optimal. See also Exercise 14.9.1.
The objective is to minimize the total penalty for the unwanted deviations from the goals,
which is equal to two times the sum of the penalty for being under the profit goal plus the
penalty for being above the import cost goal. Hence, the penalty function to be minimized
is 2z − + 1w+ . Besides the three demand constraints and the two resource constraints, we
1 4 . 3 . G oa l o p t i m i z at i o n f o r c o n f l i c t i n g o b j e c t i v e s 479
need so-called goal constraints, one for each goal. To understand how z + and z − represent
the amount by which the profit is above or below the goal of $198,500, the following can
be said. First recall that the profit is:
see Section 14.2. If the profit z is above the goal, then the value of z − should be 0 and the
value of z + should satisfy z + = z − 198,500. Hence,
z − z + = 198,500, and z − = 0.
On the other hand, if the profit z is below the goal, then the value of z + should be 0, and
the value of z − should satisfy z − = 198,500 − z . Hence,
z + z − = 198,500, and z + = 0.
Since z + × z − = 0, the above two cases can be written as a single constraint, namely:
z − z + + z − = 198,500.
(value obj. function) − (amount above goal) + (amount below goal) = goal.
w − w+ + w− = 78,000.
min 2z − + w+ (O3)
s.t. x1 + t1 = 3,000 (1)
x2 + t 2 = 5,000 (2)
x3 + t 3 = 7,000 (3)
0.55x1 + 0.40x2 + 0.60x3 ≤ 2,400 (4)
x1 + x2 + x3 ≤ 6,000 (5)
+ −
16x1 + 18x2 + 11x3 + 13t1 + 17t2 + 9t3 − z + z = 198,500 (6)
+ −
7t1 + 7t2 + 9t3 − w + w = 78,000 (7)
+ − + −
x1 , x2 , x3 , t1 , t2 , t3 , z , z , w , w ≥ 0.
We have solved Model (O3) by means of a linear optimization package. In Table 14.3 the
results of the calculations are listed. From this table it can be concluded that the optimal
production-import schedule is the one given in Table 14.4. It also follows from Table 14.3
480 C hap te r 14 . C onf l i c ti ng ob j e c tive s
that both goals are not completely achieved. The fact that z − = 625 indicates that the profit
goal of $198,500 is not met by $625 (note that 198,500 − 625 = 197,875), and the fact
w+ = 6,875 indicates that the import goal of $78,000 is exceeded by $6,875 (note that
78,000 + 6,875 = 84,875). Hence, this goal optimization model achieves lower import
cost at the expense of decreasing profit, when comparing this to the profit-maximization
model (O1); see Table 14.2. Similarly, this goal optimization model achieves a larger profit at
the expense of increasing import cost, when compared to the import-minimization model
(O2); see also Table 14.2.
Since the value of the slack variable of constraint (5) is 1,125, it follows that 1,125 grams
of the initial inventory of 6,000 grams is left over. Also notice that this solution contains no
machine idle-time.
The dual values in the above solution can be interpreted as shadow prices, because this
solution is nondegenerate; see Chapter 5. Increasing any of the demand values leads to a
lower penalty, since the corresponding shadow price is negative. This also holds for the
(binding!) ‘machine time’ constraint. The ‘finishing material’ constraint is nonbinding,
which means a zero shadow price. Increasing the profit goal leads to a higher penalty (the
situation becomes worse), while increasing the import goal lowers the penalty (the situation
improves). See also Section 14.4.
1 4 . 4 . S o f t a n d h a r d c o n s t ra i n t s 481
In Exercise 14.9.2, the reader is asked to explain the signs of the shadow prices of Table 14.3.
and suppose that this constraint is soft. This means that some deviation from the goal 2,400
is tolerable. In order to indicate the deviation from the right hand side of (4), we introduce
nonnegative decision variables m+ and m− , such that:
Since the machine time constraint is a ‘≤’ constraint with a nonnegative right hand side
value, there is no penalty for being below the goal 2,400, but there should be a penalty for
exceeding this goal. Hence, m+ reflects the unwanted deviation from the goal, and this
deviation multiplied by a certain penalty has to be minimized. This penalty is a subjective
value, reflecting the relative importance of exceeding the goal when compared to the devi-
ation from the profit and the import goals. The importance of not meeting the machine
time goal (2,400 minutes) is compared to that of not meeting the profit goal ($198,500)
and the import goal ($78,000); see Section 14.3. The question then is, what is a reasonable
decrease of the value of 2z − + w+ if during one extra hour is produced. Suppose the
management has decided that one extra minute production per week is allowed for every
$20 improvement on the previous penalty function. Hence, the machine time penalty is
set to 20, and the new penalty function becomes 2z − + w+ + 20m+ . The corresponding
LO-model, called (O4), can now be formulated as follows.
x2 + t 2 = 5,000 (2)
x3 + t 3 = 7,000 (3)
+ −
0.55x1 + 0.40x2 + 0.60x3 − m + m = 2,400 (4’)
x1 + x2 + x3 ≤ 6,000 (5)
+ −
16x1 + 18x2 + 11x3 + 13t1 + 17t2 + 9t3 − z + z = 198,500 (6)
+ −
7t1 + 7t2 + 9t3 − w + w = 78,000 (7)
+ − + − + −
x1 , x2 , x3 , t1 , t2 , t3 , z , z , w , w , m , m ≥ 0.
The optimal solution of model (O4), which has again been calculated using a computer
package, is as follows.
This production-import schedule can only be realized if (366.07/60 =) 6.10 extra hours
per week are worked. The solution reflects the situation in which both the production and
the import are optimal ((O1) and (O2), respectively), at the cost of expanding the capacity
with 6.10 hours.
Suppose that the management of Plastics International considers 6.10 additional hours as
unacceptable, and has decided that only four hours extra per week can be worked. This
leads to the constraint:
m+ ≤ 240. (8)
Adding constraint (8) to model (O4) results in model (O5), and solving it leads to the
following optimal solution.
When comparing the solution of model (O3) with this solution, it appears that both the
profit and the import goal are now better satisfied:
Also the inventory of finishing material is much better used: in the case of (O1) the slack
of constraint (5) is 1,125, and in the case of (O5) the slack is 525. Based on these facts,
the management may consider extending the man-power capacity by three hours per week.
In Exercise 14.9.6 the reader is asked to calculate the influence on the optimal solution of
model (O5), when the penalty 20 of m+ is changed.
∗ ∗
z w
(×$1,000) (×$1,000)
198,500
197,875
198 x1 = 3000
x1 = 3000 x2 = 0
x2 = 1875 x3 = 1250
x3 = 0 t1 = 0
195 194,182 t1 = 0 t2 = 5000
193,333 t2 = 3175 t3 = 5750
+
x1 = 727.27 t3 = 7000 z = 0
+ −
x1 = 0 x2 = 5000 z = 0 z = 0
192 x2 = 5000 x3 = 0 −
z = 625 +
w = 8750
t1 = 2272.73 + −
x3 = 666.67 w = 6875 w = 0
t1 = 3000 t2 = 0 −
w = 0
t2 = 0 t3 = 7000
+
t3 = 6333.33 z = 0
+ −
z =
−
0 z = 4318.18
+
90
z = 5166.67 w = 909.09
+ −
w = 0 w = 0 86,750
−
w = 0 84,875
80
78,000 78,909
70
0 1 2 3 α
Figure 14.2: Profit and import costs as functions of the profit penalty α.
for all values of P G; see Exercise 14.9.8. The function z + decreases for P G ≤ 193,333,
and is 0 for all other values of P G. If both z + and z − are 0, while P G > 193,333 then
w+ ≥ w− − 78,000 + w, and so, since w+ is minimized, w− = 0. Because (6) is now
equivalent to z = P G, it follows that the values of t1 , t2 , t3 increase when P G increases,
because (4) remains binding. Hence, w+ will increase. Similar arguments can be used to
explain the behavior of the other functions in Figure 14.3. Note that a decrease of the
original profit goal with at least (198,500 − 197,875 =) 625 units changes the penalty
values of z − and w+ .
Figure 14.4 shows the relationships between the import goal IG and the penalties. In this
case, z + (IG) = 0 for all values of IG. Moreover, the penalties are not sensitive to small
changes of the original value (78,000) of IG. The analysis of the behavior of the four
penalty functions is left to the reader.
Penalty
8,000 193,333 194,181 198,500
6,875
7,000
6,000 +
w +
5,000 −
w
z
4,000 3,333.33
3,000 2,333.33z + 2,125
2,000 − + − 2,230.77 + −
z ,w ,w + −
z ,z ,w
− z ,w
1,000
909.09 1,125
190,000 192,000 194,000 196,000 198,000 200,000 Profit
goal
lexmin{u | u ∈ P },
u∗1 = min{u1 | u ∈ P },
u∗2 = min{u2 | u ∈ P, u1 = u1∗ },
.. ..
. .
uk∗ = min{uk | u ∈ P, u1 = u1∗ , u2 = u∗2 , . . . , uk−1 = uk−1
∗
}.
The optimal values of the remaining variables uk+1 , . . . , un are determined in the k ’th
model. The sequence stops after k steps, where k is the smallest index for which the
corresponding LO-model has a unique solution.
486 C hap te r 14 . C onf l i c ti ng ob j e c tive s
Penalty
8,000 84,875 86,750
6,875
7,000
6,000
5,000
+
4,000 w
3,250
3,000 − 2,250
− − w
2,000 z + z + + +
z w w z
1,000 625
In lexicographic optimization, u1 is always the sum of the unwanted deviations from the
goals of the soft constraints. If u1∗ = 0, then there is a solution for which all the unwanted
deviations in the ‘soft’ constraints are zero. This means that there is a feasible solution if all
the ‘soft’ constraints are considered as ‘hard’ constraints. For the remaining part, u2 , . . . , un ,
we discuss the following two options.
As an illustration of option 1, consider the following situation. Suppose that all constraints of
model (O4) are soft. The unwanted deviations are a− , b− , c− , m+ , n+ , z − , and w+ . The
management considers deviations from the profit goal as more unwanted than deviations
from the import goal. A problem may arise, because in (1’), (2’), and (3’) it is allowed that
the production exceeds the demand. We leave the discussion of this problem to the reader.
1 4 . 6 . A l t e r nat i v e s o lu t i o n t e c h n i q u e s 487
The model obtained by choosing option 1, which we call model (O6), then becomes:
Since the objective ‘function’ consists of three factors, the solution procedure can be carried
out in at most three steps.
When comparing this solution to the optimal solution of model (O3) we see that both the
profit and the import goals are better approximated (compare also the optimal solutions
of (O1) and (O2)), mainly at the cost of producing an additional 310 meters of T2 tubes
488 C hap te r 14 . C onf l i c ti ng ob j e c tive s
(b+ = 310). We leave to the reader the discussion of the problem that there is no immediate
demand for these 310 meters.
The model obtained by choosing option 2, which we call model (O7), reads as follows:
The optimal solution to model (O7) turns out to be the same as the optimal solution of
model (O7).
The expression ‘solving all LO-models separately’ means solving the LO-models successively
for all objectives. In the case of our production-import problem, the two models are (O1)
and (O2), where in the latter case the objective is max(−7t1 − 7t2 − 9t3 ). Hence, U1 =
198,500, L1 = 193,333, U2 = −78,000, and L2 = −86,750.
The general form of a fuzzy LO-model can then be formulated as follows.
min δ
Uk − zk
s.t. δ ≥ for k = 1, . . . , m
dk
The original constraints
δ ≥ 0.
The right hand side vector, (U1 − z1 )/d1 . . . (Um − zm )/dm , is called the fuzzy mem-
bership vector function of the model. We consider two choices for the fuzzy parameter, namely:
dk = Uk − Lk , and dk = 1.
min δ (O8)
198,500 − 16x1 − 18x2 − 11x3 − 13t1 − 17t2 − 9t3
s.t. δ ≥
5,166.67
1 4 . 7. A c o m pa r i s o n o f t h e s o lu t i o n s 489
min δ (O9)
s.t. δ ≥ 198,500 − 16x1 − 18x2 − 11x3 − 13t1 − 17t2 − 9t3
δ ≥ −78,000 + 7t1 + 7t2 + 9t3
Constraints (1) – (5)
x1 , x2 , x3 , t1 , t2 , t3 , δ ≥ 0.
Z (×1,000)
78 79 80 81 82 83 84 85 86 87 W (×1,000)
14.9 Exercises
Exercise 14.9.1. Consider model (O3). Determine a feasible solution of model (O3) that
does not satisfy the conditions z + × z − = 0 and w+ × w− = 0. Show that this solution
is not optimal.
Exercise 14.9.2. Explain the signs of the optimal dual values in Table 14.3.
Exercise 14.9.3. What are the effects on the optimal solutions of the models (O1) – (O9),
when the quantity of finishing material is restricted to 4,000 grams? (The original quantity
is 6,000 grams; see Table 14.1.)
Exercise 14.9.4. Show that the optimal solution of model (O3) is nondegenerate and
unique.
Exercise 14.9.5. Consider the production-import problem (O3). Suppose that the
constraint x1 + t1 = 3,000 is soft. Let a+ and a− be the deviations from the goal 3,000,
such that x1 + t1 − a+ + a− = 3,000. Determine the smallest positive value of the penalty
γ in the objective min 2z − + w+ + γa+ such that the optimal production-import schedule
is equal to the optimal schedule when the constraint x1 + t1 = 3,000 was still hard.
Exercise 14.9.6. Consider model (O5). Determine the smallest positive value of the
penalty β of m+ in min 2z − + w+ + βm+ such that in the optimal solution it holds that
z = 197,875 and w = 84,785 (being the optimal solution of (O3)).
Exercise 14.9.9. Repeat the analysis in Section 14.5.1 for the case that the profit penalty
is set to 1, and the import penalty to a parameter λ.
15
C hap te r
Coalition formation and profit
distribution
Overview
This case study is based on Bröring (1996), and deals with a ‘game theoretical’ approach to a
problem where a group of farmers considers to cooperate as a result of favorable negotiations
about profit distribution.
493
494 C h a p t e r 15 . C o a l i t i o n f o r m at i o n a n d p r o f i t d i s t r i b u t i o n
Each farmer can individually decide on his biscuit production. For example, a Type 2 farmer
can use his butter, flour, and sugar to produce 1,000 packets of Bis1 biscuits, which results
in a (1,000 × e 5 =) e 5,000 profit. Similarly, a Type 1 farmer has enough butter, flour,
and sugar to produce 2,000 packets of Bis3 biscuits, resulting in a profit of (2,000 × e 7 =)
e 14,000. In fact, the production of 2,000 packets of Bis3 biscuits leaves 2,000 ounces of
flour unused, which can be used for an additional 500 packets of Bis2 biscuits, resulting in
an additional profit of (500 × e 2 =) e 1,000. So, a Type 1 farmer can make a total profit
of e 15,000.
However, since each farmer produces different amounts of butter, flour, and sugar, it may be
more profitable for farmers to combine their yields in order to increase the total production
(i.e., the number of packets of biscuits). Consider again the Type 1 and Type 2 farmers. If a
Type 1 farmer and a Type 2 farmer decide to produce biscuits individually, the two farmers
will have a joint profit of (e 5,000 + 15,000 =) e 20,000. Suppose, however, that they
decide to combine their yields. This means that they will have combined yields of 4,000
ounces of butter, 10,000 ounces of flour, and 3,000 ounces of sugar. With the combined
yields, they can jointly produce 3,000 packets of Bis3 biscuits, which gives a joint profit
of (3,000 × e 7 =) e 21,000. In addition, they can produce 250 packets of Bis2 biscuits,
yielding another e 500. So, combining their yields leads to a joint profit of e 21,500, which
is e 1,500 more than their combined profit if they decide to work separately. Therefore, it
is profitable for them to work together.
An important question that arises is: if farmers decide to combine their yields, how do we
find a fair distribution of the total earned profit? Clearly, the Type 1 farmer will only agree
to combine his yields with the Type 2 farmer if he receives at least as much profit as when
he produces biscuits individually. That is, the Type 1 farmer will want to receive at least
e 15,000. Similarly, the Type 2 farmer will want to receive at least e 5,000. As long as
the farmers agree on a profit distribution that satisfies these restrictions, the Type 1 farmer
and the Type 2 farmer will agree to cooperate. So, the two farmers could agree to split the
profits so that the Type 1 farmer receives, say, e 16,000, and the Type 2 farmer receives
e 5,500.
15 . 2 . G a m e t h e o r y ; l i n e a r p r o d u c t i o n g a m e s 495
Another question is whether or not a farmer should cooperate with just one farmer, or
with a number of farmers, or even with all other farmers. A group of farmers that work
together is called a coalition. We argued above that a farmer will only cooperate with a
coalition if his total profit turns out to be at least as large as his profit when he organizes
the production of biscuits individually. If this is not the case, the farmer will ‘split off’ and
work by himself. More generally, it could happen that a group of farmers within a coalition
(a so-called subcoalition) realizes that it can make more profit by splitting off and forming a
coalition by itself. In general, a subcoalition will only cooperate with a larger coalition if its
joint profit turns out to be at least as large as its joint profit when it splits off and organizes
the production of biscuits within the subcoalition.
So, the question is to determine a distribution of the total profit, such that no subcoalition
has an incentive to split off. This means that no subcoalition can make each of its members
better off by working together. In the following section we will show how this problem can
be formulated as a so-called game theory problem.
players, we say that the players have complete information. If, in addition, all players know
that all players have complete information, then we speak of common knowledge.
I The possibility of making binding agreements.
One can distinguish between cooperative and noncooperative games. In noncooperative
games, every player attempts to maximize his/her individual pay-off by competing with
his opponents. In cooperative games, however, it is possible to make mutual agreements.
A set of players that cooperate is called a coalition. Within a coalition, the agreements
about the decisions to be made are binding for all players in the coalition.
The current case study deals with a cooperative game in which the players are the three
types of farmers. The number of farmers of the individual types is not known. It is assumed
that if one farmer of a certain type is in a coalition, then all farmers of that type are in
that coalition. Before the actual production of biscuits starts, the farmers decide about their
cooperation with other farmers. These decisions depend on the agreed upon distribution of
the total profit. Since each farmer offers a given amount of input goods, to be transformed
into output products with a certain market value, we are in the situation of a zero-sum
game. Furthermore, since the number of farmers per type is not known there is incomplete
information.
A linear production game is a cooperative game in which the players have the possibility to
form coalitions, and there are no limitations concerning the possibility of communication
between the players. Each player has a bundle of resources, which are used for production of
output goods. It is assumed that the production process is linear, meaning that the amounts
of required input goods are linear functions of the desired production level. This means, for
instance, that doubling the amount of input goods results in a doubling of the production
level. Furthermore, we assume that the products can be sold at fixed market prices and that
the selling process is linear. The owners of the resources can work either individually, or
cooperate and possibly increase their total profits.
We define the following symbols and concepts.
c = the profit vector, which is an r-vector, where the j ’th entry denotes the profit
of output good j (in e per packet).
P = the capacity matrix, which is an (m, n)-matrix, where entry (i, k) is the
amount of input good i of the farmers of Type k .
We assume that none of the rows of A consist of all zeroes, since this would imply that the
corresponding input good is not used at all for production, and it can just be eliminated.
Similarly, we assume that none of the columns of A consist of all zeros, since this would
imply that the corresponding output good does not use any input goods.
T T
For two nonnegative n-vectors a = a1 . . . an and a0 = a10 . . . an0 , we say that a0
The coalition that corresponds to the vector a is called the grand coalition. Any coalition
that corresponds to a nonnegative integer-valued subvector a0 of a is called a subcoalition of
T
the grand coalition. For instance, in the subcoalition corresponding to the vector 7 0 5
T
(which is a subvector of the grand coalition vector 7 2 5 ): only the Type 1 and the Type
3 farmers form a coalition. Moreover, the coalition corresponding to the vector 0 is called
the empty (sub)coalition.
For any coalition vector a, v(a) is the maximum possible total profit if the farmers form
the coalition corresponding to a. This value can be calculated by solving the following
LO-model:
v(a) = max cT x
s.t. Ax ≤ Pa (GP)
x ≥ 0,
Hence, the LO-model that determines the optimal value of the coalition vector a =
T
a1 a2 a3 reads:
grand coalition, and excluding the empty coalition). Table 15.2 lists these seven possibilities,
together with the corresponding optimal production schedule x∗ , and the corresponding
value v(a0 ) of the coalition.
The maximum value of v(a0 ) in Table 15.2 is attained by the grand coalition; see Exercise
15.6.1. In this case the production schedule consists of 1,000 packets of Bis1, 0 packets
of Bis2, and 4,000 packets of Bis3. The corresponding total profit is e 33,000. The last
column of Table 15.2 contains, for each subcoalition, the average profit vavg (a0 ) (×e 1,000)
per farmer, i.e., the profit for each farmer when the total profit is equally divided over the
farmers in the corresponding subcoalition.
From Table 15.2, we can deduce that if the farmers decide to divide profits equally, then
not all farmers will be happy with the coalition. Indeed, suppose that they decide to do
so. In the grand coalition, each farmer receives e 11,000 profit. However, consider the
T
subcoalition corresponding to a0 = 1 0 1 , in which the Type 1 farmer cooperates with
the Type 3 farmer, and the Type 2 farmer works alone. In that subcoalition, the Type 1 and
Type 3 farmers both receive e 14,000. So, it is profitable for the Type 1 and 3 farmers to
abandon the grand coalition and start their own subcoalition. But, in fact, this subcoalition
will not work either: the Type 1 farmer can decide to work alone and make a e 15,000
T
profit. This corresponds to the subcoalition 1 0 0 . Note that the Type 1 farmer will not
cooperate with any other farmer unless his share is at least e 15,000.
The question is: how to divide the maximum possible total profit of e 33,000, in such a
way that the farmers are satisfied with the grand coalition? Is this even possible? In general,
suppose that it has been decided to cooperate according to the (sub)coalition vector a. The
question then is: how to distribute the total profit earned by this coalition, such that the
farmers of the coalition are all satisfied with their shares of the total profit?
15 . 3 . H o w t o d i s t r i b u t e t h e t o ta l p r o f i t a m o n g t h e f a r m e r s ? 499
0 ∗ 0 0
a x v(a ) vavg (a )
[ 1 1 1] [ 1 0 4] 33 11
1
[ 1 1 0] [ 0 4 3] 21 12 10 34
[ 1 0 1] [ 0 0 4] 28 14
[ 0 1 1] [ 1 12 0 1] 14 12 7 14
1
[ 1 0 0] [ 0 2 2] 15 15
[ 0 1 0] [ 1 0 0] 5 5
1
[ 0 0 1] [ 2 0 1] 9 12 9 12
The set of equalities aT u = v(a) means that the full profit is distributed, while the in-
T
equalities in C(a) mean that there is no incentive for any coalition to split off: (a0 ) u is
the total profit received by the members of coalition a0 when all of them are collaborating
according to the coalition a, and the payouts are given by u. The right hand side of the
inequalities, v(a0 ), is the total profit that coalition a0 can make if they split off. Instead of
T
C( a1 . . . an ) we simply write C(a1 , . . . , an ). If there is one farmer of each type and
there are three types, then the core is the solution set of the following collection of equations
500 C h a p t e r 15 . C o a l i t i o n f o r m at i o n a n d p r o f i t d i s t r i b u t i o n
u3
11.5 2
u
1
u
9.5
16.5 18.5 u1
T
Figure 15.1: Core of the game with a = 1 1 1 . The figure shows the values of u1 and u3 . All points
and equalities:
u1 + u2 + u3 = 33
u1 + u2 ≥ 21 21
u1 + u3 ≥ 28
u2 + u3 ≥ 14 21
u1 ≥ 15
u2 ≥5
u3 ≥ 9 21 .
It is easy to verify that in this simple case the core satisfies:
1 1
3 16 2 ≤ u1 ≤ 18 2 , u2 = 5,
T
C(1, 1, 1) = [u1 u2 u3 ] ∈ R 1 .
9 2 ≤ u3 ≤ 11 12 , u1 + u3 = 28
This set is depicted in Figure 15.1. The interpretation is that, if the profits are divided
according to any vector [u1 u2 u3 ]T ∈ C(1, 1, 1), no group of farmers is better off by not
participating in the grand coalition. If, on the other hand, the profits are divided according
to a vector that is not in C(1, 1, 1), then there is some group of farmers that is not satisfied,
and which will not want to be part of the grand coalition.
Since the number of constraints in the definition of C(a) may be very large when the
number of farmers is large, it is in general not at all straightforward to find the core, or even
a point in the core. In the remainder of this section, we will use the dual of model (GP) to
determine a so-called Owen point of the core, and show that it is in fact a point in the core.
For any (sub)coalition vector a ∈ Nn , the dual of model (GP) reads:
v(a) = min aT PT y
s.t. AT y ≥ c (GD)
y ≥ 0.
Both the primal and the dual model are in standard form; see Section 1.2.1.
15 . 3 . H o w t o d i s t r i b u t e t h e t o ta l p r o f i t a m o n g t h e f a r m e r s ? 501
Theorem 15.3.1.
Model (GP) and (GD) are both feasible.
Proof. Model (GP) is feasible, because x = 0 is a feasible point. To see that model (GD) is
feasible as well, notice first that A ≥ 0 and c ≥ 0. We will show that the vector y, of which
the entries satisfy:
c`
yi = max a >0
`=1,...,r ai` i`
is a feasible point of model (GD). Note that yi in the above expression is well-defined, because
we assumed that no row of A consists of all zeroes, and hence there exists at least one ` such that
ai` > 0 in the expression. To prove that y is indeed a feasible point of model (GD), take any
j ∈ {1, . . . , r}. Let I = i aij > 0 . Since none of the columns of A consist of all zeroes,
cj
it follows that I 6= ∅. By the definition of yi , we have that yi ≥ aij for i ∈ I . Therefore, we
have that:
m
T
X X X cj X
(A y)j = aij yi = aij yi ≥ aij = cj ≥ cj .
aij
i=1 i∈I i∈I i∈I
This proves that model (GD) is indeed feasible.
Because model (GP) and (GD) are both feasible, they have the same optimal objective
value. Now suppose that y∗ is an optimal solution of model (GD). Define the vector
T
u∗ = u1∗ . . . u∗n by:
u∗ = P T y ∗ .
The vector u∗ is called an Owen point of the game (with coalition vector a). The following
theorem shows that Owen points are points in the core of the game.
Theorem 15.3.2.
For any Owen point u∗ of a linear production game with coalition vector a, we have
that u∗ ∈ C(a).
Proof. Let y∗ be any optimal solution of model (GD), and let u∗ = PT y∗ . Note that
v(a) is the optimal objective value of both model (GP) and model (GD). Therefore, v(a) =
T T ∗ T ∗ ∗ 0 T ∗ 0
a P y = a u . So, to show that u ∈ C(a), it suffices to show that (a ) u ≥ v(a ) for
0 0 0 ∗
every a ≥ 0 with a ⊆ a. Replace a by a in model (GD). Since y is a feasible (but not
necessarily optimal) solution of the new model, it follows that (a0 )T u∗ = (a0 )T PT y∗ ≥ v(a0 ).
This proves the theorem.
502 C h a p t e r 15 . C o a l i t i o n f o r m at i o n a n d p r o f i t d i s t r i b u t i o n
For the case of the grand coalition of three farmers, an Owen point is easily calculated from
the following model:
Hence, in the case with one farmer of each type, we obtain the Owen point:
T
u∗ = 18 12 5 9 21 .
This is the point u1 in Figure 15.1. Thus, when the total profit is divided so that the
farmers of Type 1, Type 2, and Type 3 receive 18 12 , 5, and 9 12 (×e 1,000), respectively,
then no farmer can make a higher profit by forming other coalitions. The above model for
v(1, 1, 1) has a unique solution, and so the game has a unique Owen point. This does not
mean that there are no other points in the core for which, for instance, the Type 3 farmer
obtains a larger share; see Figure 15.1. So, in general, not every (extreme) point of the core
corresponds to an Owen point. However, it may happen that a game has more than one
Owen point. This happens if model (GD) has multiple optimal solutions; for example the
model corresponding to a2 in Figure 15.2 of the next section. Also, note that Owen points
are of particular interest if the number of farmers is large, because for Owen points not
all (exponentially many!) coalition values have to be determined. In Exercise 15.6.11, an
example is given of a linear production game with several Owen points. Actually, in this
example the core is equal to the set of Owen points.
Hence, for a p-replica game with coalition vector pa, an Owen point is calculated from the
following model:
Clearly, v(pa) = pv(a), and model (GD) and model (pGD) have the same set of optimal
solutions. Hence, model (GD) and model (pGD) produce the same set of Owen points.
Moreover, any Owen point u∗ = PT y∗ of (GD) is an Owen point of (pGD) for each
p ≥ 1. This means that the set of Owen points of a game depends only on the proportions
of Type 1, Type 2, and Type 3 farmers, and not on the total number of farmers. Hence, in
order to gain insight in the set of Owen points as a function of the coalition vector, we may
T
restrict our attention to fractional coalition vectors a1 a2 a3 satisfying a1 ≥ 0, a2 ≥ 0,
T T
3 = 15 12 13 corresponding to y3∗ = 5 12 12 0 ,
T T
4 = 15 6 1 18 1 corresponding to y4∗ = 0 21 5 12 .
2 2
It is possible to analytically derive the values of a1 , a2 , a3 for which each of these points is an
Owen point; see Section 15.4.2. We have summarized these derivations in Figure 15.2. Each
T T
point a1 a2 in the graph corresponds to the coalition vector a1 a2 1−a1 −a2 . Thus,
T
the midpoint of the triangle represents the coalition vector 13 13 13 , which represents all
coalitions with equal numbers of Type 1, Type 2, and Type 3. The Owen point we find
for such coalitions is 1 . This corresponds to payouts of e 18,000 to each Type 1 farmer,
e 5,000 to each Type 2 farmer, and e 9,500 to each Type 3 farmer.
T
The point a2 = 53 15 represents all coalitions in which 60% of the farmers are of Type 1,
20% of are of Type 2, and 20% of Type 3. For such a coalition, all four vectors y1∗ , y2∗ , y3∗ ,
and y4∗ are optimal for model (GD), and so all points of the set conv{y1∗ , y2∗ , y3∗ , y4∗ } are
optimal as well. In Exercise 15.6.4 the reader is asked to draw the feasible region of model
(GD) in this case, and to show that the optimal set is a facet (see Appendix D) of the feasible
region. Observe also that, in Figure 15.2, each region for which a particular Owen point
504 C h a p t e r 15 . C o a l i t i o n f o r m at i o n a n d p r o f i t d i s t r i b u t i o n
a2
0.75 1
a
2
1
0.5
4
2
0.25 a
1 4
3
3 3
0 0.25 0.5 0.75 1 a1
Figure 15.2: Owen points for different distributions of farmers of Type 1, Type 2, and Type 3. Each point
T
a1 a2 with a1 ≥ 0, a2 ≥ 0, a1 + a2 ≤ 1, corresponds to a coalition vector
T 1 T
a1 a2 a3 with a3 = 1 − a1 − a2 . The marked points are a = 31 23 and
2 T
a = 35 15 .
occurs is convex. In Exercise 15.6.5, the reader is asked to prove that this observation holds
in general.
From Figure 15.2, we can draw the following conclusions. In the case study of this chapter,
we are considering the situation in which, at the beginning of the season, three types of
farmers simultaneously decide to sell their products to a factory where these products are
used to produce other goods with a certain market value. It is not known in advance how
many farmers offer products and what share of the total profit (which will only become
known at the end of the season) they can ask. These uncertainties imply that at the beginning
of the season, the optimal proportions of shares can only be estimated. We have solved the
problem by determining for every possible coalition a reasonable profit distribution scheme
in which not too many farmers are (not too) dissatisfied with their shares.
The proportion of each type of farmer may be estimated from historical data. For example,
if we know from earlier years that the number of farmers is roughly the same for each type,
T
then we know that the fractional coalition vector will be ‘close’ to 13 13 13 , and hence
it is safe to agree to divide the profit according to the proportions corresponding to Owen
point 1 , which means paying out e 18,500 to each Type 1 farmer, e 5,000 to each Type
2 farmer, and e 9,500 to each Type 3 farmer.
However, if the proportions of farmers do not correspond to a point in the area marked 1
in Figure 15.2, then Owen point 1 is not in the core. In that case the share of one of the
15 . 4 . P r o f i t d i s t r i b u t i o n f o r a r b i t ra ry n u m b e r s o f f a r m e r s 505
farmer types is too high, and the other types can obtain a large share of the total profits by
forming a subcoalition.
−2 −2 −1 1 0 0
−AT I3 = 0 −4 0 0 1 0 .
−1 −3 −1 0 0 1
T
To illustrate the calculation process, let i = 1 and consider the dual solution y1∗ = 41 2 41 0 .
The corresponding optimal values of the slack variables are y4∗ = y6∗ = 0, and y5∗ = 7.
Since the model has three constraints, any feasible basic solution has three basic variables.
Thus, the basic variables corresponding to y1∗ are y1 , y2 , and y5 . We therefore have that
BI = {1, 2, 5} and N I = {3, 4, 6}. The basis matrix B and the matrix N corresponding
to this choice of basic variables are:
−2 −2 0 −1 1 0
B = 0 −4 1 , and N = 0 0 0 .
−1 −3 0 −1 0 1
Recall that a feasible basic solution of a maximizing LO-model is optimal if and only if it is
feasible and the current objective coefficients are nonpositive; see also Section 5.1. Since the
constraints of (GD) do not depend on a, we have that yi∗ is a feasible solution of (GD) for
all a. In other words, the feasibility conditions do not restrict the set of vectors a for which
yi∗ is optimal. This means that we only have to look at the current objective coefficients.
506 C h a p t e r 15 . C o a l i t i o n f o r m at i o n a n d p r o f i t d i s t r i b u t i o n
c̄TN I − c̄BI
T
B−1 N
T T 3 1
−2a1 −a2 −3a3 −2a1 −2a2 −2a3 −4 0 −1 1 0
2
= 0 − −8a1 −2a2 −4a3 1 0 − 1 0 0 0
4 2
0 0 1 1 −2 −1 0 1
1 3 1 1
= 2 a1 − 2 a3 2 a1 −a2 − 3 a3 −3a1 −a3 ,
where the second line is obtained by straightforward (but tedious) calculations. Since the
current objective coefficients need to be nonpositive, we conclude from these calculations
that y1∗ is an optimal solution of (GD) if and only if:
Together with a1 ≥ 0 and a2 ≥ 0, these are the inequalities that bound the region for 1
in Figure 15.2. We have carried out the same calculations for y2∗ , y3∗ , and y4∗ . The results
are listed below.
In Table 15.3(a), we list the Owen points and the corresponding total profit v(1) for ∆ =
T
0, 1, . . . , 13, with c(∆) = ∆ 2 7 . So only the first coordinate entry of c is subject to
change.
15 . 5 . S e n s i t i v i t y a na ly s i s 507
In Table 15.3(a) we may observe the interesting fact that for ∆ ≥ 12 it holds that for any
subcoalition S it does not make a difference whether or not a farmer who is not in S joins
S : the value of S plus this farmer is equal to the value of S plus the value of the coalition
consisting of this single farmer. So the addition of this farmer has no added value for the
coalition S . In such situations we call this farmer a dummy player.
In general, a farmer of Type k ∈ {1, . . . , n} is called a dummy player with respect to the
coalition vector a if it holds that
Note that v(ak ek ) = ak v(ek ). We will show if ∆ = 12, i.e., the market price of Bis1 is
e 12, then all three farmers are in fact dummy players. It is left to the reader to carry out
the various calculations; we only present the results. We start with the optimal values of all
subcoalitions of the grand coalition.
α > 0.
Finally, we investigate how a change of the production process influences the Owen point.
For instance, what happens if the proportions of the input factors needed for the production
of Bis3 are changed? Consider the matrix:
2 0 1
A(∆) = 2 4 ∆ .
1 0 1
Table 15.3(b) presents for ∆ ≥ 0 the corresponding Owen points and the values v(1, 1, 1)
of the total profit. It can be shown that for ∆ ≥ 10 all the farmers are dummy players. For
0 ≤ ∆ ≤ 3, there are multiple Owen points. In Exercise 15.6.9 the reader is asked to design
similar tables for the other entries of the technology matrix A.
In all cases considered above, there is a certain value of ∆ for which all values, either larger
or smaller than ∆, give rise to a situation in which all the farmers are dummy players. In
these situations the shares assigned are the same for all subcoalitions; cooperation does not
make sense from a profit distribution point of view. We leave it to the reader to analyze the
case when there are four or more types of farmers.
15 . 6 . E x e r c i s e s 509
15.6 Exercises
Exercise 15.6.1. Consider model (GP) in Section 15.2. Let P ≥ 0 (meaning that each
entry of P is nonnegative). Show that, for each two coalition vectors a0 ≥ 0 and a00 ≥ 0
with a0 ⊆ a00 , it holds that v(a0 ) ≤ v(a00 ). Show that the maximum total profit is attained
by the grand coalition.
Exercise 15.6.2. Let a ∈ Nn . Let C(a) be the core corresponding to a; see Section
15.3. Show that C(a) is a polyhedral set with dimension at most n − 1.
Exercise 15.6.3. Let a ∈ Nn , and let S(a) = {k | ak > 0, k = 1, . . . , n}, called the
support of the vector a. Define for each k ∈ S(a):
(a) Show that Mk (a)/ak is the highest possible share a farmer of Type k can ask without
risking the withdrawal of another farmer, and mk (a)/ak is the lowest possible share
with which a farmer of Type k is satisfied.
The Tijs point1 τ (a) of a linear production game is an n-vector whose entries τk (a), k =
1, . . . , n, are defined by:
Exercise 15.6.4. Consider the feasible region of model (GD) for the data used in the case
study in this chapter.
(a) Draw the feasible region. Determine the five extreme points, the three extreme rays,
and the two bounded facets of this feasible region.
(b) Show that the vectors y1∗ , y2∗ , y3∗ , y4∗ (see Section 15.4) correspond to Owen points of
T
the grand coalition (i.e., a1 , a2 , a3 > 0), and that the vector 0 2 12 0 does not.
1
In honor of Stef H. Tijs (born 1937) who introduced the concept in 1981, although he called it the
compromise value.
510 C h a p t e r 15 . C o a l i t i o n f o r m at i o n a n d p r o f i t d i s t r i b u t i o n
Exercise 15.6.5. Assume that the matrix P in model (GD) in Section 15.3 is nonsingular.
Show that if the coalition vectors a0 and a00 correspond to the same point u in the core,
then all convex combinations of a0 and a00 correspond to u.
Exercise 15.6.6. Consider Figure 15.2. Let the total profit be distributed according to
the proportions 18 12 : 5 : 9 21 , for a Type 1 farmer, a Type 2 farmer, and a Type 3 farmer,
respectively. Calculate in all cases for which 1 is not an Owen point: the core, the values
of mk (a) and Mk (a) for all k = 1, . . . , n (see Exercise 15.6.3), the Tijs point (see Exercise
15.6.3), and the deviations from the satisfactory shares.
T
Exercise 15.6.9. Calculate tables similar to Table 15.3(a) for the cases c = 5 ∆ 7 and
T
c = 5 2 ∆ with ∆ ∈ R, and answer similar questions as the ones in Exercise 15.6.8;
see Section 15.5. Determine and analyze similar tables when the entries of the production
matrix A are subject to change; compare in this respect Table 15.3(b) and the corresponding
remarks in Section 15.5. Pay special attention to the situations where the values of the Owen
points do not increase or decrease monotonically with ∆. See for example Table 15.3(b):
the values of the Owen points decrease for ∆ = 1, 2, 3, increase for ∆ = 3, 4, and decrease
again for ∆ ≥ 3.
12 1 3
h i
Exercise 15.6.10. Consider a linear production game with A = 40 10 90 , P =
100 0
h i T
240 480 , and c = 40 6 30 .
(a) Solve the LO-model corresponding to this linear production game; see Section 15.2,
model (GP).
15 . 6 . E x e r c i s e s 511
(b) Determine all optimal solutions of the dual model and the set of Owen points; see
Section 15.3, model (GD).
(c) Determine the core for the grand coalition of this game, and show that the set of Owen
points is a proper subset of the core.
101
h i
Exercise 15.6.11. Show that, for the linear production game defined by A = 0 1 1 ,
10
h i T
P = 0 1 , and c = 296 0 488 , the set of Owen points is equal to the core.
This page intentionally left blank
16
C hap te r
Minimizing trimloss when cutting
cardboard
Overview
Cutting stock problems arise, among others, when materials such as paper, cardboard, and
textiles are manufactured in rolls of large widths. These rolls have to be cut into subrolls of
smaller widths. It is not always possible to cut the rolls without leftovers. These leftovers
are called trimloss. In this section we will discuss a cutting stock problem where rolls of
cardboard need to be cut such that the trimloss is minimized. This type of trimloss problem
is one of the oldest industrial applications of Operations Research. The solution algorithm
of Paul C. Gilmore (born 1925) and Ralph E. Gomory (born 1929), published in 1963,
will be discussed here.
513
514 C h a p t e r 1 6 . M i n i m i z i n g t r i m l o s s w h e n c u t t i n g ca r d b oa r d
Trimloss
22.5
Standard rolls are cut as follows. First, the knives of the cutting machine are set up in a
certain position. Next, the cardboard goes through the cutting machine as it is unrolled.
This generates a cutting pattern of the cardboard in the longitudinal direction. Once the
subrolls have attained the desired length, the standard roll is cut off in the perpendicular
direction. An example is depicted in Figure 16.1.
Since the knives of the cutting machine can be arranged in any position, standard rolls can be
cut into subrolls in many different ways. We consider two cutting patterns to be equal if they
both consist of the same combination of demanded subrolls. Table 16.2 lists thirteen cutting
patterns for the case described in Table 16.1. So, for example, cutting pattern 7 consists of
cutting a roll into two subrolls of 128.5 cm and one subroll of 100.5 cm (resulting in 22.5
cm of trimloss); this cutting pattern 7 is the pattern depicted in Figure 16.1. The orders are
labeled i = 1, . . . , m, and the cutting patterns are labeled j = 1, . . . , n. In the case of
Table 16.2, we have that m = 6; the number n of cutting patterns is certainly more than
the thirteen listed in this table.
For i = 1, . . . , m, and j = 1, . . . , n, define:
The values of the entries aij are given in the body of Table 16.2, and the values of the bi ’s
are the entries in the second column of Table 16.1. The aij ’s are integers, and the bi ’s are
not necessarily integers. The columns A1 , . . . , An (∈ Rm ) of the matrix A = {aij } refer
T
to the various feasible cutting patterns. Define b = b1 . . . bm .
with z ∗ the optimal objective value of model (16.1). In this definition the trimloss is ex-
pressed in cm units; the company expresses trimloss in tons of cardboard. Since
n m
! n
! m
X X X X
min 380 xj − Wi bi = 380 min xj − Wi bi ,
j=1 i=1 j=1 i=1
516 C h a p t e r 1 6 . M i n i m i z i n g t r i m l o s s w h e n c u t t i n g ca r d b oa r d
128.5 cm 128.5 cm
100.5 cm 100.5 cm
minimizing the trimloss is equivalent to minimizing the number of standard rolls needed
for the demanded production. Note that any cardboard that remains on the roll, i.e., that is
not cut, is not counted as trimloss, because this remaining cardboard may still be used for a
future order.
Input: Model (16.1), with an order package, containing the amount of demanded
subrolls with the corresponding widths.
Output: An optimal solution of model (16.1).
I Step 0: Initialization. Choose an initial full row rank matrixA(1) , of which the
columns correspond to cutting patterns. For instance, take A(1) = Im .
Go to Step 1.
I Step 1: Simplex algorithm step. Let A(k) , k ≥ 1, be the current technology ma-
trix (after k iterations of the Gilmore-Gomory algorithm), of which the
columns correspond to cutting patterns; let J(k) be the index set of the
1 6 . 2 . G i l m o r e - G o m o r y ’s s o lu t i o n a l g o r i t h m 517
I Step 0. The initial matrix can always be chosen such that it contains a nonsingular
(m, m)-submatrix, for instance Im . This takes care of the feasibility of model (Pk ) for
k = 1. In order to speed up the calculations, A(1) can be chosen such that it contains
many good cutting patterns.
I Step 1. In this step, the original model (16.1) is solved for the submatrix A(k) of the
‘cutting pattern’ matrix A. Recall that A exists only virtually; the number of columns
(i.e., cutting patterns) is generally very large. Any basis matrix B(k) , corresponding to
an optimal solution of (P k ), is a basis
matrix in A, but need not be an optimal basis
matrix of A. Let A ≡ B(k) N(k) . The objective coefficients of the current simplex
tableau corresponding to B(k) are zero (by definition), and the objective coefficients
vector corresponding to N(k) is:
Since B(k) is optimal for A(k) , it follows that, for each σ ∈ BI , the current objective
coefficient c̄σ corresponding to A(k) is nonnegative, i.e., c̄σ = 0. If c̄σ ≥ 0 for all
σ ∈ N I , then B(k) is optimal for (16.1). We try to find, among the possibly millions of
c̄σ ’s with σ ∈ N I , one with a negative value. To that end, we determine:
m
!
X
min c̄σ = min 1 − (1 (B ) )i aiσ = 1 − max 1T (B(k) )−1 Aσ ,
T (k) −1
σ∈N I σ∈N I
i=1
with Aσ the σ ’th (cutting pattern) column of A. So the problem is to determine a cutting
pattern Aσ with a smallest current objective coefficient c̄σ . Hence, the entries of Aσ
have to be considered as decision variables, say u = Aσ , in the following model:
I Step 2. Substituting 1T (B(k) )−1 = yT in (K), the model (Kk ) follows. It now follows
immediately from the remarks in Ad Step 1 that an optimal solution of the knapsack
model (Kk ) is some column of the matrix A. This column is denoted by u(k) , and the
optimal objective value of (Kk ) is denoted by αk .
I Step 3. The smallest current nonbasic objective coefficient is 1−αk . Hence, if 1−αk ≥
0, then all current nonbasic objective coefficients are nonnegative, and so an optimal
solution of (16.1) has been reached; see Theorem 3.5.1. If 1 − αk < 0, then the current
(k)
matrix
(k) A(k) is augmented with the column u(k) , and the process is repeated for A(k+1) =
A u instead of A(k) . Obviously, u(k) is not a column of A(k) , otherwise the
model (Pk ) was not solved to optimality.
The knapsack model (Kk ) is an extension of the (binary) knapsack problem as formulated in
Section 7.2.3. The binary knapsack problem formulated in Section 7.2.3 has {0, 1}-variables,
1 6 . 3 . C a l c u l at i n g a n o p t i m a l s o lu t i o n 519
whereas model (Kk ) has nonnegative integer variables. Such a problem is called a bounded
knapsack problem. The general form of a bounded knapsack problem is:
max c1 x1 + . . . + cn xn
s.t. a1 x1 + . . . + an xn ≤ b
xi ≤ ui for i = 1, . . . , n
x1 , . . . , xn integer,
with n the number of objects, ci (≥ 0) the value obtained if object i is chosen, b (≥ 0)
the amount of an available resource, ai (≥ 0) the amount of the available resource used
by object i, and ui (≥ 0) the upper bound on the number of objects i that are allowed
to be chosen; i = 1, . . . , n. Clearly, the binary knapsack problem is a special case of the
bounded knapsack problem, namely the case where ui = 1 for i = 1, . . . , n. A branch-
and-bound algorithm for solving {0, 1}-knapsack problems has been discussed in Section
7.2.3. In Exercise 16.4.2, the reader is asked to describe a branch-and-bound algorithm for
the bounded knapsack problem.
min x1 + . . . + x13
s.t. 2x1 + 2x7 + 2x8 + 2x9 + 2x10 + x11 + x12 = 20
3x2 + x11 + x12 = 9
3x3 + x7 + x11 = 22
4x4 + x8 = 18
4x5 + x9 = 56
5x6 + x10 + 5x13 = 23
x1 , . . . , x13 ≥ 0.
An optimal solution can be calculated using a computer package: x1 = x8 = x9 =
x10 = x11 = x12 = x13 = 0, x2 = 3, x3 = 4, x4 = 4.5, x5 = 14, x6 = 4.6,
x7 = 10, with optimal objective value z = 40.1. The knapsack problem to be solved in
the next step uses as objective coefficients the optimal dual values of the current constraints:
y1 = y2 = y3 = 0.3333, y4 = y5 = 0.25, y6 = 0.20. (Usually, these optimal dual
values are reported in the output of a linear optimization computer package.)
520 C h a p t e r 1 6 . M i n i m i z i n g t r i m l o s s w h e n c u t t i n g ca r d b oa r d
Optimal solution
3.027 4.676 14.973 4.216 9.000 0.351
Order Cutting pattern Optimal dual value
1 0 1 1 0 0 1 0.3514
2 0 0 0 0 1 0 0.3243
3 0 0 0 3 1 1 0.2703
4 1 0 1 0 0 0 0.2162
5 1 3 2 0 1 0 0.2162
6 3 0 0 1 1 2 0.1892
In the next iteration, Step 1 and Step 2 are repeated for the matrix A(2) . The algorithm
stops at iteration k if the optimal objective value αk of model (Kk ) is ≤ 1. It is left to
the reader to carry out the remaining calculations leading to an optimal solution. With
our calculations, the Gilmore-Gomory algorithm performed ten iterations. The resulting
optimal solution is listed in Table 16.3.
The order in which the six cutting patterns (the six columns in Table 16.3) are actually
carried out is left to the planning department of the company; they can, for instance, take
into account the minimization of the number of adjustments of the knives. The first decision
variable in Table 16.3 has the value 3.027. If this is the first pattern that will be cut from
standard rolls, then the number 3.027 means that three standard rolls are needed plus a
0.027 fraction of a standard roll. This last part is very small, and may not be delivered to
the customer. The reason for this is the following. On average, standard rolls contain about
1,000 meters of cardboard. This means that this 0.027 fraction is approximately 27 meters.
The margin in an order is usually about 0.5%, i.e., the customer may expect either 0.5% less
or 0.5% more of his order. So, BGH may decide to round down the number of rolls cut
according to the first pattern to 3, and round up the number of rolls cut according to the
third pattern to 15. Whether or not this change is acceptable depends on how much 27
meters is compared to the size of the affected orders. The sizes of the affected orders are:
I order 1: 20 × 1.285 × 1,000 = 25,700 m2 cardboard,
I order 5: 56 × 0.820 × 1,000 = 45,920 m2 cardboard, and
16 . 4 . E xe rc i se s 521
For all three orders, the 27 meters is certainly less than 0.5%, and so the customers of orders
1, 5, and 6 can expect to receive a slightly different amount than what they ordered. The
price of the order is of course adjusted to the quantity that is actually delivered.
Note that the optimal solution is nondegenerate. Hence, the optimal dual values in the
rightmost column of Table 16.3 are the shadow prices for the six constraints. It may be
noted that all shadow prices are positive. This means that if the number of demanded
subrolls in an order increases by one, then the number of needed standard rolls increases
with the value of the shadow price. In Exercise 16.4.1, the reader is asked to show that the
shadow prices are nonnegative.
The company BGH may use shadow prices to deviate (within the set margin of 0.5%)
from the demanded orders. Consider for instance order 4 in which eighteen subrolls of
width 86.5 cm are demanded. The shadow price of this order is 0.2162. This means that
if one more subroll is produced, then a 0.2162 fraction of a standard roll will be needed
(assuming that the other orders remain the same). This a 0.2162 fraction of a standard roll is
0.2162 × 3.80 × 1,000 = 821.56 m2 . However, one subroll of width 86.5 cm consists of
0.865 × 1,000 = 865.00 m2 of cardboard. So producing more of order 4 is profitable, and
the customer of order 4 may expect 0.5% more cardboard. In case this additional 0.5% is
produced and delivered, the order package has actually been changed, and so the model
has to be solved for the changed order package (with 18.09 instead of 18 for order 4). The
shadow prices will change, and based on these new shadow prices it can be decided whether
or not another order will be decreased or increased. In practice, these additional calculations
are not carried out; based on the shadow prices corresponding to the original order package
some orders are slightly decreased and some increased if that is profitable for the company.
16.4 Exercises
Exercise 16.4.1. Consider the cutting stock problem as described in this chapter.
(a) Show that model (16.1) is equivalent to a model where the ‘=’ signs of the restrictions
are replaced by ‘≥’ signs.
(b) Show that the constraints of model (16.1) correspond to nonnegative optimal dual val-
ues.
(a) Generalize Theorem 7.2.1 to the case of the bounded knapsack problem.
(b) Explain how the branch-and-bound algorithm of Section 7.2.3 can be generalized to
the case of the bounded knapsack problem.
(c) Use the branch-and-bound algorithm described in (b) to solve the following model:
Exercise 16.4.3. Use a computer package to solve the model in Exercise 16.4.2(c).
Exercise 16.4.4. The factory PAPYRS produces paper in standard rolls of width 380 cm.
For a certain day, the order package is as follows:
These orders need to be cut from the standard rolls with a minimum amount of trimloss.
(a) Design an LO-model that describes this problem, and determine all possible cutting
patterns.
(b) Solve the problem by means of the Gilmore-Gomory algorithm.
(c) What are the shadow prices of the demand constraints? One of the shadow prices
turns out to be zero; what does this mean? How much can the corresponding order be
increased without increasing the number of subrolls needed?
(d) Suppose that the customers want subrolls with the same diameter as the standard rolls.
Therefore, the optimal solution needs to be integer-valued. Solve this problem by
means of an ILO-package. Are all cutting patterns necessary to determine this integer
optimal solution? Show that the problem has multiple optimal solutions.
Exercise 16.4.5. For the armament of concrete, the following is needed: 24 iron bars
with a length of 9.50 m, 18 bars of 8 m, 44 bars of 6.20 m, 60 bars of 5.40 m, and 180
bars of 1.20 m. All bars have the same diameter. The company that produces the bars has
a large inventory; all these bars have a standard length of 12 m. The problem is how to cut
the demanded bars out of the standard ones, such that the waste is minimized.
(a) Formulate this problem in terms of a mathematical model.
(b) Discuss the differences with the cutting stock problem discussed in this chapter.
(c) Determine an optimal solution, and the percentage of trimloss.
17
C hap te r
Off-shore helicopter routing
Overview
This case study deals with the problem of determining a flight schedule for helicopters
to off-shore platform locations for exchanging crew people employed on these platforms.
The model is solved by means of an LO-model in combination with a column generation
technique. Since the final solution needs to be integer-valued, we have chosen a round-off
procedure to obtain an integer solution.
523
524 C h a p t e r 1 7. O f f - s h o r e h e l i c o p t e r r o u t i n g
N
# Coord. # Coord. # Coord.
1 1 35 18 76 20 35 61 32
Norway 2 2 36 19 73 21 36 65 32
3 21 10 20 77 23 37 69 35
4 43 21 21 76 24 38 71 28
5 46 29 22 84 25 39 79 32
North sea
6 27 35 23 81 27 40 35 23
7 29 37 24 63 18 41 37 24
8 58 20 25 70 18 42 38 22
Denmark
9 57 29 26 66 19 43 68 10
10 58 33 27 68 19 44 65 52
11 57 41 28 65 20 45 63 53
12 68 49 29 67 20 46 75 7
13 72 47 30 63 21 47 73 8
14 73 47 31 64 21 48 75 12
Great Britain Netherlands Germany 15 71 49 32 67 22 49 55 55
16 75 51 33 69 28 50 60 55
Belgium 17 72 20 34 68 30 51 58 59
France
Figure 17.1: Continental shelves of the North Sea. Table 17.1: Coordinates of the platforms (in units
of 4.5 km).
There are two types of crews working on the platforms, namely regular crew which are
employees that have their regular jobs on the platforms, and irregular crew such as surveyors,
repair people, or physicians. The regular crew on the Dutch owned platforms work every
other week.
The helicopters that fly from the airport to the platforms transport new people, and heli-
copters that leave the platforms or return to the airport transport the leaving people. The
number of people leaving the platform does have to be equal to the number of newly ar-
riving people. However, in this case study, we will assume that each arriving person always
replaces a leaving person, and so the number of people in a helicopter is always the same
during a flight. A crew exchange is the exchange of an arriving person by a leaving person.
In the present study, the transportation is carried out by helicopters with 27 seats; the rental
price is about $6,000 per week. We assume that there are enough helicopters available to
carry out all demanded crew exchanges.
North Sea
Airport
The helicopters are not allowed to fly at night, and the amount of fuel that can be carried
limits the range of the helicopter. It is assumed that the range is large enough to reach any
platform and to return to the airport. One major difference from the usual VRP is that the
demanded crew exchanges of a platform need not be carried out by one helicopter. This
is an example of the so-called split delivery VRP, in which the demand of one customer can
be delivered in multiple parts. VRPs face the same computational difficulty as the traveling
salesman problem (see Section 4.8), and so optimal solutions are usually not obtainable
within a reasonable amount of time. We will design a linear optimization model that can be
solved by means of so-called column generation; the obtained solution is then rounded off
to an integer solution, which is in general not optimal but which may be good enough for
practical purposes.
Informally, this inequality states that flying straight from A to C is shorter (or, at least, not
longer) than flying from A to B , and then to C . Clearly, this inequality holds in the case
of helicopter routing.
The helicopters have a capacity C , which is the number of seats (excluding the pilot) in the
helicopter, and a range R, which is the maximum distance that the helicopter can travel in
one flight. We assume that all helicopters have the same capacity and the same range. In
the case study, for the capacity C of all helicopters we take C = 23, so that four seats are
always left available in case of emergencies or for small cargo. For the helicopter range, we
take R = 200 (×4.5 km).
T
A flight f is defined as a vector wf = w1f . . . wN f , where the value of wif indicates
Let F = {f1 , . . . , fF } be the set of all feasible flights. Clearly, for a set of locations
together with the airport there are usually several possibilities of feasible flights, and even
several possibilities of shortest feasible flights. In fact, the number F of feasible flights grows
exponentially with the number of platform locations, and the range and capacity of the
helicopter (see Exercise 17.10.2).
Let j ∈ {1, . . . , F }. With slight abuse of notation, we will write wij instead of wifj for
each platform Pi and each flight fj . Similarly, we will write dj instead of dfj .
A feasible flight schedule is a finite set S of feasible flights such that the demand for crew
exchanges on all platforms is satisfied by the flights in S . To be precise, S is a feasible flight
schedule if:
X
wij = Di for i = 1, . . . , N.
fj ∈S
It may be necessary to use the same flight multiple times, so that it is possible that the same
flight fj appears multiple times in S . Such a set in which elements are allowed to appear
multiple times is also called a multiset.
Example 17.3.1. Let P = {P1 , P2 , P3 , P4 }, W = {15, 9, 3, 8}, and C = 10. Let,
S = {f1 , f2 }, where flight f1 visits the platforms P1 (10), P2 (2), P3 (3), and f2 visits the
528 C h a p t e r 1 7. O f f - s h o r e h e l i c o p t e r r o u t i n g
platforms P1 (5), P2 (7), P4 (8). The numbers in parentheses refer to the number of crew exchanges
T T
during that flight. That is, we have w1 = w11 w21 w31 w41 = 10 2 3 0 and w2 =
T T
w12 w22 w32 w42 = 5 7 0 8 . Then S is a feasible flight schedule. Notice that F is
much larger than S , because, for example, it contains all flights consisting of the same platforms as f1 ,
but with smaller numbers of crew exchanges. It can be calculated that, in this case, F = |F| = 785,
assuming there is no range restriction, i.e., R = ∞ (see also Exercise 17.10.2).
It is assumed that the cost of operating a helicopter is a linear function of the total traveled
distance. Therefore, we are looking to solve the problem of finding a feasible flight schedule
which minimizes the total traveled distance of the flights.
The number of demanded crew exchanges on a certain platform may exceed the capacity of
the helicopter. In that case a helicopter could visit such a platform as many times as needed
with all seats occupied until the demand is less than the capacity. However, in general, this
strategy does not lead to an optimal feasible flight schedule, since the helicopters have a
limited range. This will become clear from the following simple example.
Example 17.3.2. Consider three platforms: P1 with demand 1, P2 with demand 18, and P3
with demand 1; see Figure 17.3. Suppose that the capacity of the helicopter is 10. If the helicopter first
flies to P2 and performs 10 crew exchanges there, this leaves 8 crew members to be exchanged at P2 ,
and 1 on each of P1 and P2 . Thus, the flight schedule becomes (see the left diagram in Figure 17.3):
Airport → P2 (10) → Airport,
Airport → P3 (1) → P2 (8) → P1 (1) → Airport.
The numbers given in parentheses are the numbers of crew exchanges that are performed at the corre-
sponding platforms. Now suppose that the pairwise distances between each of P1 , P2 , P3 , and the
airport is 1, and the range of the helicopter is 3. Then, the flight Airport → P3 → P2 → P1 →
Airport exceeds the range of the helicopter. Therefore, the helicopter needs multiple flights to exchange
the crews remaining after the flight Airport → P2 → Airport. The flight schedule becomes (see the
middle diagram in Figure 17.3):
P1 1 P1 1 P1 1
1 1 1
8 9
Airport P2 18 Airport P2 18 Airport P2 18
10 10 9
8
1
1 1
P3 1 P3 1 P3 1
Figure 17.3: The limited range of the helicopters. The numbers next to the nodes are the demands. The
different flights are drawn using the solid, dashed, and dotted arrows. The number of crew
exchanges are next to each arc.
1 7. 4 . I L O f o r m u l at i o n 529
Because of the computational difficulty of the off-shore transportation problem (and VRPs
in general), we cannot hope to design an algorithm that solves the problem within a reason-
able amount of time. We therefore resort to a heuristic algorithm that (hopefully) produces a
solution whose objective value is reasonably close to the optimal objective value, and which
runs in an acceptable amount of computing time. In the remainder of this chapter, we
will formulate an integer linear optimization model that solves the off-shore transportation
problem. We will describe an algorithm to solve the LO-relaxation of this model, and show
how an optimal solution of the LO-relaxation can be rounded off to find a feasible solution
for the off-shore transportation problem. We will then show some computational results to
assess the quality of the solutions produced by this procedure.
The objective function is the sum of dj xj over all feasible flights fj ∈ F , where dj xj is the
total length of the flight fj (any flight fj may be carried out several times). The total number
of passengers with destination platform Pi on all helicopter flights in the schedule has to
be equal to the number Di of demanded crew exchanges of platform Pi . The technology
matrix W = {wij } of the model has N rows and F columns; one can easily check that
W contains the identity matrix IN , and hence it has full row rank.
Although the formulation (FF) may look harmless, it is in fact a hard model to solve. Not
only are we dealing with an ILO-model, but since the value of F is in general very large,
the number of columns of the matrix W is very large. In fact, W may be too large to
fit in the memory of a computer. This means that the usual simplex algorithm cannot be
applied to solve even the LO-relaxation of model (FF) in which the variables do not need
to be integer-valued. Even if the matrix W fits in the memory of a computer, finding an
entering column for the current basis may take an exorbitant amount of computer time.
We will take the following approach to solving model (FF). We will first show how to solve
the relaxation of model (FF), which we denote by (RFF), using a column generation procedure,
as in Chapter 16. This procedure starts with a feasible basic solution of (RFF) and in each
iteration determines an ‘entering column’ without explicitly knowing the matrix W. In
Section 17.7 a round-off procedure is formulated, which derives integer solutions from the
solutions of model (RFF).
We will explain the method by means of model (RFF). For this model, finding an initial
feasible basic solution is straightforward. Indeed, an initial feasible basic solution requires N
basic variables; for i = 1, . . . , N , we just take the flight fji that performs min{C, Di } crew
exchanges at platform Pi and immediately flies back to the airport. Let W(1) be the basis
matrix corresponding to this feasible basic solution. Note that W(1) is a diagonal matrix
with min{C, Di } (i = 1, . . . , N ) as the diagonal entries, and hence it is invertible. Note
also that W(1) is a submatrix of W = {wij }.
Next consider the k ’th iteration step of the procedure (k = 1, 2, . . .). At the beginning of
the k ’th iteration, we have a submatrix W(k) of W, which is – during the next iteration
step – augmented with a new column of W. This entering column is generated by means
of solving a knapsack problem plus a traveling salesman problem.
Let B be an optimal basis matrix of W(k) in model (RFF), with W replaced by W(k) ,
and let y = (B−1 )T dBI be an optimal dual solution corresponding to B of this model;
see Theorem 4.2.4. Notice that B is also a basis matrix for model (RFF). Therefore, we
may write, as usual, W ≡ B N . Now consider any nonbasic variable xj (corresponding
PN
to flight fj ). The current objective coefficient corresponding to xj is dj − i=1 wij yi ;
see Exercise 17.10.1. Suppose that we can determine a column with the smallest current
objective coefficient, c∗ . If c∗ < 0, then we have determined a column corresponding to a
variable xj that has a negative current objective coefficient and we augment W(k) with this
column. If, however c∗ ≥ 0, then we conclude that no such column exists, and hence an
optimal solution has been found. We can indeed determine such a column with the smallest
current objective coefficient, by solving the following model:
N
X
c∗ = min dS(w) − y i wi
i=1
N
X
s.t. wi ≤ C (CG)
i=1
wi ≤ Di for i = 1, . . . , N
dS(w) ≤ R,
wi ≥ 0 and integer for i = 1, . . . , N,
where S(w) = {i | wi > 0} and dS(w) is the length of a shortest route of a flight visit-
ing platforms Pi in S(w), starting and ending at the airport. In this model, the decision
variables are the wi ’s (i.e., the number of crew exchanges to be performed at each platform
that is visited). Recall that the yi ’s are known optimal dual values obtained from the basis
matrix B, and they are therefore parameters of model (CG). This model is a well-defined
optimization model, but it is by no means a linear optimization model or even an integer
linear optimization model. In fact, model (CG) is a nonlinear model, since it contains the
term dS(w) which is defined as the shortest total traveled distance of a flight from the airport
to all platforms in S and back to the airport. This means that dS (w) is the optimal objective
value of a traveling salesman problem.
532 C h a p t e r 1 7. O f f - s h o r e h e l i c o p t e r r o u t i n g
Assume for the time being that we can solve model (CG). The model determines a column
∗ T
∗
of W. The optimal objective value c∗ is the current objective coefficient
w1 . . . wN
of this column. If c∗ < 0, then this vector is not a column of W(k) , because otherwise B
would have been an optimal basis matrix in model (RFF) with W replaced by W(k) . This
means that we find a column of W that is not in W(k) . We therefore construct W(k+1) by
∗ T
augmenting W(k) with the column vector w1∗ . . . wN . Notice that if the corresponding
∗
column index (i.e., the flight index) is denoted by j , then wij ∗ = wi∗ .
In model (CG), the minimum of all current objective coefficients is determined, including
the ones that correspond to basic variables. Since the current objective coefficients corre-
sponding to the basic variables all have value zero, the fact that the minimum over of all
current objective coefficients is determined does not influence the solution. Recall that, if
all current objective coefficients dj − i wij yi in the current simplex tableau are nonnega-
P
tive, which means that the optimal objective value of (CG) is nonnegative, then an optimal
solution in (RFF) has been reached. Therefore, if c∗ ≥ 0, then no column with negative
current objective coefficient exists, and the current feasible basic solution is optimal.
The question that remains to be answered is: how should model (CG) be solved? Before
we can solve model (CG), note that the model can be thought of a two-step optimization
model. In the first step, we consider all possible subsets S of P and compute, for every
such subset S , the corresponding optimal values of the wi ’s, and the corresponding optimal
objective value c(S). Next, we choose a set S for which the value of c(S) is minimized
(subject to the constraint dS ≤ R). To be precise, for each subset S of P , we define the
following model:
N
X
c(S) = dS − max yi wi
i=1
XN
s.t. wi ≤ C (CGS )
i=1
wi ≤ Di for i = 1, . . . , N
wi =0 if Pi 6∈ S fori = 1, . . . , N
wi ≥ 0 and integer for i = 1, . . . , N.
Now, (CG) is equivalent to solving the following model:
c∗ = min c(S)
S⊆P\{∅} (CG∗ )
s.t. dS ≤ R,
where the minimum is taken over all nonempty subsets S of P . In order to solve (CG∗ ), we
may just enumerate all nonempty subsets of P , solve model (CGS ) for each S , and choose,
among all sets S that satisfy dS ≤ R, the one that minimizes c(S). In fact, it turns out that
we do not need to consider all subsets S . Thus, to solve model (CG), we distinguish the
following procedures:
1 7. 5 . C o lu m n g e n e rat i o n 533
(A) For each nonempty subset S of the set P of platforms, we formulate and solve a
traveling salesman problem and a knapsack problem that solves model (CGS ).
(B) We design a procedure for generating subsets S of P , while discarding – as much as
possible – subsets S that only delay the solution procedure.
We will discuss (A) and (B) in the next two subsections.
An optimal solution wi∗ (i ∈ S ) of (KPS ) can easily be calculated in the following way.
First, sort the values of the yi ’s with i ∈ S in nonincreasing order: say yj1 ≥ yj2 ≥ . . . ≥
yjN ≥ 0. Then, take wj1 = min{C, Dj1 }. If C ≤ Dj1 , then take wj1 = C , and
wj2 = . . . = wjN = 0. Otherwise, take wj1 = Dj1 and wj2 = min{C − Dj1 , Dj2 },
and proceed in a way similar to the calculation of wj1 . In terms of a knapsack problem,
this means that the knapsack is filled with as much as possible of the remaining item with
the highest profit. See also Section 7.2.3. Note that this optimal solution of (KPS ) is
integer-valued.
Model (CG) can be solved by individually considering each possible subset S , and solving for
this fixed S the knapsack problem (KPS ) (and therefore model (CGS )). This is certainly a
time-consuming procedure, since the number of nonempty subsets S of P is 2N − 1. Since
N = 51 in the present case study, this amounts to roughly 2.2 × 1015 subsets. In Section
534 C h a p t e r 1 7. O f f - s h o r e h e l i c o p t e r r o u t i n g
17.5.2, we will present a more clever method, that excludes a large number of subsets S
from consideration when solving (KPS ).
C , then all lex-supersets of S can be excluded from consideration. The reason is that,
when adding a new platform, say Pt , to the current S , then the optimal value of wt is
zero, because the value of yt is less than the values of all yi ’s with Pi ∈ S . Hence, Pt is
not visited during the flight corresponding to S ∪ {t}. By excluding all lex-supersets
of S , we have created the advantage that only platform subsets are considered for
which the optimal values of the wi ’s in (KPS ) are strictly positive, and so the traveling
salesman tour for such sets does not visit platforms that have zero demand for crew
exchanges.
If c(S) ≥ 0 for all S , then an optimal solution of model (RFF) with technology matrix
W(k) is also an optimal solution of the original model (RFF). It turns out that the calcu-
lations can be speeded up, when the current matrix W(k) is augmented with several new
columns for which c(S) < 0 at once.
1 7. 5 . C o lu m n g e n e rat i o n 535
The solution procedure of model (RFF) can now be formulated as follows. For the subma-
trices W(k) of W (k = 1, 2, . . .), define Jk as the set of flight indices corresponding to
the columns of W(k) .
Input: Models (RFF) and (CG); coordinates of platforms, the vector of demanded
T
numbers of crew exchanges D1 . . . DN , the range R, and the capacity
C.
Output: An optimal solution of model (RFF).
I Step 0: Initialization.
Let W(1) = AW , with AW = {aij } the diagonal matrix with aii =
min{Di , C}. Calculate df for each f ∈ J1 . Go to Step 1.
I Step 1: Simplex algorithm step.
(k)
Let W(k) = {wij }, k ≥ 1, be the current ‘flight’ matrix (a submatrix of
the virtually known matrix W). Solve the LO-model:
X
min dj xj
j∈Jk
(k)
(F Fk )
X
s.t. wij xj = Di for i = 1, . . . , N
j∈Jk
xj ≥ 0 for j ∈ Jk .
h iT
Let y(k) = y1(k) . . . yN
(k)
be an optimal dual solution corresponding
to the current optimal basis matrix of model (F Fk ). Go to Step 2.
I Step 2: Column generation step.
Label the platforms from 1, . . . , N according to nonincreasing values of
(k)
yi . Using this labeling, order all platform subsets S according to the
lexicographical ordering.
(k)
Determine the optimal objective value c(S ∗ ) = dS ∗ − i∈S ∗ yi wi∗ of
P
W(k+1) ≡ W(k) w∗ ,
Number of platforms: 51
Coordinates of the platforms: see Table 17.1
Capacity of the helicopter: 23
Range of the helicopter: 200 (units of 4.5 km)
Table 17.2: Demanded crew exchanges. (# = Platform index, Exch. = Number of demanded crew
exchanges)
Note that this algorithm always returns an optimal solution, because a feasible solution
is constructed in Step 0 (meaning that the model is feasible), and the model cannot be
unbounded because the optimal objective value cannot be negative.
It has been mentioned already that the traveling salesman problem in Step 2 can be solved
by considering all tours and picking a shortest one. If the sets S become large, then more
advanced techniques are needed, since the number of tours becomes too large. Recall that
if |S| = n, the number of tours is 12 n!. Also, recall that the calculations are speeded up
when, in each iteration, multiple columns for which c(S) < 0 are added to W(k) .
∗
j xj Platform Crew Distance
visits exchanges (×4.5 km)
∗
1 1.000 2 20 72.111
2 0.172 3, 40, 7, 6 3, 8, 8, 4 104.628
3 0.250 4, 42 12, 11 96.862
4 1.000 4, 5 9, 14 110.776
5 0.182 8, 30 4, 19 132.858
6 0.909 8, 30 8, 15 132.858
7 0.828 3, 42, 40 3, 16, 4 89.111
8 1.000 9, 35, 10 1, 14, 8 138.846
9 1.261 9 23 127.906
10 0.217 11, 49 8, 15 162.138
11 0.783 11, 44, 45 8, 5, 10 168.380
12 0.739 12, 15, 13 8, 7, 8 175.034
13 0.298 13, 14, 16 7, 8, 8 182.152
14 0.298 14, 16, 12 8, 8, 7 182.398
15 0.404 14, 16, 15 8, 8, 7 182.033
16 1.000 17, 19 10, 13 152.101
17 1.000 18, 20, 19 10, 12, 1 162.182
18 1.348 19 23 151.921
19 1.304 21 23 159.399
20 0.800 24, 31 10, 13 136.041
21 0.200 24, 26, 28 10, 9, 4 138.105
22 1.000 26, 27, 25 5, 6, 12 145.194
23 0.550 26, 29, 28 4, 15, 4 140.102
24 0.250 28, 32, 31 4, 14, 5 141.355
25 0.750 29, 32 9, 14 142.441
26 0.272 30, 31 7, 16 134.765
27 0.522 33, 34, 36 7, 11, 5 152.756
28 0.217 33, 34 12, 11 151.024
29 0.478 33, 34, 36 12, 6, 5 152.765
30 1.000 37, 39, 23, 22 5, 13, 2, 3 184.442
31 0.828 40, 7, 6 11, 8, 4 104.145
32 1.139 40 23 83.762
33 1.739 41 23 88.204
34 1.739 43 23 137.463
35 0.217 44, 45 5, 18 167.805
36 0.435 45, 50 19, 4 167.328
37 0.261 46, 47 18, 5 150.999
38 0.261 46, 48 19, 4 156.280
39 0.739 48, 46, 47 4, 14, 5 156.627
40 0.435 49, 51 19, 4 165.516
41 0.565 49, 51, 50 15, 4, 4 168.648
The total traveled distance is 3,903.62 (×4.5 km)
Table 17.3: Noninteger optimal solution of model (RFF). Flight 1 (marked with an asterisk) is the only
one that does not use the full capacity of the helicopter.
538 C h a p t e r 1 7. O f f - s h o r e h e l i c o p t e r r o u t i n g
Table 17.4: Dual optimal values (# = Platform number, Dual = dual value).
visited during the corresponding flight, while the fourth column contains the performed
crew exchanges. The last column contains the total traveled distance during that flight.
Platform 4, for instance, is visited by flight f = 3 and by flight f = 4; the demanded
number of crew exchanges satisfies 12x∗3 + 9x∗4 = 12 (see Table 17.2).
Table 17.4 lists the values of our optimal dual solution. A dual value gives an indication of
the price in distance units per crew exchange on the corresponding platform. For instance,
the crew exchanges for platform 22 have the highest optimal dual value, namely 11.377
km per crew exchange. This means that a crew exchange for platform 22 is relatively the
most expensive one. This is not very surprising, since platform 22 is far from the airport,
and the number of demanded crew exchanges is relatively low (namely, three); compare in
this respect flight number 7 in Table 17.3. Note that the platforms 1 and 38 do not demand
crew exchanges.
The optimal dual values of Table 17.4 give only an indication for the relative ‘price’ per crew
exchange, because they refer to the LO-relaxation, and not to an optimal integer solution
of the model.
Input: The coordinates of the platforms, the demanded crew exchanges, the heli-
copter range and the capacity.
Output: A feasible flight schedule.
T
The values of the right hand side vector d = D1 . . . DN of (RFF) are
updated in each iteration. Let (RFFd) denote the model with right hand
side values given by d.
I Step 1: Solve model (RFFd) by means of the ‘simplex algorithm with column gen-
eration’. If the solution of (RFFd) is integer-valued, then stop. Otherwise,
go to Step 2.
I Step 2: Choose, arbitrarily, a variable of (RFFd) with a positive optimal value, say
xj > 0.
If xj < 1, then set xj := dxj e = 1; if xj > 1, then set xj := bxj c.
The flight fj is carried out either dxj e or bxj c times, and the current
right hand side values Di are updated by subtracting the number of crew
exchanges carried out by flight fj . (The new LO-model now contains a
number of Di ’s whose values are smaller than in the previous model.)
If the new right hand side values Di are all 0, then stop; all crew exchanges
are performed. Otherwise, remove all columns from W that do not corre-
spond to feasible flights anymore; moreover, extend W with the diagonal
matrix AW with aii = min{Di , C} on its main diagonal.
Return to Step 1.
Table 17.5 lists the results of the round-off algorithm. For instance, flight f1 in Table 17.5
(this is flight 27 in Table 17.3) is Airport → P33 (7) → P34 (11) → P36 (5) → Airport; the
distance traveled is 152.7564 (×4.5 km).
540 C h a p t e r 1 7. O f f - s h o r e h e l i c o p t e r r o u t i n g
Table 17.5: A feasible flight schedule. The flights marked with an asterisk do not use the full capacity of
the helicopter.
The flight schedule in Table 17.5 is feasible: no flight distance exceeds the range of the heli-
copters and its capacities. Moreover, the capacity of the helicopters is not always completely
used. For instance, flight 4 visits only platform 2, where all 20 demanded crew exchanges
are performed, so three seats are not used.
Table 17.6: Comparing fifteen input instances for the round-off algorithm. Each instance was solved
sixteen times using the column generation method and the round-off procedure. The lower
bound is given by the optimal objective value of the LO-relaxation (RFF). The column ‘Dev.’
contains the relative deviation from the lower bound.
exchanges. We do not specify these instances here, but they are available online (see http:
//www.lio.yoriz.co.uk/). The C++ code that we used is also available on that website.
The helicopter range remains 200, and the capacity 23 seats. The results are listed in Table
17.6. The first column contains the labels of the 15 input instances. The second column
contains the lower bounds, which are the optimal solutions of model (RFF) in combination
with model (CG). For each instance we have applied the round-off algorithm sixteen times.
Columns three and four contain the best results (‘Best’) and the corresponding deviations
(in %) from the lower bound (‘Dev.’). Similarly, the fifth and the sixth columns contain
the average optimal objective values (‘Average’) and the corresponding deviations from the
lower bound (‘Dev.’). Finally, the seventh and the eighth columns contain the worst of the
sixteen results (‘Worst’) and the corresponding deviations from the lower bound (‘Dev.’).
It can be seen from Table 17.6 that the average integer solution deviates about 4 to 6% from
the lower bound noninteger solution, while all integer solutions are within 11% of this
lower bound. On the other hand the gap between the worst and the best solution is at most
7%. The conclusion is that the method developed in this section generates solutions that
are practically useful, in spite of the fact that they are usually not optimal. The fact that
the column generation procedure and the round-off procedure require a lot of computer
time makes the method suitable only when there is enough time available. In the case of
short term planning, caused for instance by sudden changes in the crew exchange demand,
a faster method is needed, and it may be necessary to resort to heuristics.
542 C h a p t e r 1 7. O f f - s h o r e h e l i c o p t e r r o u t i n g
∗ nf (C)
z (C)
9,000
Round-off solution Round-off solution
8,000 60
LO-relaxation Lower bound ( 650
C
)
7,000 50
6,000
40
5,000
4,000 30
3,000 20
10 15 20 25 30 C 10 15 20 25 30 C
∗
(a) Objective value z (b) Number of flights nf
Notice that choosing C = R = ∞ gives a lower bound on the optimal objective value
z ∗ (·). This case, however, is equivalent to finding an optimal traveling salesman tour through
all platforms, starting and finishing at the airport. We have used a TSP solver to compute
such an optimal tour. This tour, which has length 334.7 (×4.5 km), is shown in Figure
17.5. Since this is a lower bound on the optimal objective value z ∗ (C 0 ), we have that
z ∗ (C) ≥ 334.7 for all C . This means that relationship (17.1) is not valid for large values of
C.
1 7. 9 . S e n s i t i v i t y a na ly s i s 543
∗
z (R)
LO-relaxation
4,022
4,000
3,950
3,903
Figure 17.5: An optimal traveling salesman tour Figure 17.6: Sensitivity with respect to the
that visits each platform exactly once, helicopter range R.
starting and finishing at the airport.
We are also interested in how the number of flights that need to be carried out depends on
C . Figure 17.4(b) shows this dependence.
The total demand is 650. Since each flight has
capacity 23, this means that at least 65023
= 29 flights need to be carried out. Note that
the solution given in Table 17.5 consists of exactly
29 flights. In general, a lower bound on
the number of flights to be carried out is C . The dotted line in Figure 17.4(b) shows
650
this lower bound. It is clear from the figure that, for this particular instance, the number of
flights is usually close to the lower bound. Thus, the capacity of the helicopter is crucial for
the optimal objective value.
Now consider the sensitivity of the optimal solution with respect to the range of the heli-
copters. We first consider increasing the value of the range R, starting at R = 200. Let
z ∗ (R) denote the optimal objective value of the LO-relaxation (RFF) when setting the heli-
copter range to R. When increasing the value of R, the set of all possible flights F becomes
larger. So, we expect that z ∗ (R) is nonincreasing in R. Our calculations show that, in fact,
z ∗ (R) is constant for R ≥ 200, i.e., the optimal objective value of model (RFF) does not
change when increasing the value of R. Because round-off solutions for (FF) depend on
the optimal solution of (RFF), this means that increasing the value of R has no effect on
the round-off solutions of model (FF) either. This makes sense, because all flights except
flights that are carried out in the solutions are significantly shorter than 200 (×4.5 km); see
Table 17.3 and Table 17.5. This indicates that most flights are individually restricted not by
the limited range of the helicopters, but by their limited capacity. Therefore, it makes sense
that increasing the helicopter range will not enable us to choose longer flights, and hence
increasing the value of R has no impact on the schedule.
Next, we consider decreasing the value of the range R, again starting at R = 200. Note
that platform P16 has
p coordinates (75, 51), which means that the distance from the airport
to this platform is 752 + 512 ≈ 90.697 (×4.5 km). Since every flight that carries out
544 C h a p t e r 1 7. O f f - s h o r e h e l i c o p t e r r o u t i n g
a crew exchange
p at platform P22 has to travel at least twice this distance, it follows that for
R < 2× 752 + 512 ≈ 181.395 the model is infeasible. On the other hand, consider the
solution of the LO-relaxation given in Table 17.3. From this table, we see that the longest
flight is flight f30 , which has distance 184.442 (×4.5 km). Therefore, this solution is an
optimal solution for model (RFF) for all values of R with 184.442 ≤ R ≤ 200. This
means that decreasing the value of R, starting from 200, has no impact on the schedule,
except when 181.395 ≤ R ≤ 184.442. Figure 17.6 shows how the optimal objective
value depends on the value of R in this range. For 181.395 ≤ R ≤ 181.51, there is
a sharp drop of the optimal objective value, starting at z ∗ (181.395) = 4,022.62, down
to z ∗ (181.51) = 3,934.92. For R ≥ 181.51, the value of z ∗ (R) slowly decreases to
3,903.62.
17.10 Exercises
Exercise 17.10.1. Consider the relaxation (RFF) of model (FF) in Section 17.4. Show
PN
that dj − i=1 wij yi is the j ’th current nonbasic objective coefficient
corresponding to
some basis matrix B in W with y = (B−1 )T dBI , and W ≡ B N .
Exercise 17.10.2. Show that the number F of feasible flights may grow exponentially
with each of the following parameters individually:
(a) The number of platform locations.
(b) The helicopter range R.
(c) The helicopter capacity C .
Exercise 17.10.3. Consider the column generation procedure described in Section 17.5.
Let (RFFk) be model (RFF), but with W replaced by W(k) . Show that any optimal basis
matrix of model (RFFk) is an optimal basis matrix of model (RFF).
Exercise 17.10.4. Calculate the percentage of platform subsets that is excluded by the
procedures (B1) and (B2) in Section 17.5.2.
Exercise 17.10.5. Apply the following heuristics to determine feasible flight schedules for
the input data of Section 17.1.
(a) Nearest neighbor heuristic:
Start at the airport and go to the platform that is closest to the airport. Then go to
the next closest platform, and continue until either the capacity of the helicopter has
been reached or its range (taking into account that the helicopter has to return to the
airport). The heuristic stops when all demanded crew exchanges have been performed.
1 7. 1 0 . E x e r c i s e s 545
Overview
The catering service problem is a problem that arises when the services of a caterer are to be
scheduled. It is based on a classical paper by William Prager (1903–1980); see also Prager
(1956). The version discussed in this section will be formulated as a transshipment model
(see Section 8.3.1). It will be solved by means of the network simplex algorithm.
547
548 C h a p t e r 1 8 . Th e cat e r i n g s e rv i c e p r o b l e m
the laundry. At the end of each day, the company has to make four decisions concerning
the stock of dirty napkins:
I How many dirty napkins should be sent to the slow laundry service?
I How many dirty napkins should be sent to the fast laundry service?
I How many dirty napkins should be carried over to the next day?
I How many new napkins should be purchased?
The objective is to find a sequence of decisions such that the total cost is minimized. We
assume that the caterer has no napkins (clean or dirty) on the first day of the planning
period. Moreover, all demanded napkins can be purchased. A feasible sequence of decisions
is summarized in Table 18.2.
The figures in the ‘Laundered’ column of Table 18.2 refer to the number of napkins returning
from the laundry. For example, on day 5 there are four napkins returning from the laundry;
they were sent to the slow laundry service on day 1 (see column ‘Slow’). The fourteen
napkins that were sent to the fast laundry service on day 3 (see column ‘Fast’) also return on
day 5. Hence, there are eighteen napkins returning from the laundry on day 5. The column
‘Purchased’ indicates the number of purchased napkins, while the ‘Carry over’ column shows
how many dirty napkins are carried over to the next day. For example, at the beginning
of day 5 the company has a stock of seven dirty napkins since they were held on day 4;
during that day, eighteen napkins are used. So, at the end of day 5, the company has twenty-
five dirty napkins. The ‘Fast’ column shows that ten of these dirty napkins are sent to the
fast laundry service, while the ‘Carry over’ column indicates that fifteen dirty napkins are
carried over to the next day.
The cost associated with the decision sequence of Table 18.2 is $3 × 44 = $132 for buying
new napkins, $0.75 × 71 = $53.25 for fast cleaning, and $0.50 × 9 = $4.50 for slow
cleaning. So the total cost is $189.75. Note that at the end of the planning period the
caterer has 44 dirty napkins.
1 8 . 2 . Th e t ra n s s h i p m e n t p r o b l e m f o r m u l at i o n 549
Furthermore, define the set {P0 , . . . , P7 } of supply nodes or sources of napkins; these nodes
have the property that the number of ‘leaving’ napkins exceeds the number of ‘entering’
napkins. We define P0 as the source where enough new napkins, say 125, are available
during the whole planning period; these napkins can be supplied by P0 any day of the
planning period. For j ∈ {1, . . . , 7}, Pj may be thought of as the basket containing
the dirty napkins at the end of day j , and P0 as the company where the new napkins are
purchased. When the caterer decides to send fj napkins to the fast laundry service on day
j , then we can think of it as source Pj supplying these fj napkins as clean napkins on day
j + 2; when on day j the caterer sends sj napkins to the slow laundry service, then Pj
supplies sj clean napkins on day j + 4. Moreover, the caterer can decide to carry over hj
napkins to the next day; source Pj then transfers dirty napkins to Pj+1 . So for example, the
source P1 can supply clean napkins either on day 3 (if fast service is used), or on day 5 (if
slow service is used), or can transfer dirty napkins to P2 (if some dirty napkins are carried
over to the next day).
In addition to the sources, we define the set {Q0 , Q1 , . . . , Q7 } of demand nodes or sinks.
Sinks are defined as nodes where the number of ‘entering’ napkins exceeds the number of
‘leaving’ napkins. For j = 1, . . . , 7, let Qj denote the sink where the napkins are demanded
for day j . For reasons that will become clear later on, we introduce the sink Q0 . At the end
of the planning period all napkins in the network are transported to Q0 at zero cost.
We will now present a transshipment problem (see Section 8.3.1) formulation of the catering
service problem. Let G = (V, A) be the network with node set V = {P0 , P1 , . . . , P7 ,
Q0 , Q1 , . . . , Q7 } (the Pi ’s are the sources and the Qi ’s the sinks), and arc set A joining
pairs of nodes in V ; see Figure 18.1. Hence, |V| = 16, and |A| = 23.
550 C h a p t e r 1 8 . Th e cat e r i n g s e rv i c e p r o b l e m
Q1
p1 −23
Q2
p2
[3] −14
[0.75] f1
[3] p3 Q3 P1 23
[3] −19 h1 [0]
s1
[3] p4 [0.75] f2
125 P0 Q4 P2 14
[3] ]
[3]
p5 −21
[0 .50 s2 h2 [0]
[0.75] f3
[3] Q5 P3 19
p6
−18 .50] s3 h3 [0]
[0 f4
p7 Q6 [0.75] P4 21
−14 .50] h4 [0]
[0] [0 f5
Q7 [0.75] P5 18
−15 h5 [0]
P6 14
−125 h6 [0]
Q0 P7 15
p0 h7 [0]
In Figure 18.1, the horizontal arcs correspond to fast laundry service: source Pj−2 can supply
to sink Qj (j = 2, . . . , 7). The slanted arcs to the right of the column of sinks, represent
slow laundry services: source Pj−4 can supply to sink Qj (j = 4, . . . , 7). The vertical arcs,
between the sources, represent the option of carrying over dirty napkins to the next day. So,
for sink Qj the required napkins can be delivered by either source P0 (buying napkins), or
by source Pj−2 (dirty napkins cleaned by the fast laundry service), or by source Pj−4 (dirty
napkins cleaned by the slow laundry service).
T
The supply-demand vector b = b0 . . . b15 is defined as follows. Let i, j ∈ V . If node
i is a source, then define bi as the supply of napkins at node i; for node j being a sink,
define bj as the negative demand of napkins at node j . The introduction of the sink Q0
yields that i∈V bi = 0. Transportation at the end of day 7 to sink Q0 is free, and so
P
the introduction of this node does not increase the total cost. In Figure 18.1, the following
T
supply-demand vector b = b0 b1 . . . b7 b8 . . . b14 b15 corresponds to the set of nodes
{P0 , P1 , . . . , P7 , Q0 , Q1 , . . . , Q7 }:
p0 p1 p2 p3 p4 p5 p6 p7 f1 f2 f3 f4 f5 s1 s2 s3 h1 h2 h3 h4 h5 h6 h7
1 1 1 1 1 1 1 1 P0
1 1 1 P1
-1 1
1 1 P2
1 1 -1 1 P3
1 -1 1 P4
1 -1 1 P
5
-1 1 P
6
-1 1 P7
-1 -1
Q0
-1
Q1
-1
Q2
-1 -1
Q3
-1 -1 Q4
-1 -1 -1 Q5
-1 -1 -1 Q6
-1 -1 -1 Q7
Figure 18.2: The transshipment matrix A corresponding to the data of Table 18.1. (The zeros have been
omitted for readability.)
The first eight entries of b correspond to the sources, while the last eight entries of b
correspond to the sinks in the network of Figure 18.1.
The flow of napkins through G is represented by the variables pj , fj , sj , and hj . In Figure
18.1, these flow variables are written next to the arcs. Between square brackets are denoted
the costs of shipping one napkin through the corresponding arc. In the case of Figure 18.1,
we have that:
The variables f6 , f7 , s4 , s5 , s6 , and s7 are not present in the vector x for reasons mentioned
below. Note that there is a difference between the graph of Figure 18.1 and the graph of
the transportation problem of Section 8.2.1. In a transportation problem, goods can only be
supplied by sources and delivered to sinks; there are no shipments of goods between sources
and between sinks, nor are there ‘intermediate’ nodes between sources and sinks. However,
in Figure 18.1, source P1 can, for instance, deliver to source P2 . The network presented in
Figure 18.1 is in fact the network of a transshipment problem; see Section 8.3.1. Note that
the network of Figure 18.1 has no intermediate or transshipment nodes.
The network of Figure 18.1 contains no arcs that correspond to sending dirty napkins to
the slow laundry service after day 3, and no arcs that correspond to sending dirty napkins to
the fast laundry service after day 5. The reason for this phenomenon is that napkins sent to
the slow laundry service after day 3 do not return within the planning period. Also napkins
sent to the fast laundry service after day 5 do not return within the planning period.
552 C h a p t e r 1 8 . Th e cat e r i n g s e rv i c e p r o b l e m
The transshipment matrix A corresponding to the data of Table 18.1 is shown in Figure
18.2. The ILO-model corresponding to the data of the catering service problem of Figure
18.1, can now be formulated as follows:
min 3p1 + 3p2 + 3p3 + 3p4 + 3p5 + 3p6 + 3p7 + 0.75f1 + 0.75f2
+ 0.75f3 + 0.75f4 + 0.75f5 + 0.50s1 + 0.50s2 + 0.50s3
s.t. p0 + p1 + p2 + p3 + p4 + p5 + p6 + p7 = 125
f1 + s1 + h1 = 23
f2 + s2 − h1 + h2 = 14
f3 + s3 − h2 + h3 = 19
f4 − h3 + h4 = 21
f5 − h4 + h5 = 18
−h5 + h6 = 14
−h6 + h7 = 15
−p0 − h7 = −125
−p1 = −23
−p2 = −14
−p3 − f1 = −19
−p4 − f2 = −21
−p5 − f3 − s1 = −18
−p6 − f4 − s2 = −14
−p7 − f5 − s3 = −15
p0 , p1 , . . . , p7 , f1 , . . . , f5 , s1 , s2 , s3 , h1 , . . . , h7 ≥ 0, and integer.
The technology matrix A is totally unimodular, and so – since b is an integer vector – this
model has an integer optimal solution; see Theorem 4.2.4. (Why is the model feasible?) In
the next section we will solve the model.
I Initialization. An initial feasible tree solution is easily found. Namely, buy all demanded
napkins, and keep the dirty ones until the end of the planning period. Obviously, this
solution is feasible, but not optimal. The corresponding values of the basic variables are:
p0 = 1 p1 = 23 p2 = 14 p3 = 19
p4 = 21 p5 = 18 p6 = 14 p7 = 15
h1 = 23 h2 = 37 h3 = 56 h4 = 77
h5 = 95 h6 = 109 h7 = 124.
The cost of this solution is 124 × $3 = $372. The spanning tree associated with this
feasible tree solution is shown in Figure 18.3.
1 8 . 3 . A p p ly i n g t h e n e t w o r k s i m p l e x a l g o r i t h m 553
Q1
23
Q2
14
19 Q3 P1
23
21
P0 Q4 P2
18 37
Q5 P3
14
56
15 Q6 P4
77
Q7 P5
95
P6
109
Q0 P7
1 124
I Iteration 1. Let yP0 yP1 . . . yP7 yQ0 yQ1 . . . yQ7 be the vector of dual decision
variables. Select node P0 as the root of the spanning tree, and take yP0 = 0. The
remaining dual variables are determined according to the procedure formulated in Step 1
of the network simplex algorithm. Namely, first determine the values for the nodes for
which the path to the root consists of only one arc. These are the nodes Q0 , Q1 , . . . , Q7 .
From node Q0 we work our way up to node P1 , resulting in the following dual values:
All these costs are negative, and so we can arbitrarily select an entering arc with the
smallest value. We choose arc (P1 , Q5 ), corresponding to the flow variable s1 , as the
entering arc. The resulting cycle is Q5 -P0 -Q0 -P7 -P6 -P5 -P4 -P3 -P2 -P1 . Arc (P0 , Q0 ),
corresponding to the flow variable p0 , is the only arc that points in the direction of the
entering arc (P1 , Q5 ). So, the flow on (P1 , Q5 ) is increased by a certain amount, while
554 C h a p t e r 1 8 . Th e cat e r i n g s e rv i c e p r o b l e m
Q1
23
Q2
14
19
0 Q3 P1
0 4
3 18
P0 Q4 P2
0 0 0
18
Q5 P3
0 1 0
0 14
Q6 P4
7
14
Q7 P5
11
P6
25
Q0 P7
85 40
the flow on the other arcs of the cycle is decreased by the same amount. We change the
flow on the cycle until the flow on one of the arcs becomes zero. The first arc with a
zero flow is (P0 , Q5 ). Hence arc (P0 , Q5 ) leaves the set of basic variables. The new basic
variables have the following values:
p0 = 19 p1 = 23 p2 = 14 p3 = 19
p4 = 21 p6 = 14 p7 = 15 s1 = 18
h1 = 5 h2 = 19 h3 = 38 h4 = 59
h5 = 77 h6 = 91 h7 = 106.
The cost of this solution is: 106 × $3 = $318 for buying new napkins, and 18 × $0.50 =
$9.00 for cleaning eighteen napkins via the slow laundry service. So the total cost is
$327.00.
I Iteration 2. Subtracting the dual cost of the entering arc from the current value of yQ5 ,
the following new dual values are obtained:
The dual costs of the arcs that are not in the current tree are:
c̄f1 c̄f2 c̄f3 c̄f4 c̄f5 c̄s1 c̄s2 c̄s3
= 2.50 −2.25 −2.25 0.25 −2.25 −2.25 −2.50 −2.50 .
We choose s2 as the entering basic variable. The flow along the corresponding arc is 14.
One can easily check that the arc corresponding to p6 leaves the set of basic variables.
I Iteration 10. It is left to reader to check that the network simplex algorithm terminates
after ten iterations. The spanning tree with optimal flow is shown in Figure 18.4. The
corresponding optimal basic variables satisfy:
p0 = 85 p1 = 23 p2 = 14 p4 = 3
f1 = 19 f2 = 18 f3 = 18 f4 = 14
f5 = 14 s3 = 1 h1 = 4 h4 = 7
h5 = 11 h6 = 25 h7 = 40.
The cost of this optimal solution consists of the following parts: 40 × $3 = $120 for buying
new napkins, 83 × $0.75 = $62.50 for cleaning napkins by the fast cleaning service, and
$0.50 for using the slow cleaning service. The total cost is $182.75. Table 18.3 lists the
corresponding optimal decision schedule.
The fact that on days 1 and 2 the demanded napkins are all purchased is not quite unexpected,
since it is assumed that the caterer has no napkins at the beginning of the planning period
and the fast laundry service takes already two days. It is also clear that if the caterer already
has a number of napkins at the beginning of day 1 (but not more than 37), then this number
can be subtracted from the number of purchased napkins on days 1 and 2.
in changes of the demand bi , then two right hand side values need to be changed, namely
both bi and −bi .
p3
b3 = 19
50
46
40 (b3 = 25)
39 39
(b3 = 10) (b3 = 18)
30
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 b3
No napkins need be purchased on day 3 when b1 ≥ 19, and no napkins need be pur-
chased on days 3 and 4 when b1 ≥ 26. In Exercise 18.6.2, the reader is asked to draw the
perturbation function for b1 , i.e., the total costs as a function of b1 .
Now consider day 3. The perturbation function for b3 , the number of demanded napkins
on day 3, is depicted in Figure 18.5. For b3 ≥ 18 (i.e., ε ≥ −1) the total cost increases more
than for 0 ≤ b3 ≤ 18 (i.e., −19 ≤ ε ≤ −1). This is caused by the fact that for b3 > 18
more napkins need to be purchased, while for 0 ≤ b3 ≤ 18 the number of purchased
napkins is constant (namely, equal to 39); see Figure 18.6.
Using Theorem 3.5.1, the current tree solution is optimal if and only if
Cost of slow
laundry (S)
∗
3 z =372
2.5
2
∗
z =246
1.5
1
∗
z =182.75
0.5
∗ ∗
z =120 z =203
0 0.5 1 1.5 2 2.5 3 Cost of fast
laundry (F )
Figure 18.7: Tolerance region for the costs of fast and slow laundry.
Note that the vector in parentheses in this expression contains the current objective coeffi-
cients of the nonbasic variables p3 , p5 , p6 , p7 , s1 , s2 , h5 , and h6 . It can be checked that
this system of inequalities is equivalent to
F ≤ S ≤ 23 (F − 1),
Total
costs
23 23 23 23 23
14 14 14 14 14
19 19 14 10 0 222.75
200
21 21 21 7 3
18 0 0 0 0
182.75
14 0 0 0 0
150
15 0 0 0 0
142.75
100 112.75
99.25
81.25
50 62
4 param p_fast;
5 param p_slow;
6 param d{t in 1..T} >= 0;
7
8 var p{t in 0..T} >= 0;
9 var f{t in 1..T−2} >= 0;
10 var s{t in 1..T−4} >= 0;
11 var h{t in 1..T} >= 0;
12
13 minimize costs:
14 sum{t in 1..T} p_napkin * p[t] + sum{t in 1..T−2} p_fast * f[t]
15 + sum{t in 1..T−4} p_slow * s[t];
16 subject to P0:
17 sum{t in 0..T} p[t] = M;
18 subject to P {t in 1..T}:
19 (if t <= T−2 then f[t] else 0) + (if t <= T−4 then s[t] else 0)
20 − (if t >= 2 then h[t−1] else 0)
21 + h[t] = d[t];
22 subject to Q0:
23 −p[0] − h[7] = −M;
24 subject to Q {t in 1..T}:
25 −p[t] − (if t >= 3 then f[t−2] else 0)
26 − (if t >= 5 then s[t−4] else 0) = −d[t];
27
28 data;
29
30 param M := 125;
31 param T := 7;
32 param p_napkin := 3;
33 param p_fast := 0.75;
34 param p_slow := 0.5;
560 C h a p t e r 1 8 . Th e cat e r i n g s e rv i c e p r o b l e m
35 param d :=
36 1 23
37 2 14
38 3 19
39 4 21
40 5 18
41 6 14
42 7 15;
43 end;
18.6 Exercises
Exercise 18.6.1. Consider the catering service problem of this chapter. Determine the
perturbation functions of:
(a) the demand b1 on day 1, and
(b) the demand b2 on day 2.
Give for each interval, between kink points of the graphs, the schedule of amounts of napkins
to be purchased.
Exercise 18.6.2. Draw the perturbation function for b1 , i.e., the total costs as a function
of b1 .
Exercise 18.6.3. Complete the graph of Figure 18.7 by considering also the nonoptimal
fast/slow laundering combinations.
Exercise 18.6.5. In Section 18.3 (at the end of the section), what would happen if more
than 37 napkins were available at the beginning of day 1?
Appendices
This page intentionally left blank
A
Appe ndix
Mathematical proofs
A proof demonstrates that a statement is always true, and so it is not sufficient to just enumer-
ate a large number of confirmatory cases. Instead, a proof consists of sentences that logically
follow from the assumptions, definitions, earlier sentences in the proof, and other previously
established theorems or lemmas. A proof can in principle be traced back to self-evident or
assumed statements that are known as axioms or postulates.
There are usually multiple, very different, ways of proving a statement, and we therefore
usually refer to ‘a proof ’ of a given statement, rather than ‘the proof ’ (unless, of course,
we refer to a particular proof). Often, there is one proof that is regarded as ‘more elegant’
than all others. The famous mathematician Paul Erdős (1913–1996) spoke of The Book,
a visualization of a book in which God has written down the most elegant proof of every
theorem. Using this metaphor, the job of a mathematician is to ‘discover’ the pages of The
Book; see also Aigner and Ziegler (2010).
Although proofs use logical and mathematical symbols, they usually consist of a considerable
amount of natural language (e.g., English). This means that any proof usually admits a certain
degree of ambiguity. It is certainly possible to write proofs that are completely unambiguous.
Such proofs are considered in so-called proof theory, and they are usually written in a symbolic
language without natural language. These proofs are suitable for verification with the help
of a computer, but they are usually very hard to read for humans. The interested reader is
referred to, e.g., von Plato, J. (2008).
So, providing a proof of a theorem is usually a matter of balancing the right level of detail
and the right degree of readability. On the one hand, purely formal are usually hard to read
563
564 A p p e n d i x A . M at h e m at i ca l p r o o f s
for humans, because they are too detailed. On the other hand, a proof that contains too few
details or skips important details is hard to understand because it leaves out critical insights.
Note that this also depends on the intended audience. An extreme case of ‘too few details’
is a so-called proof by intimidation. Wikipedia gives the following definition:
The phrase is used, for example, when the author is an authority in a field, presenting a
proof to people who a priori respect the author’s insistence that the proof is valid, or when
the author claims that a statement is true because it is trivial, or simply because he or she
says so.
In this appendix, we give an overview of the three most often used forms of (serious)
mathematical proofs, and illustrate these with examples.
Theorem A.1.1.
Let x and y be even integers. Then, x + y is even.
Proof. Since x and y are even, there exist integers a and b such that x = 2a and y = 2b.
Hence, we have that x + y = 2a + 2b = 2(a + b). Therefore, x + y has 2 as a factor, so that
x + y is even. This proves the theorem.
Most proofs in this book are direct proofs. Note that the proof of Theorem A.1.1 is con-
sidered a proof because (1) each sentence is a precise statement that logically follows from
the assumptions of the theorem and the preceding sentences (notice the words “hence” and
“therefore”), and (2) the final sentence of the proof is exactly the conclusion of the theorem.
A . 2 . P r o o f b y c o n t ra d i c t i o n 565
Theorem A.2.1.
There is no largest even integer.
Proof. Suppose for a contradiction that the statement of the theorem is false, i.e., suppose that
there does exist a largest even integer. Let M be that largest even integer. Define N = M + 2.
Then, N is an even integer, and N > M . Hence, N is an even integer that is larger than M .
This contradicts the fact that M is the largest even integer. This proves the theorem.
Mathematical induction consists of two elements in an analogous way. The first element
is the so-called base case, which is an (often obvious) choice for the lowest number n0 for
which the statement should be proved. The second element is the so-called induction step,
which asserts that if the statement holds for n, then it also holds for n + 1. The assumption
that the statement holds for n is called the induction hypothesis. Once the validity of these two
elements has been proved, the principle of mathematical induction states that the statement
holds for all natural numbers n ≥ n0 .
statement. The proof then proceeds by taking any n ∈ N, and showing the induction step
that if P (n) is true, then P (n + 1) is true, i.e., that the induction hypothesis “P (n) is true”
implies “P (n + 1) is true”. Once these two elements have been proved, we may conclude
that P (n) is true for all natural numbers n ≥ 1.
Theorem A.3.1.
n(n+1)
For any n ≥ 1, it holds that 1 + 2 + . . . + n = 2
.
n(n + 1)
= + (n + 1) (by the induction hypothesis)
2
n(n + 1) + 2(n + 1)
=
2
(n + 2)(n + 1)
= .
2
Thus, P (n + 1) is indeed true. Since we have proved the base case and the induction step, the
principle of mathematical induction implies that P (n) is true for all n ≥ 1.
The principle of mathematical induction allows us to prove a statement for infinitely many
integers n. It is important to note, however, that this does not necessarily mean that the
conclusion holds for n = ∞. The interested read may use mathematical induction to prove
that “for any integer n ≥ 0, the number n is finite”. Clearly, this statement is false for
n = ∞.
Appe ndix B
Linear algebra
In this appendix we review some basic results from vector and matrix algebra. For a more
extensive treatment of linear algebra, the reader is referred to, e.g., Strang (2009).
B.1 Vectors
The set of real numbers is denoted by R. For any positive integer n, the elements of the
space Rn are column arrays of size n, called vectors of size n, also called n-vectors; n is called
the dimension of the space. Vectors are denoted by lower case boldface letters. For example,
T
a = 3 5 1 0 0 is a vector of size 5; the superscript ‘T’ means that the row vector
3
5
1 ;
0
0
see also Section B.2 on ‘matrices’. In R2 and R3 vectors can be represented by an arrow
pointing from the origin to the point with coordinates the entries of the vector. Vectors in
Rn are also called points. The all-zero vector, denoted by 0, is a vector with all entries equal
to zero. For any positive integer i, the i’th unit vector, denoted by ei , is a vector in which
all entries are zero except for a one in the i’th position. For example, the third unit vector
T
in R5 is e3 = 0 0 1 0 0 . The all-ones vector, denoted by 1, is a vector with each entry
equal to 1. The sizes of the vectors 0, ei , and 1 will be clear from the context.
Vectors of the same size can be added. For instance, let a1 and a2 be two vectors in Rn ,
T T
denoted by a1 = a11 a21 . . . an1 , and a2 = a12 a22 . . . an2 . Then the addition
567
568 A p p e n d i x B . L i n e a r a l g e b ra
The number k in this scalar product is called the scalar. Note that addition, as well as scalar
multiplication are performed componentwise. For a given k 6= 0 and a 6= 0, the vectors
ka and −ka are called opposite.
T T
The inner product of the vectors a = a1 . . . an and b = b1 . . . bn is denoted and
defined by:
Xn
aT b = a1 b1 + . . . + an bn = ak bk .
k=1
T T
For example, if x = 0 1 −5 and y = 6 4 −2 , then xT y = 0 × 6 + 1 × 4 +
q n
! 12
X
kak = a21 + . . . + a2n = a2k .
k=1
It can be easily checked that ka + bk2 = kak2 + kbk2 + 2aT b. The angle α (0 ≤ α < 2π )
between two nonzero vectors a and b is defined by:
which is the cosine rule for a and b. For any two vectors a and b of the same size, the
following inequality holds:
|aT b| ≤ kak kbk,
where | · | is the absolute value. This inequality is called the Cauchy-Schwarz inequality2 . The
inequality is a direct consequence of the cosine-rule for the vectors a and b. Two vectors
a and b of the same size are called orthogonal or perpendicular if aT b = 0. Note that the
cosine-rule for orthogonal vectors a and b implies that cos α = 0, and so α = 12 π , i.e.,
the angle between a and b is in fact 90◦ . Note that for each i and j , with i 6= j , it holds
1
Named after the Greek mathematician Euclid of Alexandria, who lived from the mid-4th century
to the mid-3rd century b.c.
2
Also known as the Cauchy-Bunyakovsky-Schwarz inequality, in honor of the French mathematician
Augustin-Louis Cauchy (1789–1857), the Ukrainian mathematician Viktor Y. Bunyakovsky (1804–1889),
and the German mathematician Hermann A. Schwarz (1843–1921).
B . 1 . Ve c t o r s 569
that ei and ej are perpendicular, where ei and ej are the i’th and j ’th unit vectors of the
same size, respectively.
The set Rn , equipped with the above defined ‘addition’, ‘scalar multiplication’, ‘Euclidean
norm’, and ‘inner product’, is called the n-dimensional Euclidean vector space. From now on
we refer to Rn as the Euclidean vector space of dimension n. Note that R0 = {0}.
Let a1 , . . . , ak , and b be vectors of the same size. The vector b is called a linear combination
of the vectors a1 , . . . , ak if there are scalars λ1 , . . . , λk such that:
k
X
b = λ1 a1 + . . . + λk ak = λj aj ;
j=1
λ1 a1 + . . . + λk ak = 0 implies that λ1 = . . . = λk = 0.
On the other hand, the collection of vectors {a1 , . . . , ak } ⊂ Rn is called linearly dependent
if it is not linearly independent, i.e., if there exist scalars λ1 , . . . , λk , not all zero, such that:
λ1 a1 + . . . + λk ak = 0.
For example, the set {ei , ej }, with i 6= j , is a linearly independent collection of vectors,
T T T
while the set { 1 4 1 , 2 −3 1 , 3 1 2 } is a linearly dependent collection of vectors,
T T T T
because 1 × 1 4 1 + 1 × 2 −3 1 + (−1) × 3 1 2 = 0 0 0 . Note that any set
It can be shown that the maximum number of linearly independent vectors in Rn is equal
to n. A set of n linearly independent vectors in Rn is called a basis of Rn . For example, the
set {e1 , . . . , en } is a basis. The reason of using the term ‘basis’ of Rn is that any vector x
in Rn can be written as a linear combination of n basis vectors. For instance, for any vector
T
x = x1 . . . xn ∈ Rn , it holds that:
x = x1 e1 + . . . + xn en .
then 0 = x − x = (λ1 − λ10 )a1 + . . . + (λn − λ0n )an , and so in fact λ1 − λn0 = . . . =
λn − λn0 = 0.
The following theorem shows that, if we are given a basis of Rn and an additional nonzero
vector, then another basis can be constructed by exchanging one of the vectors of the basis
with the additional vector.
Theorem B.1.1.
Let n ≥ 1. Let a1 , . . . , an ∈ Rn be linearly independent vectors, and let b ∈ Rn
be any nonzero vector. Then, there exists k ∈ {1, . . . , n} such that a1 , . . . , ak−1 , b,
ak+1 , . . . , an are linearly independent.
Proof. Because a1 , . . . , an are linearly independent vectors, there exist scalars λ1 , . . . , λn such
that b = λ1 a1 + . . . + λn an . Since b 6= 0, there exists an integer k ∈ {1, . . . , n} such that
λk 6= 0. Without loss of generality, we may assume that k = 1. We will show that b, a2 , . . . , an
are linearly independent. To that end, let µ1 , . . . , µn be such that µ1 b+µ2 a2 +. . .+µn an = 0.
Substituting b = λ1 a1 + . . . + λn an , we obtain that:
µ1 λ1 a1 + (µ1 λ2 + µ2 )a2 + . . . + (µ1 λn + µn )an = 0.
Since a1 , . . . , an are linearly independent, it follows that µ1 λ1 = 0, µ1 λ2 + µ2 = 0, . . .,
µ1 λn + µn = 0. Since λ1 =
6 0, we have that µ1 = 0, and this implies that µ2 = . . . = µn = 0.
Hence, b, a2 , . . . , an are linearly independent.
B.2 Matrices
Let m, n ≥ 1. A matrix is a rectangular array of real numbers, arranged in m rows and n
columns; it is called an (m, n)-matrix. The set of all (m, n)-matrices is denoted by Rm×n .
Matrices are denoted by capital boldface letters. Two matrices are said to be of the same
size if they have the same number of rows and the same number of columns. The entry in
row i and column j , called position (i, j), of the matrix A is denoted by aij ; we also write
A = {aij }. An (m, 1)-matrix is a column vector in Rm , and an (1, n)-matrix is a row
vector whose transpose is a vector in Rn . If m = n, then the matrix is called a square matrix.
T
The vector a11 . . . amm of the (m, n)-matrix A = {aij } with m ≤ n is called the
T
main diagonal of A; if m ≥ n, then the main diagonal is a11 . . . ann .
Similarly, an (m, n)-matrix A = {aij } with m ≤ n is called lower triangular if the nonzero
entries are in the triangle on and below the positions (1, 1), . . . , (m, m); i.e., aij = 0 for
all i, j ∈ {1, . . . , m} with i < j . A diagonal matrix is a square matrix which is upper as
well as lower triangular.
AT = {aji }.
Notice that if A is an (m, n)-matrix, then AT is an (n, m)-matrix. Note also that (AT )T =
A, and that for any diagonal matrix A, it holds that AT = A. A matrix A is called symmetric
if AT = A.
The addition or sum of two matrices A = {aij } and B = {bij } of the same size is denoted
and defined by:
A + B = {aij + bij }.
The scalar product of the matrix A = {aij } with the scalar k is denoted and defined by:
kA = {kaij }.
Let A = {aij } be an (m, p)-matrix, and B = {bij } an (p, n)-matrix. Then the matrix
product of A and B is an (m, n)-matrix, which is denoted and defined by:
( n )
X
AB = aik bkj ;
k=1
i.e., the (i, j)’th entry of AB is equal to the inner product of the i’th row of A and the
j ’th column of B. Note that (AB)T = BT AT .
a41 a42 a43 a44 a45 a46
A21 = , A22 = , A23 = .
a51 a52 a53 a54 a55 a56
The original matrix is then written as:
A11 A12 A13
A= .
A21 A22 A23
Partitioned matrices can be multiplied if the various submatrices in the partition have ap-
propriate sizes. This can be made clear by means of the following example. Let A and B
be partitioned as follows:
A11 A12 B11 B12 B13
A= , and B = ,
A21 A22 B21 B22 B23
where A11 is an (m1 , n1 ) matrix, A12 is an (m1 , n2 ) matrix, A21 is an (m2 , n1 ) matrix,
A22 is an (m2 , n2 ) matrix, B11 is a (p1 , q1 ) matrix, B12 is a (p1 , q2 ) matrix, B13 is a
(p1 , q3 ) matrix, B21 is a (p2 , q1 ) matrix, B22 is a (p2 , q2 ) matrix, and B23 is a (p2 , q3 )
matrix, with n1 = p1 and n2 = p2 . Then the product AB can be calculated as follows:
A11 B11 + A12 B21 A11 B12 + A12 B22 A11 B13 + A12 B23
AB = .
A21 B11 + A22 B21 A21 B12 + A22 B22 A21 B13 + A22 B23
where A1 , . . . , Ak are square matrices. The matrices A1 , . . . , Ak are called the blocks of
the matrix A.
Gaussian elimination3 is the process of transforming a nonzero entry into a zero entry using
a fixed nonzero entry as pivot entry, whereas the transformation is carried out by means of
elementary row and/or column operations. More precisely, Gaussian elimination can be
described as follows. Let aij be the selected (nonzero!) pivot entry. Gaussian elimination
allows us to create nonzero entries in row i as well as in column j as follows. Suppose that
akj 6= 0, and we want a zero entry in position (k, j); k 6= i. The transformation that
accomplishes this is:
(GR) Subtract (akj /aij ) times row i from row k , and take the result as the new row k .
Similarly, we can apply Gaussian elimination to the columns. Suppose we want a zero in
position (i, l); l 6= j . Then rule (GR) is applied on the transposed matrix. Hence, the rule
for Gaussian elimination on columns is:
(GC) Subtract (ail /aij ) times column j from column l, and take the result as the new
column l.
The reader may check that rule (GR) uses (R2) and (R3), while rule (GC) uses (C2) and
(C3). Hence, the matrices that appear after applying Gaussian elimination are equivalent to
the original matrix.
We will now show how matrices are transformed into equivalent matrices for which all en-
tries below the main diagonal are zero. Such an equivalent matrix is called a Gauss form of
the original matrix, and the transformation that leads to a Gauss form is called a Gaussian
reduction. The procedure uses the entries on the main diagonal as pivot entries. The proce-
dure can be repeated for the transposed matrix. This leads to an equivalent matrix with all
nonzero entries on the main diagonal. In a further step, all nonzero main diagonal entries
can be made equal to one. A Gauss-Jordan form4 of a matrix is an equivalent 0-1 matrix (a
matrix consisting of only zeros and/or ones) with all ones on the main diagonal.
3
In honor of the German mathematician Carl F. Gauss (1777–1855), although the method was known
to the Chinese as early as 179 a.d.
4
Named after the German mathematicians Carl F. Gauss (1777–1855) and Wilhelm Jordan (1842–
1899).
574 A p p e n d i x B . L i n e a r a l g e b ra
The algorithm successively uses the current entries in the positions (1, 1), . . . , (m, m)
as pivot entries; if m ≤ n, then only the positions (1, 1), . . . , (m − 1, m − 1) need
to be used. Suppose the algorithm has already considered the columns 1 to k − 1. So
the current column is k (1 ≤ k ≤ m).
I Step 1. If all entries below position (k, k) in the current column k are zero, then
if k < m go to Step 4, and if k = m then stop. Otherwise, go to Step 2.
I Step 2. Select a nonzero entry below position (k, k) in the current column k , say
in position (i, k) with i > k . Then interchange (elementary row operation
(R1)) row k with row i in the current matrix. This yields that the entry in
position (k, k) becomes nonzero. Go to Step 3.
I Step 3. Apply Gaussian elimination on the rows below row k , by using the current
entry in position (k, k) as pivot entry. The result is that all entries in column
k , below position (k, k), become equal to zero. Go to Step 4.
I Step 4. As long as k < m, return to Step 1 with k := k + 1. If k = m, then stop.
The rank of a matrix A, denoted by rank(A), is defined as the maximum number of linearly
independent columns of A. It can be shown that the rank of a matrix is also equal to the
maximum number of linearly independent rows, and so A and AT have the same rank.
Although a Gauss-Jordan form of a matrix is not uniquely determined, the number of entries
that is equal to 1 is fixed. Based on this fact, it can be shown that the rank of a matrix is
equal to the number of ones in any Gauss-Jordan form of A.
If m ≤ n and rank(A) = m, then A is said to have full row rank. Similarly, the matrix
A, with m ≥ n, has full column rank if rank(A) = n. A square matrix A with full row
(column) rank is called nonsingular or regular, whereas A is called singular otherwise. Note
that any Gauss-Jordan form of a singular matrix with m ≤ n contains an all-zero row,
whereas any Gauss-Jordan form of a regular matrix is (equivalent to) the identity matrix.
576 A p p e n d i x B . L i n e a r a l g e b ra
Theorem B.4.1.
Let A be an (m, n)-matrix with m ≤ n, and rank(A) = m. Then, AAT is square,
symmetric, and nonsingular.
Proof. We leave it to the reader to show that AAT is square and symmetric. The fact that
T T
AA is nonsingular, is shown by proving that the columns of AA are linearly independent.
Let the entries of the m-vector y form a linear combination of the columns of AAT such that
T T
AA y = 0. (The reader may check that AA y is in fact a shorthand notation for a linear
combination of the columns of AAT with as ‘weights’ the entries of y.) Multiplying AAT y = 0
by yT gives: 0 = yT AAT y = (AT y)T (AT y) = (AT y)2 . Hence, AT y = 0. Since the rows
of A are linearly independent, it follows that y = 0. Hence, AAT has independent columns,
and this means that AAT is nonsingular.
a11 x1 + . . . + a1n xn = b1
.. .. ..
. . .
am1 x1 + . . . + amn xn = bm .
Such a system can be written in the following compact form:
Ax = b,
a11 . . . a1n
X = {x ∈ Rn | Ax = b}.
The set X is called the solution space of the system of equalities. If X 6= ∅ then the system
Ax = b is called solvable, or consistent; otherwise,
Ax = b is called nonsolvable, or inconsistent.
It can be shown
that
if A1 b1 and A2 b2 are row-equivalent matrices, which means
The following examples illustrate the procedure. In the first example the solution space is
nonempty, and in the second one it is empty.
B . 5 . S o lv i n g s e t s o f l i n e a r e q ua l i t i e s 577
x1 − x2 + x3 − x4 = 6,
−2x1 + x2 − x3 + x4 = 3,
−x1 + 2x2 − 3x3 = 3.
The calculations are carried out on the extended matrix A b . Applying Algorithm B.4.1, leads to
the following sequence of row-equivalent matrices:
1 −1 1 −1 6 1 −1 1 −1 6 1 0 0 0 −9
−2 1 −1 1 3 ∼ 0 −1 1 −1 15 ∼ 0 1 −1 1 −15 .
−1 2 −3 0 3 0 1 −2 −1 11 0 0 −1 −2 26
Rewriting the last matrix as a system of linear equalities gives:
x1 = −9,
x2 − x3 + x4 = −15,
−x3 − 2x4 = 29.
For any fixed value of x4 , the values of x1 , x2 , and x3 are fixed as well. Take x4 = α. Then,
x3 = −26 − 2α, x2 = −15 + (−26 − 2α) + α = −41 − 3α, and x1 = −9. The reader
may check that these values for x1 , x2 , x3 , and x4 satisfy the original system of equalities. The
solution space can be written as:
−9
0
−41 −3
X= + α α ∈ R .
−26 −2
0 1
x1 + x2 + x3 + x4 = 1,
2x1 + 2x4 = 4,
x2 + x3 = 1.
Applying Algorithm B.4.1, we find the following sequence of row-equivalent matrices:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 0 0 2 4 ∼ 0 −2 −2 0 2 ∼ 0 −2 −2 0 2 .
0 1 1 0 1 0 1 1 0 1 0 0 0 0 2
578 A p p e n d i x B . L i n e a r a l g e b ra
x1 + x2 + x3 + x4 = 1,
−2x2 − 2x3 = 2,
0 = 2.
This system of linear equalities is obviously inconsistent, and so the original system has no solution,
i.e., its solution space is empty.
Ax1 = e1 , . . . , Axn = en .
These n equalities can be solved by means of Gaussian elimination, applied to the matrices:
A e1 , . . . , A en ,
A to the identity matrix by means of elementary row operations. In other words, the
following transformation is carried out:
A In ∼ In A−1 .
Since the inverse exists only for nonsingular matrices, nonsingular matrices are also called
invertible. The following example illustrates the procedure.
" #
133
Example B.6.1. We will determine the inverse of A = 1 4 3 . The matrix A is extended
134
with the identity matrix I3 . Applying elementary row operations, we obtain the following sequence of
equivalent matrices:
1 3 3 1 0 0 1 3 3 1 0 0
1 4 3 0 1 0 ∼ 0 1 0 −1 1 0
1 3 5 0 0 1 0 0 2 −1 0 1
4 −3 0 11 −6 −3
1 0 53 0 2 0
∼ 0 1 0 −1 1 0 ∼ 0 1 0 −1 1 0
0 0 2 −1 0 1 0 0 2 −1 0 1
B . 7. Th e d e t e r m i na n t o f a m at r i x 579
0 5 12 −3 −1 12
1 0
0 = I3 A−1 .
∼ 0
1 0 −1 1
0 0 1 − 12 0 1
2
1
5 2 −3 −1 12
Therefore, the inverse of A is A−1 = −1 1 0. The reader may check that AA−1 =
1 1
−2 0 2
I3 = A−1 A.
In this expression, (A)I,J is the submatrix of A that consists of the rows with indices in I ,
and the columns with indices in J , with I, J ⊆ {1, . . . , n}. The right hand side in the
expression above is called a Laplace expansion5 of the determinant. The cofactor of the matrix
entry aij of the (n, n)-matrix A is denoted and defined by:
So the cofactor of aij in the (n, n)-matrix A is equal to (−1)i+j times the determinant of
the matrix A without row i and column j .
In the above definition of determinant, we have taken the cofactors of the entries in the first
column of A. It can be shown that the Laplace expansion can be done along any column
or row of A.
Example B.7.1. We will calculate the determinant of a (2, 2) and a (3, 3)-matrix. Ithis left to the
a a
i
reader to check the results for Laplace expansions for other rows and columns. For A = a11 a12 ,
21 22
it holds that:
det(A) = a11 (−1)1+1 det(a22 ) + a21 (−1)2+1 det(a12 ) = a11 a22 − a12 a21 .
a11 a12 a13
" #
For A = a21 a22 a23 , it holds that:
a31 a32 a33
a22 a23 a12 a13
det(A) = a11 (−1)1+1 det + a21 (−1)2+1 det
a32 a33 a32 a33
a12 a13
+ a31 (−1)3+1 det
a22 a23
= a11 (a22 a33 − a23 a32 ) − a21 (a12 a33 − a13 a32 ) + a31 (a12 a23 − a13 a22 )
= a11 a22 a33 + a12 a23 a31 + a13 a21 a32
− a13 a22 a31 − a12 a21 a33 − a11 a32 a23 .
B 0
I If A =
C D , where B and D are square matrices, then det(A) = det(B) ×
det(D).
I If A is a block diagonal matrix (see Section B.3) with blocks A1 , . . . , Ak , then det(A) =
det(A1 ) × · · · × det(Ak ).
I (Cramer’s rule6 ) Consider the system Ax = b, with A a nonsingular (n, n)-matrix,
b ∈ Rn , and x a vector consisting of n variables. Let A(i; b) be the matrix consisting
of the columns of A except for column i which is exchanged by the vector b. Then the
unique solution to this system is given by:
det(A(i; b))
xi = for i = 1, . . . , n.
det(A)
From the fact that det(AT ) = det(A), it follows that all properties concerning columns
hold for rows as well.
Example B.7.2. We will solve a system of equalities by means of Cramer’s rule. Consider the
following system of three linear equalities in three variables:
4x2 + x3 = 2,
−2x1 + 5x2 − 2x3 = 1,
3x1 + 4x2 + 5x3 = 6.
" #
0 4 1
We first check that the coefficient matrix is nonsingular: det −2 5 −2 = −7 6= 0, and so
3 4 5
this matrix is in fact nonsingular. Using Cramer’s rule, we find that:
2 4 1 0 2 1
x1 = − 71 det 1 5 −2 = 4, x2 = − 17 det −2 1 −2 = 1, and
6 4 5 3 6 5
0 4 2
1
x3 = − 7 det −2 5 1 = −2.
3 4 6
By substituting these values into the initial system, we find that this is in fact a solution.
It is left to the reader to show that for any nonsingular (n, n)-matrix A and any vector
b ∈ Rn , the system Ax = b has the unique solution x = A−1 b.
The adjoint of the (n, n)-matrix A = {aij } is denoted and defined by:
6
Named after the Swiss mathematician Gabriel Cramer (1704–1752).
582 A p p e n d i x B . L i n e a r a l g e b ra
Since det(A) 6= 0, it follows that the inverse exists. The adjoint of A satisfies
6 −1 −9 6 −1 −9
(i) If x, y ∈ S , then x + y ∈ S ;
(ii) If x ∈ S and λ ∈ R, then λx ∈ S .
I R3 itself.
I The singleton set {0}.
n T o
I The set x1 , x2 , x3 2x1 + 3x2 = 0 .
Theorem B.8.1.
There exists a unique smallest positive integer d such that every vector in S can be
expressed as a linear combination of a fixed set of d independent vectors in S .
Proof. We will show that if v1 , . . . , vr and w1 , . . . , ws are both bases for S , then r = s.
Suppose, to the contrary, that r < s. Since v1 , . . . , vr is a basis for S, it follows for each
j = 1, . . . , s that there are scalars a1j , . . . , arj such that:
wj = a1j v1 + . . . + arj vr . (B.2)
Let W be the matrix with columns w1 , . . . , ws , let V be the matrix with columns v1 , . . . , vr ,
and let A = {aij } be an (r, s)-matrix. Then (B.2) is equivalent to: W = VA. Since r < s,
the system Ax = 0 must have a nonzero solution x0 =
6 0 (this can be seen by writing Ax = 0
in Gauss-Jordan form). Hence, Wx0 = V(Ax0 ) = V0 = 0. This means that the columns of
W are linearly dependent, which contradicts the fact that the columns form a basis. Thus, we
have that r = s. For r > s, a similar argument holds.
The row space R(A) ofthe (m, n)-matrix A consists of all linear combinations of the rows
of A; i.e., R(A) = AT y y ∈ Rm . The reader may check that R(A) is a linear
subspace of Rn . Similarly, the column space of A consists of all linear combinations of the
columns of A. Note that the column space of A is a linear subspace of Rm , and that it is
equal to R(AT ). Also note that if dim R(A) = m, then A has full row rank and the rows
of A form a basis for R(A); if dim R(AT ) = n, then A has full column rank and the
columns of A form a basis for R(AT ).
The null space of the (m, n)-matrix A is denoted and defined by:
N (A) = {x ∈ Rn | Ax = 0}.
The reader may check that N (A) is a linear subspace of Rn . For any (m, n)-matrix A the
following properties hold:
Here, R(A) ⊥ N (A) means that the linear spaces R(A) and N (A) are perpendicular, i.e.,
for every r ∈ R(A) and every s ∈ N (A), we have that rT s = 0.
Property (1) is a direct consequence of property (2). For the proof of property (2), let
r ∈ R(A) and s ∈ N (A). Because r ∈ R(A), there exists a vector y ∈ Rm such
584 A p p e n d i x B . L i n e a r a l g e b ra
It follows from properties (1), (2), and (3) that any vector x ∈ Rn can be written as:
1 0 −1
h i
Example B.8.1. Consider the matrix A = 0 1 2 . Then,
T 1 0
R(A) = A y y ∈ R2 = y1 0 + y2 1 y1 , y2 ∈ R ,
−1 2
1
and N (A) = α −2 α ∈ R .
1
Therefore, we have that dim R(A) = 2 and dim N (A) = 1. Note that property (3) holds.
Property (2) also holds:
1
α1 1 0 −1 + α2 0 1 2 α −2
1
1 1
= α1 α 1 0 −1 −2 + α2 α 0 1 2 −2 = 0.
1 1
| {z } | {z }
=0 =0
T T
Property (1) now holds trivially. One can easily check that the vectors 1 0 −1 , 1 1 1 , and
T
1 −2 1 are linearly independent and span R3 .
An affine subspace of Rn is any subset of the form a+L, with a ∈ Rn and L a linear subspace
of Rn . By definition, dim(a + L) = dim L. Examples of affine subspaces are all linear
subspaces (a = 0). Affine subspaces of Rn can be written in the form {x ∈ Rn | Ax = b}
with A ∈ Rm×n , b ∈ Rm , and the system Ax = b is consistent. Clearly, the system
Ax = b is consistent if and only if the vector b can be expressed as a linear combination
T
of the columns of A. Namely, if a1 , . . . , an are the columns of A and x = x1 . . . xn ,
b = x1 a1 + . . . + xn an .
C
Appe ndix
Graph theory
In this appendix some basic terminology and results from graph theory are introduced. For
a more detailed treatment of graph theory, we refer to Bondy and Murty (1976) and West
(2001).
Let n, m ≥ 1 be integers. The complete graph Kn is the simple graph with n nodes and (all)
n
2
= 12 n(n − 1) edges. A bipartite graph is a graph G = (V, E) of which the node set V
can be partitioned into two sets V1 and V2 with V1 ∪ V2 = V, V1 ∩ V2 = ∅ and each edge
has one end in V1 and one in V2 . The complete bipartite graph Km,n is the simple bipartite
graph with |V1 | = m and |V2 | = n such that there is an edge between each node in V1
and each node in V2 . The degree d(x) of a node x of a graph G = (V, E) is the number
585
586 A p p e n d i x C . G ra p h t h e o r y
of edges incident with x, loops being counted twice. An isolated node is a node of degree 0.
Counting the number of edges in each node, we obtain the formula:
X
d(x) = 2|E|.
x∈V(G)
It can be shown that in any graph the number of nodes of odd degree is even. A graph in
which each node has the same degree is called regular. Hence, for each n ≥ 1, the complete
graph Kn is regular of degree n − 1 (see Figure C.2), and the complete bipartite graph
Kn,n is regular of degree n (see Figure C.3).
K5 K3,3
relationship between the concepts of connectivity and path are expressed in the following
theorem.
Theorem C.2.1.
For any graph G = (V, E), the following assertions are equivalent:
(i) G is connected;
(ii) For each u, v ∈ V with u 6= v , there is a path between u and v .
Proof of Theorem C.2.1. (i) ⇒ (ii): Take any u, v ∈ V . Let V1 consist of all nodes w ∈ V
such there is a path between u and w. We have to show that v ∈ V1 . Assume to the contrary
that v 6∈ V1 . Define V2 = V \ V1 . Since G is connected, there exist adjacent u1 ∈ V1 and
v1 ∈ V2 . Hence, there is a path from u to v1 that traverses v . But this means that v ∈ V1 ,
which is a contradiction. Hence, there is a path between u and v .
(ii) ⇒ (i): Assume to the contrary that G is not connected. Then there are nodes u and v and
sets V1 , V2 ⊂ V with u ∈ V1 , v ∈ V2 , V1 ∪ V2 = V, V1 6= ∅, V2 6= ∅, V1 ∩ V2 = ∅ and no edges
between V1 and V2 . However, since there is a path between u and v , there has to be an edge
between V1 and V2 as well. So we arrive at a contradiction, and therefore G is connected.
Using the concept of cycle, the following characterization of bipartite graphs can be formu-
lated.
Theorem C.2.2.
A graph is bipartite if and only if it contains no odd cycles.
Proof of Theorem C.2.2. First suppose that G = (V, E) is bipartite with bipartition (V1 , V2 ).
Let C = v1 - · · · -vk -v1 be a cycle in G (so {v1 , v2 }, . . . , {vk−1 , vk }, {vk , v1 } ∈ E(C)). With-
out loss of generality, we may assume that v1 ∈ V1 . Since G is bipartite, it follows that v2 ∈ V2 .
Similarly, v3 ∈ V1 , and – in general – v2i ∈ V2 and v2i+1 ∈ V1 . Clearly, v1 ∈ V1 and vk ∈ V2 .
588 A p p e n d i x C . G ra p h t h e o r y
Theorem C.3.1.
Let G = (V, E) be a simple graph. Then the following assertions hold.
(1) G is a tree if and only if there is exactly one path between any two nodes.
(2) If G is a tree, then |E| = |V| − 1.
(3) Every tree T in G with |V(T )| ≥ 2 has at least two nodes of degree one.
C . 4 . E u l e r i a n t o u r s , H a m i l t o n i a n c y c l e s a n d pat h s 589
Notice that it follows from Theorem C.3.1 that the addition of a new edge (between nodes
that are in the tree) leads to a unique cycle in that tree. A spanning tree of a graph G is a
spanning subgraph of G that is a tree. So a spanning tree of G is a tree in G that contains
all nodes of G.
An Eulerian tour in a graph G is a cycle in G that traverses each edge of G exactly once.
A graph is called Eulerian if it contains a Eulerian tour. (The terms ‘tour’ and ‘cycle’ are
interchangeable.)
Theorem C.4.1.
A graph G is Eulerian if and only if:
Proof of Theorem C.4.1. The necessity of the conditions (a) and (b) is obvious. So, we
need to show that (a) and (b) are sufficient for a graph being Eulerian. We first decompose G
into cycles C1 , . . . , Ck . Construct a cycle C in G as follows. Start at an arbitrary non-isolated
node v1 . Since v1 is not isolated, it has a neighbor v2 . Because the degree of v2 is even and
at least one (because v2 is adjacent to v1 ), v2 has degree at least two. So, v2 has a neighbor v3 .
By repeating this argument, we can continue this process until we reach a node that has been
visited before. This must happen, because the graph has a finite number of nodes. We thus
construct a path v1 -v2 - . . . -vq . Let p be such that vp = vq . Then, C = vp -vp+1 - . . . -vq is a
C C
A B A B
D D
cycle in G. Now remove the edges of C from G, and repeat this procedure until there are no
remaining edges. Note that removing these edges decreases the degree of each of the nodes vp ,
vp+1 , . . ., vq−1 by two and therefore, after removing them, the degree of each node remains
even, and the procedure can be repeated until all nodes have zero degree.
Sk
The above process yields cycles C1 , . . . , Ck such that E(G) = i=1 E(Ci ), and no two cycles
share an edge. Order C1 , . . . , Ck by keeping a current (ordered) set of cycles, starting with the
set {C1 }, and iteratively adding a cycle that has a node in common with at least one of the
cycles in the current set of cycles. If, in iteration t, we cannot add a new cycle, then the cycles
are partitioned into {Ci1 , . . . , Cit } and {Cit+1 , . . . , Cik } such that no cycle of the former set
has a node in common with the latter set, which implies that G is disconnected, contrary to
the assumptions of the theorem. So, we may assume that C1 , . . . , Ck are ordered such that, for
j = 1, . . . , k, the cycle Cj has a node in common with at least one of C1 , . . . , Cj−1 .
Now, iteratively construct new cycles C10 , . . . , Ck0 as follows. Let C10 = C1 . For i = 2, . . . , k,
0
write Ci−1 = v1 -v2 - . . . -vr - . . . -vl−1 -v1 and Ci = vr -u1 - . . . -us -vr , where vr is a common
0
node of Ci−1 and Ci , and define Ci0 = v1 -v2 - . . . -vr -u1 - . . . -us -vr - . . . -vl−1 -v1 . After k
iterations, Ck0 is an Eulerian cycle.
Using Theorem C.4.1, it is now obvious that the graph of Figure C.6 is not Eulerian, and so
it is not possible to make a walk through Kaliningrad by crossing the seven bridges precisely
once.
A Hamiltonian cycle1 in G = (V, E) is a cycle in G of length |V| that traverses each node of
G precisely once. The three Hamiltonian cycles of K4 are depicted in Figure C.6.
A graph is said to be Hamiltonian if it contains a Hamiltonian cycle. It can be shown that the
number of Hamiltonian cycles in Kn is 12 (n − 1)!. A Hamiltonian path in G = (V, E) is a
path in G of length |V| − 1 that traverses all nodes of G precisely once. Note that a graph
may contain a Hamiltonian path but not a Hamiltonian cycle (for example the leftmost
graph in Figure C.4). On the other hand, a graph that has a Hamiltonian cycle certainly has
a Hamiltonian path.
1
Named after the Irish mathematician Sir William Rowan Hamilton (1805–1865).
C . 5 . M at c h i n g s a n d c o v e r i n g s 591
In contrast with the case of Eulerian graphs, no nontrivial necessary and sufficient condition
for a graph to be Hamiltonian is known; actually, the problem of finding such a condition
is one of the main unsolved problems of graph theory.
A node covering of a graph G is a subset K of V(G) such that every edge of G has at least
one end in K . A node-covering K is called a minimum node covering if G has no node-
covering K 0 with |K 0 | < |K|; see Figure C.9 and Figure C.10. The number of elements
in a minimum node-covering of G is denoted by τ (G).
The following theorem shows that there is an interesting relationship between ν(G) and
τ (G). The first part of the theorem asserts that the size of a maximum matching is at most
the size of a minimum node covering. The second part asserts that, if G is bipartite, then
the two are in fact equal. The latter is known as Kőnig’s theorem2 . The proof is omitted
here.
Theorem C.5.1.
For any graph G, it holds that ν(G) ≤ τ (G). Moreover, equality holds when G is
bipartite.
2
Named after the Hungarian mathematician Dénes Kőnig (1884–1944).
592 A p p e n d i x C . G ra p h t h e o r y
Proof of Theorem C.5.1. The first assertion of the theorem follows directly from the fact
that edges in a matching M of a graph G are disjoint, whereas a node-covering contains at least
one node of each edge of the matching. For a proof of the second statement, we refer to, e.g.,
West (2001).
For example, for G being the unit-cube graph corresponding to the unit cube in R3 ; see
Figure C.11. Because this graph
is bipartite, it follows that ν(G) = τ (G). Also, it can easily
be shown that ν(Kn ) = n2 , and that τ (Kn ) = n (n ≥ 1). Notice that if a graph G is
not bipartite, this does not necessarily mean that ν(G) = τ (G). See Figure C.12 for an
example.
0, 1, 1 1, 1, 1
0, 1, 0
1, 1, 0
0, 0, 1 1, 0, 1
0, 0, 0 1, 0, 0
(a) (b)
For convenience, we shall abbreviate ‘directed graph’ to digraph. A digraph is sometimes also
called a network. A digraph D0 is a subdigraph of D if V(D0 ) ⊂ V(D), A(D0 ) ⊂ A(D),
and each arc of D0 has the same head node and tail node in D0 as in D.
With every digraph D, we associate an undirected graph G on the same node set but for
which the arc-directions are discarded. To be precise, V(G) = V(D) and for each arc
a ∈ A with head node u and tail node v , there is an edge of G with end nodes u and v .
This graph G is called the underlying graph of D; see Figure C.13.
Just as with graphs, digraphs (and also mixed graphs, in which some edges are directed and
some edges are undirected) have a simple pictorial representation. A digraph is represented
by a diagram of its underlying graph together with arrows on its edges, each arrow pointing
towards the head node of the corresponding arc.
Every concept that is valid for graphs, automatically applies to digraphs as well. Thus the di-
graph in Figure C.13 is connected, because the underlying graph has this property. However,
there are many notions that involve the orientation, and these apply only to digraphs.
Two nodes u and v are diconnected in D if there exist both a directed path from u to v and
a directed path from v to u. A digraph D is called strongly connected if each two nodes in D
are diconnected. The digraph in Figure C.14 is not strongly connected.
The indegree d−
D (v) of a node v in D is the number of arcs with head node v ; the outdegree
+
dD (v) of v is the number of arcs with tail node v . In Figure C.14, e.g., d− (v1 ) = 2,
d+ (v2 ) = 0, d+ (v3 ) = 2, and d− (v4 ) = 1. The subscript D will usually be omitted from
indegree and outdegree expressions.
Directed cycles are defined similar to (undirected) cycles. A directed Hamiltonian path (cycle) is a
directed path (cycle) that includes every node of D. A directed Eulerian tour of D is a directed
594 A p p e n d i x C . G ra p h t h e o r y
v1 v3
v2 v4
tour that traverses each arc of d exactly once. The proof of the following theorem is left to
the reader.
Theorem C.6.1.
A digraph D = (V, A) contains a directed Eulerian tour if and only if D is strongly
connected and d+ (v) = d− (v) for each v ∈ V .
The node-arc incidence matrix MV A = {mia } associated with the digraph G = (V, A) is
defined as follows:
1 if i ∈ V , a ∈ A, and i = t(a)
mie = −1 if i ∈ V , a ∈ A, and i = h(a)
0 otherwise.
C . 8 . N e t w o r k o p t i m i z at i o n m o d e l s 595
v4 v3
1
2
−3
3 2
v5
2
v1 −1 v2
Note that, in each column of a node-arc incidence matrix, exactly one entry is +1, exactly
one entry is −1, and all other entries are 0. See also Section 4.3.4.
Example C.8.1. Consider the weighted digraph of Figure C.15 (arcs with weights ∞ and 0 are
omitted). The weight function ` in this example is defined by `(v1 , v2 ) = −1, `(v2 , v3 ) = 2,
`(v3 , v4 ) = 1, `(v4 , v1 ) = 3, `(v5 , v1 ) = 2, `(v3 , v5 ) = −3, and `(v5 , v4 ) = 2. Note
that some weights are negative.
The weight of a path or cycle is defined as the sum of the weights of the individual arcs
comprising the path or cycle.
We will give the formulation of the most common problems in network optimization theory.
For the algorithms that solve these problems, we only mention the names of the most
common ones. The interested reader is referred to, e.g., Ahuja et al. (1993) and Cook et al.
(1997).
The shortest path problem is the problem of determining a minimum weight (also called a
shortest) path between two given nodes in a weighted graph. The most famous algorithm
that solves the shortest path problem is due to Edsger W. Dijkstra (1930–2002). This so-
called Dijkstra’s shortest path algorithm (1959) finds a shortest path from a specified node s
(the source) to all other nodes, provided that the arc weights are nonnegative. In the case
where arc weights are allowed to be negative, it may happen that there is a cycle of negative
length in the graph. In that case, finite shortest paths may not exist for all pairs of nodes.
An algorithm due to Robert W. Floyd (1936–2001) and Stephen Warshall (1935–2006),
596 A p p e n d i x C . G ra p h t h e o r y
the so-called Floyd-Warshall algorithm (1962), finds a shortest path between all pairs of nodes
and stops whenever a cycle of negative length occurs.
The minimum spanning tree problem is the problem of finding a spanning tree of minimum
length in a weighted connected graph. Prim’s minimum spanning tree algorithm (1957), due to
Robert C. Prim (born 1921), solves the problem for undirected graphs. In case the graph
is directed, the spanning trees are usually so-called spanning arborescences, which are trees
in which no two arcs are directed into the same node. The minimum arborescence algorithm
(1968), due to Jack R. Edmonds (born 1934), solves this problem.
The maximum flow problem (see Section 8.2.5) is the problem of determining the largest
possible amount of flow that can be sent through a digraph from a source to a sink taking
into account certain capacities on the arcs. The minimum cost flow problem is the problem of
sending a given amount of flow through a weighted digraph against minimum costs. These
problems may be solved with the flow algorithms (1956) discovered by Lester R. Ford (born
1927) and Delbert R. Fulkerson (1924–1976).
The maximum cardinality matching problem is the problem of finding a matching in a graph with
maximum cardinality. The minimum weight matching problem is the problem of determining a
perfect matching in a weighted graph with minimum weight. These problems can be solved
by means of Jack R. Edmonds’ matching algorithm (1965).
The postman problem, also called the Chinese postman problem, is the problem of finding a
shortest tour in a weighted graph such that all edges and/or arcs are traversed at least once
and the begin and end node of the tour coincide. In case the graph is either directed or
undirected, then postman problems can be solved by means of algorithms designed in 1973
by Jack R. Edmonds and Ellis L. Johnson (born 1937). In the case of a mixed graph an
optimal postman route – if one exists – is much more difficult to determine. Actually, there
are cases in which no efficient optimal solution procedure is currently available.
The traveling salesman problem is the problem of finding a shortest Hamiltonian cycle in a
weighted graph. Also for this problem no efficient solution technique is available. On the
other hand, an extensive number of heuristics and approximation algorithms (efficient algorithms
that generate feasible solutions) have been designed which can usually solve practical prob-
lems satisfactorily. For further discussion, we refer to, e.g., Applegate et al. (2006). William
J. Cook (born 1957) published an interesting book describing the history of the traveling
salesmen problem, and the techniques that are used to solve this problem; see Cook (2012).
See also Section 9.4.
D
Appe ndix
Convexity
This appendix contains basic notions and results from the theory of convexity. We start
with developing some properties of sets, and formulating the celebrated theorem of Karl
Weierstrass (1815–1897). For a more extensive account of convexity, we refer to Grünbaum
(2003).
For example, aff({v}) = {v} for any v ∈ Rn . Moreover, if a and b are different points
in R2 , then aff({a, b}) = {λa + (1 − λ)b | λ ∈ R} is the line through a and b. So,
the affine hull of a set can be considered as the smallest linear extension of that set. See also
Appendix B, Section B.1.
Bε (a) = {x ∈ Rn | kx − ak < ε}
is called the ε-ball around x. Let S ⊂ Rn . The closure of S is denoted and defined by:
597
598 A p p e n d i x D. C o n v e x i t y
n T T o
Note that int( λ 1 0 + (1 − λ) 3 0 0 ≤ λ ≤ 1 ) = ∅. The relative interior of a
set S in Rn is defined and denoted by:
is nonempty. If S = int(S), then the set S is called open; if S = cl(S), then S is called
closed. The point a is said to be in the boundary of S , if for each ε > 0, Bε (a) contains
at least one point in S and at least one point not in S . The set S is called bounded if there
exists x ∈ Rn and a number r > 0 such that S ⊂ Br (x).
A sequence (x1 , x2 , . . .) of points is called convergent with limit point x if and only if for
each ε > 0 there exists an integer M such that xk ∈ Bε (x) for each k ≥ M . We
use the notation limk→∞ xk = x. The following theorem holds for bounded sets. Any
sequence (x1 , x2 , . . .) of points in a bounded set S contains a convergent subsequence
(xi(1) , xi(2) , . . .).
It can be shown that S is closed if and only if S contains all its boundary points. Another
characterization is the following one. The set S is closed if and only if for any convergent
sequence (x1 , x2 , . . .) of points in S , it holds that limk→∞ xk ∈ S . Moreover, S is open
if and only if S does not contain any of its boundary points. A set may be neither open nor
closed, and the only sets in Rn that are both open and closed are ∅ and Rn (why?).
f (x) ≤ U for each x ∈ S . We use the notation U = sup{f (x) | x ∈ S}. If f (x) takes
on arbitrarily large (positive) values, then sup{f (x) | x ∈ S} = ∞; if S is the empty set,
then sup{f (x) | x ∈ S} = −∞.
With these definitions, we can formulate and prove the following famous theorem of Karl
Weierstrass (1815–1897).
Proof of Theorem D.1.1. We only show that f attains a minimum on S . First note
that, since f is a continuous function on the closed and bounded set S , it follows that f
is bounded below on S . Moreover, since S is nonempty, the largest lower bound of f on
S exists, say
n α = inf{f (x) | x ∈ S}. For o any positive integer k and 0 < ε < 1, define
k
S(ε, k) = x ∈ S α ≤ f (x) ≤ α + ε . Since α is the infimum of f on S , it follows that
x1 x1
1
2 x1 + 12 x2 1
2 x1 + 21 x2
x2 x2
2 2
(a) A convex set in R . (b) A nonconvex set in R .
2
Figure D.1: Examples of a convex and a nonconvex set in R .
600 A p p e n d i x D. C o n v e x i t y
The intervals [a, b) and (a, b) are defined analogously. We also define:
Again, (∞, a) and (a, ∞) are defined analogously. Notice that (−∞, ∞) = R.
A set S in the Euclidean vector space (see Appendix B) Rn is said to be convex if for any two
points x1 and x2 in S also the line segment [x1 , x2 ] = {λx1 + (1 − λ)x2 | 0 ≤ λ ≤ 1},
joining x1 and x2 , belongs to S . Thus, S is convex if and only if x1 , x2 ∈ S =⇒
[x1 , x2 ] ⊂ S . Figure D.1(a) shows an example of a convex set; Figure D.1(b) shows an
example of a nonconvex set.
Let k ≥ 1, and let x1 , . . . , xk be points in Rn . For any set of scalars {λ1 , . . . , λk } with
λ1 , . . . , λk ≥ 0 and λ1 + . . . + λk = 1, λ1 x1 + . . . + λk xk is called a convex combination
of x1 , . . . , xk . Moreover, the convex hull of the points x1 , . . . , xk is denoted and defined
by:
( k k
)
X X
conv({x1 , . . . , xk }) = λi xi λi ≥ 0 for i = 1, . . . , k, and λi = 1 ;
i=1 i=1
i.e., the convex hull of a set S is the set of all convex combinations of the points of S .
Instead of conv({·}), we will also write conv(·). Note that singletons (sets consisting of one
point), as well as the whole space Rn are convex. The empty set ∅ is convex by definition.
Some more convex sets are introduced below.
The set H = x ∈ Rn aT x = b with a 6= 0 is called a hyperplane in Rn ; the vector a
is called the normal vector of H . Note that for any two points x1 and x2 in the hyperplane
H , it holds that aT (x1 − x2 ) = b − b = 0, and so a is perpendicular to H ; see Appendix
B. A set of hyperplanes is called linearly dependent (linearly independent, respectively) if the
corresponding normal vectors are linearly dependent (independent, respectively). With
each hyperplane H = x ∈ Rn aT x = b are associated two (closed) halfspaces:
H + = x ∈ Rn aT x ≤ b and H − = x ∈ Rn aT x ≥ b .
F = {x ∈ Rn | Ax ≤ b, x ≥ 0},
separating hyperplane for X and Y if wT x + b > 0 for all x ∈ X and wT y + b < 0 for all
y ∈ Y (or vice versa). The following theorem shows that if X and Y are disjoint closed
bounded nonempty convex sets, such a separating hyperplane always exists.
Theorem D.2.1.
Let X, Y ⊆ Rn be closed bounded nonempty convex sets such that X ∩ Y = 6 ∅.
Then, there exist w ∈ Rn and b ∈ R such that wT x + b > 0 for all x ∈ X and
wT y + b < 0 for all y ∈ Y .
Proof. Because X and Y are closed and bounded nonempty sets, the Cartesian product (see
Section D.4) X × Y = {(x, y) | x ∈ X, y ∈ Y } is closed, bounded, and nonempty as well.
Define f : X × Y → R by f (x, y) = kx − yk. Then, by Weierstrass’ Theorem (Theorem
602 A p p e n d i x D. C o n v e x i t y
∗
x ) < 0. Therefore, for ε > 0 small enough, (D.1) implies that
00 ∗ T 00 ∗ ∗ ∗ T ∗ ∗
(x (ε) − y ) (x (ε) − y ) < (x − y ) (x − y ),
contrary to the choice of x∗ and y∗ . This proves that wT x + b > 0 for all x ∈ X . The proof
that wT y + b < 0 for all y ∈ Y is similar and is left to the reader.
The requirement in Theorem D.2.1 that X and Y are closed and bounded is necessary.
Indeed, the sets
x1 x1 1
X= ∈ R x1 ≤ 0 and Y =
2 2
∈ R x2 > 0, x1 ≥
x2 x2 x2
are both closed and convex (but not bounded), but there does not exist a separating hyper-
plane for X and Y . It can be shown, however, that it suffices that one of the sets X , Y is
closed and bounded.
In Section 2.1.2, a different definition of extreme point is used. In Theorem D.3.1, we give
three equivalent definitions of extreme point.
D. 3 . F ac e s , e x t r e m e p o i n t s , a n d a dj ac e n c y 603
Theorem D.3.1.
Tp
Let H1 , . . . , Hp be hyperplanes in Rn . Let p ≥ n, and let F = i=1 Hi+ (6= ∅). For
each v ∈ F , the following assertions are equivalent:
(i) v is a 0-face of F ;
(ii) v is not a convex combination of two other points of F ;
(iii) There exist n independent hyperplanes Hi(1) , . . . , Hi(n) (among the hyperplanes
+ +
H1 , . . . , Hp ) corresponding to n halfspaces Hi(1) , . . . , Hi(n) , respectively, such
that n
\
{v} = Hi(k) .
k=1
Proof of Theorem D.3.1. (i) =⇒ (ii): Assume to the contrary that there are points u and
w in F , both different from v, such that v ∈ conv(u, w). We may choose u and w such
+
that v = + w). Since v is an0-face
1
2 (u of F , othere is a hyperplane H such that F ⊂ H
+
and H ∩ F = {v}. Let H = x aT x ≤ b . Assume that u 6∈ H . If aT u < b, then
T T 1 T 1 T
a v = a ( 12 (u + v)) = 2a u + 2a v < 1 1
2b + 2b = b. This contradicts the fact that v ∈ H .
T
For a u > b, the same contradiction is obtained. Hence, u ∈ H . But this contradicts the fact
that v is the only point in H ∩ F .
(ii) =⇒ (iii): Let F = {x | Ax ≤ b}, with the rows of A being the normal vectors of
the hyperplanes H1 , . . . , Hp . Let A(v) be the submatrix of A consisting of the rows of A that
correspond to the hyperplanes Hi that are binding at v; i.e., if ai is a row of A(v) then aTi v = bi ,
n o
and if ai is a row of A that is not a row of A(v) then aTi v < bi , where Hi = x aTi x = bi
is the corresponding hyperplane. Let I be the index set of the rows of A that are in A(v) , and
J the index set of the rows of A that are not in A(v) . Note that |I ∪ J| = p. We will show
that rank(A(v) ) = n. Assume, to the contrary, that rank(A(v) ) ≤ n − 1. This means that the
columns of A(v) are linearly dependent; see Appendix B. Hence, there exists a vector λ 6= 0
such that A(v) λ = 0. Since aTi v < bi for each i ∈ J , there is a (small enough) ε > 0 such that
T T
ai (v + ελ) ≤ bi and ai (v − ελ) ≤ bi for i ∈ J . On the other hand, since ai λ = 0 for each
T T
i ∈ I , it follows that ai (v + ελ) ≤ bi and ai (v − ελ) ≤ bi for i ∈ I . Hence, A(v + ελ) ≤ b
and A(v + ελ) ≤ b, and so v + ελ ∈ F and v − ελ ∈ F . Since v is a convex combination
of v + ελ and v − ελ, we have obtained a contradiction with (ii). Therefore, rank(A(v) ) = n.
Let Hi(1) , . . . , Hi(n) be hyperplanes corresponding to n independent rows of A(v) . Since n
independent hyperplanes in an n-dimensional space can intersect in one point, and we know
that v ∈ Hi(1) , . . . , v ∈ Hi(n) , it follows that Hi(1) , . . . , Hi(n) determine v. This proves (iii).
604 A p p e n d i x D. C o n v e x i t y
(iii) =⇒ (i): Let a1 , . . . , an be the linearly independent normals of Hi(1) , . . . , Hi(n) , respec-
tively. Choose a vector c in the interior of cone{a1 , . . . , an } and scalars
n λ1 , . . ., λn > 0 with
o
n T T
λ1 +. . .+λn = 1, and such that c = λ1 a1 +. . .+λn an . Define H = x ∈ R c x = c v .
We will show that F ∩ H = {v}. Take any v’ ∈ F ∩ H . Then cT v’ = cT v, and so
T 0 T 0 T 0 T 0
0 = c (v − v) = (λ1 a1 + . . . + λn am ) (v − v) = λ1 (a1 (v − v)) + . . . + λn (an (v − v)).
Since v’ ∈ F , it follows that aTj v’ ≤ bj , and hence aTj (v0 −v) ≤ bj −bj = 0 for all j = 1, . . . , n.
Moreover, since λ1 , . . . , λn > 0, it follows that aTj (v0 − v) = 0 for all j = 1, . . . , n. Hence,
both v’ and v are in Hi(1) , . . . , Hi(n) . But this implies that conv(v0 , v) ⊂ nk=1 Hi(k) . Since
T
Tn 0
dim( k=1 Hi(k) ) = 0, it follows that v = v. This proves the theorem.
Theorem D.3.2.
Tp
Let H1 , . . . , Hp be hyperplanes in Rn . Let p ≥ n + 1, and let F = i=1 Hi+ (6= ∅).
For each u, v ∈ F , the following assertions are equivalent:
Proof of Theorem D.3.2. (i) ⇒ (ii): As in the proof of Theorem D.3.1, let A be the matrix
corresponding to the hyperplanes H1 , . . . , Hp . Let A(u,v) be the submatrix of A consisting
of the rows of A that correspond to the hyperplanes that are binding at both u and v. Similar
to the proof of (ii) ⇒ (iii) of Theorem D.3.1, it follows that rank(A(u,v) ) = n − 1. So, there
are n − 1 independent hyperplanes Hi(2) , . . . , Hi(n) among H1 , . . . , Hp that are binding at
both u and v. Since u and v are different extreme points of F , two more hyperplanes exist:
one to determine u, and one to determine v; say Hi(1) and Hi(n+1) , respectively. I.e., the set
{Hi(1) , . . . , Hi(n) } determines u, and the set {Hi(2) , . . . , Hi(n+1) } determines v.
(ii) ⇒ (i): This proof is similar to the proof of (iii) ⇒ (i) in Theorem D.3.1. The vector c is
chosen as follows. Let a2 , . . . , an be the normals of the n − 1 independent hyperplanes that are
binding at both u and v. n
Choose
c in the interioroof cone(a2 , . . . , an ). It then can be shown
that the hyperplane H = x c x = cT u = cT v satisfies F ∩ H = conv(u, v).
T
In Section 2.1.3, we stated that a face of a face of the feasible region of an LO-model is a
face of the feasible region itself. The following theorem makes this statement precise.
D. 3 . F ac e s , e x t r e m e p o i n t s , a n d a dj ac e n c y 605
x2 x2 x2
H4 H4 H4
v3 H6 H6 H6
5
v4 x 3
x
v2 2
2 x
0 x 0 1 0 1
x x x x x
H3 H3 H3
6
0 x 4 x1
v1 x1 x1 x
(a) (b) (c)
0
Figure D.2: Writing the vertex x as a convex combination of vertices of F .
Theorem D.3.3.
Let F be a polyhedral set in Rn . Then any face FJ of F is a polyhedral set, and any
face of FJ is also a face of F .
Proof. Let H+ = {H1 , . . . , Hp } be the set of halfspaces that determine the polyhedral set
F , and let H be the set of corresponding hyperplanes. Let FJ be a face of F . Moreover, let
HJ ⊆ H be the set of hyperplanes that determine FJ in F , meaning that for each H ∈ HJ ,
it holds that FJ ⊆ H . Hence, FJ = F ∩ ( HJ ). Let HJ+ be the set of halfspaces H +
T
Now assume that FI is a face of the polyhedral set FJ . Let HI be the set of hyperplanes with
HI ⊆ H that determine FI in FJ . Hence, FI = FJ ∩ ( HI ). Define HI∪J = HI ∪ HJ .
T
face of F .
In the remainder of this section, we give a proof of Theorem 2.1.2 for the case when the
feasible region F is bounded. This is the ‘if ’ part of Theorem 2.1.6. The following example
illustrates the main idea behind the proof.
Example D.3.1. Consider the feasible region F of Model Dovetail, and suppose that we want
T
to write the point x0 = 3 3 as a convex combination of the vertices of F ; see Figure D.2(a). We
may choose any vector z0 ∈ R2 \ {0}, and move away from x0 in the direction z0 , i.e., we move
T
along the halfline x0 + αz0 with α ≥ 0. For instance, let z0 = 1 0 . We move away from x0
in the direction z0 as far as possible, while staying inside the feasible region F . Doing that, we arrive
T
at the point x1 = x0 + 2z0 = 5 3 , where we ‘hit’ the hyperplane H4 and we cannot move
606 A p p e n d i x D. C o n v e x i t y
any further. Note that x1 is a point of the face F{4} , which is the line segment v1 v2 . The point
x1 along with the vector 2z0 originating from x0 is shown in Figure D.2(b). Similarly, we can move
T
away as far as possible from x0 in the direction −z0 = −1 0 , until we hit the hyperplane H2 ,
T
arriving at point x2 = x0 − 3z0 = 0 3 . Note that x2 is a point of the face F{2} , which is the
line segment 0v4 .
Now we have two new points x1 and x2 , with the property that x0 , the original point of interest, lies
on the line segment x1 x2 . To be precise, since x1 = x0 + 2z0 and x2 = x0 − 3z0 , it follows by
eliminating z0 that 3x1 + 2x2 = 5x0 or, equivalently, that x0 = 53 x1 + 25 x2 , which means that
x0 is in fact a convex combination of x1 and x2 . The next step is to repeat this process for the two
points x1 and x2 .
So consider x1 . Again, we choose a vector z1 ∈ R2 \ {0} and move away from x1 in the direction
z1 . Now, however, since x1 is a point of the face F{2} , we will make sure that, as we move in
the direction z1 , we stay inside the face F{2} . Since every point in this face satisfies the equation
T
3x1 + x2 = 18, this means that z1 = z11 z21 should satisfy:
We are now ready to give a proof of Theorem 2.1.2. This proof generalizes the construction
method described in the example above.
Proof of Theorem 2.1.2. (Bounded case) As in Section 2.1.3, we may write the feasible
region of any standard LO-model as:
n o
n T
F = x ∈ R ai x ≤ bi , i = 1, . . . , n + m .
The assumption that F is bounded implies that FJ is bounded for all J ⊆ {1, . . . , n + m}. We
will prove by induction on |J| that for any J ⊆ {1, . . . , n + m}:
Any x̂ ∈ FJ can be written as a convex combination of vertices of FJ . (?)
D. 3 . F ac e s , e x t r e m e p o i n t s , a n d a dj ac e n c y 607
We use induction to prove (?). As the base case we take J = {1, . . . , n + m}. Because F is the
feasible region of a standard LO-model, we have that:
n o
n T
F{1,...,n+m} = x ∈ R ai x = bi , i = 1, . . . , n + m
n
= x ∈ R Ax = b, x = 0 .
Therefore, either F{1,...,n+m} = {0}, or F{1,...,n+m} = ∅. In either case, (?) holds trivially.
For the general case, let J ⊆ {1, . . . , n + m} and assume that (?) holds for all J 0 with |J 0 | > |J|.
Let x̂ ∈ FJ . If FJ is the empty set or a singleton, then (?) holds trivially and we are done. So
we may assume that FJ contains at least two distinct points, u and u0 , say. Let d = u − u0 . We
have that d 6= 0 and aTj d = aTj (u − u0 ) = bj − bj = 0 for each j ∈ J . Consider moving away
as far as possible from x̂ in the directions d and −d, while staying inside FJ . Formally, define:
( )
aT (x̂ + αd) = b for j ∈ J
j j
α1 = max α ∈ R T , and
aj (x̂ + αd) ≤ bj for j ∈ J¯
( )
aT (x̂ − αd) = b for j ∈ J
j j
α2 = max α ∈ R T .
aj (x̂ − αd) ≤ bj for j ∈ J¯
(The vector α1 d can informally be described as the furthest we can move away from x̂ in the
direction d without leaving FJ , and similarly for −α2 d in the direction −d. Since aTj d = 0
for each j ∈ J , we have that aTj (x̂ + αd) = ajT x̂ = bj for all j ∈ J . So, all of the equality
constraints defined by aTj x = bj are still satisfied as we move in the direction d or −d.)
Because x̂ ∈ FJ and FJ is bounded, we have that α1 and α2 are nonnegative and finite.
By the choice of α1 , there exists an index j ∈ J¯ such that aTj (x̂ + α1 d) = bj . Hence,
x̂ + α1 d ∈ FJ∪{j} , and it follows from the induction hypothesis that x̂ + α1 d can be written
as a convex combination of vertices of FJ∪{j} . Since, by Theorem D.3.3, the vertices of FJ∪{j}
are also vertices of FJ , it follows that x̂ + α1 d can be written as a convex combination of the
vertices v1 , . . . , vk , say, of FJ . Similarly, x̂ − α2 d can be written as a convex combination of
vertices of FJ . Therefore, we have that:
k k
0
X X
x̂ + α1 d = λi vi , and x̂ − α2 d = λ i vi ,
i=1 i=1
Pk Pk 0
with i=1 λi = 1 and i=1 λi = 1. Observe that x̂ can be written as a convex combination
of x̂ + α1 d and x̂ − α2 d, namely:
α2
x̂ = ν(x̂ + α1 d) + (1 − ν)(x̂ − α2 d), with ν = α1 +α2 (∈ [0, 1]).
Therefore we may write x̂ as follows:
k k k
0 0
X X X
x̂ = ν λi vi + (1 − ν) λ i vi = (νλi + (1 − ν)λi )vi .
i=1 i=1 i=1
608 A p p e n d i x D. C o n v e x i t y
Pk 0 Pk Pk 0
Since i=1 (νλi + (1 − ν)λi ) = ν i=1 λi + (1 − ν) i=1 λi = 1, this proves that x̂ can be
written as a convex combination of vertices of FJ .
The function f : C → R is called concave if the ‘≥’ sign holds in (∗). The function f is
called a strictly convex function if the strict inequality sign ‘<’ holds in (∗); for a strictly concave
function, the ‘>’ sign holds in (∗).
Examples of convex functions are f (x) = x2 with x ∈ R, f (x1 , x2 ) = 3x21 + 2x22 with
T T T
x1 x2 ∈ R2 . Note that for any two points a f (a) and b f (b) , the line segment
connecting these two points is ‘above’ the graph of the convex function f and the region
‘above’ the graph is convex. This last property can be formulated as follows. Let C be a
convex set in Rn , and f : C → R. The epigraph of f on the set C is denoted and defined
by:
epi(f ) = {(x, λ) ∈ C × R | f (x) ≤ λ}.
It can be shown that epi(f ) on C is convex if and only if f is convex. Moreover, it can be
shown that, for any concave function f , it holds that epi(−f ) is convex.
The expression
p
Y n T o
Dk = D1 × . . . × Dp = x1 . . . xp xk ∈ Dk for k = 1, . . . , p
k=1
Theorem D.4.1.
For each k = 1, . . . , p, let fk : Dk → R be a (strictly) convex (concave) function.
Pp Qp
Then the function k=1 fk : k=1 Dk → R, defined by
p p
!
X X
fk (x1 , . . . , xp ) = fk (xk ),
k=1 k=1
1
Named after the French philosopher, mathematician, and writer René Descartes (1596–1650).
D. 4 . C o n v e x a n d c o n cav e f u n c t i o n s 609
Proof of Theorem D.4.1. We give a proof for convex functions and p = 2; all other cases can
be proved similarly, and they are left to the reader. So, let f1 (x) and f2 (y) be convex functions;
x ∈ D1 and y ∈ D2 . Note that D1 and D2 may have different dimensions. Also note that
(f1 + f2 )(x + y) = f1 (x) + f2 (y). Let 0 ≤ λ ≤ 1, x1 , x2 ∈ D1 , and y1 , y2 ∈ D2 . Then,
f1 (λx1 + (1 − λ)x2 ) + f2 (λy1 + (1 − λ)y2 ) ≤ λ(f1 (x1 ) + f2 (y1 )) + (1 − λ)(f1 (x2 ) + f2 (y2 )).
The proof of Theorem D.4.2 is based on the following fact, which we state without proof.
Let F be a nonempty convex set in Rn , and let f : F → R be a convex function. Then f
is continuous on the relative interior of F .
Theorem D.4.2.
Let F be a nonempty bounded subset of Rn , and let f : F → R be a strictly convex
function on F . Then, the function f has a unique minimizer on F .
Interior point algorithms (see Chapter 6) rely strongly on optimization methods from non-
linear constrained optimization. For that purpose, in this appendix, we will present some
of the theory concerning nonlinear constrained optimization, and we will show how linear
optimization can be viewed as a special case of nonlinear optimization.
E.1 Basics
Consider the function f : Rn → Rm (n, m ≥ 1). We say that f is continuous at the point
x̂ (∈ Rn ) if, for any sequence of points x1 , x2 , .. . in Rn that converges to x̂, we have that
T
limk→∞ f (xk ) = f (x̂). We write f (x) = f x1 . . . xn . Whenever this does not
T
cause any confusion, we write f (x1 , . . . , xn ) instead of f x1 . . . xn .
If f is differentiable at x̂, then the partial derivative of f with respect xi at the point x̂ is given
by:
∂f f (x̂ + εei ) − f (x̂)
(x̂) = lim ,
∂xi ε→0 ε
611
612 A p p e n d i x E . N o n l i n e a r o p t i m i z at i o n
f (x) g(x)
f (x̄)
x x
x̄
Figure E.1: The functions f (x) = x (solid line), Figure E.2: The functions f (x) = |x| (solid line),
2
2
and h(x) = x̂ + 2x̂(x − x̂) (dashed and h(x) = |x̂| + α(x − x̂) for
line). different choices of α (dashed lines).
where ei is the i’th unit vector in Rn , and i = 1, . . . , n. The vector ∇f (x̂) (if it exists) in
the definition above is called the gradient of f at x̂. It satisfies the following equation:
∂f
∂x (x̂)
1
..
∇f (x̂) = . .
∂f
(x̂)
∂xn
A function f is continuously differentiable at x̂ if f is differentiable at x̂ and the gradient of
f is continuous at x̂. If n = 1, then there is only one partial derivative, and this partial
derivative is then called the derivative of f .
T T T
points ŷ1 ẑ1 , ŷ2 ẑ2 , . . . in R2 that converge to ŷ ẑ , we have that:
Example E.1.3. Consider the function f : R → R defined by f (x) = |x|. We will show that
f is not differentiable at x̂ = 0. Suppose for a contradiction that f is differentiable at 0. Then, it
has a gradient, say ∇f (0) = α for some α ∈ R. Define h(x) = f (0) + α(x − 0) = αx; see
also Figure E.2. Because we assumed that f is differentiable at 0, we must have that:
f (xk ) − h(xk ) |xk | − αxk αxk
α = ∇f (0) = lim = lim = lim 1 − (E.1)
k→∞ |x̂ − xk | k→∞ |xk | k→∞ |x|
for every sequence x1 , x2 , . . . in R that converges to 0. However, this is not true. For example, choose
k
the sequence xk = (−1) k
, i.e., x1 = −1, x2 = 12 , x3 = − 13 , x4 = − 41 , . . .. This sequences
converges to 0, but we have that:
(
αxk (−1)k /k (−1)k /k 1 + α if k is odd
1− = 1−α k
= 1−α = 1 − α(−1)k =
|x| |(−1) /k| 1/k 1 − α if k is even.
We now distinguish two cases:
αxk
I α 6= 0. Then, lim 1 − |x|
does not exist, and hence (E.1) does not hold.
k→∞
The gradient has two important geometric interpretations. First, ∇f (x̂) is a vector that is
perpendicular to the so-called level set {x ∈ Rn | f (x) = f (x̂)}. Second, ∇f (x) points
in the direction in which the function f increases. In fact, ∇f (x) is a direction in which
f increases fastest; it is therefore also called the direction of steepest ascent.
614 A p p e n d i x E . N o n l i n e a r o p t i m i z at i o n
For p ∈ Rn , the directional derivative of f in the direction p (if it exists) is denoted and defined
as:
f (x + εp) − f (x)
D(f (x); p) = lim .
ε→0 ε
So the i’th partial derivative is just D(f (x); ei ) (i = 1, . . . , n).
∂2f ∂2f
∂x ∂x (x̂) . . . ∂x ∂x (x̂)
1 1 1 n
.. ..
∇2 f (x̂) = . . .
∂2f 2
∂ f
(x̂) · · · (x̂)
∂xn ∂x1 ∂xn ∂xn
max f (x)
(E.2)
s.t. x ∈ F,
where f : F → R is the objective function, F is the feasible region, and x is the vector of decision
variables. We will assume that F ⊆ Rn for some integer n ≥ 1, and therefore x is a vector
of n real numbers.
We assume that the set F can be described by a set of equality constraints gj (x) = 0
(j = 1, . . . , p) and inequality constraints hk (x) ≥ 0 (k = 1, . . . , q ). We also assume that
the functions f , gj , and hk (j = 1, . . . , p, k = 1, . . . , q ) are continuously differentiable.
The set F is then given by:
g (x) = 0 for j = 1, . . . , p,
F = x ∈ Rn j .
hk (x) ≤ 0 for k = 1, . . . , q
Consequently, the models that we deal with in this appendix can be written in the form:
max f (x)
s.t. gj (x) = 0 for j = 1, . . . , p (E.3)
hk (x) ≤ 0 for k = 1, . . . , q.
Example E.2.1. In linear optimization, the function f is linear, i.e., it can be written as
f (x) = cT x with c ∈ Rn . The feasible region F is the intersection of finitely many halfspaces;
see Section 2.1.1. The requirement that x ∈ F can then be written as hk (x) = akT x − bk , for
k = 1, . . . , q .
1
Named after the German mathematician Ludwig Otto Hesse (1811–1874).
E . 3 . L ag ra n g e m u l t i p l i e r m e t h o d 615
f (x)
4 global maximizer
2
local maximizer
x
0
1 2
local minimizer
−2
global minimizer
cos(2πx)
Figure E.3: Global and local minimizers and maximizers of the function f (x) = for
x
0.15 ≤ x ≤ 1.75.
Any point x ∈ F is called a feasible point (or feasible solution). The goal of optimization
is to find a feasible point x∗ for which the value of f (x∗ ) is as large as possible. A global
maximizer for model (E.2) is a feasible point x∗ ∈ F such that, for all x ∈ F , we have that
f (x∗ ) ≥ f (x).
Although the goal of optimization is to find a global maximizer, in many cases in nonlinear
optimization, this is too much to ask for. In such cases, we are already satisfied with a so-
called local maximizer. Intuitively, a local maximizer is a feasible point x̂ such that no feasible
point ‘close’ to it has a larger objective value than f (x̂). To make this intuition precise,
first define, for x̂ ∈ Rn and for ε > 0, the (restricted) ε-ball (around x̂) as Bε (x̂) =
{x ∈ F | kx − x̂k ≤ ε}; see also Section D.1. We say that the point x̂ ∈ F is a local
maximizer if there exists an ε > 0, such that for all x ∈ Bε (x̂), we have that f (x̂) ≥ f (x).
A global minimizer and a local minimizer are defined similarly. Figure E.3 illustrates these
concepts.
model (E.4):
max f (x)
(E.4)
s.t. gj (x) = 0 for j = 1, . . . , p.
In general, it is very difficult to solve constrained optimization models and many practi-
cal techniques work only in special cases. The Lagrange multiplier method for constrained
optimization models is one such technique. The method was developed by the French
mathematician Joseph Louis Lagrange (1736–1813). We will briefly review the technique
in this section. We start with an example that gives some intuition behind the method.
616 A p p e n d i x E . N o n l i n e a r o p t i m i z at i o n
xB
∇g
2 α=
∇f 2
∗ α=1
x .5
1
α = ln 2
0 1 2 3 4 xA
Figure E.4: Consumer choice model with parameters pA = 1, pB = 2, and b = 4. The solid line is the
budget
nh i constraint pA xA + pB xB = b. The level
o sets
xA
xB U (xA , xB ) = ln(xA ) + ln(xB ) = α of the objective function, for different
values of α, are drawn as dashed curves.
Example E.3.1. Consider the following consumer choice model. A consumer can buy two different
products, say products A and B . The prices of the products are pA and pB (in dollars), respectively,
and the consumer has a budget of b dollars. Suppose that the consumer receives a certain amount of
‘happiness’ or ‘utility’ U (xA , xB ) from buying amounts xA and xB of products A and B, respectively.
The consumer can spend all her money on just one product, but she is also allowed to spend some of
her money on product A, and some on product B . The optimization model she faces is:
max U (xA , xB )
(E.5)
s.t. pA xA + pB xB = b.
In this example, we choose U (xA , xB ) = ln(xA ) + ln(xB ). Note that we need xA > 0
and xB > 0 in order for U (xA , xB ) to make sense, but we will ignore these constraints in this
example. In the notation of model (E.3), we have that f (xA , xB ) = ln(xA ) + ln(xB ), and
g1 (xA , xB ) = b − pA xA − pB xB .
Figure E.4 shows the constraint g1 (xA , xB ) = 0, together with some of the level sets of f . Let
α > 0. Recall that the level set corresponding to α is given by the equation f (xA , xB ) = α, where
α is a constant. It is not hard to work out that the level sets of f satisfy the equation xA xB = eα for
α > 0.
To solve the optimization model, suppose we ‘move along’ the constraint pA xA + pB xB = b. As can
be seen from Figure E.4, different level sets cross the constraint, which means that at different points
T
on the line, the objective functions takes on different values. If, at a given point xA xB on this
line, the level set of f crosses the line pA xA + pB xB = b, then we can move along the constraint
to decrease the objective value. The only situation in which this is not possible, is when the level set
of f is tangent to the constraint, i.e., the level set of f does not cross the constraint. Thus, if x∗
∗ T
(= xA∗ xB ) is an optimal point, then the level set of the objective function at x∗ is tangent to the
E . 3 . L ag ra n g e m u l t i p l i e r m e t h o d 617
constraint at x∗ . Since the gradient of a function is perpendicular to its level set, this is equivalent to
requiring that the gradients of f and g are parallel at x∗ , i.e., ∇f (xA∗ , xB∗
) = λ∇g(xA∗ , xB ∗
) for
∗ ∗ T
some λ ∈ R. Therefore, any optimal point xA xB satisfies
1/xA∗
pA
∗ = λ , and
1/xB pB
∗
g(xA∗ , xB ) = b − pA xA∗ − pB xB
∗
= 0.
Note that λ is unrestricted in sign. This is only a necessary condition for x∗ being an optimal point; it
is in general not sufficient. We can summarize the system of (three) equations by defining a convenient
function, which is called the Lagrange function associated with the optimization model:
The beauty of this is that it suffices to equate the gradient of this Lagrange function to the all-zero
vector. Taking derivatives with respect to xA , xB , and λ, and equation them to zero, we obtain the
following system of equations:
1/xA∗ − pA λ∗
0
∗
∇L(xA∗ , xB ∗
, λ∗ ) = 1/xB − pB λ∗ = 0 .
pA xA∗ + pB xB∗
−b 0
Solving the first two equations of this system yields that xA∗ = p 1λ∗ , xB = p 1λ∗ . The third
A
∗ B
equations reads pA xA∗ + pB xB∗
− b = 0. Substituting the values for xA∗ and xB yields:
1 1 2
pA xA∗ + pB xB
∗
− b = pA ∗ + pB ∗ −b = − b = 0.
pA λ pB λ λ∗
This implies that λ∗ = 2/b, and therefore, the optimal solution satisfies:
b b
xA∗ = ∗
, xB = , λ∗ = 2/b.
2pA 2pB
The utility value corresponding to this optimal solution is:
b b
ln + ln .
2pA 2pB
Notice that the rate of change of the objective value with respect to b is:
∂ b b
2p 1 2p 1 2
ln + ln = A× + A× = ,
∂b 2pA 2pB b 2pA b 2pA b
which corresponds exactly to the optimal value of λ∗ . This is no coincidence: λ can in fact be viewed as
a variable of the dual model of (E.5). Duality theory for nonlinear optimization lies beyond the scope of
this book. The interested reader is referred to, e.g., Bazaraa et al. (1993), or Boyd and Vandenberghe
(2004). Duality theory for linear optimization is the subject of Chapter 4.
618 A p p e n d i x E . N o n l i n e a r o p t i m i z at i o n
T
The Lagrange multiplier method can be formulated as follows. Let x = x1 . . . xn ∈
T
Rn , and λ = λ1 . . . λp ∈ Rp . The Lagrange function (also called the Lagrangian)
L : R × R → R is defined by:
n p
p
X
L(x, λ) = f (x) − λj gj (x).
j=1
x̂
exists a vector λ̂ (∈ Rp ) such that is a stationary point of the function L, meaning that
λ̂
x̂
is a solution of the following set of equations:
λ̂
p
∂f X ∂gj
(x) − λj (x) = 0 for i = 1, . . . , n, (stationarity)
∂xi j=1
∂xi
gj (x) = 0 for j = 1, . . . , p. (feasibility)
The variables λ1 , . . . , λp are called the Lagrange multipliers of (E.4). The Lagrange multiplier
method for solving (E.4) can now be described as follows. By a ‘point at infinity’ we mean
the limit of a direction; i.e., if a is any nonzero point in Rn , then lim λa is the point at
λ→∞
infinity corresponding to a.
I Step 2: Find the set X of all constrained stationary points, by solving the following
set of n + p equations:
∂L
(x) = 0 for i = 1, . . . , n, and gj (x) = 0 for j = 1, . . . , p.
∂xi
E . 3 . L ag ra n g e m u l t i p l i e r m e t h o d 619
If this set of constraints is inconsistent, then stop: model (E.4) has no optimal
solution. Otherwise, continue.
I Step 3: Calculate the objective value f (x) at each x ∈ X , and – where appropriate
– at points at infinity satisfying the constraints.
I Step 4: Let z ∗ be the largest finite (if it exists) objective value f (x) among all x ∈
X . Let X ∗ be the set of points x ∈ X such that f (x) = z ∗ . If z ∗ is at
least as large as the largest objective value of the points at infinity, then stop:
X ∗ is the set of all optimal points of (E.4). Otherwise, model (E.4) has no
optimal solution.
In the case of a minimizing model, the same algorithm can be used, except for Step 4,
where smallest values have to be selected. The following example illustrates the Lagrange
multiplier method.
Thus, in the notation of (E.4), we have that f (x) = −x21 − x22 − x23 , g1 (x) = x1 + x2 + x3 ,
and g2 (x) = x1 + 2x2 + 3x3 − 1. The Lagrange function reads:
∂L ∂L
(x, λ) = −2x1 − λ1 − λ2 = 0, (x, λ) = −x1 − x2 − x3 = 0,
∂x1 ∂λ1
∂L ∂L
(x, λ) = −2x2 − λ1 − 2λ2 = 0, (x, λ) = −x1 − 2x2 − 3x3 + 1 = 0,
∂x2 ∂λ2
∂L
(x, λ) = −2x3 − λ1 − 3λ2 = 0.
∂x3
It follows that x1 = − 21 (λ1 + λ2 ), x2 = − 12 (λ1 + 2λ2 ), and x3 = − 12 (λ1 + 3λ2 ).
Substituting these values into the constraint equations, and solving for λ1 and λ2 yields that λ1 = 2
and λ2 = −1. Substituting these values back into the expressions found for x1 , x2 , and x3 , yields
T
that the unique constrained stationary point is − 21 0 12 . The corresponding objective value is
T T
where λ = λ1 . . . λp ∈ Rp and µ = µ1 . . . µq ∈ Rq . The stationarity condi-
tions corresponding to this model are known as the Karush-Kuhn-Tucker conditions2 (or KKT
conditions). They are the following n + p + 3q equations and inequalities:
Stationarity:
p q
∂f X ∂gj X ∂h
(x) − λj (x) − µk k (x) = 0 for i = 1, . . . , n,
∂xi j=1
∂xi k=1
∂xi
Primal feasibility:
gj (x) = 0 for j = 1, . . . , p,
hk (x) ≤ 0 for k = 1, . . . , q ,
Dual feasibility:
µk ≥ 0 for k = 1, . . . , q ,
Complementary slackness relations:
µk hk (x) = 0 for k = 1, . . . , q .
Note that the first two sets of equations are equivalent to ∇L(x, µ, λ) = 0.
The following theorem states that, under certain regularity conditions, the KKT conditions
hold at any local maximum. Although these regularity conditions are satisfied in many
cases in practice, they do not hold in all situations. The regularity conditions are usually
referred to as the constraint qualifications. There are several such constraint qualifications,
each of which, if satisfied, guarantees that the KKT conditions hold. Two commonly used
constraint qualifications are:
2
Named after the American mathematicians William Karush (1917–1997) and Harold William Kuhn
(1925–2014), and the Canadian mathematician Albert William Tucker (1905–1995).
E . 4 . K a r u s h - Ku h n - Tu c k e r c o n d i t i o n s 621
12
10 ∗
x
∗
8 y
0 1 2 3 4 x
x
nh i o
Figure E.5: The feasible region of the model in Example E.4.1. The level sets y xy = α of the
objective function, for different values of α, are drawn as dashed curves.
Clearly, the linearity constraint qualification holds in the case when model (E.6) is an LO-
model, so that the KKT conditions apply to linear optimization.
The KKT conditions can, in principle, be used to find an optimal solution to any model
(for which one of the constraint qualifications is satisfied). However, the equations and
inequalities that constitute the KKT conditions are in general very hard to solve, and they
can usually only be solved for very small models. Even the relatively simple model in the
following example requires tedious calculations.
622 A p p e n d i x E . N o n l i n e a r o p t i m i z at i o n
max xy
s.t. x2 + y ≤ 12
x, y ≥ 0.
Figure E.5 illustrates the feasible region and the levels sets ofnh
the iobjective function of this model.oSince
x 2
the objective function is continuous and the feasible region y x + y ≤ 12, x, y ≥ 0 is a
closed bounded subset of R2 , Theorem D.1.1 guarantees the existence of an optimal point of this
model. Putting the model into the form of (E.6), we have that f (x, y) = xy , p = 0, q = 3,
h1 (x, y) = x2 + y − 12, h2 (x, y) = −x, and h3 (x, y) = −y . The corresponding Lagrange
function is:
This system of equations and inequalities can be solved by considering a few different cases. First,
suppose that µ1 = 0. Then (i) implies that y = −µ2 , and hence (v) and (vii) together imply that
y = µ2 = 0. Similarly, (ii) implies that x = −µ3 , and hence (iv) and (viii) together imply that
x = µ3 = 0. It is straightforward to verify that the solution x = y = µ1 = µ2 = µ3 = 0
indeed satisfies the conditions (i)–(xi). The corresponding objective value is 0.
√
Second, suppose that µ1 > 0. By (ix), we have that x2 + y = 12, and hence x = 12 − y .
We consider three subcases:
I x = 0. Then, we have that y = 12 > 0. Equation (xi) implies that µ3 = 0, and hence (ii)
implies that µ1 = x = 0, contradicting the fact that µ1 > 0.
√
I y = 0. Then, we have that x = 12 > 0. Equation (x) implies that µ2 = 0, and hence (i)
implies that µ1 = y2 = 0, contradicting the fact that µ1 > 0.
I x > 0 and y > 0. Then, equations (x) and (xi) imply that µ2 = µ3 = 0. Hence, using (i)
and (ii), we have that x = µ1 and y = 2µ1 x = 2x2 . So, we have that 12 = x2 + y = 3x2 ,
which implies that either x = −2 (which we may ignore because we have that x ≥ 0) or x = 2.
So, we must have x = 2, and therefore y = 8. Thus, we find the following solution of the
conditions (i)–(xi):
x = 2, y = 8, µ1 = 2, µ2 = 0, µ3 = 0.
E . 4 . K a r u s h - Ku h n - Tu c k e r c o n d i t i o n s 623
∗
h1 (x) = 0 ∇f (x ) h2 (x) = 0
2
∗
∗ x ∗
1 ∇h2 (x ) ∇h1 (x )
1 2 3 4 x
Figure E.6: Nonlinear optimization model for which the KKT conditions fail.
We have found the two points of the KKT conditions (i)–(xi). Clearly, the first one (the all-zero
solution) is not optimal, because the corresponding objective value is 0, and we know that there exists a
point with corresponding objective value 16. Since we also know that an optimal point exists, it must
be the second solution, i.e., x∗ = 2, y ∗ = 8.
Note that the first KKT conditions can be written in vector form as follows:
p q
X X
∇f (x) = λj ∇gj (x) + µk ∇hk (x). (E.8)
j=1 k=1
This means that, assuming that some constraint qualification condition holds, any optimal
point satisfies the property that the gradient of the objective function at that point can
be written as a linear combination of the gradients of the constraints at that point. The
weights in this linear combination are precisely the corresponding Lagrange multipliers. The
following example shows that there are cases in which the KKT conditions fail, although an
optimal solution exists. The reader may check that the constraint qualifications listed above
Theorem E.4.1 do not hold.
Example E.4.2. Consider the following optimization model with decision variables x and y :
max y
s.t. (x − 1)2 + (y − 1)2 ≤ 1
(x − 3)2 + (y − 1)2 ≤ 1.
The feasible region and the constraints of this problem are drawn in Figure E.6. The first constraint
T
restricts the values of x and y to lie inside a unit circle (i.e., with radius 1) centered at the point 1 1 ;
T
the second constraint restricts the values to lie inside a unit circle centered at 3 1 . From the figure,
T
it should be clear that the only feasible, and hence optimal, point of the model is the point 2 1 .
However, consider the KKT conditions. Setting f (x, y) = y , h1 (x, y) = (x−1)2 +(y−1)2 −1,
624 A p p e n d i x E . N o n l i n e a r o p t i m i z at i o n
T
and h2 (x, y) = (x − 3)2 + (y − 1)2 − 1, we have that, at the point 2 1 :
0
∇f (x, y) = ,
1
2(x − 1) 2
∇h1 (x, y) = = ,
2(y − 1) 0
2(x − 3) −2
∇h2 (x, y) = = .
2(y − 1) 0
Hence, at the unique optimal point, the gradient of the objective function cannot be written as a linear
combination of the gradients of the constraints at that point, and therefore the KKT conditions have
no solution. This can also seen from the figure: the arrow pointing up is the gradient vector of the
objective function at the optimal point, and the arrows pointing left and right are the gradient vectors of
the constraints at the optimal point.
h1 (x)
..
.
h (x) Ax − b
f (x) = c x, and
T
h
m = .
m+1 (x) −x
..
.
hm+n (x)
c − AT y + λ = 0 (stationarity)
Ax − b ≤ 0, −x ≤ 0 (primal feasibility)
y ≥ 0, λ ≥ 0 (dual feasibility)
T
T
y (Ax − b) = 0, λ (−x) = 0. (complementary slackness)
E . 5 . K a r u s h - Ku h n - Tu c k e r c o n d i t i o n s f o r l i n e a r o p t i m i z at i o n 625
Ax ≤ b, x ≥ 0 (primal feasibility)
T
y ≥ 0, A y ≥ c (dual feasibility)
T
y (b − Ax) = 0, T T
(A y − c) x = 0 (complementary slackness)
Recall that Theorem E.4.1 states that any optimal point of a standard LO-model satisfies
this system of equations and inequalities, i.e., they form a necessary condition for optimality.
Compare in this respect Theorem 4.3.1, which states that the KKT conditions are in fact
necessary and sufficient.
This page intentionally left blank
Appe ndixF
Writing LO-models in GNU
MathProg (GMPL)
In this appendix, the GNU MathProg Modeling Language (or GMPL) is described. The GMPL
language is a subset of the AMPL language, which is one of a few common modeling
languages for mathematical optimization, including linear optimization. Similar languages
include AIMMS, AMPL, GAMS, and Xpress-Mosel. These languages provide a tool for
describing and solving complex optimization models. One of the advantages of using such
a language is the fact that models can be expressed in a notation that is very similar to
the mathematical notation that is used for expressing optimization models. This allows for
concise and readable definitions of the models, and avoids having to explicitly construct the
abstract ingredients of LO-models, namely the vector objective coefficients, the technology
matrix, and the vector of right hand side values.
We chose to describe GMPL because software for it is available on the internet at no cost.
We think that most modeling languages are similar to GMPL, so once one knows how to
write models in this language, the transition to other modeling languages is relatively easy.
GMPL models may be solved using, for instance, the GLPK package, which is available for
download at http://www.gnu.org/software/glpk/. For small models, the reader is
invited to use the online solver on this book’s website: http://www.lio.yoriz.co.uk/.
This online solver provides a user-friendly interface for solving GMPL models. Moreover,
the code for the listings in this book are available online.
627
628 A p p e n d i x F. W r i t i n g L O - m o d e l s i n G N U M at h P r o g ( G M P L )
1 /* Decision variables */
2 var x1 >= 0; # number of boxes (x 100,000) of long matches
3 var x2 >= 0; # number of boxes (x 100,000) of short matches
4
5 /* Objective function */
6 maximize z: 3*x1 + 2*x2;
7
8 /* Constraints */
9 subject to c11: x1 + x2 <= 9; # machine capacity (1.1)
10 subject to c12: 3*x1 + x2 <= 18; # wood (1.2)
11 subject to c13: x1 <= 7; # boxes for long matches (1.3)
12 subject to c14: x2 <= 6; # boxes for short matches (1.4)
13
14 end;
Lines 2–3 define the decision variables x1 and x2. Any variable definition starts with the
keyword ‘var’. They are defined to be nonnegative decision variables by the ‘>= 0’ sign. To
declare the nonpositive decision variable x, we would insert the definition ‘var x <= 0’.
In fact, the 0 can be replaced by any value. For instance, to declare the variable x that
has to have value at least 15, one can insert ‘var x >= 15’. It is also possible to specify
both upper and lower bounds. For instance: var x >= 0, <= 20 defines the nonnegative
decision variable x whose value should be at most 20. Notice that the variable definitions,
like every definition in GMPL, end with a semicolon. Note also that GMPL is case sensitive,
i.e., the expressions ‘y1’ and ‘Y1’ are seen as two different variables.
Line 6 defines the objective. The word ‘maximize’ indicates that we are dealing with a max-
imizing LO-model. We could have used ‘minimize’ to indicate a minimizing LO-model.
The word ‘maximize’ is followed by ‘z’, which is the name of the objective. The following
colon separates the name of the objective from the objective function. The objective func-
tion (3x1 + 2x2 in this case) is written using the customary arithmetic signs +, −, *, and /.
The semicolon marks the end of the objective definition.
Lines 9–12 define the constraints. Each constraint starts with the keywords ‘subject to’.
As with the objective, these keywords are followed by the name of the constraint, a colon,
the constraint expression, and a closing semicolon. The names of the constraints (‘c11’,
‘c12’, ‘c13’, ‘c14’) must be chosen to be unique. The sign ‘<=’ means that we are defining
a ‘≤’ constraint. A ‘≥’ constraint is defined using the sign ‘>=’, and an equality constraint
using ‘=’.
P = the set of end products, i.e., long matches and short matches.
I = the set of input products required for the production of the products, i.e., machine
time, wood, boxes for long matches, and boxes for short matches.
pj = the profit from selling one unit of end product j (j ∈ P ).
aij = the number of units of input product i that is required for the production of one
unit of end product j (i ∈ I , j ∈ P ).
ci = the amount of input product available for production (i ∈ I ).
xj ≥ 0 for j ∈ P.
This model has one variable for each element of P , and one constraint for each element of
I . In GMPL, this model is written as follows:
1 /* Model parameters */
2 set P; # set of products
3 set I; # input products
4 param p{P}; # profit per unit of product
5 param c{I}; # availability of each input product
6 param a{I, P}; # number of units of input required to produce
7 # one unit of product
8
9 /* Decision variables */
10 var x{P} >= 0; # number of boxes (x 100,000) of each product
11
12 /* Objective function */
13 maximize z:
14 sum{j in P} p[j] * x[j];
15
16 /* Constraints */
17 subject to input {i in I}: # one constraint for each input product
18 sum{j in P} a[i, j] * x[j] <= c[i];
19
20 /* Model data */
21 data;
22
23 set P := long short;
24 set I := machine wood boxlong boxshort;
630 A p p e n d i x F. W r i t i n g L O - m o d e l s i n G N U M at h P r o g ( G M P L )
25
26 param p :=
27 long 3
28 short 2;
29
30 param c :=
31 machine 9
32 wood 18
33 boxlong 7
34 boxshort 6;
35
36 param a : long short :=
37 machine 1 1
38 wood 3 1
39 boxlong 1 0
40 boxshort 0 1;
41
42 end;
This GMPL listing consists of two parts. The first part (lines 1–18) concerns the definition
of model (F.1), and there is no reference to the particular data of the model (i.e., the precise
elements of the sets P and I , the numerical values of parameters aij , ci , and pj ). The values
of the sets and parameters are specified in the second part (lines 20–41). An LO-model can
only be solved if all these numerical values of the sets and parameters have been specified.
This separation has a number of advantages: (1) it is easier to read the listing when model
and data are specified separately, (2) solving the same model with a new data set is a matter of
simply replacing the second part of the listing, and (3) especially when the model has many
sets and parameters, and these sets and parameters appear in different places in the model, it
saves a lot of repetition in the code.
Lines 2–3 define the two sets P and I , without specifying the elements of the sets. Lines 4–6
define the different parameters of the model, again without specifying their numerical values.
The parameters are defined with index sets. For example, line 4 defines the parameter p,
which has index set P. This means that, for every element j in P, a scalar parameter p[j]
is defined. Similarly, lines 5–6 define, for each element i in I and for each element j in P,
the parameters c[i] and a[i,j].
Line 10 defines the decision variables of the model. It defines one nonnegative variable
x[j] for each j in P.
Lines 13–14 define the objective. As in Listing F.1, the objective is to maximize the total
profit. Since it is not known at this stage how many terms will be involved in this expression,
we take the sum over all elements j of P of the total profit due to selling product j, i.e.,
p[j]*x[j].
F. 3 . B u i l t - i n o p e rat o r s a n d f u n c t i o n s 631
Lines 17–18 compactly summarize the constraints of the model. In contrast to the previous
listing, this expression defines multiple constraints at once: the expression ‘{i in I}’ means
that we are defining one constraint for each element i of I. The left hand side of each of
the constraints consists of a summation, and the right hand side contains a parameter.
Line 21 signals to the solver package that the model has been specified, and that the remain-
der of the listing will contain data specifications. Lines 23–24 specify the entries of the sets
P and I. The entries consist of plain words separated by spaces. So, P contains two entries,
and I contains four entries.
Lines 26–28 specify the parameter p. Recall that p is indexed by the set of products P. Thus,
since P has two entries, we need to specify two values, namely p[long] and p[short].
This is exactly what these lines do: they state that p[long] = 3 and p[short] = 2. The
parameter c is specified in lines 30–34 in a similar manner.
Lines 36–41 specify the two-dimensional parameter a in table form. Line 36 announces that
the parameter a will be specified. The next line contains the elements of the first index set,
I. These elements can follow in any order, as long as the numerical values are consistently
listed in the same order. Lines 38–41 start with the name of an element j of the set P,
followed by the values of a[i,j].
If all sets and model parameters have been given numerical values, the solver package has
enough information to construct an LO-model: it derives the number of decision variables
from the number of elements of the set P, the number of constraints from the number of
elements of the set I, the vector of objective coefficients from the definition of the objective
function, and the entries of the technology matrix and the right hand side vector from the
constraint definitions. The solver uses an algorithm (i.e., the simplex algorithm) to solve the
resulting LO-model, and then, if an optimal solution exists, translates the optimal solution
back into the variables names defined in the model.
x*y x × y;
x
x/y y;
y
x**y, x^y x ;
x div y integer quotient x/y (i.e., x/y without the fractional part);
x mod y remainder of x divided by y ;
abs(x) absolute value |x|;
atan(x) arctan x (in radians);
atan(y, x) arctan y/x (in radians);
card(S) cardinality |S| of the set S ;
ceil(x) x rounded up;
632 A p p e n d i x F. W r i t i n g L O - m o d e l s i n G N U M at h P r o g ( G M P L )
Recall that the solver constructs an LO-model (i.e., the technology matrix A, and the
vectors b and c) from the model code. Because the operators and functions listed above
are all nonlinear, they cannot appear in the constructed LO-model. Instead, any of these
operators and functions is evaluated when the solver constructs the LO-model. Hence, they
may only involve parameter values (which are known when the LO-model is constructed),
and they cannot involve any decision variables.
Another special expression in GMPL is the ‘if, then, else’ expression. We will explain this
expression by means of an example. Suppose that we have an LO-model with decision
variables x1 , . . . , xT , and s1 , . . . , sT , where xt represents the account balance (in dollars)
of a bank account at the end of year t, and st denotes the (net) earnings during year t
(t = 1, . . . , T ). Let x0 be the initial account balance. Let r be the annual interest rate. For
simplicity, assume that the interest is paid at the beginning of the year, based on the balance
at that time point. The evolution of the account balance can be expressed as the following
set of constraints:
1 param r;
2 param T;
3 var x{1..T} >= 0;
4 var s{1..T} >= 0;
5 subject to balance{t in 1..T}:
6 x[t] = (1+r) * x[t−1] + s[t];
However, this code is invalid, because x[0] is not defined. The problem lies in the fact that
x1 , . . . , xT are variables of the model, whereas x0 is a parameter of the model. It is not
possible to define x[0] as a parameter. Instead, a new parameter, x0, say, has to be defined.
This means that, in constraint (F.2) with t = 1, x0 should be used instead of x[0] and, for
t 6= 1, x[t] should be used. This can be written in GMPL code as follows:
1 param x0;
F. 4 . G e n e rat i n g a l l s u b s e t s o f a s e t 633
If condition is true, then the expression is equal to value-if-true; otherwise it is equal to value-
if-false. Note that, as with the operators and functions above, any ‘if, then, else’ expression
is evaluated when the GMPL solver constructs the technology matrix A, and the vectors b
and c. This means that the condition may only depend on the values of the model parameters,
and not on the values of the decision variables. However, value-if-true and value-if-false may
contain decision variables.
It is important to note that the ‘if, then, else’ expression discussed here is different from
the conditional relationships that are discussed in Section 7.3. The latter are conditional
relationships between decision variables. Such relationships generally require the use of
mixed integer optimization algorithms and are computationally harder to deal with; see
Chapter 7. See also Section 18.5 for an application that uses ‘if, then, else’ expressions.
∅, {0}, {1}, {0, 1}, {2}, {0, 2}, {1, 2}, {0, 1, 2},
{3}, {0, 3}, {1, 3}, {0, 1, 3}, {2, 3}, {0, 2, 3}, {1, 2, 3}, {0, 1, 2, 3}.
634 A p p e n d i x F. W r i t i n g L O - m o d e l s i n G N U M at h P r o g ( G M P L )
For k = 0, . . . , 2n+1 −1 and i = 0, . . . , n, let di (k) be the i’th binary digit of the number
k . It can be checked that:
(
0 if 2−i k if even
−i
di (k) = 2 k (mod 2) =
1 if 2−i k if odd.
In this expression, x (mod 2) denotes the remainder after dividing x by 2. So, for instance
12 (mod 2) = 0, and 13 (mod 2) = 1. For k = 0, . . . , 2n+1 −1, we have that the subset
Sk of X satisfies:
Note that, in Model 7.2.2, there are no subtour elimination constraints for S = ∅ and
S = J . So, to implement this model in GMPL, we should define SI to be the numbers
1, . . . , 2n+1 −2.
List of Symbols
635
636 L i st of Sym bol s
NI typical notation for the set of indices of the current basic variables
N (A) null space of the matrix A (see Appendix B)
N set of positive integers
n! n factorial
n n!
binomial coefficient ‘n choose k’ (= )
k k!(n−k)!
O(·) big-O notation (see Chapter 9)
R(A) row space of the matrix A (see Appendix B)
relint (see Appendix D.1)
R set of all real numbers
Rn set of all n-dimensional real-valued vectors
m×n
R set of all real-valued (m, n)-matrices
s.t. subject to
sup supremum (see Appendix D.1)
|S| number of elements in the set S
t(a) tail node of the arc a (see Appendix C.6)
V(G) node set of the graph G (see Appendix C.1)
x typical notation for the vector of (primal) decision variables
xs typical notation for the vector of (primal) slack variables
xi typical notation for a (primal) decision variable
T
typical notation for a column vector with specified entries
x1 . . . xn
{x | P (x)} set of all elements x satisfying the property P
y typical notation for the vector of dual decision variables
ys typical notation for the vector of dual slack variables
yi typical notation for a dual decision variable
z typical notation for the objective value
∗
z typical notation for the optimal objective value
Z set of integers (i.e., the positive and negative integers, and 0)
= is equal to
6= is not equal to
≈ is approximately equal to
≡ is equivalent to, up to an appropriate permutation of the rows and/or columns
:= assign to
< less than
≤ less than or equal; if both sides of the inequality are vectors, then the inequality
is understood to be entry-wise
> greater than
is much smaller than
is much greater than
638 L i st of Sym bol s
≥ greater than or equal; if both sides of the inequality are vectors, then the inequality
is understood to be entry-wise
∼ is equivalent to
∅ empty set
\ set difference
∈ is an element of
6∈ is not an element of
⊂ is a subset of (but not equal to)
⊆ is a subset of (or equal to)
∪ set union
∩ set intersection
n
\
Si intersection of the sets S1 , . . . , Sn
i=1
[n
Si union of the sets S1 , . . . , Sn
i=1
⊥ is perpendicular to
∞ infinity
∨ logical inclusive disjunction (see Section 7.3.2)
Y logical exclusive disjunction (see Section 7.3.2)
∧ logical conjunction (see Section 7.3.2)
⇒ logical implication (see Section 7.3.2)
⇔ logical equivalence (see Section 7.3.2)
¬ logical negation (see Section 7.3.2)
Bibliography
Ahuja, R. K., Magnanti, T. L., and Orlin, J. B. (1993), Network Flows: Theory, Algorithms, and Applica-
tions, Prentice Hall, Englewood Cliffs, New Jersey.
Aigner, M., and Ziegler, G. M. (2010), Proofs from The Book, Fourth edition, Springer.
Albers, D. J., Reid, C., and Dantzig, G. B. (1986), ‘An Interview with George B. Dantzig: The Father
of Linear Programming’, The College Mathematics Journal 17(4), 292–314.
Altier, W. J. (1999), The Thinking Manager’s Toolbox: Effective Processes for Problem Solving and Decision
Making, Oxford University Press.
Applegate, D. L., Bixby, R. E., Chvátal, V., and Cook, W. J. (2006), The Traveling Salesman Problem: A
Computational Study, Princeton University Press.
Arora, S., and Barak, B. (2009), Computational Complexity: A Modern Approach, Cambridge University
Press, New York.
Arsham, H., and Oblak, M. (1990), ‘Perturbation Analysis of General LP Models: A Unified Approach
to Sensitivity, Parametric, Tolerance, and More-for-Less Analysis’, Mathematical and Computer Mod-
elling 13(8), 79–102.
Barnett, S. (1990), Matrices. Methods and Applications, Oxford Applied Mathematics and Computing
Science Series, Clarendon Press, Oxford.
Bazaraa, M. S., Jarvis, J. J., and Sherali, H. D. (1990), Linear Programming and Network Flows, Second
edition, John Wiley & Sons, Inc., New York.
Bazaraa, M. S., Sherali, H. D., and Shetty, C. M. (1993), Nonlinear Programming: Theory and Algorithms,
John Wiley & Sons, Inc., New York.
Beasley, J. E. (1996), Advances in Linear and Integer Programming, Oxford University Press.
Bertsimas, D., and Tsitsiklis, J. N. (1997), Introduction to Linear Optimization, Athena Scientific, Bel-
mont, Massachusetts.
639
640 B i b l i o g ra p h y
Bixby, R. E., Fenelon, M., Gu, Z., Rothberg, E., and Wunderling, R. (2000), ‘MIP: Theory and
Practice – Closing the Gap’, System Modelling and Optimization: Methods, Theory, and Applications
174, 19–49.
Bondy, J. A., and Murty, U. S. R. (1976), Graph Theory with Applications, MacMillan, London.
Boyd, S. P., and Vandenberghe, L. (2004), Convex Optimization, Cambridge University Press.
Bröring, L. (1996), Linear Production Games, Master’s thesis, University of Groningen, The Nether-
lands.
Chartrand, G., Lesniak, L., and Zhang, P. (2010), Graphs & Digraphs, Fifth edition, CRC Press.
Checkland, P. (1999), Systems Thinking, Systems Practice, John Wiley & Sons, Inc., New York.
Chen, D. S., Batson, R. G., and Dang, Y. (2011), Applied Integer Programming: Modeling and Solution,
John Wiley & Sons, Inc., New York.
Chvátal, V. (1983), Linear Programming, W.H. Freeman and Company, New York.
Ciriani, T. A., and Leachman, R. C. (1994), Optimization in Industry 2: Mathematical Programming and
Modeling Techniques in Practice, John Wiley & Sons, Inc., New York.
Cook, W. J. (2012), In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation, Prince-
ton University Press, Princeton and Oxford.
Cook, W. J., Cunningham, W. H., Pulleyblank, W. R., and Schrijver, A. (1997), Combinatorial Opti-
mization, John Wiley & Sons, Inc., New York.
Cooper, W. W., Seiford, L. M., and Zhu, J., eds (2011), Handbook on Data Envelopment Analysis, Second
edition, Springer.
Coppersmith, D., and Winograd, S. (1990), ‘Matrix Multiplication via Arithmetic Progressions’, Jour-
nal of Symbolic Computation 9(3), 251–280.
Cornuéjols, G. (2008), ‘Valid Inequalities for Mixed Integer Linear Programs’, Mathematical Program-
ming 112(1), 3–44.
Cristianini, N., and Shawe-Taylor, J. (2000), An Introduction to Support Vector Machines and Other Kernel-
Based Learning Methods, Cambridge University Press.
Cunningham, W. H., and Klincewicz, J. G. (1983), ‘On Cycling in the Network Simplex Method’,
Mathematical Programming 26(2), 182–189.
Dantzig, G. B. (1963), Linear Programming and Extensions, Princeton University Press, Princeton, New
Jersey.
Dantzig, G. B. (1982), ‘Reminiscences about the Origins of Linear Programming’, Operations Research
Letters 1(2), 43–48.
Fang, S.-C., and Puthenpura, S. (1993), Linear Optimization and Extensions: Theory and Algorithms,
Prentice Hall, Inc.
B i b l i o g ra p h y 641
Ferris, M. C., Mangasarian, O. L., and Wright, S. J. (2007), Linear Programming with Matlab, MPS-
SIAM Series on Optimization.
Fletcher, R. (2000), Practical Methods of Optimization, John Wiley & Sons, Inc., New York.
Friedman, J. W. (1991), Game Theory with Applications to Economics, Second edition, Oxford University
Press.
Gal, T. (1986), ‘Shadow Prices and Sensitivity Analysis in Linear Programming under Degeneracy’,
OR Spectrum 8(2), 59–71.
Gal, T. (1995), Postoptimal Analyses, Parametric Programming and Related Topics, Walter de Gruyter,
Berlin.
Gill, P. E., Murray, W., and Wright, M. H. (1982), Practical Optimization, Academic Press, Inc., Lon-
don.
Gilmore, P. C., and Gomory, R. E. (1963), ‘A Linear Programming Approach to the Cutting Stock
Problem – Part II’, Operations Research 11(6), 863–888.
Greenberg, H. J. (1986), ‘An Analysis of Degeneracy’, Naval Research Logistics Quarterly 33(4), 635–655.
Griffel, D. H. (1989a), Linear Algebra and its Applications. Vol. 1: A First Course, Halsted Press.
Griffel, D. H. (1989b), Linear Algebra and its Applications. Vol. 2: More Advanced, Halsted Press.
Guéret, C., Prins, C., and Sevaux, M. (2002), Applications of Optimisation with Xpress-MP, Dash Opti-
mization.
Haimovich, M. (1983), The Simplex Algorithm is Very Good! On the Expected Number of Pivot Steps and
Related Properties of Random Linear Programs, Columbia University Press.
Hartsfield, N., and Ringel, G. (2003), Pearls in Graph Theory: A Comprehensive Introduction, Dover
Publications.
den Hertog, D. (1994), Interior Point Approach to Linear, Quadratic and Convex Programming: Algorithms
and Complexity, Kluwer Academic Publisher.
Horn, R. A., and Johnson, C. R. (1990), Matrix Analysis, Cambridge University Press.
Ignizio, J. P. (1991), Introduction to Expert Systems: the Development and Implementation of Rule-Based
Expert Systems, McGraw-Hill, New York.
Jeffrey, P., and Seaton, R. (1995), ‘The Use of Operational Research Tools: A Survey of Operational
Research Practitioners in the UK’, Journal of the Operational Research Society 46, 797–808.
642 B i b l i o g ra p h y
Jenkins, L. (1990), ‘Parametric Methods in Integer Linear Programming’, Annals of Operations Research
27(1), 77–96.
Johnson, E. L., and Nemhauser, G. L. (1992), ‘Recent Developments and Future Directions in Math-
ematical Programming’, IBM Systems Journal 31(1), 79–93.
Johnson, L. W., Riess, R. D., and Arnold, J. T. (2011), Introduction to Linear Algebra, Sixth edition,
Pearson.
Karmarkar, N. (1984), ‘A New Polynomial-Time Algorithm for Linear Programming’, Combinatorica
4, 373–395.
Keys, P. (1995), Understanding the Process of Operational Research: Collected Readings, John Wiley & Sons,
Inc., New York.
Khachiyan, L. G. (1979), ‘A Polynomial Algorithm in Linear Programming’, Doklady Akademia Nauk
SSR 224(5), 1093–1096.
Klee, V., and Minty, G. J. (1972), How Good is the Simplex Algorithm?, in O. Shisha, ed., ‘Inequalities
III’, Academic Press, New York and London, 159–175.
Lawler, E. L., Lenstra, J. K., Rinnooy Kan, A. H. G., and Shmoys, D. B., eds (1985), The Traveling
Salesman Problem: A Guided Tour of Combinatorial Optimization, John Wiley & Sons, Inc., New
York.
Lay, D. C. (2012), Linear Algebra and its Applications, Fourth edition, Pearson.
Lee, J. (2004), A First Course in Combinatorial Optimization, Cambridge University Press.
Lenstra, A. K., Rinnooy Kan, A. H. G., and Schrijver, A. (1991), History of Mathematical Programming
– A Collection of Personal Reminiscences, North-Holland.
Littlechild, S. C., and Shutler, M. F., eds (1991), Operations Research in Management, Prentice Hall
International, London.
Liu, J. S., Lu, L. Y. Y., Lu, W. M., and Lin, B. J. Y. (2013), ‘Data Envelopment Analysis 1978–2010: A
Citation-Based Literature Survey’, Omega 41(1), 3–15.
Luenberger, D. G., and Ye, Y. (2010), Linear and Nonlinear Programming, Springer.
Martello, S., and Toth, P. (1990), Knapsack Problems: Algorithms and Computer Implementations, John
Wiley & Sons, Inc., New York.
Martin, R. K. (1999), Large Scale Linear and Integer Optimization: A Unified Approach, Springer.
Matoušek, J., and Gärtner, B. (2007), Understanding and Using Linear Programming, Springer.
McMahan, H. B., Holt, G., Sculley, D., Young, M., Ebner, D., Grady, J., Nie, L., Phillips, T., Davydov,
E., Golovin, D. et al. (2013), Ad Click Prediction: A View from the Trenches, in ‘Proceedings of
the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining’,
ACM, 1222–1230.
Michalewicz, Z., and Fogel, D. B. (2004), How to Solve It: Modern Heuristics, Springer.
Mirchandani, P. B., and Francis, R. L., eds (1990), Discrete Location Theory, John Wiley & Sons, Inc.,
New York.
B i b l i o g ra p h y 643
Mitra, G., Lucas, C., Moody, S., and Hadjiconstantinou, E. (1994), ‘Tools for Reformulating Logical
Forms into Zero-One Mixed Integer Programs’, European Journal of Operational Research 72(2), 262–
276.
Müller-Merbach, H. (1981), ‘Heuristics and their Design: A Survey’, European Journal of Operational
Research 8(1), 1–23.
Nemhauser, G. L., and Wolsey, L. A. (1999), Integer and Combinatorial Optimization, John Wiley &
Sons, Inc., New York.
Nering, E. D., and Tucker, A. W. (1993), Linear Programs and Related Problems, Academic Press, Lon-
don.
Papadimitriou, C. H., and Steiglitz, K. (1998), Combinatorial Optimization: Algorithms and Complexity,
Dover Publications.
Picard, J.-C. (1976), ‘Maximal Closure of a Graph and Applications to Combinatorial Problems’,
Management Science 22(11), 1268–1272.
von Plato, J. (2008), The Development of Proof Theory, Technical Report, http://plato.
stanford.edu/entries/proof-theory-development/.
Powell, S. G., and Baker, K. R. (2004), The Art of Modeling with Spreadsheets, John Wiley & Sons, Inc.,
New York.
Prager, W. (1956), ‘On the Caterer Problem’, Management Science 3(1), 15–23.
Ribeiro, C. C., and Urrutia, S. (2004), ‘OR on the Ball: Applications in Sports Scheduling and
Management’, OR/MS Today 31, 50–54.
Roos, C., Terlaky, T., and Vial, J.-Ph. (2006), Interior Point Methods for Linear Optimization, Second
edition, Springer.
Schrage, L., and Wolsey, L. (1985), ‘Sensitivity Analysis for Branch and Bound Integer Programming’,
Operations Research 33(5), 1008–1023.
Schrijver, A. (1998), Theory of Linear and Integer Programming, John Wiley & Sons, Inc., New York.
Schweigman, C. (1979), Doing Mathematics in a Developing Country: Linear Programming with Applications
in Tanzania, Tanzania Publishing House.
Sierksma, G., and Ghosh, D. (2010), Networks in Action: Text and Computer Exercises in Network Opti-
mization, Springer.
Sierksma, G., and Tijssen, G. A. (1998), ‘Routing Helicopters for Crew Exchanges on Off-Shore
Locations’, Annals of Operations Research 76, 261–286.
644 B i b l i o g ra p h y
Sierksma, G., and Tijssen, G. A. (2003), ‘Degeneracy Degrees of Constraint Collections’, Mathematical
Methods of Operations Research 57(3), 437–448.
Sierksma, G., and Tijssen, G. A. (2006), ‘Simplex Adjacency Graphs in Linear Optimization’, Algo-
rithmic Operations Research 1(1).
Strassen, V. (1969), ‘Gaussian Elimination is not Optimal’, Numerische Mathematik 13(4), 354–356.
Terlaky, T., and Zhang, S. (1993), ‘Pivot Rules for Linear Programming: A Survey on Recent Theo-
retical Developments’, Annals of Operations Research 46–47(1), 203–233.
Thurston, W. P. (1998), On Proof and Progress in Mathematics, in T. Tymoczko, ed., ‘New Directions
in the Philosophy of Mathematics’, Princeton University Press, 337–55.
Tijs, S. H., and Otten, G. J. (1993), ‘Compromise Values in Cooperative Game Theory’, TOP 1(1), 1–
36.
Tijssen, G. A., and Sierksma, G. (1998), ‘Balinski-Tucker Simplex Tableaus: Dimensions, Degeneracy
Degrees, and Interior Points of Optimal Faces’, Mathematical Programming 81(3), 349–372.
Truemper, K. (1990), ‘A Decomposition Theory for Matroids. V. Testing of Matrix Total Unimodu-
larity’, Journal of Combinatorial Theory, Series B 49(2), 241–281.
Vanderbei, R. J. (2014), Linear Programming: Foundations and Extensions, Fourth edition, Springer.
Ward, J. E., and Wendell, R. E. (1990), ‘Approaches to Sensitivity Analysis in Linear Programming’,
Annals of Operations Research 27(1), 3–38.
West, D. B. (2001), Introduction to Graph Theory, Second edition, Prentice Hall, Upper Saddle River.
Williams, V. V. (2012), Multiplying Matrices Faster than Coppersmith-Winograd, in ‘STOC ’12 Pro-
ceedings of the 44th Annual ACM Symposium on Theory of Computing’, 887–898.
Williamson, D. P., and Shmoys, D. B. (2011), The Design of Approximation Algorithms, Cambridge Uni-
versity Press.
Wolsey, L. A. (1998), Integer Programming, John Wiley & Sons, Inc., New York.
Wright, M. B. (2009), ‘50 Years of OR in Sport’, Journal of the Operational Research Society 60, 161–168.
Zhang, S. (1991), ‘On Anti-Cycling Pivoting Rules for the Simplex Method’, Operations Research
Letters 10(4), 189–192.
Zwols, Y., and Sierksma, G. (2009), ‘OR Practice – Training Optimization for the Decathlon’, Op-
erations Research 57(4), 812–822.
Third Advances in Applied Mathematics
Presenting a strong and clear relationship between theory and practice, Linear and Edition
Integer Optimization: Theory and Practice is divided into two main parts. The first
More advanced topics also are presented including interior point algorithms, the branch-
and-bound algorithm, cutting planes, complexity, standard combinatorial optimization
models, the assignment problem, minimum cost flow, and the maximum flow/minimum
OPTIMIZATION
cut theorem.
The second part applies theory through real-world case studies. The authors discuss
Theory and Practice
advanced techniques such as column generation, multiobjective optimization, dynamic
optimization, machine learning (support vector machines), combinatorial optimization,
Third Edition
approximation algorithms, and game theory.
Besides the fresh new layout and completely redesigned figures, this new edition
incorporates modern examples and applications of linear optimization. The book now
includes computer code in the form of models in the GNU Mathematical Programming
Language (GMPL). The models and corresponding data files are available for download
and can be readily solved using the provided online solver.
This new edition also contains appendices covering mathematical proofs, linear algebra,
graph theory, convexity, and nonlinear optimization. All chapters contain extensive
examples and exercises.
This textbook is ideal for courses for advanced undergraduate and graduate students
in various fields including mathematics, computer science, industrial engineering,
operations research, and management science.
w w w. c rc p r e s s . c o m