Download as pdf or txt
Download as pdf or txt
You are on page 1of 324

Lecture Notes in Computer Science 1767

Edited by G. Goos, J. Hartmanis and J. van Leeuwen


3
Berlin
Heidelberg
New York
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo
Giancarlo Bongiovanni Giorgio Gambosi
Rossella Petreschi (Eds.)

Algorithms
and Complexity
4th Italian Conference, CIAC 2000
Rome, Italy, March 1-3, 2000
Proceedings

13
Series Editors

Gerhard Goos, Karlsruhe University, Germany


Juris Hartmanis, Cornell University, NY, USA
Jan van Leeuwen, Utrecht University, The Netherlands

Volume Editors

Giancarlo Bongiovanni
Rossella Petreschi
University of Rome "La Sapienza", Department of Computer Science
Via Salaria 113, 00198 Rome, Italy
E-mail: {bongiovanni/petreschi}@dsi.uniroma1.it
Giorgio Gambosi
University of Rome "Tor Vergata", Department of Mathematics
Via della Ricerca Scientifica 1, 00133 Rome, Italy
E-mail: [email protected]
Cataloging-in-Publication Data applied for

Die Deutsche Bibliothek - CIP-Einheitsaufnahme


Algorithms and complexity : 4th Italian conference ; proceedings / CIAC 2000,
Rome, Italy, March 1-3, 2000. Giancarlo Bongiovanni . . . (ed.). - Berlin ;
Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ;
Paris ; Singapore ; Tokyo : Springer, 2000
(Lecture notes in computer science ; Vol. 1767)
ISBN 3-540-67159-5

CR Subject Classification (1998): F.2, F.1, E.1, I.3.5, G.2

ISSN 0302-9743
ISBN 3-540-67159-5 Springer-Verlag Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are
liable for prosecution under the German Copyright Law.
Springer-Verlag is a company in the specialist publishing group BertelsmannSpringer
© Springer-Verlag Berlin Heidelberg 2000
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Boller Mediendesign
Printed on acid-free paper SPIN: 10719716 06/3142 543210
Pref ce

The papers in this volume were presented at the Fourth Italian Conference on
Algorithms and Complexity (CIAC 2000). The conference took place on March
1-3, 2000, in Rome (Italy), at the conference center of the University of Rome
La Sapienza .
This conference was born in 1990 as a national meeting to be held every
three years for Italian researchers in algorithms, data structures, complexity,
and parallel and distributed computing. Due to a signi cant participation of
foreign reaserchers, starting from the second conference, CIAC evolved into an
international conference.
In response to the call for papers for CIAC 2000, there were 41 submis-
sions, from which the program committee selected 21 papers for presentation at
the conference. ach paper was evaluated by at least three program committee
members. In addition to the selected papers, the organizing committee invited
Giorgio Ausiello, Narsingh Deo, Walter Ruzzo, and Shmuel Zaks to give plenary
lectures at the conference.
We wish to express our appreciation to all the authors of the submitted
papers, to the program committee members and the referees, to the organizing
committee, and to the plenary lecturers who accepted our invitation.

March 2000 Rossella Petreschi


Giancarlo Bongiovanni
Giorgio Gambosi
Organizing Committee:
G. Bongiovanni (Chair, Rome) I. Finocchi (Rome)
T. Calamoneri (Rome) G. Gambosi (Rome)
A. Clementi (Rome) P. Penna (Rome)
S. De Agostino (Rome) P. Vocca (Rome)

Program Committee:
R. Petreschi (Chair, Rome) Y. Manoussakis (Paris)
P. Crescenzi (Florence) J. Nesetril (Prague)
S. ven (Haifa) S. Olariu (Norfolk)
L. Gargano (Salerno) R. Silvestri (L’Aquila)
G. Italiano (Rome) R. Tamassia (Providence)
F. Luccio (Pisa) P. Widmayer (Zurich)

Additional Referee :
R. Battiti G. De Marco Z. Liptak M. Ruszinko
C. Blundo C. Demetrescu P. McKenzie K. Schlude
D. Boneh M. Di Ianni J. Marion H. Shachnai
G. Bongiovanni S. idenbenz A. Massini C. Stamm
M. Bonuccelli . Feuerstein B. Masucci A. Sterbini
T. Calamoneri I. Finocchi A. Monti M. Sviridenko
Y. Censor R. Friedman M. Napoli U. Vaccaro
M. Cieliebak C. Galdi P. Penna C. Verri
A. Clementi A. Itai G. Pucci P. Vocca
P. DArco M. Kaminsky A. Rescigno M. Zapolotsky
S. De Agostino M. Leoncini

Spon ored by:


Banca di Credito Cooperativo di Roma
Dipartimento di Scienze dell’Informazione, Universita di Roma La Sapienza
uropean Association for Theoretical Computer Science
Gruppo Nazionale per l’Informatica Matematica
Italsoft Ingegneria dei Sistemi S.r.l.
Universita degli Studi di Roma La Sapienza
Universita degli Studi di Roma Tor Vergata
Table of Contents

Invited Presentations
On Salesmen, Repairmen, Spiders, and Other Traveling Agents 1
G. Ausiello, S. Leonardi, A. Marchetti-Spaccamela

Computing a Diameter-Constrained Minimum Spanning Tree in Parallel 17


N. Deo, A. Abdalla

Algorithms for a Simple Point Placement Problem 32


J. Redstone, W.L. Ruzzo

Duality in ATM Layout Problems 44


S. Zaks

Regular Presentations
The Independence Number of Random Interval Graphs 59
W.F. de la Vega

Online Strategies for Backups 63


P. Damaschke

Towards the Notion of Stability of Approximation for Hard Optimization


Tasks and the Traveling Salesman Problem 72
H.-J. Böckenhauer, J. Hromkovic, R. Klasing, S. Seibert, W. Unger

Semantical Counting Circuits 87


F. Noilhan, M. Santha

The Hardness of Placing Street Names in a Manhattan Type Map 102


S. Seibert, W. Unger

Labeling Downtown 113


G. Neyer, F. Wagner

The Online Dial-a-Ride Problem under Reasonable Load 125


D. Hauptmeier, S.O. Krumke, J. Rambau

The Online-TSP against Fair Adversaries 137


M. Blom, S.O. Krumke, W. de Paepe, L. Stougie

uickHeapsort, an E cient Mix of Classical Sorting Algorithms 150


D. Cantone, G. Cincotti
VIII Table of Contents

Triangulations without Minimum-Weight Drawing 163


C.A. Wang, F.Y. Chin, B. Yang
Faster Exact Solutions for Max Sat 174
J. Gramm, R. Niedermeier
Dynamically Maintaining the Widest k-Dense Corridor 187
S.C. Nandy, T. Harayama, T. Asano
Reconstruction of Discrete Sets from Three or More X-Rays 199
. Barcucci, S. Brunetti, A. Del Lungo, M. Nivat
Modi ed Binary Searching for Static Tables 211
D. Merlini, R. Sprugnoli, M.C. Verri
An E cient Algorithm for the Approximate Median Selection Problem 226
S. Battiato, D. Cantone, D. Catalano, G. Cincotti, M. Hofri
Extending the Implicit Computational Complexity Approach to the
Sub-elementary Time-Space Classes 239
. Covino, G. Pani, S. Caporaso
Group Updates for Red-Black Trees 253
S. Hanke, . Soisalon-Soininen
Approximating SVP∞ to within Almost-Polynomial Factors Is NP-Hard 263
I. Dinur
Convergence Analysis of Simulated Annealing-Based Algorithms Solving
Flow Shop Scheduling Problems 277
K. Steinhöfel, A. Albrecht, C.K. Wong
On the Lovasz Number of Certain Circulant Graphs 291
V. . Brimkov, B. Codenotti, V. Crespi, M. Leoncini
Speeding Up Pattern Matching by Text Compression 306
Y. Shibata, T. Kida, S. Fukamachi, M. Takeda, A. Shinohara,
T. Shinohara, S. Arikawa

Author Index 317


On Salesmen, Repairmen, Spiders, and Other
Traveling Agents

Giorgio Ausiello, Stefano Leonardi, and Alberto Marchetti-Spaccamela

Dipartimento di Informatica Sistemistica, Universita di Roma La Sapienza ,


via Salaria 113, 00198-Roma, Italia.

Ab tract. The Traveling Salesman Problem (TSP) is a classical prob-


lem in discrete optimization. Its paradigmatic character makes it one of
the most studied in computer science and operations research and one
for which an impressive amount of algorithms (in particular heuristics
and approximation algorithms) have been proposed. While in the general
case the problem is known not to allow any constant ratio approximation
algorithm and in the metric case no better algorithm than Christo des’
algorithm is known, which guarantees an approximation ratio of 3/2, re-
cently an important breakthrough by Arora has led to the de nition of a
new polynomial approximation scheme for the Euclidean case. A grow-
ing attention has also recently been posed on the approximation of other
paradigmatic routing problems such as the Travelling Repairman Prob-
lem (TRP). The altruistic Travelling Repairman seeks to minimimize the
average time incurred by the customers to be served rather than to mini-
mize its working time like the egoistic Travelling Salesman does. The new
approximation scheme for the Travelling Salesman is also at the basis of
a new approximation scheme for the Travelling Repairman problem in
the euclidean space. New interesting constant approximation algorithms
have recently been presented also for the Travelling Repairman on gen-
eral metric spaces. Interesting applications of this line of research can be
found in the problem of routing agents over the web. In fact the prob-
lem of programming a spider for e ciently searching and reporting
information is a clear example of potential applications of algorithms for
the above mentioned problems. These problems are very close in spirit
to the problem of searching an object in a known graph introduced by
Koutsoupias, Papadimitriou and Yannakakis [14]. In this paper, moti-
vated by web searching applications, we summarize the most important
recent results concerning the approximate solution of the TRP and the
TSP and their application and extension to web searching problems.

1 Introduction
In computer applications involving the use of mobile virtual agents (sometimes
called spiders ) that are supposed to perform some task in a computer network,
a fundamental problem is to design routing strategies that allow the agents to
complete their tasks in the most e cient way ([3],[4]). In this context a typical
scenario is the following: an agent generated at node 0 of the network searches

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 1 16, 2000.

c Springer-Verlag Berlin Heidelberg 2000
2 Giorgio Ausiello, Stefano Leonardi, and Alberto Marchetti-Spaccamela

a portion of the web formed by n sites, denoted by 1 n, looking for an


information. At each site i it is associated a probability pi that the required
information is at that site. The distance from site i to site j is given by a
metric function d(i j). The aim is to nd a path (1) (n) that minimizes
n i
the quantity i=1 pi k=1 d(k − 1 k). In [14] this problem is called the Graph
Searching problem (GSP in the following). The GSP is shown to be strictly
related to the Travelling Repairman Problem (TRP), also called the Minimum
Latency Problem (MLP), in which a repairman is supposed to visit the nodes of
a graph in a way to minimize the overall waiting time of the customers sitting
in the nodes of the graph. More precisely in the TRP we wish to minimize the
n i
quantity i=1 k=1 d(k − 1 k).
The Minimum Latency problem is known to be MAX-SNP-hard for general
metric spaces as a result of a reduction from the TSP where all the distances are
either 1 or 2, while it is solvable in polynomial time for the case of line networks
[1].
In this paper we present the state of the art of the approximability of the TRP
and present the extension of such results to the Graph Searching problem. The
relationship between GSP and TRP is helpful from two points of view. In some
cases, in fact, an approximation preserving reduction from GSP to TRP can be
established [14] under the assumption that the probabilities associated with the
vertices are polynomially related, by replacing every vertex with a polynomial
number of vertices of equal probability. This allows to apply the approximation
algorithms developed for TRP to GSP. Among them, particularly interesting are
the constant approximation algorithms for general metric spaces given by Blum
et al. [9] and Goemans and Kleinberg [13], later improved in combination with
a result of Garg [11] on the k-MST problem .
More recently a quasi-polynomial O(nO(log n) ) approximation scheme for tree
networks and uclidean spaces has been proposed by Arora and Karakostas [5].
This uses the same technique as in the quasi-polynomial approximation scheme
of Arora for the TSP [6]. The case of tree networks seems particularly interesting
since one is often willing to run the algorithm on a tree covering a portion of the
network that hopefully contains the required information. In the paper we also
show how to extend approximation schemes for the TRP to the Graph Searching
problem.
In conclusion, the Graph Searching problem, beside being an interesting prob-
lem per se , motivated by the need to design e cient strategies for moving
spiders in the web, has several interesting connections with two of the most
intriguing and paradigmatic combinatorial graph problems, the Traveling Re-
pairman problem and the Traveling Salesman problem. Therefore the study of
the former problem naturally leads to the study of the results obtained for the lat-
ter problems, which may be classi ed among the most interesting breakthrough
achieved in the recent history of algorithmics.
This paper is organized as follows: in section 2 we formally de ne the GSP
and in section 3 we review approximation algorithms for the TRP. In sections 4
On Salesmen, Repairmen, Spiders, and Other Traveling Agents 3

and 5 we present approximation algorithms and approximation schemes for the


GSP.

Preliminaries and Notation


The Graph Searching Problem (GSP), introduced by Koutsoupias, Papadim-
itriou and Yannakakis [14], is de ned on a set of n vertices V = 1 n
of a metric space M , plus a distinguished vertex of M at which the travelling
agent is initially located and will return after the end of the tour. The starting
point is also denoted as the root and indicated as vertex 0. A metric distance
d(i j) de nes the distance between any pair of vertices i j. With every vertex
i is also associated a probability or weight wi > 0 (the vertices with weight 0
are simply ignored). We assume that the object is in exactly one site for which
n
i=1 wi = 1. A solution to the GSP is a permutation (1) (n) indicating
the tour to be followed. The distance of vertex i to the root along the tour is
given by l(i) = ij=1 d( (j − 1) (j)). The objective of the GSP is to minimize
n
the expected time spent to locate the object in the network, namely i=1 wi l(i).
We will measure the performance of algorithms by their approximation ratio,
that is the maximum ratio over all input instances between the cost of the
algorithm’s solution and the optimal solution.

3 Approximation Algorithms for the TRP


In this section we will present the approximation algorithms developed in the
literature for the TRP that was rst introduced in [1]. These results are relevant
to the solution of the GSP that reduces to the TRP when all the vertices have
equal probability or are polynomially related.
In the case of line networks the problem is polynomial.
Theorem 1. [1] There exists a O(n2 ) optimal algorithm for the TRP on line
networks.
The algorithm numbers the root with 0, the vertices at the right of the root
with positive integer numbers, the vertices at the left of the root with negative
integer numbers. By dynamic programming the algorithm stores for every pair
of vertices (−l r) with l r > 0, (i) the optimal path that visits the vertices
of [−l r] and ends at −l and (ii) the optimal path that visits the vertices of
[−l r] and ends at r. The information at point (i.) is computed in O(1) time
by selecting the best alternative among (a.) the path that follows the optimal
path for (−(l − 1) r), ends at −(l − 1) and then moves to l, and (b.) the optimal
path for (−l r − 1) that ends at r − 1 and then moves rst to r then to l. The
information at point (ii.) is analogously computed in O(1) time.
The TRP is known to be solvable in polynomial time beyond line networks
only for trees with bounded number of leaves [14]. Whether the TRP is polynomi-
ally time solvable or NP-hard for general tree networks is still a very interesting
open problem.
4 Giorgio Ausiello, Stefano Leonardi, and Alberto Marchetti-Spaccamela

The rst constant factor approximation for the TRP on general metric spaces
and on tree networks has been presented by Blum et al [9]. The authors introduce
the idea to concatenate a sequence of tours to form the nal solution. The
algorithm proposed by the authors computes for every j = 1 2 3 a tree Tj of
cost at most 2j+1 spanning the maximum number of vertices. This procedure
is repeated until all the vertices have been included in a tree. The nal tour is
obtained by concatenating a set of tours obtained by depth rst traversing trees
Tj , j = 1 2 3 . Let mj be the number of vertices spanned by Tj . Let Sj be
the set of vertices of Tj . Consider a number i such that mj i mj+1 . We
can state that the i-th vertex visited in the optimal tour has latency at least
2j . On the other end, the latency of the i-th vertex visited in the algorithm’s
tour is at most 8 2j . This is because the latency of the i-th vertex in the tour
of the algorithm is at most 2( k<j 2k+1 + 2j+1 ) 8 2j . Assume that it is
available a c approximation algorithm that is able to nd a tree of minimum cost
that spans k vertices of the network, this can be easily turned through a binary
search procedure into a c-approximation algorithm for the problem of nding a
tree of bounded cost that maximizes the number of vertices that are spanned.
This immediately results in an 8c approximation algorithm for the TRP.
In a tree network the problem of nding a tree of k vertices of minimum cost
is polynomial time solvable using a dynamic programming algorithm described
in the paper of Blum et al [9]. The algorithm rst transforms the tree into a
binary tree by replacing every vertex v of degree higher than 2 into a binary tree
with edges of cost 0 and every leaf connected to at most 2 children of v with
edges weighted by the cost of the edges from v to the corresponding children.
The procedure computes for any vertex of the graph, for every integer j between
1 and k, for every i = 0 j, the minimum cost tree that collects i vertices
on the left subtree and j − i vertices on the right subtree. This procedure can
be clearly implemented in polynomial time. This implies an 8-approximation
algorithm for the TRP on tree networks.
When the paper [9] appeared, no constant approximation algorithm for the
k-MST problem on general metric spaces was known. A constant approximation
algorithm for the TRP problem on general metric spaces was then obtained by
applying the so called ( ) TSP approximator. An ( ) TSP approximator
is an algorithm that given bounds and L, an n-point metric space M and a
starting point p, nds a tour starting at p of length at most L which visits at
least (1 − )n vertices when there exists a tour of length L which visits (1 − )n
vertices. The existence of an ( ) TSP approximator ensures the existence
of an 8 approximation algorithm for the TRP. A (3 6) and a (4 4) TSP
approximator were proposed in [9], a (2 4) and a (2 3) TSP approximator were
later proposed by Goemans and Kleinberg in the paper [13].
The paper of Goemans and Kleinberg also presents a new technique to select
a sequence of tours of growing length to concatenate to form a solution. The
procedure proposed by Goemans and Kleinberg computes for every number j
from 1 to n the tour Tj of minimum length that visits j vertices. Let dj be the
length of tour Tj . The goal is to select values j1 jm = n in order to minimize
On Salesmen, Repairmen, Spiders, and Other Traveling Agents 5

the latency of the nal tour. Let pi be the number of new vertices visited during
tour i. Since the number of vertices discovered up to the ith tour is certainly no
smaller than ji the following claim of [13] holds
m m
pi di (ji − ji−1 )di
i=1 i=1
i
It follows that for a number of vertices equal to k=1 pk − ji we sum a con-
tribution at most dk on the left side of the equation while a contribution larger
than dk on the right side of the equation. Moreover, each tour Ti is traversed
in the direction that minimizes the total latency of the vertices discovered dur-
ing tour Ti . This allows to rewrite the total latency of the tour obtained from
concatenating Tj1 T jm as:

j
1
(n − pk )dji + pi dji
i
2 i
k=1
1
(n − ji )dji + (ji − ji−1 )dji
i
2 i
ji−1 + ji
= (n − )dji
i
2

The formula above allows to rewrite the total latency of the algorithm only
in terms of the indices ji and of the length dji , independently from the number
of new vertices discovered during each tour. A complete graph of n vertices is
then constructed in the following way. Arc (i j) is turned into a directed edge
from min(i j) to max(i j). Arc (i j) has length (n − i+j 2 )dj . The algorithm
computes a shortest path from node 0 to node n. Assume that the path goes
through vertices 0 = j0 < j1 < < jm = n. The tour is then obtained by
concatenating Tj1 Tjm .
The obtained solution is compared against the following lower bound OP T
n dk
k=1 2 . This lower bound follows from the observation that the kth vertex
cannot be visited in any optimal tour before dk 2. The approximation ratio of
the algorithm is determined by bounding the maximum over all the possible set
of distances d1 dn of the ratio between the shortest path in Gn and the lower
bound on the optimal solution. This value results to be smaller than 3 5912 thus
improving over the ratio of 8 in [9].
Theorem 2. [13] Given a c approximation algorithm for the problem of nding
a tour of minimum length spanning k vertices on a speci c metric space, then
there exists an 3 5912c approximation ratio for the TRP on the same metric
space.
The method described above allows to obtain a 3 5912 approximation for
tree networks. For general metric spaces, a 3 approximation algorithm for the k-
MST problem and for the problem of devising a tour of minimum cost spanning
6 Giorgio Ausiello, Stefano Leonardi, and Alberto Marchetti-Spaccamela

k vertices has been later proposed by Garg [11]. This allows to obtain a 10 7796
approximation algorithm for the TRP on general metric spaces. This bound can
be furtherly improved by applying the more recent 2 5 approximated k-MST
algorithm of Arya and Kumar[7]. We will describe the algorithm of [11] for the
k-MST in the following section where we study the extension of the algorithm
for the TRP to the GSP.

4 Approximation Algorithms for the GSP

In this section we will study the extension of the algorithms for the TRP to the
GSP.
The algorithm for the TRP on line networks can be extended to provide a
polynomial time algorithm for the GSP on line networks. The dynamic program-
ming algorithm presented in the previous section is simply modi ed in order to
increase the cost of a solution by the latency of a vertex weighted by its proba-
bility rather than just by the latency of a vertex.
As we mentioned in the introduction, the GSP problem has been introduced
by Koutsoupias, Papadimitriou and Yannakakis [14]. In that paper the authors
show a simple reduction from the GSP to the TRP under some restrictive con-
ditions. They show that the metric GSP can be reduced to the metric TRP
under the assumption that all the weights/probabilities are rational numbers
with small coe cients and common denominators. This assumption allows to
split every vertex into a polynomial number of vertices with weight equal to the
common denominator of all the weights of the vertices of the graph. If two ver-
tices in the instance of the TRP derive from the splitting of the same vertex in
the instance of the GSP, their internode distance is 0, if the two vertices derive
from two di erent vertices in the instance of the GSP, say i and j, their distance
is d(i j). A solution to the instance of the TRP obtained from an instance of
the GSP can be easily turned into a solution of equal cost to the original GSP
instance, since all the vertices at distance 0 in the TRP can be visited at the
same time.
Unfortunately, this reduction does not apply to the general case. In this
section we will consider algorithms for the general case of the metric GSP and of
the GSP on tree networks. When trying to extend the general approach for the
TRP to the GSP, we need to solve two kind of problems: (i.) Find a sequence of
tours to be concatenated to obtain the nal tour; (ii.) Compute every tour to be
concatenated. We will see that the solution of Goemans and Kleinberg for point
(i.) seems not to be easily extendible to the GSP, and that the computation of
every tour to be concatenated can be strictly more di cult than for the TRP.
Kleinberg and Goemans propose to compute for every k = 1 n a tour
of minimum cost that spans k vertices. The application of this approach to the
GSP requires to nd a tour of minimum cost that spans at least a given weight
for every possible amount of weight. This approach when extended to the GSP,
requires to compute the minimum cost tour that covers an amount of weight i for
every i = 1 W = j wj . This clearly results in a pseudo-polynomial time
On Salesmen, Repairmen, Spiders, and Other Traveling Agents 7

algorithm. Alternatively we can think of partitioning the interval [0 W ] into a


polynomial number of intervals of larger size x, [ix (i+1)x], i = 0 W x−1 .
The drawback of a similar solution is a weaker lower bound. Let w = minj wj
be the minimum weight of a vertex. We can state a lower bound of OP T
m dj
j=1 minj w 2 over the optimal solution, but we cannot state that the optimal
solution will cover the (j + 1)-th amount of weight x before time dj 2. However
in this section we will follow the approach of Blum et al [9], that repeatedly
nds a tree of exponentially increasing length spanning the biggest amount of
weight until the whole weight has been collected. Their result on the relationship
between TRP and k-MST can be easily extended to the GSP problem; we de ne
the W -MST problem as the problem of nding a tree of minimum cost that
covers a weigh of at least W .

Theorem 3. [9] Given a c-approximation algorithm for W -MST problem there


exists a 8c approximation algorithm for the GSP.

Let Wl be the total weight collected in the l-th tour of length at most 2l .
Let l = Wl − Wl−1 . It is possible to see that any algorithm will pay for the
weight that is collected between Wl−1 and Wl a latency of at least 2l . We then
obtain an algorithm with approximation ratio 8c if we have a c-approximation
algorithm for nding a tree spanning a maximum amount of weight with cost
bounded by a given value L, or alternatively an algorithm for nding a tree of
minimum cost that covers at least a given weight, say W .
Such algorithms are not known in literature for both tree networks and gen-
eral metric spaces. The problem of nding a tree of minimum cost spanning a
weight of at least W is already NP-hard for tree networks. The reduction is from
Knapsack. Consider n items where the generic item i has cost ci and bene t
wi . The corresponding instance of the GSP is obtained by constructing a star
network of n leaves where the root is the center of the star, and every leaf i has
weight wi and it is connected to the center with an edge of cost ci . The problem
of nding a tree of maximum weight of bounded cost is clearly N P -hard as it
is N P -hard the problem of nding a minimum cost tree that spans a weight of
at least W . In the next section we will show how to provide a fully polynomial
time approximation scheme for this problem on tree networks.
In the rest of this section we will show how to extend a constant approxima-
tion algorithm for the k-MST problem to the W -MST problem.
Theorem 4. There exists a constant approximation algorithm for the W -MST
problem.
For the sake of the exposition, we limit ourself to show the extension of the
5 approximation algorithm of Garg for the k-MST problem to the case in which
the goal is to collect a weight of at least W . In [11], a 3 approximation algorithm
is also presented, that improves over the previous constant approximation algo-
rithm by Blum, Ravi and Vempala [10], while an improved approximation based
on the same techniques has been later proposed by Arya and Kumar [7]. These
algorithms are based on the Primal-Dual method developed by Agrawal, Klein
8 Giorgio Ausiello, Stefano Leonardi, and Alberto Marchetti-Spaccamela

and Ravi [2] and by Goemans and Williamson [12] to design forests of minimum
cost satisfying various constraints.
In the following we will highlight the main variation to the algorithm and
to the analysis of [11] to extend the 5 approximation to the W -MST problem.
We consider the problem in the case the vertex furthest to the root is part of
the optimal solution. An algorithm for the general problem is then obtained by
trying every possible vertex and selecting the best solution. It is well known
that the primal-dual method uses the dual of a relaxation of the linear program
formulation of the problem as a guide for the algorithm and the analysis. We
need to introduce the standard notation for the primal-dual method. We denote
by S the generic subset of V and by (S) the set of edges with exactly one
endpoint inside S. Let be the edge set of the graph, e the generic edge (i j)
of , and ce = d(i j) its cost. In the linear programming formulation of the
problem a variable xe 0 1 , e , indicates if edge e is part of the tree,
a variable xv 0 1 , v V , indicates if vertex v is spanned by the tree. The
starting point of the tour is the root r of the tree.
The linear programming formulation of the W -MST problem after the relax-
ation of the integrality constraints on variables xe and xv is as follows:

minimize ce xe
e2

xe xv ( v S : v S V −r )
e2 (S)

xv wv = W
v2V
xv 1 ( v V)
xv 0 ( v V)
xe 0 ( e )

In the dual formulation a variable yv S is associated to every constraint of


the rst set, a variable p to the second constraint and a variable pv to every
constraint of the third set. The dual formulation is as follows:

maximize pW− pv
v2V

y v S + pv pwv ( v V)
S:v2S

yv S ce ( e )
S:e2 (S)

pv 0 ( v V)
yv S 0 ( v S:v S V −r )
On Salesmen, Repairmen, Spiders, and Other Traveling Agents 9

De ne in a way similar to [11] the potential v of vertex v as v = S:v2S yv S .


Observe that if v pwv then pv = 0, else pv = pwv − v . From the previous
observation it is possible to prove that in an optimal solution p has value be-
tween the kW -th and (kW + 1)-th smallest ratio wvv where kW is the smallest
kW
integer such that i=1 wi W . The optimal solution of the dual problem can
be thought of as an assignment to the dual variables such that the sum of the
rst kW potentials is maximized. By the duality theorem it also follows that the
sum of the rst kW potentials is a lower bound to the optimal solution of the
primal problem.
The primal-dual algorithm will construct a solution with cost bounded by
twice the sum of the rst kW potentials. This solution will be completed to cover
a weight at least W with an extra cost bounded by a constant factor times the
optimal value.
The problem is then reduced to nding a feasible assignment of potentials
(v) such that the sum of the rst kW potentials is maximized. An assignment
of potentials is feasible if there exists an assignment of variables yv S for which
S v2S:e2 (S) yv S ce and for any vertex v, (v) S:v2S yv S .
The primal-dual algorithm is run with an initial potential pwv assigned to ev-
ery vertex v apart from the root to which it is assigned a potential 0. very subset
of vertices not containing the root has associated a variable yS . The assignment
of variables yS satis es at any time, for every vertex e, S:e2 (S) yS ce . If for
a vertex e the inequality holds with equality then the edge is said to be tight. At
any step of the algorithm the set of vertices V is partitioned into a set of active
and inactive components. A component is active if it has a positive potential
and it does not contain the root, otherwise it is inactive.
The algorithm simultaneously increases for every active component S the
variable yS and decreases its potential until either the potential is 0 or one of the
constraints on one of the edges is tight. If the constraint for edge e = (i j) is tight,
the active components containing edges i and j are merged with potential equal
to the sum of the residual potentials of the two components. The two components
are made inactive, while the new component is active unless it contains the root.
The set of tight edges at any stage forms a forest whose trees de ne the set
of components at that stage. The procedure halts when all the components are
inactive, that is the residual potential of all the components not containing the
root is 0.
The tree spanning the component of the root when the algorithm halts is
denoted by Tp . Tp is then pruned to remove every edge that connects to Tp a
subtree that spans an inactive component at some stage of the algorithm. Let Tp0
be the tree obtained after the pruning phase. The set of initial potentials does
not necessarily form a feasible assignment of potentials. We can follow [11] in
showing that a su cient condition for the set of potentials to be feasible is that
the componenent containing the root has zero potential when the algorithm
halts. An assignment of the potentials that satis es this requirement can be
obtained by decreasing the potential of a vertex such that all the components
containing that vertex have non-zero potential. We can reduce the potential
10 Giorgio Ausiello, Stefano Leonardi, and Alberto Marchetti-Spaccamela

of every such vertex until a component containing the vertex has 0 residual
potential. Such procedure is repeated until the component containing the root
has 0 potential.
Denote by cost(T ) the cost of tree T , and by Wp the weight spanned by
the vertices of Tp . The primal-dual method ensures that the cost of Tp0 is at
most twice the sum of the potentials of the vertices of Tp0 , namely cost(Tp0 ) =
e2Tp0 ce 2 v2T 0 (v). The sum of the smallest kW potentials is also a lower
p
bound on the optimal tree spanning a weight of at least W . The vertices in Tp0
are the only vertices in the graph with ratio (v) wv < p, then the sum of the
potentials of the vertices of the tree Tp0 plus p(W − Wp ) is also a lower bound
on the optimal solution.
We select the highest value of p such that v2Tp0 wv = Wp W . We also
0
run the algorithm for a value p + , thus obtaining a tree Tp+ for which it
holds W < Wp+ . A rst solution is obtained as follows. Consider the tour
0
obtained by traversing twice Tp+ . We select the minimum cost path on this
tour that collects a weight of at least W − Wp . Such path has cost bounded by
W −Wp 0 0
Wp+ −Wp 2 cost(Tp+ ). We consider a rst solution obtained from tree Tp , the
0
path selected out of Tp+ , and an edge that joins this path to the root.
Denote by OPT the cost of the optimal tour. Remind that OP T is lower
bounded by the maximum distance from a vertex to the root. The cost of the
rst solution is bounded by
W − Wp
cost(Tp0 ) + 0
2cost(Tp+ ) + OP T
Wp+ − Wp
0
The second solution is obtained from Tp+ . Following the analysis of [11], we
write the two following lower bounds on the optimal solution:

OP T p (v) + p (W − Wp )
v2Tp0

OP T p+ − (p + )(Wp+ − Wp )
0
v2Tp+

We can write:

cost(Tp0 ) 2 OP T − 2 p (W − Wp )
0
cost(Tp+ ) 2 OP T + 2(p + )(Wp+ − Wp )
from which it follows that the smallest among the two solutions is at most
5 OP T .
By combining the above analysis with Theorem 3 we obtain the following
Corollary.

Corollary 1. There exists a 40 approximation algorithm for the GSP de ned


in a general metric space.
On Salesmen, Repairmen, Spiders, and Other Traveling Agents 11

5 Approximation Schemes for the GSP


In this section we show under what conditions the GSP allows approximation
schemes. Also in this case we do so by extending to GSP similar results obtained
for TRP. In particular we rst present the main ideas of [5] that shows how to
construct an approximation scheme for the TRP in the case of tree metric and
uclidean metric whose running time is quasi polynomial; we will then show an
approximation preserving reduction that allows us to obtain similar results for
the GSP. As it is done in previous papers on the TRP, the algorithm of [5] nds
a low latency tour by joining paths; in this case the algorithm decides at the
beginning how many nodes are in each path and then uses dynamic programming
for computing this set of paths.
In order to reduce the cost of dynamic programming the authors rst show
that distances between nodes can be rounded without a ecting the approxima-
tion; namely, given an instance of the TRP such that the minimum internode
distance is 1 and the maximum internode distance is dmax it is possible to round
internode distances in such a way that the minimum internode distance is 1
and the maximum distance is cn2 , where c is a constant. Given a tour T the
rounding a ects the contribution of each node to the latency of T by a value
less than dmax n; since dmax is a lower bound on the optimum it follows that
the rounding a ects the value of T by a factor of . The second idea is to break
the optimal tour in k segments, k = O(log n ), Ti each one with a determined
number of nodes; the number of nodes in segment i is given by

ni = (1 + )k−1−i i=1 2 k−1

nk = 1
Let Ti be the length of Ti ; clearly j−1
i=1 Ti is a lower bound on the latency
of any node in segment j. It follows that a lower bound on the optimum latency
L is given by:
k j−1 k k
L (nj Ti ) = ( nj )Ti
j=1 i=1 i=1 j>i

Now replace segment Ti , i < k, with a minimum traveling salesman tour


through the same set of nodes. In this way both the lenght of the segment and
the latency of nodes in subsequent segments cannot increase; the latency of nodes
in Ti can increase by at most ni Ti . Repeating this replacement for all segments
but the last one the increase of the latency is at most
k−1
ni T i
i=1

Observing that kj>i nj ni it follows that the new latency is at most (1 +


)L . Note that if the above approach is applied using an approximate solution
for the TSP then the latency of the obtained approximate solution has value at
most (1 + + )L .
12 Giorgio Ausiello, Stefano Leonardi, and Alberto Marchetti-Spaccamela

Let us now consider the case of tree metric. In such case a TSP tour that
visits a given set of nodes can be computed in polynomial time. However the
above approach requires to know the set of nodes belonging to each segment
in the optimal solution. These sets can be computed in quasi polynomial time
in the case of tree metric by dynamic programming. In fact the break up in k
segments implies that an edge is visited in the optimal solution at most k times.
Let us consider the case of a binary tree. First identify an edge that is a 1/3
: 2/3 separator and the algorithm guesses the number of times this edge is
visited, and for each such portion the length of the portion and the number of
nodes visited. Guessing means that using dynamic programming the algorithm
exhaustively searches for all possibilities; since there at most k = O(log n )
portions and the length of each portion is bounded by O(n3 ) it follows that
there is a polynomial number of solutions. By recurring on each side of the
separator edge it is possible to compute an break up in segments in nO(log n ) .
The above idea can be applied also to nonbinary trees.

Theorem 5. [5] For any , > 0, there exists a 1 + approximation algorithm


for the TRP in tree metric that runs in time nO(log n) .

In the uclidean case a similar result can be obtained by making use of


Arora’s approximation scheme [6] for the computation of the TSP paths which
correspnd to the segments of the TRP.

Theorem 6. [5] For any , > 0, there exists a 1 + approximation algorithm


2
for the TRP in uclidean metric that runs in time nO(log n ) .

The proof of Theorem 6 will be provided the next subsection along with the
proof of existence of a polynomial time approximation scheme for TSP.
Let us now see how we can apply the preceding results in order to design
approximation schemes for GSP. Recall that, given an instance x of the GSP
with n nodes, if the integer weights associated to the nodes are polynomially
related, then it is easy to see that GSP is polynomially reducible to an instance
y of TRP. On the other side it can be proved [15] that if the weights are not poly-
nomially bounded still there exists a polynomial time reduction that preserves
the approximation schemes [8].
Given an instance of GSP, with n nodes, let wmax be the maximum weight
associated to a city and let be any positive real number. The idea of the proof
is to round the weights associated to each city by a factor k, k = wmax c where
= 2 n4 and c is a suitably chosen constant. Namely, given an instance x of
GSP, we de ne a new instance x0 , with the same set of nodes and the same metric
distance of x that is obtained by rounding the weight associated to each city;
namely, wi , the weight associated to city i, becomes wi k . Note that by the
above rounding the weights associated to the nodes of x0 are now polynomially
related and, therefore, x0 is polynomially reducible to an instance of TRP.
Assume now that we are given a tour T that is an optimal solution of x0 ;
we now show that T is a (1 + ) approximate solution of x. In fact, following
[5] we can assume that the maximum distance between nodes is cn2 , where c
On Salesmen, Repairmen, Spiders, and Other Traveling Agents 13

is a constant; it follows that the rounding introduces an absolute error for the
contribution of city i to the objective function that is bounded by

kcn3 = wmax n3 = wmax n

By summing over all nodes we obtain that the total absolute error is bounded
by wmax ; since wmax is a lower bound on the optimum value of instance x it
follows that T is a (1 + ) approximate solution of x.
Assume now that we are given a approximate solution of x0 ; analogously
we can show that this solution is a (1+ ) approximate solution of x. The above
reduction together with the approximation results of Theorems 5 and 6 imply
the following theorem.

Theorem 7. There exists a quasi polynomial time (1 + ) approximation algo-


rithm for the GSP in the case of tree metric and uclidean metric.

5.1 Polynomial Time Approximation Schemes for the uclidean


TSP and TRP

Let us now insert the last missing stone which is needed to prove the existence
of an approximate scheme for the GSP in the uclidean case: the polynomial
time approximation schemes for the uclidean TSP [6] and TRP [5]. Let us rst
see the result for the TSP.
The basic idea on which Arora’s result is organized is the following. In or-
der to overcome the computational complexity of the TSP in the uclidean
case we may reduce the combinatorial explosion of the solution space by impos-
ing that the required approximate solutions should satisfy particular structural
properties. Under suitable conditions the number of solutions which satisfy such
properties may be reduced in such a way that we may search for the best ap-
proximate solution by means of a dynamic programming procedure which runs
in polynomial time. Let us consider an uclidean TSP instance x consisting of
a set of n points in the plane and let L be the size of its bounding box B. Let us
rst make the following simplifying assumptions: (i) all nodes have integral co-
ordinates; (2) the minimum internode distance is 8; (3) the maximum internode
distance is O(n); (4) the size of the bounding box, L is O(n) and it is a power
of 2. It is not di cult to prove that if a PTAS exists for this particular type
of instances, that we call well rounded instances, then it exists for general TSP
instances. Now, suppose we are given a well rounded TSP instance. In order to
characterize the approximate solutions that satisfy speci c structural properties
we may proceed in the following way. We decompose the bounding box through
a recursive binary partitioning until we have at most one point per square cell
(in practice, and more conveniently, we can organize the instance into a quad-
tree). Note that at stage i of the partitioning process we divide any square in the
quad-tree of size L 2i−1 which contains more than one point into 4 squares of
size L 2i . Then we identify m = O(c log L ) points evenly distributed on each
side of any square created during the partition (plus four points in the square’s
14 Giorgio Ausiello, Stefano Leonardi, and Alberto Marchetti-Spaccamela

corners). By slightly bending the edges of a TSP tour we will impose it to cross
square boundaries only at those prespeci ed points (called portals ). Finally
we allow the partition to be shifted both horizontally and vertically by integer
quantities a and b respectively. The structure theorem can then be stated in the
following terms.

Theorem 8. (Structure Theorem,[6]) Let a well rounded instance x be given,


let L be the size of its bounding box B and let > 0 be a constant. Let us
pick a and b, with 0 a b L, randomly and let us consider the recursive
partitioning of B shifted by quantities a and b. Then with probability at least 1 2
there is a salesman tour of cost at most (1 + )OP T (where OP T is the cost of
an optimum solution) that crosses each edge of each square in the partition at
most r = O(1 ) times always going through one among m = O(log L ) portals
(such tour is called (m, r) light).

ssentially the proof of the theorem is based on the fact that given a recursive
partitioning of B shifted by quantities a and b, given an optimum TSP tour of
length OP T it is possible to bend its edges slightly so that they cross square
boundaries only O(1 ) times and only at portals and the resulting increase in
length of the path is at most OP T 2. Over all possible shifts a, b, therefore,
with probability 1 2, such increase is bounded by OP T .
On the basis of the structure theorem we can then de ne the following polyno-
mial time approximation scheme for the solution of a well rounded TSP instance.
In what follows we call a TSP path a tour, or a fragment of tour, which goes
through the points in the TSP instance and whose edges are possibly bended
through the portals. When we will have built a unique TSP path that goes ex-
actly once through all points in the instance it will be immediately possible to
transform it into a TSP tour by taking uclidean shortcuts whenever possible.
Given the required approximation ratio (1 + ) and given a TSP instance x
the randomized algorithm performs the following steps:
1) Perturbation. Instance x is rst transformed into a well rounded instance
x0 . We will then look for a (1 + 0 ) approximate solution for instance x0 where
0
can be easily computed from .
2) Construction of the shifted quad-tree. Given a random choice of 1 a b
L a quad-tree with such shifts is computed. The depth of the quad-tree will be
O(log n) and the number of squares it will contain is T = O(n log n)
3) Construction of the path by dynamic programming. The path that satis es
the structure theorem can be constructed bottom-up by dynamic programming
as follows. In any square there may be p paths, each connecting a pair of portals
such that for any i a path goes from the rst portal to the second portal in pair
pi . Since 2p 4r and there are 4m + 4 portals on the borders of one square,
there are at most (4m + 4)4r ways to chose the crossing points of the p paths
and there are at most 4r! pairings among such crossing points. For each of the
choices we compute the optimum solution which corresponds to one entry in
the lookup table that the dynamic programming procedure has to construct.
Since we have T squares, the total number of entries is O(T (4m + 4)(4r) (4r)!).
On Salesmen, Repairmen, Spiders, and Other Traveling Agents 15

In order to determine the running time of the procedure let us see how the
entries are constructed; rst note that for the leaves of the quad-tree we only
have the condition that one path should go through the node (if any) in the leaf.
Inductively, when we want to construct the entries for a square S at level i − 1
of the quad-tree, given the entries of squares S1 , S2 , S3 , S4 at level i, we have to
determine the optimum way to connect the paths in the subsquares by choosing
among all possible choices on how to cross the inner borders among the four
squares. Such choices are (4m + 4)4r (4r)4r (4r)!. All taken into account we have
a running time O(T (4m + 4)8r (4r)4r (4r!)2 ), that is O(n(log n)O(1 ) .
asily enough the algorithm can be derandomized by exhaustively trying all
possible shifts a, b, and picking the best path. This simply implies repeating
steps 2) and 3) of the algorithm O(n2 ) times. In conclusion we can state the
nal result.

Theorem 9. [5] For any , > 0, there exists a 1 + approximation algorithm


for the TSP in uclidean metric that runs in time O(n3 (log n)1 ).

The result for the TRP goes along the same line. As we have seen, in or-
der to solve the TRP we compute O(log n ) segments consisting in as many
salesman paths. Now the same algorithm as before can be constructed but this
time we want to compute simultaneously the O(log n ) salesman paths. As a
consequence, while in the case of the TSP we were looking for paths going at
most r = O(1 ) times through one among m = O(log n ) portals, in the case
of the TRP we construct paths that go O(log n ) times through the m portals.
The same dynamic programming technique as in the case of TSP can then be
applied, but now, since we may have O(log n ) crossings, the algorithm will
2
require quasi-polynomial time O(nO(log n ) ). Theorem 6 hence follows.

References

[1] F. Afrati, S. Cosmadakis, C.H. Papadimitriou, G. Papageorgiou, and N. Pa-


pakostantinou. The complexity of the travelling repairman problem. Informatique
Theoretique et Applications, 20(1):79 87, 1986.
[2] Ajit Agrawal, Philip Klein, and R. Ravi. When trees collide: an approximation
algorithm for the generalized Steiner problem on networks. SIAM Journal on
Computing, 24(3):440 456, June 1995.
[3] Paola Alimonti and F. Lucidi. On mobile agent planning, 1999. manuscript.
[4] Paola Alimonti, F. Lucidi, and S. Triglia. How to move mobile agents, 1999.
manuscript.
[5] Arora and Karakostas. Approximation schemes for minimum latency problems.
In STOC: ACM Symposium on Theory of Computing (STOC), 1999.
[6] Sanjeev Arora. Polynomial time approximation schemes for Euclidean traveling
salesman and other geometric problems. Journal of the ACM, 45(5):753 782,
1998.
[7] S. Arya and H. Kumar. A 2.5 approximation algorithm for the k-mst problem.
Information Processing Letter, 65:117 118, 1998.
16 Giorgio Ausiello, Stefano Leonardi, and Alberto Marchetti-Spaccamela

[8] Giorgio Ausiello, Pierluigi Crescenzi, Giorgio Gambosi, Viggo Kann, Alberto
Marchetti=Spaccamela, and Marco Protasi. Complexity and Approxima-
tion, Combinatorial optimization problems and their approximability properties.
Springer Verlag, 1999.
[9] Avrim Blum, Prasad Chalasani, Don Coppersmith, Bill Pulleyblank, Prabhakar
Raghavan, and Madhu Sudan. The minimum latency problem. In Proceedings of
the Twenty-Sixth Annual ACM Symposium on the Theory of Computing, pages
163 171, Montreal, Quebec, Canada, 23 25 May 1994.
[10] Avrim Blum, R. Ravi, and Santosh Vempala. A constant-factor approximation
algorithm for the k-MST problem (extended abstract). In Proceedings of the
Twenty- ighth Annual ACM Symposium on the Theory of Computing, pages 442
448, Philadelphia, Pennsylvania, 22 24 May 1996.
[11] Naveen Garg. A 3-approximation for the minimum tree spanning k vertices. In
37th Annual Symposium on Foundations of Computer Science, pages 302 309,
Burlington, Vermont, 14 16 October 1996. IEEE.
[12] M. X. Goemans and D. Williamson. A general approximation technique for con-
strained forest problems. SIAM Journal on Computing, 24:296 317, 1995.
[13] Michel Goemans and Jon Kleinberg. An improved approximation ratio for the
minimum latency problem. In Proceedings of the Seventh Annual ACM-SIAM
Symposium on Discrete Algorithms, pages 152 158, New York/Philadelphia, Jan-
uary 28 30 1996. ACM/SIAM.
[14] Elias Koutsoupias, Christos H. Papadimitriou, and Mihalis Yannakakis. Searching
a xed graph. In Friedhelm Meyer auf der Heide and Burkhard Monien, editors,
Automata, Languages and Programming, 23rd International Colloquium, volume
1099 of Lecture Notes in Computer Science, pages 280 289, Paderborn, Germany,
8 12 July 1996. Springer-Verlag.
[15] Alberto Marchetti-Spaccamela and Leen Stougie, 1999. private communication.
Computing a Diameter-Constrained Minimum
Spanning Tree in Parallel

Narsingh Deo and Ayman Abdalla

School of Computer Science


University of Central Florida, Orlando, FL 32816-2362, USA
{deo, abdalla}@cs.ucf.edu

Abstract. A minimum spanning tree (MST) with a small diameter is required in


numerous practical situations. It is needed, for example, in distributed mutual
exclusion algorithms in order to minimize the number of messages
communicated among processors per critical section. The Diameter-
Constrained MST (DCMST) problem can be stated as follows: given an
undirected, edge-weighted graph G with n nodes and a positive integer k, find a
spanning tree with the smallest weight among all spanning trees of G which
contain no path with more than k edges. This problem is known to be NP-
complete, for all values of k; 4 ≤ k ≤ (n − 2). Therefore, one has to depend on
heuristics and live with approximate solutions. In this paper, we explore two
heuristics for the DCMST problem: First, we present a one-time-tree-
construction algorithm that constructs a DCMST in a modified greedy fashion,
employing a heuristic for selecting edges to be added to the tree at each stage of
the tree construction. This algorithm is fast and easily parallelizable. It is
particularly suited when the specified values for k are small—independent of n.
The second algorithm starts with an unconstrained MST and iteratively refines
it by replacing edges, one by one, in long paths until there is no path left with
more than k edges. This heuristic was found to be better suited for larger values
of k. We discuss convergence, relative merits, and parallel implementation of
these heuristics on the MasPar MP-1 — a massively parallel SIMD machine
with 8192 processors. Our extensive empirical study shows that the two
heuristics produce good solutions for a wide variety of inputs.

1 Introduction
The Diameter-Constrained Minimum Spanning Tree (DCMST) problem can be stated
as follows: given an undirected, edge-weighted graph G and a positive integer k, find
a spanning tree with the smallest weight among all spanning trees of G which contain
no path with more than k edges. The length of the longest path in the tree is called the
diameter of the tree. Garey and Johnson [7] show that this problem is NP-complete
by transformation from Exact Cover by 3-Sets. Let n denote the number of nodes in
G. The problem can be solved in polynomial time for the following four special
cases: k = 2, k = 3, k = (n – 1), or when all edge weights are identical. All other cases
are NP-complete. In this paper, we consider graph G to be complete and with n
nodes. An incomplete graph can be viewed as a complete graph in which the missing
edges have infinite weights.
G. Bongiovanni, G. Gambosi, R. Petreschi (Eds.): CIAC 2000, LNCS 1767, pp. 17-31, 2000.
 Springer-Verlag Berlin Heidelberg 2000
18 Narsingh Deo and Ayman Abdalla

The DCMST problem arises in several applications in distributed mutual exclusion


where message passing is used. For example, Raymond's algorithm [5,11] imposes a
logical spanning tree structure on a network of processors. Messages are passed
among processors requesting entrance to a critical section and processors granting the
privilege to enter. The maximum number of messages generated per critical-section
execution is 2d, where d is the diameter of the spanning tree. Therefore, a small
diameter is essential for the efficiency of the algorithm. Minimizing edge weights
reduces the cost of the network.
Satyanarayanan and Muthukrishnan [12] modified Raymond’s original algorithm
to incorporate the “least executed” fairness criterion and to prevent starvation, also
using no more than 2d messages per process. In a subsequent paper [13], they
presented a distributed algorithm for the readers and writers problem, where multiple
nodes need to access a shared, serially reusable resource. In this algorithm, the
number of messages generated by a read operation and a write operation has an upper
bound of 2d and 3d, respectively.
In another paper on distributed mutual exclusion, Wang and Lang [14] presented a
token-based algorithm for solving the p-entry critical-section problem, where a
maximum of p processes are allowed to be in their critical section at the same time. If
a node owns one of the p tokens of the system, it may enter its critical section;
otherwise, it must broadcast a request to all the nodes that own tokens. Each request
passes at most 2pd messages.
The DCMST problem also arises in Linear Lightwave Networks (LLNs), where
multi-cast calls are sent from each source to multiple destinations. It is desirable to
use a short spanning tree for each transmission to minimize interference in the
network. An algorithm by Bala et al [4] decomposed an LLN into edge disjoint trees
with at least one spanning tree. The algorithm builds trees of small diameter by
computing trees whose maximum node-degree was less than a given parameter, rather
than optimizing the diameter directly. Furthermore, the lines of the network were
assumed to be identical. If the LLN has lines of different bandwidths, lines of higher
bandwidth should be included in the spanning trees to be used more often and with
more traffic. Employing an algorithm that solves the DCMST problem can help find
a better tree decomposition for this type of network. The network would be modeled
by an edge-weighted graph, where an edge of weight 1/x is used to represent a line of
bandwidth x.
Three exact-solution algorithms the DCMST problem developed by Achuthan et al
[3] used Branch-and-Bound methods to reduce the number of subproblems. The
algorithms were implemented on a SUN SPARC II workstation operating at 28.5
MIPS. The algorithms were tested on complete graphs of different orders (n ≤ 40),
using 50 cases for each order, where edge-weights were randomly generated numbers
between 1 and 1000. The best algorithm for k = 4 produced an exact solution for
n = 20 in less than one second on average, but it took an average of 550 seconds for
n = 40. Clearly, such exact-algorithms, with exponential time complexity, are not
suitable for graphs with thousands of nodes.
For large graphs, Abdalla et al [2] presented a fast approximate algorithm. The
algorithm first computed an unconstrained MST, then iteratively refined it by
increasing the weights of (log n) edges near the center of the tree and recomputing the
MST until the diameter constraint was achieved. The algorithm was not always able
to produce DCMST(k) for k ≤ 0.05n because sometimes it reproduced spanning trees
already considered in earlier iterations, thus entering an infinite cycle.
Computing a Diameter-Constrained Minimum Spanning Tree in Parallel 19

In this paper, we first present a general method for evaluating the solutions to the
DCMST problem in Section 2. Then, we present approximate algorithms for solving
this problem employing two distinct strategies: One-Time Tree Construction (OTTC)
and Iterative Refinement (IR). The OTTC algorithm, based on Prim’s algorithm, is
presented in Section 4. A special IR algorithm and a general one are presented in
Sections 3 and 5, respectively.

2 Evaluating the Quality of a DCMST

Since the exact DCMST weights cannot be determined in a reasonable amount of time
for large graphs, we use the ratio of the computed weight of the DCMST to that of the
unconstrained MST as a rough measure of the quality of the solution.
To obtain a crude upper bound on the DCMST(k) weight (where k is the diameter
constraint), observe that DCMST(2) and DCMST(3) are feasible (but often grossly
suboptimal) solutions of DCMST(k) for all k > 3. Since there are polynomial-time
exact algorithms for DCMST(2) and DCMST(3), these solutions can be used as upper
bounds for the weight of an approximate DCMST(k). In addition, we develop a
special approximate-heuristic for DCMST(4) and compare its weight to that of
DCMST(3) to verify that it provides a tighter bound and produces a better solution for
k = 4. We use these upper bounds, along with the ratio to the unconstrained MST
weight, to evaluate the quality of DCMST(k) obtained.

3 Special IR Heuristic for DCMST(4)

The special algorithm to compute DCMST(k) starts with an optimal DCMST(3), then
replaces higher-weight edges with smaller-weight edges, allowing the diameter to
increase to 4.

3.1 An Exact DCMST(3) Computation

Clearly, in a DCMST(3) of graph G, every node must be of degree 1 except two


nodes, call them u and v. Edge (u, v) is the central edge of such a spanning tree. To
construct DCMST(3), we select an edge to be the central edge (u, v), then, for every
node x in G, x ∉{u, v}, we include in the spanning tree the smaller of the two edges
(x, u) and (x, v). To get an optimal DCMST(3), we compute all such spanning trees
— with every edge in G as its central edge — and take the one with the smallest
weight. Since we have m edges to choose from, we have to compute m different
spanning trees. Each of these trees requires (n − 2) comparisons to select (x, u) or
(x, v). Therefore, the total number of comparisons required to obtain the optimal
DCMST(3) is (n − 2)m.
20 Narsingh Deo and Ayman Abdalla

3.2 An Approximate DCMST(4) Computation

To compute DCMST(4), we start with an optimal DCMST(3). Then, we relax the


diameter constraint while reducing the spanning tree weight using edge replacement
to get a smaller-weight DCMST(4). The refinement process starts by arbitrarily
selecting one end node of the central edge (u, v), say u, to be the center of DCMST(4).
Let W(a, b) denote the weight of an edge (a, b). For every node x adjacent to v, we
attempt to obtain another tree of smaller-weight by replacing edge (v, x) with edge
(x, y), where W(x, y) < W(x, v). Furthermore, the replacement (x, y) is an edge such
that y is adjacent to u and for all nodes z adjacent to u and z ≠ v, W(x, y) ≤ W(x, z). If
no such edge exists, we keep edge (v, x) in the tree. We use the same method to
compute a second DCMST(4), with v as its center. Finally, we accept the DCMST(4)
with the smaller weight as the solution.
Suppose there are p nodes adjacent to u in DCMST(3). Then, there are (n − p − 2)
nodes adjacent to v. Therefore, we make 2p(n − p − 2) comparisons to get
DCMST(4). It can be shown that employing this procedure for a complete graph, the
expected number of comparisons required to obtain an approximate DCMST(4) from
2
an exact DCMST(3) is (n – 8n – 12)/2.

4 One-Time Tree Construction

In the One-Time Tree Construction (OTTC) strategy, a modification of Prim's


algorithm is used to compute an approximate DCMST in one pass. Prim’s algorithm
has been experimentally shown to be the fastest for computing an MST for large
dense graphs[8].
The OTTC algorithm grows a spanning tree by connecting the nearest neighbor
that does not violate the diameter constraint. Since such an approach keeps the tree
connected in every iteration, it is easy to keep track of the increase in tree-diameter.
This Modified Prim algorithm is formally described in Figure 1, where we maintain
the following information for each node u:
• near(u) is the node in the tree nearest to the non-tree node u.
• wnear(u) is the weight of edge (u, near(u)).
• dist(u, 1..n) is the distance (unweighted path length) from u to every other
node in the tree if u is in the tree, and is set to -1 if u is not yet in the tree.
• ecc(u) is the eccentricity of node u, (the distance in the tree from u to the
farthest node) if u is in the tree, and is set to -1 if u is not yet in the tree.
To update near(u) and wnear(u), we determine the edges that connect u to
partially-formed tree T without increasing the diameter (as the first criterion) and
among all such edges we want the one with minimum weight. We do this efficiently,
without having to recompute the tree diameter for each edge addition.
In Code Segment 1 of the OTTC algorithm, we set the dist(v) and ecc(v) values for
node v by copying from its parent node near(v). In Code Segment 2, we update the
values of dist and ecc for the parent node in n steps. In Code Segment 3, we update
the values of dist and ecc for other nodes. We make use of the dist and ecc arrays, as
described above, to simplify the OTTC computation.
Computing a Diameter-Constrained Minimum Spanning Tree in Parallel 21

procedure ModifiedPrim
INPUT:Graph G, Diameter bound k
OUTPUT: Spanning Tree T = (VT,ET)
initialize VT := Φ and ET, := Φ
select a root node v0 to be included in VT
initialize near(u) := v0 and wnear(u) := wuv0, for every u ∉ VT
compute a next-nearest-node v such that:
wnear(v) = MINu∉VT{wnear(u)}
while (|ET| < (n−1))
select the node v with the smallest value of wnear(v)
set VT := VT ∪ {v} and ET := ET ∪ {(v,near(v))}
{1. set dist(v,u) and ecc(v)}
for u = 1 to n
if dist(near(v),u) > −1 then
dist(v,u) := 1 + dist(near(v),u)
dist(v, v) := 0
ecc(v) := 1 + ecc(near(v))
{2. update dist(near(v),u) and ecc(near(v))}
dist(near(v),v) = 1
if ecc(near(v)) < 1 then
ecc(near(v)) = 1
{3. update other nodes' values of dist and ecc}
for each tree node u other than v or near(v)
dist(u,v) = 1 + dist(u,near(v))
ecc(u) = MAX{ecc(u),dist(u,v)}
{4. update the near and wnear values for other nodes in G}
for each node u not in the tree
if 1 + ecc(near(u)) > u then
examine all nodes in T to determine near(u) and wnear(u)
else
compare wnear(u) to the weight of (u,v).

Fig. 1. OTTC Modified Prim algorithm

Code Segment 4 is the least intuitive. Here, we update the near and wnear values
for every node not yet in the tree by selecting an edge which does not increase the tree
diameter beyond the specified constraint and has the minimum weight among all such
edges. Now, adding v to the tree may or may not increase the diameter. If the tree
diameter increases, and near(u) lies along a longest path in the tree, then adding u to
the tree by connecting it to near(u) may violate the constraint. In this case, we must
reexamine all nodes of the tree to find a new value for near(u) that does not violate
the diameter constraint. This can be achieved by examining ecc(t) for nodes t in the
22 Narsingh Deo and Ayman Abdalla

tree; i.e., we need not recompute the tree diameter. This computation includes adding
a new node to the tree.
On the other hand, if (u, near(u)) is still a feasible edge, then near(u) is the best
choice for u among all nodes in the tree except possibly v, the newly added node. In
this case, we need only determine whether edge (u, v) would increase the tree
diameter beyond the constraint, and if not, whether the weight of (u, v) is less than
wnear(u).
2
The complexity of Code Segment 4 is O(n ) when the diameter constraint k is
small, since it requires looking at each node in the tree once for every node not in the
tree. This makes the time complexity of this algorithm higher than that of Prim's
algorithm. The while loop requires (n − 1) iterations. Each iteration requires at most
2 3
O(n ) steps, which makes the worst case time complexity of the algorithm O(n ).
This algorithm does not always find a DCMST. Furthermore, the algorithm is
sensitive to the node chosen for starting the spanning tree. In both the sequential and
parallel implementations, we compute n such trees, one for each starting node. Then,
we output the spanning tree with the largest weight.
To reduce the time needed to compute the DCMST further, we develop a heuristic
that selects a small set of starting nodes as follows. Select the q nodes (q is
independent of n) with the smallest sum of weights of the edges emanating from each
node. Since this is the defining criterion for spanning trees with diameter k = 2 in
complete graphs, it is polynomially computable. The algorithm now produces q
spanning trees instead of n, reducing the overall time complexity by a factor O(n)
when we choose a constant value for q.

5 The General Iterative Refinement Algorithm

This IR algorithm does not recompute the spanning tree in every iteration; rather, a
new spanning tree is computed by modifying the previously computed one. The
modification performed never produces a previously generated spanning tree and,
thus it guarantees the algorithm will terminate. Unlike the algorithm in [2], this
algorithm removes one edge at a time and prevents cycling by moving away from the
center of the tree whenever cycling becomes imminent.
This new algorithm starts by computing the unconstrained MST for the input graph
G = (V, E). Then, in each iteration, it removes one edge that breaks a longest path in
the spanning tree and replaces it by a non-tree edge without increasing the diameter.
The algorithm requires computing eccentricity values for all nodes in the spanning
tree in every iteration.
The initial MST can be computed using Prim’s algorithm. The initial eccentricity
values for all nodes in the MST can be computed using a preorder tree traversal where
each node visit consists of computing the distances from that node to all other nodes
2
in the spanning tree. This requires a total of O(n ) computations. As the spanning
tree changes, we only recompute the eccentricity values that change. After computing
the MST and the initial eccentricity values, the algorithm identifies one edge to
remove from the tree and replaces it by another edge from G until the diameter
constraint is met or the algorithm fails. When implemented and executed on a variety
of inputs, we found that this process required no more than (n + 20) iterations. Each
iteration consists of two parts. In the first part, described in Subsection 5.1, we find
Computing a Diameter-Constrained Minimum Spanning Tree in Parallel 23

an edge whose removal can contribute to reducing the diameter, and in the second
part, described in Subsection 5.2, we find a good replacement edge. The IR algorithm
is shown in Figure 2, and its two different edge-replacement subprocedures are shown
in Figures 3 and 4. We use eccT(u) to denote the eccentricity of node u with respect to
spanning tree T, the maximum distance from u to any other node in T. The diameter
of a spanning tree T is given by MAX{eccT(u)} over all nodes u in T.

5.1 Selecting Edges for Removal

To reduce the diameter, the edge removed must break a longest path in the tree and
should be near the center of the tree. The center of spanning tree T can be found by
identifying the nodes u in T with eccT(u) = diameter/2, the node (or two nodes) with
minimum eccentricity.
Since we may have more than one edge candidate for removal, we keep a sorted
list of candidate edges. This list, which we call MID, is implemented as a max-heap
sorted according to edge weights, so that the highest-weight candidate edge is at the
root.
Removing an edge from a tree does not guarantee breaking all longest paths in the
tree. The end nodes of a longest path in T have maximum eccentricity, which is equal
to the diameter of T. Therefore, we must verify that removing an edge splits the tree
T into two subtrees, subtree1 and subtree2, such that each of the two subtrees contains
a node v with eccT(v) equal to the diameter of the tree T. If the highest-weight edge
from list MID does not satisfy this condition, we remove it from MID and consider
the next highest. This process continues until we either find an edge that breaks a
longest path in T or the list MID becomes empty.
If we go through the entire list, MID, without finding an edge to remove, we must
consider edges farther from the center. This is done by identifying the nodes u with
eccT(u) = diameter/2 + bias, where bias is initialized to zero, and incremented by 1
every time we go through MID without finding an edge to remove. Then, we
recompute MID as all the edges incident to set of nodes u. Every time we succeed in
finding an edge to remove, we reset the bias to zero.
This method of examining edges helps prevent cycling since we consider a
different edge every time until an edge that can be removed is found. But to
guarantee the prevention of cycling, we always select a replacement edge that reduces
the length of a path in T. This will guarantee that the refinement process will
terminate, since we will either reduce the diameter below the bound k, or bias will
become so large that we try to remove the edges incident to the end-points of the
longest paths in the tree.

procedure IterativeRefinement
INPUT: Graph G = (V,E), diameter bound k
OUTPUT: Spanning tree T with diameter ≤ k
compute MST and eccT(v) for all v in V
MID := Φ
move := false
24 Narsingh Deo and Ayman Abdalla

repeat
diameter := MAXv∈V{eccT(v)}
if MID = Φ then
if move = true then
move := false
MID := edges (u,v) that are one edge farther from
the center of T than in the previous iteration
else
MID := edges (u,v) at the center of T
repeat
(x,y) := highest weight edge in MID
{This splits T into two trees: subtree1 and subtree2}
until MID = Φ
OR MAXu∈subtree1{eccT(u)} = MAXv∈subtree2{eccT(v)}
if MID = Φ then {no good edge to remove was found}
move := true
else
remove (x,y) from T
get a replacement edge and add it to T
recompute eccT values
until diameter ≤ k OR we are removing the edges farthest from
the center of T
Fig. 2. The general IR algorithm

In the worst case, computing list MID requires examining many edges in T,
requiring O(n) comparisons. In addition, sorting MID will take O(n log n) time. A
2
replacement edge is found in O(n ) time since we must recompute eccentricity values
for all nodes to find the replacement that helps reduce the diameter. Therefore, the
3
iterative process, which removes and replaces edges for n iterations, will take O(n )
time in the worst case. Since list MID has to be sorted every time it is computed, the
execution time can be reduced by a constant factor if we prevent MID from becoming
too large. This is achieved by an edge-replacement method that keeps the tree T fairly
uniform so that it has a small number of edges near the center, as we will show in the
next subsection. Since MID is constructed from edges near the center of T, this will
keep MID small.

5.2 Selecting a Replacement Edge

When we remove an edge from a tree T, we split T into two subtrees: subtree1 and
subtree2. Then, we select a non-tree edge to connect the two subtrees in a way that
reduces the length of at least one longest path in T without increasing the diameter.
The diameter of T will be reduced when all longest paths have been so broken. We
develop two methods, ERM1 and ERM2, to find such replacement edges.
Computing a Diameter-Constrained Minimum Spanning Tree in Parallel 25

5.2.1 Edge-Replacement Method ERM1


This method, shown in Figure 3, selects a minimum-weight edge (a, b) in G
connecting a central node a in subtree1 to a central node b in subtree2. Among all
edges that can connect subtree1 to subtree2, no other edge (c, d) will produce a tree
such that the diameter of subtree1 ∪ subtree2 ∪ {(c, d)} is smaller than the diameter
of subtree1 ∪ subtree2 ∪ {(a, b)}. However, such an edge (a, b) is not guaranteed to
exist in incomplete graphs.

procedure ERM1
Recompute eccsubtree1 and eccsubtree2 for each subtree by itself
m1 := MINu∈subtree1{eccsubtree1(u)}
m2 := MINu∈subtree2{eccsubtree2(u)}
(a,b) := minimum-weight edge in G that has:
a ∈ subtree1 AND b ∈ subtree2 AND
eccsubtree1(a) = m1 AND eccsubtree2(b) = m2
Add edge (a,b) to T
if MID = Φ OR bias = 0 then
move := true
MID := Φ
Fig. 3. Edge-replacement method ERM1

Since there can be at most two central nodes in each subtree, there are at most four
edges to select from. The central nodes in the subtrees can be found by computing
eccsubtree1 and eccsubtree2 in each subtree, then taking the nodes v with eccsubtree(v) =
MIN{eccsubtree(u)} over all nodes u in the subtree that contains v. This selection can be
2
done in O(n ) time.
Finally, we set the boolean variable move to true every time we remove an edge
incident to the center of the tree. This causes the removal of edges farther from the
center of the tree in the next iteration of the algorithm, which prevents removing the
edge (a, b) which has just been added.
This edge-replacement method seems fast at the first look, because it selects one
out of four edges. However, in the early iterations of the algorithm, this method
creates nodes of high degree near the center of the tree, which causes MID to be very
large. This, as we have shown in the previous section, causes the time complexity of
the algorithm to increase by a constant factor. Furthermore, having at most four
edges from which to select a replacement often causes the tree weight to increase
significantly.

5.2.2 Edge-Replacement Method ERM2


This method, shown in Figure 5, computes eccsubtree1 and eccsubtree2 values for each
subtree individually, as in ERM1. Then, the two subtrees are joined as follows. Let
the removed edge (x, y) have x∈ subtree1 and y∈ subtree2. The replacement edge
will be the smallest-weight edge (a, b) which (1) guarantees that the new edge does
not increase the diameter, and (2) guarantees reducing the length of a longest path in
the tree at least by one. We enforce condition (1) by:
26 Narsingh Deo and Ayman Abdalla

eccsubtree1(a) ≤ eccsubtree1(x) AND eccsubtree2(b) ≤ eccsubtree2(y) ,


(1)
and condition (2) by:
eccsubtree1(a) < eccsubtree1(x) OR eccsubtree2(b) < eccsubtree2(y) . (2)
If no such edge (a, b) is found, we must remove an edge farther from the center of the
tree, instead.

procedure ERM2
recompute eccsubtree1 and eccsubtree2 for each subtree by itself
m1 := eccsubtree1(x)
m2 := eccsubtree2(y)
(a,b) := minimum-weight edge in G that has:
a ∈ subtree1 AND b ∈ subtree2 AND eccsubtree1(a) ≤ m1
AND eccsubtree2(b) ≤ m2 AND
(eccsubtree1(a) < m1 OR eccsubtree2(b) < m2)
if such an edge (a,b) is found then
add edge (a,b) to T
else
add the removed edge (x,y) back to T
move := true
Fig. 4. Edge-replacement method ERM2

Since ERM2 is not restricted to the centers of the two subtrees, it works better than
ERM1 on incomplete graphs. In addition, it can produce DCMSTs with smaller
weights because it selects a replacement from a large set of edges, instead of 4 or
fewer edges as in ERM1. The larger number of edges increases the total time
complexity of the IR algorithm by a constant factor over ERM1. Furthermore, this
method does not create nodes of high degree near the center of the tree as in ERM1.
This helps keep the size of list MID small in the early iterations, reducing the time
complexity of the IR algorithm by a constant factor.

6 Implementation

In this section, we present empirical results obtained by implementing the OTTC and
IR algorithms on the MasPar MP-1, a massively-parallel SIMD machine of 8192
processors. The processors are arranged in a mesh where each processor is connected
to its eight neighbors.
Complete graphs Kn, represented by their (n × n) weight matrices, were used as
input. Since the MST of a randomly generated graph has a small diameter, O(log n)
[2], they are not suited for studying the performance of DCMST(k) algorithms.
Therefore, we generated graphs in which the minimum spanning trees are forced to
have diameter of (n − 1).
Computing a Diameter-Constrained Minimum Spanning Tree in Parallel 27

6.1 One-Time Tree Construction

We parallelized the OTTC algorithm and implemented it on the MasPar MP-1 for
graphs of up to 1000 nodes. The DCMST generated from one start node for a graph
of 1000 nodes took roughly 71 seconds, which means it would take it about 20 hours
to run with n start nodes. We address this issue by running the algorithm for a
carefully selected small set of start nodes.
We used two different methods to choose the set of start nodes. SNM1 selects the
center nodes of the q smallest stars in G as start nodes. SNM2 selects q nodes from G
at random. As seen in Figure 5, the quality of DCMST obtained from these two
heuristics, where we chose q = 5, is similar. The execution times of these two
heuristics were also almost identical.
The results from running the OTTC algorithm with n start nodes were obtained for
graphs of 50 nodes and compared with the results obtained with 5 start nodes for the
same graphs; for k = 4, 5, and 10. The results compare the average value of the
smallest weight found from SNM1 and SNM2 to the average weight found from the
OTTC algorithm that runs for n iterations. The quotient of these values is reported.
For k = 4, the DCMST obtained using SNM1 had weight of 1.077 times the weight
from the n-iteration OTTC algorithm. The cost of SNM2-tree was 1.2 times that of
the n-iteration tree. For k = 5, SNM1 weight-ratio was 1.081 while SNM2 weight-
ratio was 1.15. For k = 10, SNM1 weight-ratio was 1.053 while SNM2 weight-ratio
was 1.085. In these cases, SNM1 outperforms SNM2 in terms of the quality of
solutions, in some cases by as much as 12%. The results obtained confirm the
theoretical analysis that predicted an improvement of O(n) in execution time, as
described in Section 4. The execution time for both SNM1 and SNM2 is
approximately the same. This time is significantly less than the time taken by the
n-iteration algorithm as expected. Therefore, SNM1 is a viable alternative to the
n-iteration algorithm.

20
18
16
14 SNM1
weight ratio

12 SNM2
10
8 DCMST(3)
6 DCMST(4)
4
2
0
50 100 150 200
Number of Nodes (n)

Fig. 5. Weight of DCMST(5), obtained using two different node-search heuristics, as a multiple
of MST weight. Initial diameter = n − 1

6.2 The Iterative Refinement Algorithms

The heuristic for DCMST(4) was also parallelized and implemented on the MasPar
MP-1. It produced DCMST(4) with weight approximately half that of DCMST(3), as
28 Narsingh Deo and Ayman Abdalla

we see in Figures 5, 6, and 8. The time to refine DCMST(4) took about 1% of the
time to calculate DCMST(3).
We also parallelized and implemented the general IR algorithm on the MasPar
MP-1. As expected, the algorithm did not enter an infinite loop, and it always
terminated within (n + 20) iterations. The algorithm was unable to find a DCMST
with diameter less than 12 in some cases for graphs with more than 300 nodes. In
graphs of 400, 500, 1000, and 1500 nodes, our empirical results show a failure rate of
less than 20%. The algorithm was 100% successful in finding a DCMST with k = 15
for graphs of up to 1500 nodes. This shows that the failure rate of the algorithm does
not depend on what fraction of n the value of k is. Rather, it depends on how small
the constant k is.
To see this, we must take a close look at the way we move away from the center of
the tree when we select edges for removal. Note that the algorithm will fail only
when we try to remove edges incident to the end-points of the longest paths in the
spanning tree. Also note that we move away from the center of the tree every time we
go through the entire set MID without finding a good replacement edge, and we return
to the center of the spanning tree every time we succeed.

25
weight ratio: DCMST/MST

20 DCMST(3)
15 ERM1
10 ERM2
5 DCMST(4)
0
100 200 300 400 500
Number of Nodes (n)

Fig. 6. Quality of DCMST(10) obtained using two different edge-replacement methods. Initial
diameter = n – 1

Thus, the only way the algorithm fails is when it is unable to find a good
replacement edge in diameter/2 consecutive attempts, each of which includes going
through a different set of MID. Empirical results show that it is unlikely that the
algorithm will fail for 8 consecutive times, which makes it suitable for finding
DCMST where the value of k is a constant greater than or equal to 15. The algorithm
still performs fairly well with k =10, and we did use that data in our analysis, where
we excluded the few cases in which the algorithm did not achieve diameter 10. This
exclusion should not affect the analysis, since the excluded cases all achieved
diameter less than 15 with approximately the same speed as the successful attempts.
The quality of DCMST(10) obtained by the IR technique using the two different
edge replacement methods, ERM1 and ERM2, is shown in Figure 6. The diagram
shows the weight of the computed DCMST(10) as a multiple of the weight of the
unconstrained MST. The time taken by the algorithm using ERM1 and ERM2 to
obtain DCMST(10) is shown in Figure 7. As expected, ERM2 out-performs ERM1 in
time and quality. In addition, ERM1 uses more memory than ERM2, because the size
of MID when we use ERM1 is significantly larger than its size when ERM2 is used.
Computing a Diameter-Constrained Minimum Spanning Tree in Parallel 29

This is caused by the creation of high-degree nodes by ERM1, as explained in


Subsection 4.2.
We tested the general IR algorithm, using ERM2, on random graphs. The quality
of the DCMSTs obtained are charted in Figure 8. Comparing this figure with those
obtained for the randomly-generated graphs forced to have unconstrained MST with
diameter (n − 1), it can be seen that the quality of DCMST(10) in the graphs starting
with MSTs of (n − 1) diameter is better than that in unrestricted random graphs. This
is because the IR algorithm keeps removing edges close to the center of the
constrained spanning tree, which contain more low-weight edges in unrestricted
random graphs, coming from the unconstrained MST. But when the unconstrained
MST has diameter (n −1), there are more heavy-weight edges near the center that
were added in some earlier iterations of the algorithm. Therefore, the DCMST for
this type of graphs will lose less low-weight edges than in unrestricted random
graphs.

120

100

80
Time (seconds)

ERM1
60
ERM2
40

20

0
100 200 300 400 500
Number on Nodes (n)

Fig. 7. Time to reduce diameter from n−1 to 10 using two different edge-replacement methods
weight ratio: DCMST / MST

10
8
DCMST(3)
6
DCMST(10)
4 DCMST(4)
2
0
100 200 300 400 500 1000
Number of Nodes (n)

Fig. 8. Quality of DCMST(4) and DCMST(10) for unrestricted random graphs

Furthermore, the weight of DCMST(4) was lower than that of DCMST(10) in


unrestricted random graphs. Note that the DCMST(4) heuristic approaches the
diameter optimization from above, rather than from below. When the diameter
constraint is small, it becomes more difficult for the general IR algorithm to find a
solution and allows large increases in tree-weight in order to achieve the required
diameter. The approach from the upper bound, however, guarantees the tree weight
will not increase during the refinement process. The performance of the DCMST(4)
30 Narsingh Deo and Ayman Abdalla

algorithm did not change much in unrestricted random graphs. Rather, the quality of
DCMST(10) deteriorated, exceeding the upper bound. Clearly, DCMST(4) algorithm
provides a better solution for this type of graphs.

7 Conclusions

We have presented three algorithms that produce approximate solutions to the


DCMST problem, even when the diameter constraint is a small constant. One is a
modification of Prim’s algorithm, combined with a heuristic that reduces the
execution time by a factor of n (by selecting a small constant number of nodes as the
start nodes in the OTTC algorithm) at a cost of a small increase in the weight of the
DCMST.
The second is an IR algorithm to find an approximate DCMST. This algorithm is
guaranteed to terminate, and it succeeds in finding a reasonable solution when the
diameter constraint is a constant, about 15. The third is a special IR algorithm to
compute DCMST(4). This algorithm was found to be especially effective for random
graphs with uniformly distributed edge weights, as it outperformed the other two in
speed and quality of solution. This algorithm provides a tighter upper bound on
DCMST quality than the one provided by the DCMST(3) solution. We implemented
these algorithms on an 8192-processor, the MasPar MP-1, for various types of graphs.
The empirical results from this implementation support the theoretical conclusions
obtained.

References

1. Abdalla, A., Deo, N., Fraceschini, R.: Parallel heuristics for the diameter-constrained MST
problem. Congressus Numerantium, (1999) (to appear)
2. Abdalla, A., Deo, N., Kumar, N., Terry, T.: Parallel computation of a diameter-constrained
MST and related problems. Congressus Numerantium, Vol. 126. (1997) 131−155
3. Achuthan, N.R., Caccetta, L., Caccetta, P., Geelen, J.F: Algorithms for the minimum
weight spanning tree with bounded diameter problem. Optimization: Techniques and
Applications, (1992) 297−304
4. Bala, K., Petropoulos, K., Stern,T.E.: Multicasting in a Linear Lightwave Network. IEEE
INFOCOM ’93, Vol. 3. (1993) 1350−1358
5. Chow, R., Johnson, T.: Distributed Operating Systems and Algorithms. Addison-Wesley,
Reading, MA (1997)
6. Deo N., Kumar, K.: Constrained Spanning Tree Problems: Approximate Methods and
Parallel Computation. DIMACS Series in Discrete Mathematics and Theoretical Computer
Science, Vol. 40. (1998) 191−217
7. Garey, M.R., Johnson D.S.: Computers and Intractability: A Guide to the Theory of NP-
Completeness. W.H. Freeman, San Francisco (1979)
8. Moret, B.M.E., Shapiro, H.D.: An empirical analysis of algorithms for constructing a
minimum spanning tree. DIMACS Series in Discrete Mathematics and Theoretical
Computer Science, Vol. 15. (1994) 99−117
9. P.W. Paddock.: Bounded Diameter Minimum Spanning Tree Problem, M.S. Thesis,
George Mason University, Fairfax, VA (1984)
Computing a Diameter-Constrained Minimum Spanning Tree in Parallel 31

10. Palmer., E.M.: Graph Evolution: An Introduction to the Theory of Random Graphs, John -
Wiley & Sons, Inc., New York (1985)
11. Raymond, K.: A tree-based algorithm for distributed mutual exclusion. ACM Transactions
on Computer Systems, Vol. 7. No. 1. (1989) 61−77
12. Satyanarayanan, R., Muthukrishnan, D.R.: A note on Raymond’s tree-based algorithm for
distributed mutual exclusion. Information Processing Letters, Vol. 43. (1992) 249−255
13. Satyanarayanan, R., Muthukrishnan, D.R.: A static-tree-based algorithm for the distributed
readers and writers problem. Computer Science and Informatics, Vol. 24. No.2. (1994)
21−32
14. Wang, S, Lang, S.D.: A tree-based distributed algorithm for the k-entry critical section
problem. In: Proceedings of the 1994 International Conference on Parallel and Distributed
Systems, (1994) 592−597
Algorithms for a Simple Point Placement Problem

Joshua Redstone and Walter L. Ruzzo

Department of Computer Science & Engineering


University of Washington
Box 352350
Seattle, WA 98195-2350
redstone,ruzzo @cs.washington.edu

Abstract. We consider algorithms for a simple one-dimensional point placement


problem: given N points on a line, and noisy measurements of the distances be-
tween many pairs of them, estimate the relative positions of the points. Problems
of this flavor arise in a variety of contexts. The particular motivating example that
inspired this work comes from molecular biology; the points are markers on a
chromosome and the goal is to map their positions. The problem is NP-hard under
reasonable assumptions. We present two algorithms for computing least squares
estimates of the ordering and positions of the markers: a branch and bound al-
gorithm and a highly effective heuristic search algorithm. The branch and bound
algorithm is able to solve to optimality problems of 18 markers in about an hour,
visiting about 106 nodes out of a search space of 1016 nodes. The local search
algorithm usually was able to find the global minimum of problems of similar
size in about one second, and should comfortably handle much larger problem
instances.

1 Introduction

The problem of mapping genetic information has been the subject of extensive research
since experimenters started breeding fruit flies for physical characteristics. Due to the
small scale of chromosomes, it has been difficult to obtain accurate information on
their structure. Many techniques relying on statistical inference of indirect data have
been applied to deduce this information; see [1] for some examples.
More recently, researchers have developed many techniques for estimating of rela-
tive positions various genetic features by more direct physical means. We are interested
in one called fluorescent in situ hybridization (FISH). In this technique, pairs of fluo-
rescently labeled probes are hybridized (attached) to specific sites on a chromosome.
The 2-d projection of the distance between the probes is measured under a microscope.
Despite the highly folded state of DNA in vivo and the resulting high variance of in-
dividual measurements, [10] shows that the genomic distance can be estimated if the
experiment is repeated in many cells.
Not surprisingly, if more pairs of probes are measured, and the measurement be-
tween each pair is repeated many times, the accuracy of the answer increases. Unfortu-
nately, so does the cost. Hence, the resulting computational problem is the following:

G. Bongiovanni, G. Gambosi, R. Petreschi (Eds.): CIAC 2000, LNCS 1767, pp. 32–43, 2000.

c Springer-Verlag Berlin Heidelberg 2000
Algorithms for a Simple Point Placement Problem 33

Problem: Given N probes on a line, and an incomplete set of noisy pairwise measure-
ments between probes, determine the best estimate of the ordering and position of
the probes.

If the measurements were complete and accurate, the problem would be easy—the
farthest pair obviously are the extreme ends, and the intervening points can be placed by
sorting their distances to the extremes. However, with partial, noisy data, the problem
is known to be NP-hard. (See [6, 5] for a particularly simple proof.)

1.1 Previous Work

Brian Pinkerton previously investigated solving this problem using the seriation algo-
rithm of [3], and a branch and bound algorithm (personal communication, 6/96). The
seriation algorithm, which is a local search algorithm, was only moderately effective.
The branch and bound algorithm, using a simple bounding function, was able to solve
problems involving up to about 16 probes.
There has been extensive work on other algorithms to solve DNA mapping prob-
lems, but they are based on distance estimates from techniques other than FISH, and are
tailored to the particular statistical properties of the distance measurements. Two among
many examples are the distance geometry algorithm of [7], based on recombination fre-
quency data, and [2], which investigated branch and bound, simulated annealing, and
maximum likelihood algorithms based on data from radiation hybrid mapping.

1.2 Outline

We present two algorithms for finding least-squares solutions to the probe placement
problem. One is a branch and bound algorithm that can find provably optimal solutions
to problems of moderate, but practically useful, size. The second is a heuristic search
algorithm, fundamentally a “hill-climbing” or greedy algorithm, that is orders of mag-
nitude faster than the branch and bound algorithm, and although it is incapable of giving
certifiably optimal solutions, it appears to be highly effective on this data.
In the next section we sketch some of the more difficult aspects of the problem.
Section 3 develops a cost function to evaluate solutions. Section 4 describes the heuristic
search algorithm. Section 5 outlines the branch and bound algorithm. We then present
the results of simulations of the two algorithms in Section 6.

2 Introduction to the Solution Space

Before explaining the development of the algorithms, it is helpful to gain some intuition
about the solution space. Given that the data is both noisy and incomplete, the problem
can be under-constrained and/or over-constrained. In this domain, a “constraint” refers
to a measurement between two probes (since it constrains the placement of the probes).
An under-constrained problem instance is one in which a probe might not have
enough measurements to other probes to uniquely determine its position. In the example
of four probes in Figure 1, probe B has only one measurement to probe A, and so a
34 Joshua Redstone and Walter L. Ruzzo

A B C D B A C D

Fig. 1. An example of an under-constrained ordering (Probe B can be placed on either


side of probe A). A line between two probes indicates a measurement between the
probes.

location on either side of probe A is consistent with the data. It is also important to note
that in all solutions, left/right orientation is arbitrary as is the absolute probe position.
In a more extreme example, a set of probes could have no measurements to an-
other set. In Figure 2, probes A and B have no measurements to probes C and D, and
placement anywhere relative to probes C and D is consistent with the data.

A B C D or A C B D

Fig. 2. Another example of an under-constrained ordering

In the examples of Figures 1 and 2, not only are the positions not uniquely deter-
mined, but different orderings are possible. When developing search algorithms, we
have to be careful to recognize and treat such cases correctly. It appears that in real
data such as from [Trask, personal communication, 1996], there are no degrees of free-
dom in the relative positioning of probes due to the careful choice of pairs of probes to
measure. However, under-constrained instances do arise in the branch and bound algo-
rithm described in Section 5 and in any algorithm that solves the problem by examining
instances with a reduced set of constraints.
Due to the noise in the data, parts of a problem instance will be over-constrained.
For example, as shown in Figure 3, if we examine three probes with pairwise mea-
surements between them and there isn’t an ordering such that the sum of two pairwise
measurements equals the third pairwise measurement, there will be no way to place
the three probes on a line. In this case, the distances between the probes in any linear
placement will unavoidably be different from the measured distances.

12

5 5
A B C

Fig. 3. There is no way to linearly place these probes on a line and respect all the mea-
surements.
Algorithms for a Simple Point Placement Problem 35

Given the existence of over and under-constrained problems, it is necessary to de-


velop a method of evaluating how well a solution conforms to the data. This is covered
in Section 3. Once we define how to evaluate a solution, we will develop algorithms to
search for the best solution.

3 How to Evaluate a Probe Placement

We construct a cost function to evaluate the “goodness” of a solution, then solve the
problem by finding the answer that has the least cost. Let N be the number of probes,
xi be the assigned position of probe i, dij be the measured distance between probe i
and probe j (dij = dji ), and let wij = wji be a nonnegative weight associated with the
measured distance between probes i and j. We define the cost of this placement to be
the weighted sum of squares of the differences between the measured distance between
two probes and the distance between the probes in the given linear placement of the
probes:

Cost(x1 xN ) = wij ( xi − xj −dij )2 (1)


i j
dij measured

Many subsequent formulae will be simplified by assuming wij = 0 if i = j or if the


distance dij has not been measured. For example, we could have omitted the qualifier
“dij measured” from Equation 1 under this assumption.
Intuitively, the weight wij reflects the relative confidence we have in measurement
dij . For example, if the measurement errors were independent normal random variables,
2 2
then we should choose the weight wij to be proportional to 1 ij , where ij is the
variance of dij . Least squares solutions under these assumptions have several desirable
properties, like being unbiased maximum likelihood estimators. Even though the error
distribution in our motivating problem violated these assumptions, choosing weights
inversely proportional to the variances substantially improved the solution quality (and
speed) of our algorithms; see [9].

3.1 Finding Least Squares Solutions for a Fixed Ordering

One standard approach to solving a least squares problem is to take the partial deriva-
tives of the cost with respect to each of the xi ’s, set them equal to 0, and solve. Unfortu-
nately, our cost function is not differentiable due to the absolute value terms. However,
for a given fixed ordering of the probes we can bypass this difficulty, allowing us to find
the placement which minimizes cost for the given ordering. Without loss of generality,
assume

x1 < x2 < < xN (2)


36 Joshua Redstone and Walter L. Ruzzo

Then for a given probe k:

wij ( xi − xj −dij )2 = 2wik (xk − xi − dik )


xk i j 1ik−1

− 2wki (xi − xk − dki ) (3)


k+1iN

Separating the terms and setting equal to 0, we get for xk :

xk (wik ) + (−wik xi ) = (wik dik ) − (wki dki ) (4)


1iN 1iN 1ik−1 k+1iN

Equation 4 is of the form

Mx = r (5)

where x is the vector of xi ’s, M is the matrix defined as:

−wij i=j
Mij = (6)
1pN (wip ) i=j

and r is the vector whose k th component rk is given by the right hand side of Equation 4.
Thus, in matrix form, Equation 4 can be written as:

−wk1 Mkk −wkN xk (wik dik ) − (wki dki )


1ik−1
= k+1iN

where Mkk , the summation term in Equation 6, is the sum of the weights of the mea-
surements from probe k to other probes.
A critical point is that there is no guarantee that the ordering of the probes in the
solution of Mx = r will respect the ordering (2) used to construct this linear system.
However, the solution to this linear system provides useful information in either case.

– If the solution does respect the ordering, then it provides the optimal (in the least-
squares sense) positioning of the probes with respect to the given ordering, and is a
local minimum of the cost function.
– If the solution does not respect the ordering, then it gives a lower bound on the cost
of the best placement with that ordering. This is true since solution to Mx = r gives
the minimum of i j wij (xj − xi − dij )2 over all x, which is certainly no greater
than the minimum over the region x x1 < x2 < < xN . Furthermore, in this
case the given ordering is not the optimal one, since the solution to Mx = r gives
Algorithms for a Simple Point Placement Problem 37

an ordering having a lower cost. This holds since for each pair i < j for which
xi > xj , we have

( xi − xj −dij )2 = (xi − xj − dij )2 < (xj − xi − dij )2

In other words, at the point x solving Mx = r, each term in the true cost function
is less that or equal to the corresponding term in the restricted cost function built
assuming the ordering x1 < x2 < < xN , and so that ordering cannot be
optimal.

These are the key observations on which our algorithms are built. The problem has
been reduced from a continuous optimization problem to a discrete one—that of com-
puting the matrix solution over all probe orderings and choosing the ordering with the
lowest cost. Our branch and bound algorithm searches over all possible probe orderings,
using an extension of the method above to bound the cost of large sets of possible order-
ings, provably finding the one(s) of minimum cost. The branch and bound algorithm is
described more fully in Section 5. Our heuristic search algorithm is even simpler. Start-
ing from many random orderings, it merely iterates the process described above until
it reaches a local minimum. Empirically, this is highly effective at finding the global
minimum quickly. This is described more fully in the next section.

4 Heuristic Search

As outlined in the previous section, solution to the linear system constructed for any
fixed order of the probes either gives the optimal placement for probes in that order,
which is a local minimum of the cost function, or gives a placement with another order-
ing 0 at which the cost function is lower than it is at any placement respecting . Our
heuristic search algorithm is simply “iterated linear solve”:

1. choose a random ordering ;


2. set up the linear system corresponding to that ordering;
3. solve it;
4. if the resulting order 0 is equal to , record that as a potential minimum;
5. if 0 = , replace by 0 and return to step 2.

Finally, we repeat this entire process for many random initial orderings, and report the
lowest cost solution found. In different tests, we either did a fixed number of random
starts, usually 300, or repeated until the known optimal solution was found.
One nice feature of the matrix formulation is that M is independent of the ordering
of the probes. When solving this system by LU decomposition (as in [8]), this means
that once we perform an initial O(N 3 ) operation on M, we can find a solution in O(N 2 )
time per ordering, the time required to generate the (order-dependent) vector r and
backsolve.
38 Joshua Redstone and Walter L. Ruzzo

5 Branch and Bound


Our branch and bound algorithm constructs a search tree over probe orderings. The
leaves will be complete orderings and the interior nodes will be partially specified or-
derings. There are two basic approaches to structuring the search tree. In the first ap-
proach, shown in Figure 4, the children of a node P in the tree will be the ordering
of probes at node P augmented with a new probe in all possible positions among the
probes ordered at P .

A A B C

AB BA AB AC BA BC CA CB

CAB ACB ABC CBA BCA BAC ABC ACB BAC BCA CAB CBA

Fig. 4. At a node, the children are or- Fig. 5. At a node, the children are or-
derings in which an additional probe is derings in which each of the unordered
placed in all possible positions with re- probes has been placed to the right of
spect to the ordered probes. the rightmost ordered probe.

For the second approach, in Figure 5, the ordering of a child of an interior node P
will be the ordering of P augmented by a probe placed adjacent to the rightmost ordered
probe in P .
In either approach, as is typical in branch and bound algorithms, little of the ordering
is specified at higher levels of the search tree, hence the bounds computed there will be
weak and pruning will be rare. Given this, the first approach has the advantage that
the branching factor is much lower near the root of the tree compared to the second
approach, e.g. 3 versus N − 3 on the third level. On the other hand, the second approach
has the advantage that more information is known about the partially specified ordering
at an interior node P , namely that all unordered probes lie to the right of the rightmost
specified probe in every node of the subtree rooted at P . We can exploit this to give
a strengthened bound at internal nodes compared to approach one. In our experiments
[11], approach two outperformed approach one by nearly a factor of two both in run
time and in number of tree nodes visited. Throughout the remainder of this paper, we
will only consider approach two.
Our branch and bound algorithm searches through nodes in a tree, pruning a node
if its cost is greater than the lowest cost found in a leaf node so far. At a leaf node in
the tree, we compute the cost of the ordering as described in Section 3.1. At an interior
node, the cost function must be a lower bound on the cost of all nodes in the subtree
to allow us to possibly prune the subtree. In this section, we describe a simple cost
function based on least squares.
Consider an interior node such as that in Figure 6. In this picture of an interior node,
the circles represent probes, and the edges represent the existence of a measurement
between two probes. Probes A, B, C, and D have been ordered (in that order). Probes
Algorithms for a Simple Point Placement Problem 39

Unordered
Ordered Probes Probes
E

A B C D

Fig. 6. An Interior Node

and F are unordered with respect to each other, but both will appear to the right of probe
D. One way of computing a lower bound on the cost for this node is to consider only the
measurements between the ordered probes. In this case, we compute the cost function
by computing the matrix solution (as described in Section 3.1) using a matrix built from
only measurements between the ordered probes. This is done by simply pretending the
other measurements do not exist, i.e., the terms in M of Equation 5 for measurements
that we are not considering are 0, and there is no contribution from them in the r vector.
We note that the cost function described here is ineffective at high levels in the tree
(where nodes will reflect probe orderings with few constraints). In particular, the cost
function described evaluates to zero for the first and second level in the tree (when only
one or two probes are ordered). However, consider the measurements in Figure 6 be-
tween ordered probes C D, and unordered probe F . Even though the position of F is
undetermined with respect to , we know that F will be to the right of D. This al-
lows us to remove the absolute value sign in the sum of squares terms of Equation 1
for the measurements between F and C D and include these terms in the cost function
computation. Thus, for an interior node, as well as considering all edges between or-
dered nodes, we can consider edges between ordered nodes and unordered nodes when
constructing the cost function for the node. This improvement potentially allows us
to compute a non-zero cost function for nodes as high as the second level in the tree
(when only two probes are ordered). With this improvement, the only constraints we
are not considering at a node are those between unordered probes. The bound function
described here is the one we use in the simulations reported in Section 6, Results.
The cost of an interior node P computed in this way will be a lower bound on the
cost of all nodes in the subtree rooted at P , since nodes in the subtree impose additional
constraints on the ordering, never remove constraints, and each additional constraint
adds additional non-negative terms to the cost function.
An additional issue which has a strong effect on the performance of our branch and
bound algorithm is initialization of the bound. Starting the algorithm with a conservative
default value for the bound (like + ) results in very poor pruning until a reasonably
good solution is encountered. Instead we first run the local search algorithm from a
few random starting orderings. Empirically, this will quickly locate a good solution,
facilitating good pruning from the beginning. In our experiments, branch and bound
removes 100-1000 times as many nodes as a result [9].
There is one remaining detail to be specified—we need to modify the construc-
tion of the M of Equation 5. As it stands, the linear system Mx = r of Section 3.1 is
40 Joshua Redstone and Walter L. Ruzzo

under-constrained (the rank of the null-space of M is non-zero). Because the system is


constructed from relative orderings between the probes, there is one degree of freedom:
the absolute position of the probes. This is remedied by modifying the system to arbi-
trarily place probe 1 at location x1 = 0. There may be additional degrees of freedom
in the solutions. In particular, at high levels in the tree the small set of ordered probes
may be partitioned into several disconnected components whose relative positions are
unconstrained. These situations are handled similarly; see [9] for details.
Finally, we remark that the cost computed by the techniques outlined above is a
lower bound, but not necessarily an attainable bound, on the cost of any ordering con-
sistent with that specified at a search tree node. In particular, in the case where the
solution to the linear system Mx = r exhibits a different ordering than the one from
which the system was constructed, we know that the bound is not attainable by the de-
sired ordering. It is still valid to use this bound to prune the search tree, since we know
the bound is attainable by some (other) ordering. However, pruning could be improved
if a higher lower bound could be computed in these cases. One possible approach to
doing so would be to use quadratic programming—minimization of the quadratic ob-
jective function in Equation 1 subject to the linear constraints in Equation 2 is a convex
quadratic optimization problem, for which polynomial time algorithms are known; see,
for example, [4]. However, it is not clear whether the increased pruning efficiency would
offset the extra computational cost of using the more elaborate quadratic programming
algorithm. Preliminary experiments have been inconclusive [11]
We now present the results of experiments performed on the heuristic search and
branch and bound algorithms.

6 Results

We ran multiple simulations to assess the performance of the two algorithms and also to
gauge the sensitivity of the algorithms to different parameters. We summarize the main
results here; see [9, 11] for further details.
The experiments described below were all run on synthetic data generated in ac-
cord with the motivating problem presented in Section 1. Probes were placed uniformly
at random, except that adjacent probes were separated by minimum distance of ap-
proximately 3% of the average spacing. Approximately 50% of the probe pairs were
“measured,” were measurement consisted of drawing a random sample from a certain
distribution whose mean was the actual distance between the probes. Data sets having
more than one connected component or certain other anomalies were filtered out. The
results do not seem to be overly sensitive to any of these parameters.
As a measure of the quality of the solution found by the algorithms, we used RMS
error—the square root of the mean squared difference between the true and calculated
positions of the probes. While this quantity varied from run to run, the median value was
10%–15% of the average interprobe distance, which is reasonably good considering the
variance of the “measurements.”
We present the total time for the branch and bound algorithm using weighted least-
squares in Figure 7, and the total time for heuristic search in Figure 8. Each point in the
graph is a problem instance.
Algorithms for a Simple Point Placement Problem 41

10000
Median Time to Perform Branch and Bound

1000

Seconds per Problem Instance


100

10

0.1

0.01
2 4 6 8 10 12 14 16 18
Problem Size

Fig. 7. Time for Branch and Bound (Seconds).

10000
Median Time Taken
Time Taken For 300 Random Starts

1000

100

10
2 4 6 8 10 12 14 16 18
Problem Size

Fig. 8. Time for Heuristic Search. Trials showing identical times spread horizontally for
clarity. (300 random starts; Milliseconds).
42 Joshua Redstone and Walter L. Ruzzo

We can see that the running time of the branch and bound algorithm is exponential,
as is expected, with time increasing roughly as 2 8N . Note that at the far right of the
graph, the most time taken to solve a problem of 18 probes was about 70 minutes. Since
the number of nodes in a search tree of a problem that size is around 1016 , we can see
that the pruning heuristic is quite effective; in fact it visited on the order of 106 nodes.
The performance of heuristic search is in some ways more difficult to assess. For a
problem size of 18, the 300 random starts of heuristic search took about 1 second. The
surprisingly stable growth rate also appears to be exponential, but grows much more
slowly, roughly as 1 2N . At this rate, problems of size 30 would be solvable in a few
minutes and problems of size 50 in under an hour. However, note that 300 random starts
is a very arbitrary choice. In most trials (> 90%), the method finds the globally optimal
solution within 10 random starts. In a few “hard core” cases, however, it can take several
thousand starts to find the global. Unfortunately, of course, using heuristic search alone,
one cannot tell when the globally optimal solution has been reached. (We compared to
the provably optimal results from branch and bound.) Nevertheless, the method seems
to be a powerful one and worth further study.
Timing experiments where performed on a 100 MHz DEC AlphaStation 200 4/100
with 96MB of memory. The C code was not optimized beyond the optimizations de-
scribed here (and in [9, 11]). In particular, the LU decomposition routine was copied
without modification from [8]. Since the process size for these algorithms was around 3
MB, and since the simulation code is CPU intensive, the time due to non-CPU activities
(such as paging) does not significantly affect the results shown.

7 Conclusions
We have presented two search algorithms, a branch and bound algorithm and a heuristic
local search algorithm, both of which attempt to minimize a weighted least-squares cost
function to solve a one dimensional point placement problem.
Due to the exponential nature of the branch and bound algorithm, it is unlikely that
it will scale to larger problem sizes. However, it does provide good performance on
problems of 18-20 probes, large enough to be of practical use. Since it finds the global
minimum, it is also useful as a benchmark against which to compare other algorithms.
The local search algorithm performed surprisingly well, finding optimal solutions
in seconds and appears capable of handling much larger problem instances.

8 Acknowledgments
We would like to thank Brian Pinkerton, Barb Trask, Ger van den Engh, Harry Ye-
ung, and the Statistics Consulting Group for their thoughtful comments and generous
assistance.

References
[1] Timothy Bishop. Linkage analysis: Progress and problems. Phil. Trans. R. Soc. Lond.,
344:337–343, 1994.
Algorithms for a Simple Point Placement Problem 43

[2] Michael Boehnke, Kenneth Lange, and David Cox. Statistical methods for multipoint ra-
diation hybrid mapping. Am. J. Hum. Genet., 49:1174–1188, 1991.
[3] Kenneth H. Buetow and Aravinda Chakravarti. Multipoint gene mapping using seriation.
I. General methods. Am. J. Hum. Genet., 41:180–188, 1987.
[4] Donald Goldfarb and Shucheng Liu. An O(n3 L) primal-dual potential reduction algorithm
for solving convex quadratic programs. Mathematical Programming, 61:161–170, 1993.
[5] Brendan Marshall Mumey. A fast heuristic algorithm for a probe mapping problem. In
Proceedings of the Fifth International Conference on Intelligent Systems for Molecular
Biology, pages 191–197, 1997.
[6] Brendan Marshall Mumey. Some Computational Problems from Genomic Mapping. PhD
thesis, Department of Computer Science and Engineering, University of Washington, 1997.
[7] William R. Newell, Richard Mott, S. Beck, and Hans Lehrach. Construction of genetic
maps using distance geometry. Genomics, 30:59–70, 1995.
[8] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian Flannery. Numerical
Recipes in C. Cambridge University Press, 1992.
[9] Joshua Redstone and Walter L. Ruzzo. Algorithms for ordering DNA probes on chro-
mosomes. Technical Report UW-CSE-98-12-04, Department of Computer Science and
Engineering, University of Washington, December 1998.
[10] Ger van den Engh, Ranier Sachs, and Barbara J. Trask. Estimating genomic distance from
DNA sequence location in cell nuclei by a random walk model. Science, 257:1410–1412,
4 September 1992.
[11] Harry Yeung and Walter L. Ruzzo. Algorithms for determining DNA sequence on chromo-
somes. Unpublished, March 1997.
Duality in ATM Layout Problems

Shmuel Zaks

Department of Computer Science, Technion, Haifa, Israel.


[email protected] nion.ac.il, ttp://www.cs.tec nion.ac.il/ zaks

Ab tract. We present certain dualities occuring in problems of design


of virtual path layouts in ATM networks. We concentrate on the one-
to-many problem on a chain network, in which one constructs a set of
paths, that enable connecting one vertex with all others in the network.
We consider the parameters of load (the maximum number of paths that
go through any single edge) and hop count (the maximum number of
paths traversed by any single message). Optimal results are known for
the cases where the routes are shortest paths and for the general case
of unrestricted paths. These solutions are symmetric with respect to the
two parameters of load and hop count, and thus suggest duality between
these two.
We discuss these dualities from various points of view. The trivial one
follows from corresponding recurrence relations. We then present var-
ious one-to-one correspondences. In the case of shortest paths layouts
we use binary trees and lattice paths (that use horizontal and vertical
steps). In the general case we use ternary trees, lattice paths (that use
horizontal, vertical and diagonal steps), and high dimensional spheres.
These correspondences shed light on the structure of the optimal solu-
tions, and simplify some of the proofs, especially for the optimal average
case designs.

1 Introduction
In this paper we study path layouts in ATM networks, in which pairs of nodes
exchange messages along pre-de ned paths in the network, termed virtual paths.
Given a physical network, the problem is to design these paths optimally. ach
such design forms a layout of paths in the network, and each connection between
two nodes must consist of a concatenation of such virtual paths. The smallest
number of these paths between two nodes is termed the hop count for these
nodes, and the load (or congestion) of a layout is the maximum number of vir-
tual paths that go through any (physical) communication line. The two principal
parameters that determine the optimality of the layout are the maximum con-
gestion of any communication line and the maximum hop count between any two
nodes. The hop count corresponds to the time to set up a connection between
the two nodes, and the congestion measures the load of the routing tables at the
nodes.
Two problems that have been recently studied are the one-to-all (or broad-
cast) problem (e.g., [CGZ94, GWZ95, G95, DFZ97]), and the all-to-all problem

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 44 58, 2000.

c Springer-Verlag Berlin Heidelberg 2000
Duality in ATM Layout Problems 45

(see, e.g., [CGZ94, G95, KKP95, SV96, ABCRS99, DFZ97]), in which one wishes
to measure the hop count from one speci ed node (or all nodes) in the network
to all other nodes. In what follows we always consider the one-to-all problem.
Given bounds on the load L and the hop count H between a given node
termed root and all the other nodes in a layout, we look for the maximum number
of nodes for which such a solution exists, satisfying these bounds. Considering
a chain network, where the leftmost vertex has to be the root, and where each
path traversed by a message must be a shortest path, a family of ordered trees
Tshort (L H ) was presented in [GWZ95], within which an optimal solution can
+H 
be found, for a chain of length N , with N bounded by L L . This number,
which is symmetric in H and L , is also equal to the number of lattice paths from
(0 0) to (L H ), that use horizontal and vertical steps. Optimal bounds for this
shortest path case were also derived for the average case , which also turned out
to be symmetric in H and L .
Considering the same problem but without the shortest path restriction,
termed the general path case, a family of tree layouts T (L H ) was introduced
in [DFZ97], for a chain of length N , not assuming that the root is located at its
PminfL H g i L  H 
leftmost vertex, and with N bounded by i=0 2 i i [GW70]. This
number, which is also symmetric in H and L , is equal to the number of lattice
points within an L -dimensional l1 -Sphere or radius H , and is also equal to the
number of lattice paths from (0 0) to (L H ), that use horizontal, vertical or
(up-)diagonal steps.
As a consequence, the trees T (L H ) and T (H L ) have the same number
of nodes, and so do the trees Tshort (L H ) and Tshort (H L ). In this paper
we use one-to-one correspondences, using binary and ternary trees, in order to
combinatorially explain the duality between these two measures of hop count
and load, as reflected by these above symmetries. These correspondences shed
more light into the structure of these two families of trees, allowing to nd for
any optimal layout with N nodes, load L and minimal (or minimal average) hop
count H , its dual layout, having N nodes, maximal hop count L and minimal
(or minimal average) load H , and vice-versa. Moreover, they give one proof for
both measures, whereas in the above-mentioned papers these symmetries were
only derived as a consequence of the nal result; we note that the average-case
results were derived by a seemingly-di erent formulas, whereas the worst-case
results were derived by symmetric arguments. In addition, these correspondences
also provide a simple proof to a new result concerning the duality of these two
parameters in the worst case and the average case analysis for the general path
case layouts. Finally, it is shown that an optimal worst case solution for the
shortest path and general cases, is also an optimal average case solution in both
cases, allowing a simpler characterization of these optimal layouts.
This paper surveys results from various papers. In Section 2 the ATM model
is presented, following [CGZ94]. In Section 3 we discuss the optimal solutions;
the optimal design for the shortest path case follows the discussion in [GWZ95],
and the optimal design for the general case follows the discussion in [DFZ97,
F98]. We encounter the duality of the parameters of load and hop count, which
46 Shmuel Zaks

follows via recurrence relations. In Section 4 we describe the use of binary


and ternary trees to shed more direct light on these duality results; this follows
[DFZ97, F98]. Lattice paths and spheres are then used in Section 5 to supply
additional points of view for these dualities, following ( [DFZ97, F98]). We close
with a discussion in Section 6.

The Model
We model the underlying communication network as an undirected graph G =
(V E), where the set V of vertices corresponds to the set of switches, and the
set E of edges corresponds to the physical links between them.

De nition 1. A rooted virtual path layout (layout for short) Ψ is a collection


of simple paths in G, termed virtual paths ( VP s for short), and a vertex V
termed the root of the layout (denoted oot(Ψ )).

De nition . The load L(e) of an edge e E in a layout Ψ is the number of


VP s Ψ that include e.

De nition 3. The load Lmax (Ψ ) of a layout Ψ is maxe2 L(e).

De nition 4. The hop count H(v) of a vertex v V in a layout Ψ is the


minimum number of VP s whose concatenation forms a path in G from v to
oot(Ψ ). If no such VP s exist, de ne H(v) = .

De nition 5. The maximal hop count of Ψ is Hmax (Ψ ) = maxv2V H(v) .

In the rest of this paper we assume that the underlying network is a chain.
We consider two cases: the one in which only shortest paths are allowed, and the
second one in which general paths are considered.
To minimize the load, one can use a layout Ψ which has a VP on each
physical link, i.e., Lmax (Ψ ) = 1, however such a layout has a hop count of
N − 1. The other extreme is connecting a direct VP from the root to each
other vertex, yielding Hmax = 1, but then Lmax = N − 1. For the intermediate
cases we need the following de nitions.

De nition 6. Hopt (N L ) denotes the optimal hop count of any layout Ψ


on a chain of N vertices such that Lmax (Ψ ) L , i.e., Hopt (N L )
minΨ Hmax (Ψ ) : Lmax (Ψ ) L

De nition 7. Lopt (N H ) denotes the optimal load of any layout Ψ on a chain


of N vertices such that Hmax (Ψ ) H , i.e., Lopt (N H ) minΨ Lmax (Ψ ) :
Hmax (Ψ ) H
Duality in ATM Layout Problems 47

De nition 8. Two VP s constitute a crossing if their endpoints l1 l2 and 1 2


satisfy l1 < l2 < 1 < 2 . A layout is called crossing-free if no pair of VP s
constitute a crossing.

It is known ([GWZ95, ABCRS99]) that for each performance measure (Lmax ,


Hmax , Lavg , Havg ) there exists an optimal layout which is crossing-free. In the
rest of the paper we restrict ourselves to layouts viewed as a planar (that is,
crossing-free) embedding of a tree on the chain, also termed tree layouts. There-
fore, when no confusion occurs, we refer to each VP in a given layout Ψ an edge
of Ψ .
Nshort (L H) denotes the length of a longest chain in which one node can
broadcast to all others, with at most H hops and a load bounded by L , for the
case of shortest paths. The similar measure for the general case is denoted by
N (L H) .

3 Optimal Solutions and Their Duality

In this section we present the optimal solutions for layouts, when messages have
to travel either along shortest paths or general paths. We’ll show the symmetric
role played by the load and hop count, and explain it via the corresponding
recurrence relations.

3.1 Optimal Virtual Path for the Shortest Path Case

Assuming that the leftmost node in the chain has to broadcast to each node to
its right, it is clear that, for given H and L , the largest possible chain for which
such a design exists is like the one shows in Fig. 1.

Tshort (L − 1 H ) Tshort (L H − 1)

Fig. 1. The tree layout Tshort (L H )

Recall that Mshort (L ,H ) is the length of the longest chain in which a design
exists, for a broadcast from the leftmost node to all others, for given parameters
H and L . Mshort (L ,H ) clearly satis es the following recurrence relation:

Mshort (0 H ) = Mshort (L 0) = 1 H L 0 (1)


Mshort (L H ) = Mshort (L H − 1) + Mshort (L − 1 H ) H L >0
48 Shmuel Zaks

It easily follows that


 
L +H
Mshort (L H ) = (2)
H
This design is clearly symmetric in H and L , which establishes the rst result
in which the load and hop count play symmetric roles.
Note that it is clear that the maximal number of nodes in a chain ,
Nshort (L H) , to which one node can broadcast using shortest paths, satis es
 
L +H
Nshort (L H) = 2 − 1 (3)
H

The above discussion, and Fig. 1, clearly give rise to the trees Tshort (L H )
de ned as follows.
De nition 9. The tree layout Tshort (L H ) is de ned recursively as follows.
Tshort (L 0) and Tshort (0 H ) are tree layouts with a unique node. Otherwise,
the root of a tree layout Tshort (L H ) is the leftmost node of a Tshort (L − 1 H )
tree layout, and it is also the leftmost node of a tree layout Tshort (L H − 1)
Using these trees, it is easy to show that Lmax (Tshort (L H )) = L and
Hmax (Tshort (L H )) = H . The following two theorems follow:
Theorem 1. Consider a chain of N vertices and a maximal load requirement
L . Let H be such that
   
L +H −1 L +H
<N
L L

Then Hopt (N L ) = H .

Theorem . Consider a chain of N vertices and a maximal hop requirement


H . Let L be such that
   
L +H −1 L +H
<N
H H

Then Lopt (N H ) = L .

Optimal bounds were also derived in [GWZ95, GWZ97] for the average case,
using dynamic programming; the results use di erent recursive constructions,
but end up in structures that are symmetric in H and L . These results are
stated as follows:

Theorem 3. Let n and H be given. Let L be the largest integer such that N
L +H , and let = N − L +H . Then
L L
 
L +H
Ltot (N H ) = H + (L + 1)
L −1
Duality in ATM Layout Problems 49

Theorem 4. Let N and L be given. Let H be the maximal such that N


L +H , and let = N − L +H . Then
H H
 
L +H
Htot (N L ) = L + (H + 1)
H −1

3. Optimal Virtual Path for the General Case


In the case where not only shortest paths are traversed, a new family of optimal
tree layouts T (L H ) is now presented.
De nition 10. The tree layout T (L H ) is de ned recursively as follows.
Tright (L 0), Tright (0 H ), Tlef t (L 0) and Tlef t (0 H ) are tree layouts with a
unique node. Otherwise, the root is also the rightmost node of a tree layout
Tright (L H ) and the leftmost node of a tree layout Tlef t (L H ), when the tree
layouts Tlef t (L H ) and Tright (L H ) are also de ned recursively as follows. The
root of a tree layout Tlef t (L H ) is the leftmost node of a Tlef t (L − 1 H ) tree
layout, and it is also connected to a node which is the root of a tree layout
Tright (L − 1 H − 1) and a tree layout Tlef t (L H − 1) (see Fig. 2). Note that
the root of Tlef t (L H ) is its leftmost node. The tree layout Tright (L H ) is de-
ned as the mirror image of Tlef t (L H ).

Tlef t (L − 1 H ) Tright (L − 1 H − 1) Tlef t (L H − 1)

Fig. . Tlef t (L H ) recursive de nition

Denote by N (L ,H ) the longest chain in which it is possible to connect one


node to all others, with at most H hops and the load bounded by L . From the
above, it is clear that this chain is constructed from two chains as above, glued
at their root. N (L ,H ) clearly satis es the following recurrence relation:

N (0 H ) = N (L 0) = 1 H L 0 (4)
N (L H ) = N (L H − 1) + N (L − 1 H ) + N (L − 1 H − 1) H L > 0

Again, the symmetric role of the hop count and the load are clear both from
the de nition of the corresponding trees and from the recurrence relations that
compute their sizes.
It is known ( [GW70]) that the solution to the recurrence relation (4) is given
by
50 Shmuel Zaks

minfL
XH g   
L H
N = 2i (5)
i=0
i i

4 Duality: Binary Trees and Ternary Trees

We saw in Section 3 that the layouts Tshort (L H ) and Tshort (H L ) and also
T (L H ) and T (H L ) have the same number of vertices. We now turn to show
that each pair within these virtual path layouts are, actually, quite strongly
related. In Section 4.1 we deal with layouts that use shortest-length paths, and
show their close relations to a certain class of binary trees, and in Section 4.2
we deal with the general layouts and show their close relations to a certain class
of ternary trees.

4.1 Tshort (L H ) and Binary Trees

In this section we show how to transform any layout Ψ with hop count bounded
by H and load bounded by L for layouts using only shortest paths, into a
layout Ψ (its dual) with hop count bounded by L and load bounded by H . In
particular, this mapping will transform Tshort (L H ) into Tshort (H L ).
To show this, we use transformation between any layout with x edges ( VP s)
and binary trees with x nodes (in a binary tree, each internal node has a
left child and/or a right child). We’ll derive our main correspondence between
+H 
Tshort (H L ) and Tshort (L H ) for x = N − 1, where N = L L . Our corre-
spondence is done in three steps, as follows.
Step 1: Given a planar layout Ψ we transform it into a binary tree T = b(Ψ ),
under which each edge e is mapped to a node b(e), as follows. Let e = ( v) be
the edge outgoing the root to the rightmost vertex (to which there is a VP;
we call this a 1-level edge). This edge e is mapped to the root b( ) of T . Remove
e from Ψ . As a consequence, two layouts remain: Ψ 1 with root and Ψ 2 with
root v, when their roots are located at the leftmost vertices of both layouts.
Recursively, the left child of node b(e) will be b(Ψ 1 ) and its right child will be
b(Ψ 2 ). If any of the layouts Ψ is empty, so is its image b(Ψ ) (in other words, we
can stop when a Ψ that consists of a single edge is mapped to a binary tree that
consists of a single vertex).
Step : Build a binary tree T , which is a reflection of T (that is, we exchange
the left child and the right child of each vertex).
Step 3: We transform back the binary tree T into the (unique) layout Ψ such
that b(Ψ ) = T

xample 1. In Fig. 3 the layouts for L = 2 H = 3 and L = 3 H = 2 are shown,


together with the corresponding trees Tshort (2 3) and Tshort (3 2), and the corre-
sponding binary trees constructed as explained above. The edge e in the layout
Tshort (3 2) is assigned the vertex b(e) in the corresponding tree b(Tshort (3 2)).
Duality in ATM Layout Problems 51

Tshort (3 2) Tshort (2 3)
Load 3 3 2 3 2 1 3 2 1 2 2 2 1 2 2 1 2 1

e
Layouts: (e0 )
v(e)
e’
Hop count 1 2 1 2 2 1 2 2 2 1 2 3 1 2 3 2 3 3

T = b(Tshort (3 2)) b(Tshort (2 3))


1 1 1 1

2 1 1 2 2 1 1 2
Binary trees: v b(e) 2 2 2 2 1 3
dL
T (v) 3 1 2 2 2 2
dR
T (v)
3 23 2 3 2 2 3 2 32 3

Fig. 3. An example of the transformation using binary trees

Given a non-crossing layout Ψ , we de ne the level of an edge e in Ψ , denoted


levelΨ (e) (or level(e) for short), to be one plus the number of edges above e in
Ψ . In addition, to each edge e of the layout Ψ we assign its farthest end-point
from the root, v(e).

xample 2. In Fig. 3 the edge e in the layout Tshort (3 2) is assigned the vertex
v(e) in this layout, and its level level(e) is 2.

One of our key observations is the following theorem:

Theorem 5. For every H and L , the trees b(Tshort (L H )) and b(Tshort (H L ))


are reflections of each other.

This clearly establishes a one-to-one mapping between these trees, and thus
establishes the required duality.
To further investigate the structure of these trees, we now turn to explore the
properties of the binary trees that we have de ned above. We prove the following
theorem:

Theorem 6. Given a layout Ψ , let T = b(Ψ ) be the binary tree assigned to it by


the transformation above. Let dL R
T (v) (dT (v)) be equal to one plus the number of
left (right) steps in the path from the root to v, for every node v in T . Then,
for every edge e in the layout Ψ :
52 Shmuel Zaks

1. HΨ (v(e)) = dR T (b(e)), and


L
2. level(e) = dT (b(e)).

Given a non-crossing layout Ψ , for each physical link e0 we assign an edge


0
(e ) in Ψ that includes it and is of highest level (such a path exists due to the
connectivity and planarity of the layout; see edge e0 and physical edge (e0 ) in
Fig. 3). It can be easily proved that:

Lemma 1. Given a non-crossing tree layout Ψ , the mapping of a physical link


e0 to an edge (e0 ) described above is one-to-one.

Proposition 1. Given a non-crossing tree layout Ψ over a physical network, let


T = b(Ψ ) be the binary tree assigned to it. Then L(e0 ) = level( (e0 )) for every
edge e0 in the physical network.

Given a layout Ψ over a chain network, if we consider the multiset dR T (v) v


b(Ψ ) we get exactly the multiset of hop counts of the vertices of this network
(by Theorem 6), and if we consider the multiset dL T (v) v b(Ψ ) we get exactly
the multiset of loads of the physical links of this network (by Theorem 6 and
Proposition 1). By using this and nding the dual layout Ψ with the multisets
dR
T (v) v b(Ψ ) of hop counts of its vertices and dL T (v) v b(Ψ ) of loads of
its physical edges of Ψ , we observe that the multiset of hop counts of Ψ is exactly
the multiset of load of Ψ, and the multiset of loads of Ψ is also the multiset of
hop counts of Ψ , thus deriving a complete combinatorial explanation for the
symmetric results of Section 3.1 for either the worst case trees or average case
trees:

Theorem 7. Given an optimal layout Ψ with N nodes, load bounded by L and


optimal hop count Hopt (N L ), its dual layout Ψ has N nodes, hop count bounded
by L and optimal load Hopt (N L ).

Theorem 8. Given an optimal layout Ψ with N nodes, hop count bounded by


H and optimal load Lopt (N H ), its dual layout Ψ has N nodes, load bounded by
H and optimal hop count Lopt (N H ).

Theorem 9. Given an optimal layout Ψ with N nodes, load bounded by L and


optimal average hop count, its dual layout Ψ has N nodes, hop count bounded by
L and optimal average load.

Theorem 10. Given an optimal layout Ψ with N nodes, hop count bounded by
H and optimal average load, its dual layout Ψ has N nodes, load bounded by
H and optimal average hop count.
Duality in ATM Layout Problems 53

4. T (L H ) and Ternary Trees

We now extend the technique developed in Section 4.1 to general path case
layouts; we show how to transform any layout Ψ with hop count bounded by H
and load bounded by L into a layout Ψ (its dual) with hop count bounded by
L and load bounded by H . In particular, this mapping will transform T (L H )
into T (H L ).
To show this, we use transformation between any layout with x edges ( VP s)
and ternary trees with x nodes (in a ternary tree, each internal node has a left
child and/or a middle child and/or a right child). Our correspondence is done
in three steps, as follows.
Step 1: Given a planar layout Ψ we transform it into a ternary tree T = t(Ψ ),
under which each edge e is mapped to a node t(e), as follows. Let e = ( v) be
the edge outgoing the root to the rightmost vertex (to which there is a VP; we
call this a 1-level edge). This edge e is mapped to the root t( ) of T . Remove e
from Ψ . As a consequence,three layouts remain: Ψ 1 with root and and Ψ 3 with
root v (when their roots are located at the leftmost vertices of both layouts)
and Ψ 2 with root v (when v is its rightmost vertex). Recursively, the left child
of node t(e) will be t(Ψ 1 ), its middle child will be t(Ψ 2 ) and its right child will
be t(Ψ 3 ). If any of the layouts Ψ is empty, so is its image t(Ψ ) (in other words,
we can stop when a Ψ that consists of a single edge is mapped to a ternary tree
that consists of a single vertex).
Step : Build a ternary tree T , which is a reflection of T (that is, we exchange
the left child and the right child of each vertex; the middle child does not change).
Step 3: We transform back the ternary tree T into the (unique) layout Ψ such
that t(Ψ ) = T
See Fig. 4 for an example of this transformation.

Tlef t (3 2) Tlef t (2 3)
Loads 3 3 2 3 3 2 1 2 3 3 2 1 2 2 2 1 2 2 2 2 1 2 2 1

Layout : e

v(e)
Hop Counts 1 2 2 1 2 2 2 2 1 2 2 2 1 2 3 3 2 1 2 3 3 2 3 3

T = (Tlef t (3 2)) (Tlef t (2 3))


1 1 1 1

Ternary tree : (e)


2 1 2 2 1 2 2 1 2 2 1 2
v
dLM
T (v) 3 13 22 2 3 2 2 2 2 2 2 3 2 22 3 1 3
dRM
T (v)
3 23 2 3 2 2 3 3 32 3

Fig. 4. An example of the transformation using ternary trees


54 Shmuel Zaks

One of our key observations is the following theorem:

Theorem 11. For every H and L , the trees t(T (L H )) and t(T (H L )) are
reflections of each other.

This clearly establishes a one-to-one mapping between these trees, and thus
establishes the required duality.
To further investigate the structure of these trees, we now turn to explore
the properties of the ternary trees that we have de ned above. We prove the
following theorem. Note that the de nition of level (of an edge) and (of a
physical link) remain exactly the same as in Section 4.1.

Theorem 1 . Given a layout Ψ , let T = t(Ψ ) be the ternary tree assigned to


it by the transformation above. Let dLM RM
T (v) (dT (v)) be equal to one plus the
number of left and middle (right and middle ) steps in the path from the root
to v, for every node v in T . Then, for every edge e in the layout Ψ :
1. HΨ (v(e)) = dRM
T (t(e)), and
2. level(e) = dLM
T (t(e)).

Proposition . Given a non-crossing tree layout Ψ over a physical network, let


T = t(Ψ ) be the ternary tree assigned to it. Then L(e0 ) = level( (e0 )) for every
edge e0 in the physical network.

Given a layout Ψ over a chain network, if we consider the multiset


dRM
T (v) v t(Ψ ) we get exactly the multiset of hop counts of the vertices of
this network (by Theorem 12), and if we consider the multiset dLM T (v) v t(Ψ )
we get exactly the multiset of loads of the physical links of this network (by The-
orem 12 and Proposition 2). By using this and nding the dual layout Ψ with the
multisets dRMT (v) v t(Ψ ) of hop counts of its vertices and dLMT (v) v t(Ψ )
of loads of its physical edges of Ψ , we observe that the multiset of hop counts of
Ψ is exactly the multiset of load of Ψ , and the multiset of loads of Ψ is also the
multiset of hop counts of Ψ , thus deriving a complete combinatorial explanation
for the symmetric results of either the worst-case trees or average-case trees in
the general path case.
Following the above discussion, we obtain the exact four theorems (Theorems
7, 8, 9 and 10) extended to the general path case layouts.

5 Duality: Lattice Paths and High-Dimensional Spheres


5.1 Lattice Paths
The recurrence relation (1) clearly corresponds to the number of lattice paths
from the point (0,0) to the point (L ,H ), that use only horizontal (right) and
vertical (up) steps.
In Fig. 5 each lattice point is labeled with the number of lattice paths from
(0,0) to it; the calculation is done by the recurrence relation 1. For the case L =
Duality in ATM Layout Problems 55

hops

3 6 10
1 (3, )

1 3 4

1 load
(0,0) 1 1 1

Fig. 5. Lattice paths with regular steps


3 and H = 2 one gets 3+2 2 = 10; this corresponds to the number of nodes in
the tree Tshort (3 2) (see Fig. 3), and to the number of paths that go from (0,0)
to (3,2).
The recurrence relation (4) clearly corresponds to the number of lattice paths
from the point (0,0) to the point (L ,H ), that use horizontal (right), vertical (up),
and diagonal (up-right) steps. In Fig. 6 each lattice point is labeled with the
number of lattice paths from (0,0) to it. For the case L = 3 and H = 2 one gets
25 such paths. This corresponds to the number of nodes in the tree T (3 2)

hops
(3, )
5 13 5
1

1 3 5 7

1 load
(0,0) 1 1 1

Fig. 6. Lattice paths with regular and diagonal steps

(see Fig. 7), that is constructed of two trees, glued at their roots, the one depicted
in Fig. 3 (and containing 13 vertices), and its corresponding reverse tree.
We also refer to these lattice paths in Section 5.2.
56 Shmuel Zaks

Fig. 7. The tree T (3 2)

5. Spheres
Consider the set of lattice points (that is, points with integral coordinates)
of an L -dimensional l1 -Sphere of radius H . The points in this sphere are L -
dimensional vectors v = (v1 v2 vL ), where v1 + v2 + + vL H . Let
Sp(L H ) be the number of lattice points in this sphere. Let Rad(N L ) be
the radius of the smallest L -dimensional l1 -Sphere containing at least N internal
lattice points.
It can be shown that
Theorem 13. The tree T (L H ) contains Sp(L H ) vertices.
The exact number of points in this sphere is given by equation (5). (This was
studied, in conection with codewords, in [GW70].)
Moreover, we can show that
Theorem 14. Consider a chain of N vertices and a maximal load requirement
L . Then Hopt (N L ) = Rad(N L ) .
This theorem is proved by showing a one-to-one mapping between the nodes
of any layout with hop count bounded by H and load bounded by L into the
L -dimensional spheres of radius H . This mapping turns out to be very useful
in the analysis of this and related analytical results (see also Section 6).
Using the above correspondences and discussion, it is possible to show that,
for either the shortest paths case or the general case, any optimal layout Ψ with
N nodes, load bounded by L and optimal hop count, has also optimal average
hop count regarding layouts with load bounded by L , and that any optimal
layout Ψ with N nodes, hop count bounded by H and optimal load, has also
optimal average load regarding layouts with hop count bounded by H .
We now sketch a one-to-one mapping between the set of lattice points of
the L -dimensional sphere of radius H and the set of lattice paths from (0 0)
to (L H ) that use horizontal, vertical or (up-)diagonal steps. We rst describe
a function which maps every vector v = (v1 vL ) in SP(L H ) into such a
lattice path. Starting from (0 0) make v1 vertical steps and one horizontal step,
make v2 vertical steps and one horizontal step,..., make vL vertical steps and
P
one more horizontal step, ending with H − i=l i=1 vi horizontal steps. After that,
for every negative vi component of v, we replace the vi th vertical step and
the subsequent horizontal step done during the translation of this component
by an (up-)diagonal step. A close look at the properties of these paths enables
Duality in ATM Layout Problems 57

us to further explore the properties of these trees. Returning to the discussion


of the layouts Tshort (L H ) that use only shortest paths, it is possible to nd a
similar correspondence between the vertices of these trees and lattice paths from
(0 0) to (L H ) that use only vertical and horizontal steps, and to view some
properties of these trees using these lattice paths.

6 Discussion
We presented the dual role played by the parameters of load and hop count in
optimal designs of virtual path layouts in ATM chain networks, for the cases
of shortest paths routes and the general case. We discussed these dualities with
the aid of recurrence relations, one-to-one correspondences with binary trees (for
the shortest paths case) and ternary trees (for the general case), lattice paths
(that use horizontal and vertical steps for the shortest path case, and that also
use diagonal steps for the general case), and high dimensional spheres (in the
general case). These dualities shed light on the structure the optimal solutions,
and simplify some of the proofs.
It might be of interest to further explore such duality relations between these
and corresponding parameters (such as load measured at vertices (e.g., [FNP97])
also for other topologies (such as trees ([CGZ94, G95]), meshes or planar graphs
([BG97, BBGG97, G95]); one might also consult the survey in [Z97] for a general
discssion of these extensions.
Of special interest in the use of geometry, presented in Section 5.2. This
approach provides a clear view for the structure of the solution, and enabled
improving results for the all-to-all problem on a ring network (see [DFZ97] for
details).
Acknowledgments: I would like to thank my coauthors (Marcelo Feighelstein,
Ori Gerstel, Avishai Wool, Israel Cidon and Ye m Dinitz), and Renzo Sprug-
noli, Donatella Merlini, Cecilia Verri, Ron Aharoni and Noga Alon for helpful
discussions.

References
[ABCRS99] W. Aiello, S. Bhatt, F. Chung, A. Rosenberg, and R. Sitaraman, Aug-
mented rings networks, Proceedings of the 6th International Colloquium
on Structural Information and Communication Complexity (SIROCCO),
Lacanau-Ocean, France, 1999, pp. 1-16.
[BG97] L. Beccheti and C. Gaibisso, Lower bounds for t e virtual pat layout problem
in ATM networks, Proceedings of the 24th Seminar on Theory and Practice of
Informatics (SOFS M), Milovny, The Czech Republic, November 1997, pp.
375-382.
[BBGG97] L. Becchetti, P. Bertolazzi, C. Gaibisso and G. Gambosi, On t e design of
e cient ATM routing sc emes, submitted, 1997.
[CGZ94] I. Cidon, O. Gerstel and S. Zaks, A scalable approac to routing in ATM
networks. 8th International Workshop on Distributed Algorithms (WDAG),
Lecture Notes in Computer Science 857, Springer Verlag, Berlin, 1994, pp.209-
222.
58 Shmuel Zaks

[DFZ97] Ye. Dinitz, M. Feighelstein and S. Zaks, On optimal grap embedded into
pat and rings, wit analysis using l1 -sp eres, 23th International Workshop
on Graph-Theoretic Concepts in Computer Sciences (WG), Berlin, Germany,
June 1997.
[F98] M. Feighelstein, Virtual pat layouts for ATM networks wit unbounded
stretc factor, , M.Sc. Dissertation, Department of Computer Science, Tech-
nion, Haifa, Israel, May 1998.
[FNP97] M. Flammini, . Nardelli and G. Proietti, ATM layouts wit bounded op
count and congestion, Proceedings of the 11th International Workshop on Dis-
tributed Algorithms (WDAG), Saarbrüecken, Germany ,September 1997, pp.
24-26.
[FZ98] M. Feighelstein and S. Zaks, Duality in c ain ATM virtual pat layouts, Pro-
ceedings of the 4th International Colloquium on Structural Information and
Communication Complexity (SIROCCO), Monte Verita, Ascona, Switzerland,
July 24-26, 1997, pp. 228-239.
[G95] O.Gerstel, Virtual Pat Design in ATM Networks, Ph.D. thesis, Department
of Computer Science, Technion, Haifa, Israel, December 1995.
[GW70] S. W. Golomb and L. R. Welch, Perfect Codes in t e Lee Metric and t e
Packing of Polyominoes. SIAM Journal on Applied Math.,vol.18,no.2, January,
1970, pp. 302-317.
[GWZ95] O. Gerstel, A. Wool and S. Zaks, Optimal layouts on a chain ATM network,
Discrete Applied Mathematics , special issue on Network Communications, 83,
1998, pp. 157-178.
[GWZ97] O. Gerstel, A. Wool and S. Zaks, Optimal Average-Case Layouts on C ain
Networks, Proceedings of the Workshop on Algorithmic Aspects of Commu-
nication, Bologna, Italy, July 11-12, 1997.
[KKP95] . Kranakis, D. Krizanc and A. Pelc, Hop-congestion tradeo s for ATM
networks, 7th I Symp. on Parallel and Distributed Processing, pp. 662-
668.
[SV96] L. Stacho and I. Vrt’o, Virtual Pat Layouts for Some Bounded Degree Net-
works. 3rd International Colloquium on Structural Information and Commu-
nication Complexity (SIROCCO), Siena, Italy, June 1996.
[Z97] S. Zaks, Pat Layout in ATM Networks, The DIMACS Workshop on Networks
in Distributed Computing, DIMACS Center, Rutgers University, October 26-
29, 1997, pp. 145-160.
The Independence Number of Random Interval
Graphs

W. Fernandez de la Vega

Laboratoire de Recherche en Informatique


Universite de Paris-Sud, bat 490
91405 Orsay Cedex
lalo@lr .fr

Ab tract. It is proved, sharpening previous results of Scheinerman and


by analysing an algorithm, that the independence number of the random
interval graph, de ned as the intersection graph of n intervals whose end
points are chosen at random on [0,1], concentrates around 2 n .

Key Words: Random Graphs, Analysis of Algorithms, Probabilistic Methods

1 Introduction
Scheinerman de ned [1], a random interval graph on n vertices as the intersection
graph of n intervals [X1 Y1 ] [Xi Yi ] [Xn Yn ] whose end points are chosen
at random on [0,1]. Hence here we start with 2n independent random variables
Z 1 Z2 Z2n independently and uniformly distributed on [0 1] and we put, for
1 i n, Xi = min Z2i−1 Z2i and Yi = max Z2i−1 Z2i . Scheinerman
derived many interesting properties of these graphs. Here we answer one of the
questions that Scheinerman left open, namely we derive an asymptotic equivalent
for the independence number of these graphs. The main ingredient in our proof
is the analysis of an algorithm. Recall that the independence number of a graph
is the maximum cardinality of a subset of vertices which span no edge.
Theorem Let Gn denote t e random grap de ned as t e intersection grap
of n intervals w ose end points are c osen at random on [0,1]. T e independence
number (Gn ) of t is grap satis es

(Gn )
1
2 n

in probability as n .

Proof of the Theorem


The proof uses a greedy algorithm. Let again [X1 Y1 ] [Xi Yi ] [Xn Yn ]
denote our random intervals. We call Xi (resp. Yi ) the left (resp. the right) end of

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 59 62, 2000.

c Springer-Verlag Berlin Heidelberg 2000
60 W. Fernandez de la Vega

[Xi Yi ]. We put Ii = [Xi Yi ]. We also denote by Ii the vertex of Gn corresponding


to the interval Ii . We say that [Xj Yj ] lies at the right of [Xi Yi ] if we have
Xj > Yi . It is clear that we can assume that the leftmost interval representing
a vertex of an independent set of maximum cardinality is the interval Ii with
leftmost rigth end. This implies by induction that the independent set J1 J
of Gn given by following algorithm has maximum cardinality (here each Jk is
equal to some Il ).
Algorithm Al n
1. De ne J1 as the interval Ii with leftmost right extremity and set k = 1.
2. If there is no interval Ii lying at the right of Jk put (Gn ) = k and stop.
lse de ne Jk+1 as the interval Ii lying at the right of Jk which has the leftmost
right extremity, set k = k + 1 and go to 2.
This concludes the description of our algorithm.

.1 Some Preliminary Results

We begin by restating for ease of reference two well known inequalities concerning
the tail of the Binomial distribution.
Fact 1(Hoe ding-Cherno bounds) Let Sn p denote the sum of n 0 1 val-
ued independent random variables X1 Xn with P [Xi = 1] = p 1 i n.
We have then, for 0 1,
2
P [Sn p (1 − )np] e− np 3
(1)

and,
2
P [Sn p (1 + )np] e− np 2
(2)
We will need the following easy results concerning the distribution of the Ij ’s.
Fact Suppose that I1 Im are m random intervals contained in the seg-
ment J = [0 l] and let x denote the largest distance between the right end of J
and the right ends of the Ii ’s. We have
l
(x)
2 m
and,
l2
(x2 )
m
Proof. The probability that the right end of I1 lies at a distance greater than x
of the left end of J is equal to 1 − (x l)2 . The probability that this is true for
every interval Ii is thus equal to [1 − (x l)2 ]m . Hence we have
l l
(x) = − td[1 − (t l)2 ]m = [1 − (t l)2 ]m dt
0 0
l
mt2 l
exp − 2 dt
0 l 2 m
The Independence Number of Random Interval Graphs 61

and
l l
(x2 ) = − t2 d[1 − (t l)2 ]m = 2 [1 − (t l)2 ]m tdt
0 0
l
mt2 l2
2 e− l2 tdt
0 m

Fact 3 Let m(t) denote the number of intervals Ii which lie completely in
the interval [1 − t 1]. We have, with probability 1-o(1),

nt2 − 5t n log n m(t) nt2 + 5 n log n n−1 2


t 1

Proof. Use the fact that m(t) is for xed t a binomial random variable with
parameters n and p = t2 .

. Analysis of the Algorithm Al n

We set xo = 1 and for each i 1 we denote by xi the distance between the


rightmost extremity of the interval Ji and the right extremity of [0 1]. We denote
by ni the number of intervals Ij which lie at the right of xi .(Here xi and ni are
random variables which are de ned for each value of i which does not exceed
the independence number). Let us rst observe that, since the restriction of the
uniform distribution on [0 1] to any subinterval is again uniform, it follows that,
conditionally on xi and ni , the ni intervals Ij which are contained in [xi 1] are
independently and uniformly distributed on this interval.
Let us denote by Bi the - eld generated by the random variables xo no
x1 n1 xi ni . Let us de ne

io = max j : m(xi ) nxi 2 − xi n log n 1 i j

and jo as the last value (if any) of the index i for which the inequality xi
1
n− 4 7 log n is satis ed. If there is no such a value we set jo = n. Let ko =
min io jo We de ne a new process (yi ni ) by putting yi = xi if i ko and
yi = xko (= yko ) if not. Obviously this new process is also measurable relatively
to the family of - elds (Bi ). Since the conditional expectation of the di erence
i+1 = yi − yi+1 is, for a given yi , a decreasing function of ni , we have, for
i io − 1, using fact 2 with l = yi m = nyi 2 − yi n log n,

yi 1
B ( i+1 )
2 nyi 2 − yi 7n log n 2 n(1 − n−1 4 )

and this inequality is obviously also true for i ko since then i+1 vanishes. It
implies that the sequence (zi ) de ned by

i
zi = yi + (3)
2 n(1 − n−1 4 )
62 W. Fernandez de la Vega

is a supermartingale relatively to the familly of - elds (Bi ). Let us put l =


−1 4 n(1−n−1 4
) 1
2(1 − 4n log 1 2
n)( ) We have
2

l
1
V ar zl V ar yi Cn− 2
i=1

since, by facts 2 and 3, each of the terms in this sum is bounded above by
Cn−(1 2) where C is an absolute constant. Observing that zi 1 and using
Kolmogorov’s inequality for martingales, we get

C2
P [zi 1 − n−1 4
log1 2
n 1 i l] 1−
log n

Replacing i by l in 3 we get the inequality yl zl − 1 + n−1 4


log1 2
n which
gives, with the preceding inequality,
1
P [yl 3n− 4 log1 2
n] = 1 − o(1)

that is, with probability 1 − o(1), we have l jo . Since we have also l io


with probability 1 − o(1) it follows that, again with probability 1 − o(1), the
process (yi ) coincides with the process (xi ) up to time l. This means that the
independence number of our interval graph is at least l = 2( n )1 2 (1 − o(1)) and
concludes the rst part of the proof. For the second part, that is in order to
prove that the independence is bounded above by 2( n )1 2 (1 + o(1)), it su ces
to repeat essentially the same arguments, using inequalities reverse to those we
have used. The details are omitted.

3 Conclusion
By analysing an algorithm, we have obtained an asymptotic equivalent to the
independence number of a random interval graph in the sense of Scheinerman.
An open problem is to nd an asymptotic equivalent to the independence number
of a genuine random interval graph, in the model where every possible interval
graph on n vertices is equally likely.

References
1. Scheinerman, .R., Random Interval Grap s, Combinatorica 8 (1988) 357-371.
Online Strategies for Backups

Peter Damaschke

FernUniversität, Theoretische Informatik II


58084 Hagen, Germany
Peter.Damasc ke@fernuni- agen.de

Ab tract. We consider strategies for full backups from the viewpoint


of competitive analysis of online problems. We concentrate upon the re-
alistic case that faults are rare, i.e. the cost of work between two faults is
typically large compared to the cost of one backup. Instead of the (worst-
case) competitive ratio we use a re ned and more expressive quality
measure, in terms of the average fault frequency. This is not standard in
the online algorithm literature. The interesting matter is, roughly speak-
ing, to adapt the backup frequency to the fault frequency, while future
faults are unpredictable. We give an asymptotically optimal determin-
istic strategy and propose a randomized strategy whose expected cost
beats the deterministic bound.

1 Introducing the Backup Problem


The main method to protect data from loss (due to power failure, physical de-
struction of storage media, deletion by accident etc.) is to save the current status
of a le or project from time to time. Such a full backup incurs some cost, but
loss of data is also very costly and annoying, and faults are unpredictable. So it
is a natural question what competitve analysis of online problems can say about
backup (or autosave) strategies.
We consider the following basic model. Some le (or le system, project etc.)
is being edited, while faults can appear. The cost of work per time is assumed
to be constant. very backup incurs a xed cost as well. We may w.l.o.g. choose
the time unit and cost unit in such a way that every unit of working time and
every backup incurs cost 1. In case of a fault, all work done after the most
recent backup is lost and must be repeated. Before this, we have to recover the
last consistent status from the backup, which incurs cost R (a constant ratio of
recovery and backup cost). The goal is to minimize the total cost of a piece of
work, which is the sum of costs for working time (including repetitions), backups
and recoveries.
This seems to be the simplest reasonable model and a good starting point
for studying online backup strategies. Perhaps the main criticism is concerned
with the constant backup cost. Usually they depend somehow on the amount of
changings (such as incremental backups). However, the constant cost assumption
is also suitable in some cases, e.g. if the system always saves the entire le though
the changings are minor updates, or if the save operation has large constant setup
cost whereas the amount of data makes no di erence.

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 63 71, 2000.

c Springer-Verlag Berlin Heidelberg 2000
64 Peter Damaschke

xtended models may, of course, take more aspects into account: faults which
have more fatal implications than just repeated work (such as loss of hardly
retrievable data etc.), backups which may be faulty in turn; repeated work which
is done faster than the rst time to mention a few.
It is usual in competitive analysis to compare the costs of an online strategy
to the costs of a clearvoyant (o ine) strategy which has prior knowledge of the
problem instance (here: the times when faults appear). Their worst-case ratio is
called the competitive ratio. We remark that, for our problem, an optimal o ine
strategy is easy to establish but not useful in developing online strategies, so we
omit this subject.
Only a few online problems have been considered where a problem instance
merely consists of a sequence of points in time: rent-to-buy (or spin-block),
acknowledgement delay, and some special online scheduling problems fall into
this category, cf. the references.

Rare Faults and a Reformulation

Consider a piece of work that requires p time units and is to be carried out
nonstop. A time interval I of length p is earmarked for this job. Let n be the
number of faults that will appear during I. The ratio f = n p is referred to
as the average fault frequency, with respect to I. In most realistic scenarios f
is quite small compared to 1, i.e. the time equivalent of the cost of one backup
is much smaller than the average distance between faults, thus we will focus
attention to this case of rare faults.
If the online player would know f in advance (but not the times the faults
appear at), it were not a bad idea to make a backup every 1 f time units.
Namely, the backup cost per time unit is f , and every fault destroys work to
the value of at most 1 f , hence the average cost of work to be repeated is
bounded by f per time unit. The f fraction of work which got lost must
be fetched later, immediately after I. New faults can occur in this extra time
interval, but this adds lower-order terms to the costs. Hence the cost per time
is at most 1 + 2 f + Rf , and the stretch (i.e. ratio of completion time and
productive time) is 1 + f , subject to O(f ) terms.
An o ine player can trivially succeed with 1 + (1 + R)f = 1 + O(f ) average
cost per time, making a backup immediately before each fault. (This is not
necessarily optimal.) We shall see below that any online strategy incurs cost at
least 1 + f per time unit, in the worst case. Hence the competitve ratio still
behaves as 1 + Ω( f ), for small f . This suggests to simply consider the cost per
time incurred by an online strategy, rather than the slightly smaller competitive
ratio. In particular, the constant R we have preliminarily introduced appears in
O(f ) only, thus we will suppress it henceforth. We also omit the summand 1 for
normal work and merely consider the additional cost for backups and lost (i.e.
repeated) work per time. Throughout the paper, this is called the excess rate.
Online Strategies for Backups 65

The next simple observation shows that the average fault frequency is intrin-
sic in the excess rate, as announced. (Similarly one can realize that no competi-
tive online strategy exists if faults are unrestricted.)

Proposition 1. For any fault frequency f , even if the online player knows f
beforehand, no deterministic backup strategy can guarantee an excess rate below
f.

Proof. An adversary partitions the time axis into phases of length 1 f and places
one fault in each phase by the following rule: If there occurs a gap of 1 f time
units between backups then the adversary injects a fault there, hence the online
player loses 1 f time, so the ratio of lost time is f . Otherwise, if the distance
between consecutive backups were always smaller than 1 f then more than
1 f backups have been made, so the backup cost per time is at least f . In
this case the adversary injects a fault at the end of the phase, to keep the fault
frequency f . 2

Thus the excess rate is some c f , and the main interesting matter is to adapt
the backup frequency so as to achieve the best coe cient c, under the realistic
assumption that the online player has no previous knowledge of the faults at all.
Some remarks suggest that other objectives would be less interesting: (1)
One might also study the excess rate in terms of the smallest fault distance d,
rather than the fault frequency f . However this is not very natural, since a pair
of faults occuring close together may be an exceptional event, and d can only
decrease in time, so using d as a parameter would yield too cautious strategies.
Moreover note that, trivially, any online strategy with excess rate c f has the
upper bound c d, too. (2) For a given strategy one may easily compute that f
maximizing (1+c f ) (1+(1+R)f ), thus estimating the worst-case competitive
ratio, but this number is less meaningful than c itself.
Clearly a c f upper bound can only hold in case f > 0, in other words,
if at least one fault occurs. If f = 0 then already the rst backup yields an
in nite coe cient, but if the online player makes no backups at all, speculating
on absence of faults, the adversary can foil him by a late fault, which also yields
a coe cient not bounded by any constant.
An elegant viewpoint avoiding this f = 0 singularity is the following refor-
mulation of the problem. Consider a stream of work whose length is not a priori
bounded. Given n, what c can be achieved for the piece of work up to the n-th
fault? (If the n-th fault appears after p time units, f is understood to be n p.)
We refer to the corresponding strategies as n-fault strategies. This version of our
problem is also supported by

Proposition . If we partition, in retrospect, the work time interval arbitrarily


into phases each containing at least one fault, such that we have always achieved
an excess rate c fi , where fi is the fault frequency in the i-th phase, then the
overall excess rate is bounded by c f .
66 Peter Damaschke

Proof. Let the i-th phase have length pi and contain ni > 0 faults. The cost
of i-th phase is by assumption pi c ni pi = c ni pi . xploiting an elementary
inequality, the total cost is bounded by

c ni pi c ni pi = c np = pc f
i i i

Therefore, once we have an n-fault strategy with excess rate c f , we may


apply it repeatedly to phases of n faults each, thus keeping an overall excess rate
c f.
Let us summarize this section: Instead of the competitive ratio we use a
nonstandard measure for online backup strategies in terms of an input parameter
(fault frequency f ), called the excess rate. It behaves as c f and is much more
expressive than a single number for the worst case, as the extra costs heavily
depend on the fault frequency. Moreover, we consider an arbitrarily long piece
of work until the n-th fault appears, and we try to minimize c for given n.
In the following sections we develop concrete n-fault strategies. In the proofs,
we rst consider a sort of continuous analogue of the discrete problem. Doing so
we can rst ignore tedious technicalities and conveniently obtain a heuristic so-
lution which is then discretized. The bound for the discrete strategy is rigorously
veri ed afterwards.

3 Deterministic n-Fault Strategies


We rst settle case n = 1.
Theorem 1. There exists a deterministic 1-fault strategy with c = 8, and this
is the optimal constant coe cient.
Proof. Work begins w.l.o.g. at time 0. A backup strategy is speci ed by the
integer-valued function y(x) describing the number of backups made before time
x2 . (This quadratic scale will prove convenient.) In order to get a heuristic so-
lution, we admit di erentiable real functions y instead of integer-valued ones.
That is, we provisionally x the asymptotic growth of backup numbers in time
only, but not the particular backup times.
We have to assign suitable costs to such functions. Assume that the fault
occurs between time (x − 1)2 and x2 . Then the cost of backups and lost work
incurred so far is bounded by y(x) + 2x y 0 (x). Namely, at most y(x) backups
have been made, and, in the worst case, the fault appears immediately before
a planned backup, hence the lost time may equal the distance of consecutive
backups. This distance can be roughly estimated as 2x y 0 (x), since y 0 (x) is the
backup density on the quadratic scale, and x2 − (x − 1)2 < 2x.
Remember that the excess rate c f is the cost per time. Since f = 1 x2 , the
coe cient c is the cost divided by x. So our y and c must satisfy y(x)+2x y 0 (x)
Online Strategies for Backups 67

cx for all x. We can assume equality, since every backup may be deferred until
coe cient c is reached. The resulting di erential equation y + 2x y 0 = cx with
y(0) = 0 has the solution y = ax with suitable constant a. Substitution yields
a + 2 a = c. The optimal c = 8 is achieved with a = 2.
Translating this back, let us make the x-th backup at time x2 2. Is is not
hard to verify accurately that this strategy has excess rate bounded by 8 f :
Let the fault appear at time u2 , with x2 2 < u2 (x + 1)2 2. The coe cient of
f at this moment is obviously

x + u2 − x2 2
c=
u
This term is monotone increasing in u within this interval, so we may consider
u2 = (x + 1)2 2, implying

2x + 1 2
c= 2 <2 2
x+1
2

Note that optimality holds only in an asymptotic sense, i.e. for f 0. The
coe cient is 8 minus some term vanishing with f . It might be interesting to
analyze this lower-order term, too.
Next we extend the idea to n faults. Here the coe cient improves upon the
1-fault optimum, if we combine n single-fault phases appropriately: Note that
the inequality used in Proposition 2 is tight for equal-length phases only, so it
should be possible to beat 8 by adapting the backup frequency. A more intuitive
explanation of this e ect is that the online player learns, with each fault, more
about the parameter f which describes an average behaviour in time.

Lemma 1. Any deterministic n-fault strategy with excess rate cn f yields a


deterministic (n + 1)-fault strategy with excess rate

c2n n + 2
cn+1 = f
cn n 2 + n
Proof. Apply the given n-fault strategy up to the n-th fault which occurs, say,
at time z 2 . With c := cn , the cost of backups and lost time until the n-th
fault is c f z 2 = c nz. Let y(x) denote the number of further backups until
time (z + x)2 . Assuming that the (n + 1)-th fault appears at time (z + x)2 and
allowing for di erentiable real functions y, the total cost up to this moment is
bounded by
c nz + y(x) + 2(z + x) y 0 (x)
(The arguments are the same as in Theorem 1.) On the other hand, with C :=
cn+1 , the cost up to (z + x)2 is

C n + 1(z + x)
68 Peter Damaschke

Together this yields the di erential equation


c nz + y(x) + 2(z + x) y 0 (x) = C n + 1(z + x)
with y(0) = 0. One solution is given by y(x) = c nx and C as claimed.
Once we have derived this solution heuristically, we can verify it exactly:
Using the backup function y(x) = c nx means to make the k-th backup
after z 2 at time (z + cpk n )2 . Let the next fault appear at time (z + u)2 , with
k k+1
(z + p
c n
)2 < (z + u)2 (z + p )2 .
c n
Then we have

c nz + k + (z + u)2 − (z + k
p
c n
)2
C=
n + 1(z + u)

Considering the derivative dC


du we nd that C(u) can attain its maximum only
at the endpoints of the interval. In case u = cpk n we get

n c2 n + 2
C=c <
n+1 c n2 + n
So it su ces to consider u = ck+1
p . Obvious algebraic manipulation yields, in a
n
few steps
c2 n + 2 + kc n z + (2k + 1) (c nz)
C=
c n2 + n + (k + 1) n + 1 z
In the numerator, replace k with k + 1 and 2k + 1 with 2k + 2. Then we see
c2 n + 2
C<
c n2 + n
also in this case. 2
Note that the excess rate at any time after the n-th fault is smaller than
n+1
n cn f . For n we get:

Theorem . There exists a deterministic backup strategy with excess rate cn f


after n faults, such that lim cn = 2.
Proof. Consider the sequence cn given by Lemma 1. With n := c2n we get
2
( n n + 2)
n+1 = 2
n (n + n)

Further let be n = 4 + rn n. By easy manipulation we obtain


4
rn+1 = rn +
4n + rn
Thus rn = O(ln n), lim n = 4, and lim cn = 2, independently of the start value.
2
Online Strategies for Backups 69

4 A Randomized Backup Strategy


As e.g. in the rent-to-buy problem [4], suitable randomization signi cantly im-
proves the expected cost against an oblivious adversary (who has no insight into
the online player’s random decisions).
Theorem 3. There exists a randomized 1-fault strategy with expected excess rate
c f such that limf !0 c = 2.
Proof. (Sketch We modify the deterministic strategy of Theorem 1 which had
coe cient c = 2 2. The x-th backup is made at time (x + r)2 2, where r is
a xed number, randomly chosen from interval [0 1]. This randomized strategy
makes the same number of backups as our deterministic strategy did, but it is
quite clear that the expected loss of working time is about half the worst-case
loss incurred by S (subject to some failure vanishing with f , i.e. with growing
backup number). Furthermore remember that, in Theorem 1, both the backups
and the worst-case loss of time contributed the same amount a = 2 to c. We
conclude that our randomized strategy is only 3 4 times as expensive as S, which
gives lim c = 3 2.
We achieve the slightly better factor 2 if we make the x-th backup at time
(x + r)2 instead! Namely, this reduces the backup cost and the expected loss by
the same factor 2. 2

For this type of randomized backup strategy (a xed backup pattern ran-
domly shifted on the quadratic scale), the above result is optimal, by a similar
argument as in Theorem 1. It remains open whether it is optimal at all. We hope
that a suitable application of Yao’s minimax principle will provide an answer.
For n faults we have:
Theorem 4. There exists a randomized backup strategy with expected excess
rate cn f after n faults, such that limf !0 limn!1 cn = 2.
Proof. (Sketch The method of Lemma 1 of extending an n-fault strategy to an
(n + 1)-fault strategy is also applicable in the randomized case, i.e. if c is the
expected coe cient: If we apply a scheme as in the weaker 3 2 version of The-
orem 3, the number of backups y(x) is deterministic (subject to 1 deviations),
and 2(z + x) y 0 (x) is replaced with the expected loss, i.e. multiplied by 1 2. We
therefore use the modi ed equation

c nz + y(x) + (z + x) y 0 (x) = C n + 1(z + x)

to obtain C which is the expected cn+1 . One solution is given by y(x) = c nx,
for
c2 n + 2
C=
c n2 + n
Let n = c2n . We get lim n = 2 in a similar way as in Theorem 2. The straight-
forward calculations are omitted. 2
70 Peter Damaschke

5 Some Lower Bounds


The quite trivial Proposition 1 remains true for randomized strategies and an
adaptive adversary. A stronger lower bound can be shown if the adversary has
no obligation to meet some prescribed f .
Proposition 3. No backup strategy can guarantee an excess rate below 2 f − f
against an adaptive adversary.
Proof. The adversary partitions the time axis into phases of some xed length
t > 2 and behaves as follows. If the online player did not make any backup in
a phase then the adversary injects a fault at the end of this phase. Let x be
the fraction of phases without backup, thus ending with a fault. Here the online
player pays 1 per time for repeated work. In the remaining 1 − x fraction of
phases he pays 1 t or more per time for backups. Hence the average cost per
time is at least x+ (1 − x) t. Furthermore note that f = x t. Thus the coe cient
of f is
1 x t 1 x
c = (x + − ) = tx + −
t t x tx t
The online player can minimize c choosing any strategy with x = 1 (t − 1) which
yields
t t−1 1
c= + −
t−1 t t(t − 1)
and also means f = 1 t(t − 1). Now the assertion follows easily.
Note that the coe cient can be made arbitrarily close to 2 with large enough
t. 2

This lower bound does not contradict Theorem 4 which refers to an oblivious
adversary who must x the fault times beforehand, whereas in Proposition 3, the
adversary can permanently decide whether to inject a fault or not, depending on
the online players behaviour. Thus he can also gain some information about the
coin tosses in a randomized online strategy. (Of course, the oblivious adversary
better reflects the real-world situation.)
In the deterministic case the adversaries all have the same power, hence it
follows:
Corollary 1. No deterministic backup strategy can guarantee an excess rate
better than 2 f − f . 2
In view of Theorem 2 this is a matching asymptotic lower bound.
For deterministic strategies, a stronger lower bound than in Proposition 1
can be proven also for prescribed f . We state one such result:
Proposition 4. For any fault frequency f , even if the online player knows f
beforehand, no deterministic backup strategy can guarantee an excess rate below
2 f.
Online Strategies for Backups 71

Proof. An adversary partitions the time axis into phases of length 1 f and places
one fault in each phase by the following rule: Since the online player’s strategy
is deterministic, the adversary knows the sequence of backups made in the next
phase until a fault. W.l.o.g. the phase starts at time 0, and the backup times
are t1 tk . Let t0 = 0, and in case tk < 1 f further de ne tk+1 = 1 f .
The adversary injects a fault immediately before ti+1 such that i + ti+1 − ti is
maximized.
The best an online player can do against this adversary’s strategy is to choose
his ti so as to minimize maxi (i + ti+1 − ti ). Obviously all these terms should
be equal, thus ti = it1 − i(i − 1) 2. In particular this yields 1 f kt1 − k 2 2.
Since the adversary may place his fault at the end of the phase, both t1 and k
are lower bounds for the additional cost the online player incurs in this phase.
By elementary calculation, max t1 k is minimized if t1 = k = 2 f . 2

References
1. A. Borodin, R. l-Yaniv: Online Computation and Competitive Analysis, Cam-
bridge Univ. Press 1998
2. P. Damaschke: Multiple spin-block decisions, 10t ISAAC’99,
3. D.R. Dooly, S.A. Goldman, S.D. Scott: TCP dynamic acknowledgment delay: the-
ory and practice, 30t STOC’98, 389-398
4. A.R. Karlin, M.S. Manasse, L.A. McGeoch, S. Owicki: Competitive randomized
algorithms for non-uniform problems, Algorit mica 11 (1994), 542-571
5. P. Krishnan, P.M. Long, J.S. Vitter: Adaptive disk spindown via optimal rent-to-
buy in probabilistic environments, Algorit mica 23 (1999), 31-56
6. R. Motwani, S. Phillips, . Torng: Nonclairvoyant scheduling, T eor. Comp. Sc.
130 (1994), 17-47; extended abstact in: 4t SODA’93, 422-431
Towards the Notion of Stability of
Approximation for Hard Optimization Tasks and
the Traveling Salesman Problem
( xtended Abstract)?

Hans-Joachim Böckenhauer, Juraj Hromkovic, Ralf Klasing, Sebastian Seibert,


and Walter Unger

Dept. of Computer Science I (Algorithms and Complexity), RWTH Aachen,


Ahornstra e 55, 52056 Aachen, Germany
jb,j ,rak,seibert,quax @i1.informatik.rwt -aac en.de

Ab tract. The investigation of the possibility to e ciently compute


approximations of hard optimization problems is one of the central and
most fruitful areas of current algorithm and complexity theory. The aim
of this paper is twofold. First, we introduce the notion of stability of
approximation algorithms. This notion is shown to be of practical as well
as of theoretical importance, especially for the real understanding of the
applicability of approximation algorithms and for the determination of
the border between easy instances and hard instances of optimization
problems that do not admit any polynomial-time approximation.
Secondly, we apply our concept to the study of the traveling salesman
problem. We show how to modify the Christo des algorithm for -TSP
to obtain e cient approximation algorithms with constant approxima-
tion ratio for every instance of TSP that violates the triangle inequality
by a multiplicative constant factor. This improves the result of Andreae
and Bandelt [AB95].

Keywords: Stability of approximation, Traveling Salesman Problem

1 Introduction

Immediately after introducing NP-hardness (completeness) [Co71] as a concept


for proving intractability of computing problems, the following question has been
posed: If an optimization problem does not admit an e ciently computable op-
timal solution, is there a possibility to e ciently compute at least an approxima-
tion of the optimal solution? Several researchers [Jo74], [Lo75], [Chr76], [IK75]
provided already in the middle of the seventies a positive answer for some op-
timization problems. It is a fascinating e ect if one can jump from exponential

Supported by DFG Project HR 14/5-1 Zur Klassi zierung der Klasse praktisch
lösbarer algorithmischer Aufgaben

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 72 86, 2000.

c Springer-Verlag Berlin Heidelberg 2000
Stability of Approximation for Hard Optimization Tasks 73

complexity (a huge inevitable amount of physical work) to polynomial complex-


ity (tractable amount of physical work) due to a small change in the requirements
instead of an exact optimal solution one demands a solution whose cost di ers
from the cost of an optimal solution by at most % of the cost of an optimal
solution for some > 0. This e ect is very strong, especially, if one considers
problems for which this approximation concept works for any relative di er-
ence (see the concept of approximation schemes in [IK75], [MPS98], [Pa94],
[BC93]). This is also the reason why currently optimization problems are con-
sidered to be tractable if there exist randomized polynomial-time approximation
algorithms that solve them with a reasonable approximation ratio. In what fol-
lows an -approximation algorithm for a minimization [maximization] problem
is any algorithm that provides feasible solutions whose cost divided by the cost
of optimal solutions is at most [is at least 1 ].
There is also another possibility to jump from NP to P. Namely, to consider
the subset of inputs with a special, nice property instead of the whole set of inputs
for which the problem is well-de ned. A nice example is the Traveling Salesman
Problem (TSP). TSP is not only NP-hard, but also the search for an approximate
solution for TSP is NP-hard for every constant approximation ratio.1 But if one
considers TSP for inputs satisfying the triangle inequality (so called -TSP),
one can even design a polynomial-time 32 -approximation algorithm [Chr76].2
The situation is even more interesting if one considers the Euclidean TSP, where
the distances between the nodes correspond to the distances in the Euclidean
metrics. The Euclidean TSP is NP-hard [Pa77], but for every > 1 one can
design a polynomial-time -approximation algorithm [Ar98], [Mi96]. Moreover,
if one allows randomization the resulting approximation algorithm works in n
(log2 n)O(1) time [Ar97].3 This is the reason why we propose again to revise the
notion of tractability especially because of the standard de nition of complexity
as the worst-case complexity: Our aim is to try to separate the easy instances
from the hard instances of every computing problem considered to be intractable.
In fact, by our concept, we want to attack the de nition of complexity as the
worst-case complexity. The approximation ratio of an algorithm is also de ned
in a worst-case manner. Our idea is to split the set of input instances of the
given problem into possibly in nitely many subclasses according to the hardness
of their approximability, and to have an e cient algorithm for deciding the
membership of any problem instance to one of the subclasses considered. To
achieve this goal we introduce the concept of approximation stability.
Informally, one can describe the idea of our concept by the following scenario.
One has an optimization problem for two sets of inputs L1 and L2 , L1 ( L2 . For

1
Even no f (n)-approximation algorithm exists for f polynomial in the input size n.
2
Note that -TSP is APX-hard and we know even explicit lower bounds on its inap-
proximability [En99, BHKSU00].
3
Obviously, there are many similar examples where with restricting the set of inputs
one crosses the border between decidability and undecidability (Post Correspon-
dence Problem) or the border between P and NP (SAT and 2-SAT, or vertex cover
problem).
74 Hans-Joachim Böckenhauer et al.

L1 there exists a polynomial-time -approximation algorithm A for some > 1,


but for L2 there is no polynomial-time γ-approximation algorithm for any γ > 1
(if NP is not equal to P). We pose the following question: Is the use of algorithm
A really restricted to inputs from L1 ? Let us consider a distance measure d in
L2 determining the distance d(x) between L1 and any given input x L2 − L1 .
Now, one can consider an input x L2 − L1 with d(x) 6 k for some positive real
k. One can look for how good the algorithm A is for the input x L2 − L1 . If
for every k > 0 and every x with d(x) 6 k, A computes a γk -approximation of
an optimal solution for x (γk is considered to be a constant depending on k and
only), then one can say that A is (approximation) stable according to the
distance measure d. Obviously, such a concept enables to show positive results
extending the applicability of known approximation algorithms. On the other
hand it can help to show the boundaries of the use of approximation algorithms
and possibly even a new kind of hardness of optimization problems.
Observe that the idea of the concept of approximation stability is similar to
that of stability of numerical algorithms. Instead of observing the size of the
change of the output value according to a small change of the input value, one
looks for the size of the change of the approximation ratio according to a small
change in the speci cation of the set of consistent input instances.
To demonstrate the applicability of our new approach we consider TSP, -
TSP, and, for every real > 1, -TSP containing all input instances with
cost(u v) 6 (cost(u x) + cost(x v)) for all vertices u v x. If an input is
consistent for -TSP we say that its distance to -TSP is at most − 1.
We will show that known approximation algorithms for -TSP are unstable
according to this distance measure. But we will nd a way how to modify the
Christo des algorithm in order to obtain approximation algorithms for -TSP
that are stable according to this distance measure. So, this e ort results in a ( 32
2
)-approximation algorithm for -TSP.4 This improves the result of Andreae
and Bandelt [AB95] who presented a ( 32 2 + 12 )-approximation algorithm for
-TSP. Our approach essentially di ers from that of [AB95], because in order to
design our ( 32 2 )-approximation algorithm we modify the Christo des algorithm
while Andreae and Bandelt obtain their approximation ratio by modifying the
original 2-approximation algorithm for -TSP.
Note that, after this paper was written, we got the information about the
independent, unpublished result of Bender and Chekuri, accepted for WADS’99
[BC99]. They designed a 4 -approximation algorithm which can be seen as a
modi cation of the 2-approximation algorithm for -TSP. Despite this nice
result, there are three reasons to consider our algorithm. First, our algorithm
provides a better approximation ratio for < 83 . Secondly, in the previous work
[AB95], the authors claim that the Christo des algorithm cannot be modi ed
4
Note that in this way we obtain an approximate solution to every problem instance
of TSP, where the approximation ratio depends on the distance of this problem
instance to -TSP. Following the discussion in [Ar98] about typical properties of
real problem instances of TSP our approximation algorithm working in O(n3 ) time
is of practical relevance.
Stability of Approximation for Hard Optimization Tasks 75

in order to get a stable (in our terminology) algorithm for TSP, and our result
disproves this conjecture. This is especially of practical importance, since for in-
stances where the triangle inequality is violated only by a few edge costs, one can
expect that the approximation ratio will be as in the underlying algorithm with
a high probability. Finally, our algorithm is a practical O(n3 )-algorithm. This
cannot be said about the 4 -approximation algorithm from [BC99]. The rst
part of the latter algorithm is a 2-approximation algorithm for nding minimal
two-connected subgraphs with time complexity O(n4 ). For the second part, con-
structing a Hamiltonian tour in S 2 (if S was the two-connected subgraph), there
exist only proofs saying that it can be implemented in polynomial time, but no
low-degree polynomial upper bound on the time complexity of these procedures
has been established.
This paper is organized as follows: In Section 2 we introduce our concept of
approximation stability. In Section 3 we show how to apply our concept in the
study of the TSP, and in Section 4 we discuss the potential applicability and
usefulness of our concept.

2 De nition of the Stability of Approximation Algorithms

We assume that the reader is familiar with the basic concepts and notions of algo-
rithmics and complexity theory as presented in standard textbooks like [BC93],
[CLR90], [GJ79], [Ho96], [Pa94]. Next, we give a new de nition of the notion of
an optimization problem. The reason to do this is to obtain the possibility to
study the influence of the input sets on the hardness of the problem considered.
Let IN = 0 1 2 be the set of nonnegative integers, let IR+ be the set of
positive reals, and let IR>a be the set of all reals greater than or equal to a for
some a IR.

De nition 1. An optimization problem U is a 7-tuple U = ( I O, L, LI ,


M, cost, goal), where

1. I is an alphabet called input alp abet,


. O is an alphabet called output alp abet,

3. L I is a language over I called the language of consistent inputs,
4. LI L is a language over I called the language of actual inputs,

5. M is a function from L to 2 O , where, for every x L, M(x) is called the
set of feasible solutions for the input x, S
6. cost is a function, called cost function, from x2L M(x) LI to IR>0 ,
7. goal minimum maximum .

For every x L, we de ne

OutputU (x = y M(x) cost(y x) = goal cost(z x) z M(x)

and
OptU (x = cost(y x) for some y OutputU (x)
76 Hans-Joachim Böckenhauer et al.

Clearly, the meaning for I , O , M, cost and goal is the usual one. L
may be considered as the set of consistent inputs, i.e., the inputs for which the
optimization problem is consistently de ned. LI is the set of inputs considered
and only these inputs are taken into account when one determines the complexity
of the optimization problem U . This kind of de nition is useful for considering the
complexity of optimization problems parameterized according to their languages
of actual inputs. In what follows, Language(U denotes the language LI of
actual inputs of U . If the input x is xed, we usually use cost(y) instead of
cost(y x) in what follows.

De nition 2. Let U = ( I , O , L, LI , M, cost, goal) be an optimization


problem. We say that an algorithm A is a consistent algorit m for U if, for
every input x LI , A computes an output A(x) M(x). We say that A solves
U if, for every x LI , A computes an output A(x) from OutputU (x). The time
complexity of A is de ned as the function

T imeA (n = max T imeA(x) x LI n


I

from IN to IN, where T imeA (x is the length of the computation of A on x.

Next, we give the de nitions of standard notions in the area of approximation


algorithms (see e.g. [CK98], [Ho96]).

De nition 3. Let U = ( I O L LI M cost goal) be an optimization prob-


lem, and let A be a consistent algorithm for U . For every x LI , the approxi-
mation ratio RA (x is de ned as
 
cost(A(x)) OptU (x)
RA (x = max
OptU (x) cost(A(x))

For any n IN, we de ne the approximation ratio of A as

RA (n = max RA (x) x LI n
I

For any positive real > 1, we say that A is a -approximation algorit m


for U if RA (x) 6 for every x LI .
For every function f : IN IR>1 , we say that A is an f (n -approximation
algorit m for U if RA (n) 6 f (n) for every n IN.

In what follows, we consider the standard de nitions of the classes NPO,


PO, APX (see e.g. [Ho96],[MPS98]). In order to de ne the notion of stability of
approximation algorithms we need to consider something like a distance between
a language L and a word outside L.

De nition 4. Let U = ( I , O , L, LI , M, cost, goal) and U = ( I , O ,


L, L, M, cost, goal) be two optimization problems with LI ( L. A distance
function for U according to LI is any function hL : L IR>0 satisfying
the properties
Stability of Approximation for Hard Optimization Tasks 77

1. hL (x) = 0 for every x LI , and


. hL can be computed in polynomial time.

Let h be a distance function for U according to LI . We de ne, for any r IR+ ,

Ballr;h (LI = w L h(w) 6 r

Let A be a consistent algorithm for U, and let A be an -approximation algorithm


for U for some IR>1 . Let p be a positive real. We say that A is p-stable
according to if, for every real 0 < r 6 p, there exists a IR>1 such that
A is a -approximation algorithm for U = ( I , O , L, Ball h (LI ), M, cost,
goal).5
A is stable according to if A is p-stable according to h for every p IR+ .
We say that A is unstable according to if A is not p-stable for any p IR+ .
For every positive integer r, and every function f : IN IR>1 we say that
A is (r fr (n -quasistable according to if A is an f (n)-approximation
algorithm for U = ( I O , L, Ball h (LI ), M, cost, goal).

A discussion about the potential usefulness of our concept is given in the last
section. In the next section we show a transparent application of our concept for
TSP.

3 Stability of Approximation Algorithms and TSP

We consider the well-known TSP problem (see e.g. [LLRS85]) that is in its gen-
eral form very hard for approximation. But if one considers complete graphs
in which the triangle inequality holds, then we have a 32 -approximation algo-
rithm due to Christo des [Chr76]. So, this is a suitable starting point for the
application of our approach based on approximation stability. First, we de ne
two natural distance measures and show that the Christo des algorithm is sta-
ble according to one of them, but not according to the second one. This leads
to the development of a new algorithm, PMCA, for -TSP. This algorithm
is achieved by modifying Christo des algorithm in such a way that the result-
ing algorithm is stable according to the second distance measure, too. In this
way, we obtain a ( 32 (1 + r)2 )-approximation algorithm for every input in-
stance of TSP with the distance at most r from Language( -TSP), i.e. with
cost(u v) 6 (1 + r) (cost(u w) + cost(w v)) for every three nodes u v w. This
improves the result of Andreae and Bandelt [AB95] who achieved approximation
ratio 32 (1 + r)2 + 12 (1 + r).
To start our investigation, we concisely review two well-known algorithms
for -TSP: the 2-approximative algorithm 2APPR and the 32 -approximative
Christo des algorithm [Chr76], [Ho96].
5
Note that r is a constant depending on and only.
78 Hans-Joachim Böckenhauer et al.

Algorithm 2APPR

Input: A complete graph G = (V E) with a cost function cost : E IR>0


satisfying the triangle inequality (for every u v q V , cost(u v) 6 cost(u q) +
cost(q v)).
Step 1a: Construct a minimal spanning tree T of G. (The cost of T is surely
smaller than the cost of the optimal Hamiltonian tour.)
Step 1b: Construct an ulerian tour D on T going twice via every edge of T . (The
cost of D is exactly twice the cost of T .)
Step 2: Construct a Hamiltonian tour H from D by avoiding the repetition of
nodes in the ulerian tour. (In fact, H is the permutation of nodes of G, where
the order of a node v is given by the rst occurrence of v in D.)
Output: H.

Christo des Algorithm

Input: A complete graph G = (V E) with a cost function cost : E IR>0


satisfying the triangle inequality.
Step 1a: Construct a minimal spanning tree T of G and nd a matching M with
minimal cost (at most 12 of the cost of the optimal Hamiltonian tour) on the
nodes of T with odd degree.
Step 1b: Construct a ulerian tour D on G0 = T M .
Step 2: Construct a Hamiltonian tour H from D by avoiding the repetition of
nodes in the ulerian tour.
Output: H.

Since the triangle inequality holds and Step 2 in both algorithms is realized by re-
peatedly shortening a path x u1 um y by the edge (x y) (because u1 um
have already occurred before in the pre x of D) the cost of H is at most the
cost of D. Thus, the crucial point for the success of 2APPR and Christo des
algorithm is the triangle inequality. A reasonable possibility to search for an ex-
tension of the application of these algorithms is to look for inputs that almost
satisfy the triangle inequality. In what follows we do this in two di erent ways.
Let -TSP = ( I O L LI M cost minimum) be a representation of the
TSP with triangle inequality. We may assume I = O = 0 1 # , L contains
codes of all cost functions for edges of complete graphs, and LI contains codes
of cost functions that satisfy the triangle inequality. Let, for every x L, Gx =
(Vx Ex costx ) be the complete weighted graph coded by x. Obviously, the above
algorithms are consistent for ( I O L L M cost minimum)
Let 1+ d -TSP = ( I O L, Ball d (LI ), M cost minimum) for any r
IR+ and for any distance function d for -TSP. We de ne for every x L
Stability of Approximation for Hard Optimization Tasks 79
  
cost(u v)
dist(x) = max 0 max − 1 u v p Vx
cost(u p) + cost(p v)

and
 
cost(u v)
distance(x) = max 0 max Pm − 1 u v Vx
i=1 cost(pi pi+1 )
and u = p1 p2 pm+1 = v is a simple path between

u and v in Gx

Since the distance measure dist is the most important for us we will use the
notation -TSP instead of dist -TSP. For simplicity we consider the size of
x as the number of nodes of Gx instead of x . We observe that for every > 1 the
inputs from dist -TSP have the property cost(u v) 6 (cost(u x)+cost(x v))
for all u v x ( = 1 + r). It is a simple exercise to prove the following lemma.

Lemma 1. The APPR and Christo des algorithm are stable according to dis-
tance.

Now, one can ask for the approximation stability according to the distance
measure dist that is the most interesting distance measure for us. Unfortunately,
as shown in the next lemmas, the answer is not as positive as for distance.

Lemma 2. For every r IR+ , Christo des algorithm is r 32 (1 + r)2dlog 2 ne -
quasistable for dist, and APPR is (r 2 (1 + r)dlog2 ne )-quasistable for dist.

That the result of Lemma 2 cannot be essentially improved, is shown by


presenting an input for which the Christo des algorithm as well as 2APPR
provide a very poor approximation.

Lemma 3. For every r IR+ , if the Christo des algorithm (or APPR) is
(r f (n))-quasistable for dist, then f (n) > nlog2 (1+ ) (2 (1 + r)).

Proof. We construct a weighted complete graph from Ball dist (LI ) as follows.
We start with the path p0 p1 pn for n = 2k , k IN, where every edge
(pi pi+1 ) has the cost 1. For all other edges we take maximal possible costs in
such a way that the constructed input is in Ball dist (LI ). As a consequence,
for every m 1 log2 n , we have cost(pi pi+2m ) = 2m (1 + r)m for i =
0 n − 2 (see Figure 1).
m

Let us have a look at the work of Christo des algorithm on this input. (Simi-
lar considerations can be made for 2APPR.) There is only one minimal spanning
tree that corresponds to the path containing all edges of the cost 1. Since every
path contains exactly two nodes of odd degree, the Eulerian graph constructed
in Step 1 is the cycle D = p0 p1 p2 pn p0 with the n edges of cost 1 and the
edge of the maximal cost n (1 + r)log2 n = n1+log2 (1+ ) . Since the Eulerian path
is a Hamiltonian tour, the output of the Christo des algorithm is unambiguously
80 Hans-Joachim Böckenhauer et al.
4(1+ )2 4(1+ )2 4(1+ )2
................................................ ................................................ ................................................
.......... ..................... ...................... ........
....... .................
...... ..........
...... ..........
......
..
........ ..
........ .....
.... ..... .
......
.....
.... .... . ...
. .... ....
.. .. ..... .. ...... ...
.. .. ..
.
..
... 2(1+ ) ..
..
.. 2(1+ ) .... .
..
..
.. 2(1+ ) .... . 2(1+ ) ...... 2(1+ )
. .. .............................................. .
... .......................................... .... .... .......................................... .... ......
.................................... ..... ..........
...............................
.......
.. ..... ...... .. ........ ...... .. .. .......
......... .......
...... .. .......
........ .....
...... ..
........ ....... ......
.
. ........... ..... .......
. . ..... .......... ......... ..... . .
...... ....
......... ........ ....... ...... ........ ...... ........ ...... ......... ... ......... ..... ........
... ........
................. ................ ................... ..
........ ..
... 1
....
..
... ..... 1
... ..........
. .
1.............. ..............1.... . .
1................. ........1 . 1 ...
... .
..... . ......1
..
.. 1....
..... .. 1 .
.....
.... .............................. ... ................................ ... ........... .
........ ... . .
........... ........ . . .
... ....... .. . . ......
..... .. . .. . ....
... . ... .
. . . . .
.... ... ...
.... ... ... . ... ...
... ... ....
....
....
...
....2(1+ ) ....2(1+ ) ..... 2(1+ ) ........ 2(1+ ) 2(1+ ) ...
.... ..... .... ..... .. ....
.....
.....
.....
......
.....
...... .......... .
..
...... ..
......
..... ....... ............ .... .....
.
..... ......... .. ... ....... .....
...... ............... ......... ...................... ......... .....
...... ............................... ................................ ......
......
...... 2 2 .........
...... 4(1+ ) 4(1+ ) ......
....... ......
....... ......
....... .......
........ ...
.........
.
........ ...
......... ........
......... .........
........... ..........
............ ...........
...............
....................... ...... .
....
................
...................................................................

n(1+ )log2 n = n1+log2 (1+r)

Fig. 1. A hard di t -TSP instance

the cycle p0 p1 pn p0 with the cost n + n (1 + r)log 2 n . But the optimal tour
is

T = p 0 p2 p4 p2i p2(i+1) pn pn−1 pn−3 p2i+1 p2i−1 p3 p 1 p 0

This tour contains two edges (p0 p1 ) and (pn−1 pn ) of the cost 1 and all n − 2
edges of the cost 2 (1 + r). Thus, Opt = cost(T ) = 2 + 2 (1 + r) (n − 2), and

n + n1+log2 (1+ ) 1+log2 (1+ )


nlog2 (1+ )
cost(D)
cost(T )
=
2 + 2 (1 + r) (n − 2)
> 2n n (1 + r)
=
2 (1 + r)

Corollary 1. APPR and the Christo des algorithm are unstable for dist.

The results above show that 2APPR and Christo des algorithm can be useful
for a much larger set of inputs than the original input set. But the stability
according to dist would provide approximation algorithms for a substantially
larger class of input instances. So the key question is whether one can modify
the above algorithms to get algorithms that are stable according to dist. In what
follows, we give a positive answer on this question.

Theorem 1. For every IR>1 , there is a ( 32 2


)-approximation algorithm
3
PMCA for dist -TSP working in time O(n ).

Proof sketch.. In the following, we will give a sketch of the proof of Theorem 1
by stating algorithm PMCA. The central ideas of PMCA are the following. First,
we replace the minimum matching generated in the Christo des Algorithm by
a minimum path matching . That means to nd a pairing of the given vertices
s.t. the vertices in a pair are connected by a path rather than a single edge, and
the goal is to minimize the sum of the path costs. In this way, we obtain an
Eulerian tour on the multi-graph consisting of spanning tree and path matching
Stability of Approximation for Hard Optimization Tasks 81

(in general not Hamiltonian). This Eulerian tour has a cost of at most 1 5 times
of the cost of an optimal TSP tour.
The second new concept concerns the substitution of sequences of edges by
single ones, when transforming the above mentioned tour to a Hamiltonian one.
Here, we can guarantee that at most four consecutive edges will be eventually
substituted by a single one. This may increase the cost of the tour by a factor
of at most 2 for inputs from -TSP (remember that we deal in what follows
only with distance function dist, and therefore drop the corresponding subscript
from dist -TSP).
Before stating the algorithm in detail, we have to introduce its main tools
rst. Let G = (V E) be a graph. A path matching for a set of vertices U V
of even size is a collection of U 2 edge-disjoint paths having U as the set of
endpoints.
Assume that p = (u0 u1 ) (u1 u2 ) (uk−1 uk ) is a path in (V E), not
necessarily simple. A bypass for p is an edge (u v) from E, replacing a sub-path
(ui ui+1 ) (ui+1 ui+2 ) (uj−1 uj ) of p from u = ui to uj = v (0 6 i < j 6 k).
Its size is the number of replaced edges, i.e. j − i.6 Also, we say that the vertices
ui+1 ui+2 uj−1 are bypassed. Given some set of simple paths , a conflict
according to is a vertex which occurs at least two times in the given set of
paths.
Algorithm PMCA
Input: a complete graph (V E) with cost function cost : E IR>0
(a -TSP instance for 1).
1. Construct a minimal spanning tree T of (V E).
2. Let U be the set of vertices of odd degree in T ;
construct a minimal (edge-disjoint) path matching for U .
3. Resolve conflicts according to , in order to
obtain a vertex-disjoint path matching 0 with cost( 0 ) 6 cost( )
(using bypasses of size 2 only).
4. Construct an ulerian tour on T and 0 .
( can be considered as a sequence of paths p1 p2 p3
0
such that p1 p3 are paths in T , and p2 p4 )
5. Resolve conflicts inside the paths p1 p3 from T , such that T is divided into
a forest Tf of trees of degree at most 3, using bypasses of size 2 only.
(Call the resulting paths p01 p03 and the modi ed tour 0 is p01 p2 p03 p4 .)
6. Resolve every double occurrence of nodes in 0 such that the overall size
of the bypasses is at most 4 (where ‘‘overall’’ means that a bypass constructed
in Step 3 or 5 counts for two edges). Obtain tour 00 .
Output: Tour 00 .
In the following, we have to explain how to e ciently obtain a minimal path
matching, and how to realize the conflict resolution in Steps 3, 5, and 6. The
latter not only have to be e cient but must also result in substituting at most
four edges by a single one after all.
6
Obviously, we are not interested in bypasses of size 1.
82 Hans-Joachim Böckenhauer et al.

How to construct an Eulerian cycle in Step 4 is a well-studied task. We only


observe that since each vertex can be endpoint of at most one path from 0 by
de nition the same holds for T : the endpoints of p1 p3 are the same as those
of p2 p4 .
We give in the following detailed descriptions of Steps 2, 3, 5, and 6, respec-
tively.

Claim 1
One can construct in time O( V 3 ) a minimum path matching for U that has
the following properties:

Every two paths in are edge-disjoint. (1)


forms a forest. (2)

Proof sketch.. First, we will show how to construct a path matching within the
given time. To construct the path matching, we rst compute all-pairs cheapest
paths.7 Then, we de ne G0 = (V E 0 ) where cost0 (v v 0 ) is the cost of a cheapest
path between v and v 0 in G. Next, we compute a minimum matching on G0 (in
the usual sense), and nally, we substitute the edges of G0 in the matching by
the corresponding cheapest paths in G. Clearly, this can be done in time O(n3 )
and results in a minimum path matching.
The claimed properties (1) and (2) are a consequence of the minimality. The
technical details will be given in the full version of this paper.

The essential property of a minimal path matching for our purposes is that
it costs at most half of the cost of a minimal Hamiltonian tour. Now we show
how Step 3 of the algorithm is performed.

Claim 2
Every path matching having properties (1) and (2) can be modi ed into a vertex-
disjoint one by using bypasses of size 2 only. Moreover, on each of the new paths,
there will be at most one bypass.

Proof sketch.. By Claim 1, every vertex used by two paths in a path matching
belongs to some tree. We will show how to resolve a tree of by using bypasses
of size 2 in such a way that only vertices of the tree are a ected. Then we are
done by solving all trees independently.
Let T be a subset of , forming a tree. For simplicity, we address T itself
as a tree. Every vertex of T being a conflict has at least three edges incident
to it, since it cannot be endpoint of two paths in , and it is part of at least
two edge-disjoint paths by de nition of a conflict.
We reduce the size of the problem at hand in that we eliminate paths from
the tree by resolving conflicts.

7
Since we associate a cost instead of a length to the edges, we speak about cheapest
instead of shortest path.
Stability of Approximation for Hard Optimization Tasks 83

Procedure 1
Input: A minimal path matching for some vertex set U on (V E).
For all trees T of
While there are conflicts in T (i.e. there is more than one path in T )
pick an arbitrary path p T;
if p has only one conflict v, and v is an endpoint of p,
pick another path using v as new p instead;
let v1 v2 vk be (in this order) the conflicts in p;
while k > 1
consider two paths p1 pk T which use v1 respectively vk , commonly with p;
pick as new p one of p1 pk which was formerly not picked;
let v1 v2 vk be (in this order) the conflicts in p;
let v be the only vertex of the nally chosen path p which is a conflict;
if v has two incident edges in p,
replace those with a bypass,
else (v is an endpoint of p)
replace the single edge incident to v in p together
with one of the previously picked paths with a bypass.
Output: the modi ed conflict-free path matching 0 .

The proof of the correctness of Procedure 1 is moved to the full version of this
paper.

Now we describe the implementation of Step 5 of Algorithm PMCA. It divides


the minimal spanning tree by resolving conflicts into several trees, whose crucial
property is that they have vertices of degree at most 3.
Procedure 2 below is based on the following idea. First, a root of T is picked.
Then, we consider a path pi in T which, under the orientation w.r.t. this root,
will go up and down. The two edges immediately before and after the turning
point are bypassed. One possible view on this procedure is that the minimal
spanning tree is divided into several trees, since each bypass building divides a
tree into two trees.

Procedure 2
Input: T and the paths p1 p3 p5 computed in Step 4 of Algorithm PMCA.
Choose a node r as a root in T .
For each path
pi = (v1 v2 ) (v2 v3 ) (vni −1 vni ) in T do
Let vj be the node in pi of minimal distance to r in T .
If 1 < j < ni then
bypass the node vj and call this new path p0i .
else p0i = pi .
Output: The paths p01 p03 p05 , building a forest Tf .

Now the following properties hold. Their proofs are given in the full version
of this paper.
84 Hans-Joachim Böckenhauer et al.

1. If a node v occurs in two di erent paths p0i and p0j of Tf , then v is an inner
node in one path and a terminal node in the other path. I.e. the node degree
of the forest spanned by p01 p03 p05 is at most three.
2. In Tf , every path has at most one bypass, and every bypass is of size two.
3. Vertices which are leaves in Tf are not conflicts in 0 .
4. In the cycle p01 p2 p03 p4 p05 p6 , between each two bypasses there is at
least one vertex not being a conflict.
Below, we present Procedure 3 which consecutively resolves the remaining
conflicts. Note that s t u v, and their primed versions, denote occurrences of
vertices on a path, rather than the vertices itself. In one step, Procedure 3 has
to make a choice.
Procedure 3
Input: a cycle 0 on (V E) where every vertex of V occurs once or twice.
Take an arbitrary conflict, i.e. a vertex occurring twice as u and u0 in 0 ;
bypass one occurrence, say u (with a bypass of size 2);
while there are conflicts remaining
if occurrence u has at least one unresolved conflict as neighbor
let v be one of them, chosen by the following rule:
If between u and another bypassed vertex occurrence t on
0
, there are only unresolved conflicts, choose v to be the
neighbor of u towards t.
((v u) or (u v) is an edge of 0 and there is another occurrence v 0 of
the same vertex as v)
resolve that conflict by bypassing v 0
else
resolve an arbitrary conflict;
let u be the bypassed vertex.
Output: the modi ed cycle 00 .
The proofs of correctness of Procedure 3 and of the approximation ratio of
PMCA are given in the full version of this paper.
Theorem 1 improves the approximation ratio achieved in [AB95]. Note that
this cannot be done by modifying the approach of Andreae and Bandelt. The
crucial point of our improvement is based on the presented modi cation of
Christo des algorithm while Andreae and Bandelt conjectured in [AB95] that
Christo des algorithm cannot be modi ed in order to get an approximation
algorithm for dist -TSP.
Note that Theorem 1 can also be formulated in a general form by substituting
the parameter by a function (n), where n is the number of nodes of the graph
considered.

4 Conclusion and Discussion


In the previous sections we have introduced the concept of stability of approxima-
tions and we have applied it to TSP. Here we discuss the potential applicability
Stability of Approximation for Hard Optimization Tasks 85

and usefulness of this concept. Applying it, one can establish positive results of
the following types:

1. An approximation algorithm or a PTAS can be successfully used for a larger


set of inputs than the set usually considered (see Lemma 1).
2. We are not able to successfully apply a given approximation algorithm A (a
PTAS) for additional inputs, but one can modify A to get a new approxi-
mation algorithm (a new PTAS) working for a larger set of inputs than the
set of inputs of A (see Theorem 1 and [AB95, BC99]).
3. To learn that an approximation algorithm is unstable for a distance measure
could lead to the development of completely new approximation algorithms
that would be stable according to the considered distance measure.

The following types of negative results may be achieved:

4. The fact that an approximation algorithm is unstable according to all rea-


sonable distance measures and so that its use is really restricted to the
original input set.
5. Let = ( I , O , L, LI , M, cost, goal) N P O be well approximable. If,
for a distance measure d and a constant r, one proves the nonexistence of
any approximation algorithm for d = ( I O L Ball d (LI ), M, cost,
goal) under the assumption P = N P , then this means that the problem
is unstable according to d.

Thus, using the notion of stability one can search for a spectrum of the
hardness of a problem according to the set of input instances, which is the main
aim of our concept. This has been achieved for TSP now. Collecting results of
Theorem 1 and of [BC99], we have min 23 2 4 -approximation algorithms for
dist -TSP, and following [BC99], dist -TSP is not approximable within a
factor 1 + for some < 1. While TSP does not seem to be tractable from
the previous point of view of approximation algorithms, using the concept of
approximation stability, it may look tractable for many speci c applications.

References
[Ar97] S. Arora: Nearly linear time approximation schemes for Euclidean TSP and
other geometric problems. In: Proc. 38th I FOCS, 1997, pp. 554 563.
[Ar98] S. Arora: Polynomial time approximation schemes for Euclidean traveling
salesman and other geometric problems. In: Journal of the ACM 45, No. 5
(Sep. 1998), pp. 753 782.
[AB95] T. Andreae, H.-J. Bandelt: Performance guarantees for approximation al-
gorithms depending on parametrized triangle inequalities. SIAM J. Discr.
Math. 8 (1995), pp. 1 16.
[BC93] D. P. Bovet, P. Crescenzi: Introduction to the Theory of Complexity,
Prentice-Hall 1993.
[BC99] M. A. Bender, C. Chekuri: Performance guarantees for the TSP with a
parameterized triangle inequality. In: Proc. WADS’99, LNCS, to appear.
86 Hans-Joachim Böckenhauer et al.

[BHKSU00] H.-J. Böckenhauer, J. Hromkovic, R. Klasing, S. Seibert, W. Unger: An


Improved Lower Bound on the Approximability of Metric TSP and Ap-
proximation Algorithms for the TSP with Sharpened Triangle Inequality
(Extended Abstract). In: Proc. STACS’00, LNCS, to appear.
[Chr76] N. Christo des: Worst-case analysis of a new heuristic for the traveling
salesman problem. Technical Report 388, Graduate School of Industrial
Administration, Carnegie-Mellon University, Pittsburgh, 1976.
[Co71] S. A. Cook: The complexity of theorem proving procedures. In: Proc. 3rd
ACM STOC, ACM 1971, pp. 151 158.
[CK98] P. Crescenzi, V. Kann: A compendium of NP optimization problems.
ttp://www.nada.kt .se/t eory/compendium/
[CLR90] T. H. Cormen, C. E. Leiserson, R. L. Rivest: Introduction to algorithms.
MIT Press, 1990.
[En99] L. Engebretsen: An explicit lower bound for TSP with distances one and
two. Extended abstract in: Proc. STACS’99, LNCS 1563, Springer 1999, pp.
373 382. Full version in: lectronic Colloquium on Computational Complex-
ity, Report TR98-046 (1999).
[GJ79] M. R. Garey, D. S. Johnson: Computers and vIntractability. A Guide to the
Theory on NP-Completeness. W. H. Freeman and Company, 1979.
[Ha97] J. Hastad: Some optimal inapproximability results. Extended abstract in:
Proc. 29th ACM STOC, ACM 1997, pp. 1 10. Full version in: lectronic
Colloquium on Computational Complexity, Report TR97-037, (1999).
[Ho96] D. S. Hochbaum (Ed.): Approximation Algorithms for NP-hard Problems.
PWS Publishing Company 1996.
[Hr98] J. Hromkovic: Stability of approximation algorithms and the knapsack
problem. Unpublished manuscript, RWTH Aachen, 1998.
[IK75] O. H. Ibarra, C. E. Kim: Fast approximation algorithms for the knapsack
and sum of subsets problem. J. of the ACM 22 (1975), pp. 463 468.
[Jo74] D. S. Johnson: Approximation algorithms for combinatorial problems. JCSS
9 (1974), pp. 256 278.
[Lo75] L. Lovasz: On the ratio of the optimal integral and functional covers. Dis-
crete Mathematics 13 (1975), pp. 383 390.
[LLRS85] E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, D. B. Shmoys (Eds.):
The Traveling Salesman Problem. John Wiley & Sons, 1985.
[Mi96] I. S. B. Mitchell: Guillotine subdivisions approximate polygonal subdivi-
sions: Part II a simple polynomial-time approximation scheme for ge-
ometric k-MST, TSP and related problems. Technical Report, Dept. of
Applied Mathematics and Statistics, Stony Brook 1996.
[MPS98] E. W. Mayr, H. J. Prömel, A. Steger (Eds.): Lectures on Proof Veri cation
and Approximation Algorithms. LNCS 1967, Springer 1998.
[Pa77] Ch. Papadimitriou: The Euclidean traveling salesman problem is NP-
complete. Theoretical Computer Science 4 (1977), pp. 237 244.
[Pa94] Ch. Papadimitriou: Computational Complexity, Addison-Wesley 1994.
[PY93] Ch. Papadimitriou, M. Yannakakis: The traveling salesman problem with
distances one and two. Mathematics of Operations Research 18 (1993),
1 11.
Semantical Counting Circuits?

Fabrice Noilhan1 and Miklos Santha2


1
Universite Paris-Sud, LRI, Bat. 490, 91405 Orsay, France
Fabrice.Noil [email protected]
2
CNRS, URA 410, Universite Paris-Sud, LRI, Bat. 490, 91405 Orsay, France
Miklos.Sant [email protected]

Ab tract. Counting functions can be de ned syntactically or seman-


tically depending on whether they count the number of witnesses in a
non-deterministic or in a deterministic computation on the input. In
the Turing machine based model, these two ways of de ning counting
were proven to be equivalent for many important complexity classes. In
the circuit based model, it was done for #P and #L, but for low-level
complexity classes such as #AC0 and #NC1 only the syntactical de -
nitions were considered. We give appropriate semantical de nitions for
these two classes and prove them to be equivalent to the syntactical ones.
This enables us to show that #AC0 is included in the family of count-
ing functions computed by polynomial size and constant width counting
branching programs, therefore completing a result of Caussinus et al
[CMTV98]. We also consider semantically de ned probabilistic complex-
ity classes corresponding to AC0 and NC1 and prove that in the case of
unbounded error, they are identical to their syntactical counterparts.

1 Introduction
Counting is one of the basic questions considered in complexity theory. It is a
natural generalization of non-determinism: computing the number of solutions
for a problem is certainly not easier than just deciding if there is a solution at
all. Counting has been extensively investigated both in the machine based and
in the circuit based models of computation.
Historically, the rst counting classes were de ned in Turing machine based
complexity theory. Let us call a non-deterministic Turing machine an NP-machine
if it works in polynomial time, and an NL-machine if it works in logarithmic
space. In the case of a non-deterministic machine, an accepting path in its com-
putation tree on a string x certi es that x is accepted. We will call such a path
a witness for x. The very rst, and still the most famous, counting class called
#P was introduced by Valiant [Val79] as the set of counting functions that map
a string x to the number of witnesses for x of some NP-machine. An analogous
de nition was later made by Alvarez and Jenner [AJ93] for the class #L: it con-
tains the set of counting functions that map x to the number of witnesses for x of
This research was supported by the SPRIT Working Group RAND2 No. 21726 and
by the French-Hungarian bilateral project Balaton, Grant 99013

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 87 101, 2000.

c Springer-Verlag Berlin Heidelberg 2000
88 Fabrice Noilhan and Miklos Santha

some NL-machine. These classes contain several natural complete problems: for
example computing the permanent of a matrix is complete in #P, whereas com-
puting the number of paths in a directed graph between two speci ed vertices
is complete in #L.
The so-called Gap classes were de ned subsequently to include functions
taking also negative values into the above model. GapP was introduced by Fen-
ner, Fortnow and Kurtz [FFK94] as the di erence of two functions in #P. The
analogous de nition for GapL was made independently by Vinay [Vin91], Toda
[Tod91], Damm [Dam91] and Valiant [Val92]. This later class has received con-
siderable attention, mostly because it characterizes the complexity of computing
the determinant of a matrix [AO96, ST98, MV97].
Still in the Turing machine model, there is an alternative way of de ning
the classes #P and #L, based on the computation of deterministic machines.
In the following discussion let us consider deterministic Turing machines acting
on pairs of strings (x y) where for some polynomial p(n), the length of y is
p( x ). We will say that the string y is a witness for x when the machine accepts
(x y), otherwise y is a non-witness. We will call a deterministic Turing machine
a P-machine if it works in polynomial time, and an L-machine if it works in
logarithmic space and it has only one-way access to y. Then #P (respectively
#L) can be de ned as the set of functions f for which there exists a P-machine
(respectively L-machine) such that f (x) is the number of witnesses for x. The
equivalence between these de nitions can be established if we interpret the above
deterministic Turing machines as a normal form, with simple witness structure,
for the corresponding non-deterministic machines, where the string y describes
the sequence of choices made during the computation on x. Nonetheless this
latter way of looking at counting has at least two advantages over the previous
one.
The rst advantage is that this de nition is more robust in the following
sense. Two non-deterministic machines, even if they compute the same relation
R(x), might de ne di erent counting functions depending on their syntactical
properties. On the other hand, if the de nition is based on deterministic ma-
chines, only the relation they compute is playing a role. Indeed, two deterministic
machines computing the same relation R(x y) will necessarily de ne the same
counting function independently from the syntactical properties of their compu-
tation. Therefore, from now on, we will refer to the non-deterministic machine
based de nition of counting as syntactical, and to the deterministic machine
based de nition as semantical.
The second advantage of the semantical de nition of counting is that prob-
abilistic complexity classes can be de ned more naturally in that setting. For
example PP(respectively PL) is just the set of languages for which there exists
a P-machine (respectively L-machine) such that a string x is in the language ex-
actly when there are more witnesses for x than non-witnesses. In the case of the
syntactical de nition of counting the corresponding probabilistic classes usually
are de ned via the Gap classes.
Semantical Counting Circuits 89

The above duality in the de nition of counting exists of course in other models
where determinism and non-determinism are meaningful concepts. This is the
case of the circuit based model of computation. Still, in this model syntactical
counting has received considerably more attention than semantical counting.
Before we discuss the reason for that, let us make clear what do we mean here
by these notions.
The syntactical notion of a witness for a string x in a circuit family was
de ned by Venkateswaran [Ven92] as an accepting subtree of the corresponding
circuit on x, which is a smallest sub-circuit certifying that the circuit’s output
is 1 on x. It is easy to show that the number of such witnesses is equal to the
value of the arithmetized version of the circuit on x. Let us stress again that
this number, and therefore the counting function de ned by a circuit, depends
heavily on the speci c structure of the circuit and not only on the function
computed by it. For example if we consider circuit C1 which is just the variable
x, and circuit C2 which consists of an OR gate whose both inputs are the same
variable x, then clearly these two circuits compute the same function. On the
other hand, on input x = 1, the counting function de ned by C1 will take the
value 1, whereas the counting function de ned by the circuit C2 will take the
value 2.
For the semantical notion of a witness we consider again families whose inputs
are pairs of strings of polynomially related lengths. As in the case of Turing
machines, y is a witness for x if the corresponding circuit outputs 1 on (x y).
Venkateswaran was able to give a characterization of #P and #L in the
circuit model based on the syntactical de nition of counting. His results rely
on a circuit based characterization of NP and NL. He has shown that #P is
equal to the set of counting functions computed by uniform semi-unbounded
circuits of exponential size and of polynomial algebraic degree; and #L is equal
to the set of counting functions computed by uniform skew-symmetric circuits
of polynomial size. Semantically #P can be characterized as the set of counting
functions computed by uniform polynomial size circuits.
In recent years several low level counting classes were de ned in the cir-
cuit based model, all in the syntactical setting. Caussinus et al.[CMTV98] have
de ned #NC1 , and Agrawal et al. [AAD97] have de ned #AC0 as the set of
functions counting the number of accepting subtrees in the respective circuit
families. In subsequent works, many important properties of these classes were
established [ABL98, AAB+ 99]. Although some attempts were made [Yam96], no
satisfactory characterization of these classes was obtained in the semantical set-
ting. The main reason for that is that by simply adding counting bits to AC0
or NC1 circuits, we fall to the all too powerful counting class #P [SST95, VW96],
and it is not at all clear what type of restrictions should be made in order to
obtain #AC0 and #NC1 .
The main result of this paper is such a semantical characterization of these
two counting classes. Indeed, we will de ne semantically the classes #AC0CO and
#NC1CO by putting some relatively simple restrictions on the structure of AC0
and NC1 circuits involved in the de nition, and on the way they might contain
90 Fabrice Noilhan and Miklos Santha

counting variables. Our main result is that this de nition is equivalent to the
syntactical de nition, that is we have
Theorem 1. #AC0 = #AC0CO and #NC1 = #NC1CO .
Put it another way, if standard AC0 and NC1 are seen as non-deterministic
circuit families in the syntactical de nition of the corresponding counting classes,
we are able to characterize their deterministic counterparts which de ne the
same counting classes semantically.
We also examine the relationship between #BR, the family of counting func-
tions computed by polynomial size and constant width counting branching pro-
grams, and counting circuits. While Caussinus et al. [CMTV98] proved that
#BR #NC1 , we will show
Theorem 2. #AC0 #BR.
Semantically de ned counting classes give rise naturally to the corresponding
probabilistic classes in the three usually considered cases: in the unbounded, in
the bounded and in the one sided error model. Indeed, we will de ne the prob-
abilistic classes PAC0CO , PNC1CO , BPAC0CO , BPNC1CO , RAC0CO and RNC1CO .
PAC0 and PNC1 were already de ned syntactically via #AC0 and #NC1 , and
we will prove for this model that our de nitions coincide with previous ones:
Theorem 3. PAC0CO = PAC0 and PNC1CO = PNC1
In the other two error models, previous de nitions were also semantical, but
without any restrictions on the way the corresponding circuits could use the
counting variables. We couldn’t determine if they coincide with ours, and we
think that this question is worth of further investigations. Nonetheless we argue
that because of their close relationship with counting branching programs, the
counting circuit based de nition might be the right one.
The paper is organized as follows: Section 2 contains the de nitions for se-
mantical circuit based counting. Section 3 exhibits the mutual simulations of
syntactical and semantical counting for the circuit classes AC0 and NC1 . Theo-
rem 1 is a direct consequence of Theorems 4 and 5 proven here. In section 4 we
deal with counting branching programs, and Theorem 2 will follow from The-
orem 6. Finally in section 5 we discuss the gap and random classes which are
derived from semantical counting circuits. Theorem 8 relating gap classes and
counting circuits will imply Theorem 3.

De nitions
In this chapter we de ne counting circuit families which will be used for the
semantical de nition of a counting function. Counting circuits have two types of
input variables: standard and counting ones. They are in fact restricted boolean
circuits, where the restriction is put on the way the gates and the counting
variables can be used in the circuits. First we will de ne the usual boolean
circuit families and the way they are used to de ne (syntactically) counting
Semantical Counting Circuits 91

functions, and then we do the same for counting circuit families. The names
circuit versus counting circuit will be used systematically this way in the
rest of the paper.
A bounded fan-in circuit with n input variables is a directed acyclic graph
with vertices of in-degree 0 or 2. The vertices of in-degree 0 are called inputs,
and they are labeled with an element of the set 0 1 x1 x1 xn xn The
vertices of in-degree 2 are labeled with the bounded AND or OR gate. There
is a distinguished vertex of outdegree 0, this is the output of the circuit. An
unbounded fan-in circuit is de ned similarly with the only di erence that non
input vertices can have arbitrary in-degree, and they are labeled with unbounded
AND or OR gate. A circuit family is a sequence (Cn )1 n=1 of circuits where Cn
has n input variables. It is uniform if its direct connection language is computed
in DLOGTIME. An AC0 circuit family is a uniform, unbounded fan-in circuit
family of polynomial size and constant depth. An NC1 circuit family is a uniform,
bounded fan-in circuit family of polynomial size and logarithmic depth.
A circuit C is a tree circuit if all its vertices have out-degree 1. A proof tree
in C on input x is a connected subtree which contains its output, has one edge
into each OR gate, has all the edges into the AND gates, and which evaluates
to 1 on x. The number of proof trees in C on x will be denoted by #PTC (x).
A boolean tree circuit family (Cn )1 n=1 computes a function f : 0 1

N if
0
for every x, we have f (x) = #PTC x (x). We denote by #AC (respectively by
#NC1 the class of functions computed by a uniform AC0 (respectively NC1 )
tree circuit family.
In order to introduce counting variables into counting circuits and to carry
out the syntactical restrictions, we use two new gates, SELECT and PADAND
gates. These are actually small circuits which will be built some speci c way from
AND and OR gates. The SELECT gates which use a counting variable to choose
a branch of the circuit will actually replace OR gates which will be prohibited
in their general form. The PADAND gates will function as AND gates, but they
will allow again the introduction of counting variables. They will actually x the
value of these counting variables to the constant 1.
We now de ne formally these gates. In the following we will denote single
boolean variables with a subscript such as v0 . Boolean vector variables will be
denoted without a subscript, such as v. We will also identify an integer 0
2k − 1 with its binary representation ( 0 k−1 )
The bounded fan-in SELECT gate will have 3 arguments. It is de ned by
SELECT1 (x0 x1 u ) = xu , and represented by OR(AND(x0 u ) AND(x1 u )).
For every k, the unbounded fan-in SELECTk gate has 2k + k arguments and
is de ned by SELECTk (x0 x2k −1 u0 uk−1 ) = xu . This gate is repre-
k
sented by the circuit OR2=0−1 (AND(x u = i)) where u = i stands for the circuit
ANDk−1 j=0 (OR(AND(uj ij ) AND(uj ij ))). The last gate can easily be extended
to m + k arguments for m < 2k as SELECTk (x0 xm−1 u0 uk−1 ) =
SELECTk (x0 xm 0 0 u0 uk−1 ). Clearly, SELECTk can be simu-
lated by a circuit of depth O(log k) containing only SELECT1 gates. The un-
bounded fan-in PADAND gate has at least two arguments and is de ned by
92 Fabrice Noilhan and Miklos Santha

PADAND(x0 u0 ul ) = AND(x0 u0 ul ). Its bounded fan-in equivalent


PADANDb can also have an arbitrary number of arguments, and in case of m
arguments is represented by a circuit of depth log m consisting of a balanced
binary tree of bounded AND gates. It will always be clear from the context if
we are dealing with the bounded or the unbounded PADAND gate.
We will de ne recursively unbounded fan-in counting circuits. There will be
two types of input variables: standard and counting ones.
De nition 1 (Counting circuit).

If C is a boolean tree circuit, then C is a counting circuit. All its variables


are standard.
If C0 C2k −1 are counting circuits and u0 uk−1 are input variables
which are not appearing in them, then SELECT(C0 C2k −1 u0 uk−1 )
is a counting circuit. The variables u0 uk−1 are counting variables.
If C0 Ck are counting circuits and they do not have any common count-
ing variables, then AND(C0 Ck ) is a counting circuit.
If C is a counting circuit and u0 ul are input variables, then
PADAND(C u0 ul ) is a counting circuit. The variables u0 ul are
counting variables.

Moreover, we require that no input variable can be counting and standard at


the same time.
Bounded counting circuits are de ned analogously, with k = 1 in all the con-
struction steps.
The set of all standard (respectively counting) variables of a circuit C will
be denoted SV(C) (respectively CV(C). Let C be a counting circuit with n
standard variables. The counting function #COC : 0 1 n N associated with
C is de ned as:

C(x) if CV(C) =
#COC (x) =
# u C(x u) = 1 if CV(C) =
A sequence (Cn )1
n=1 of counting circuits is a counting family if there exists
a polynomial p such that for all n, Cn has n standard variables and at most
p(n) counting variables. A family is uniform if its direct connection language is
computed in DLOGTIME. The counting function computed by a circuit family
is de ned as #COC x (x). Finally, the semantical counting classes are de ned as
follows: #AC0CO (respectively #NC1CO ) is the set of functions computed by a
uniform AC0 (NC1 ) family of counting circuits.

3 Circuits and Counting Circuits


3.1 Simulating Circuits by Counting Circuits
We will use a step-by-step simulation. We will de ne a function which maps
circuits into counting circuits by structural recursion on the output gate G of
Semantical Counting Circuits 93

the circuit. The de nition will be done for the unbounded case from which the
bounded case can be obtained by replacing the parameter k with 1 in all the
construction steps, and unbounded gates by bounded ones.

De nition 2 (the function). If G is a literal, then (G) = G and the


corresponding variable is standard. If G is an AND gate whose entries are the
circuits C0 Ck , then let the circuits C 0 be obtained from (C ) by renaming
counting variables so that i = j CV(C 0 ) CV(Cj0 ) = CV(C 0 ) SV(Cj0 ) = .
Then (C) = AND(C00 Ck0 ). If G is an OR gate whose entries are the cir-
cuits C0 C2k −1 , then the circuits C 0 are obtained from (C ) by renaming
counting variables so that i = j CV(C 0 ) CV(Cj0 ) = CV(C 0 ) SV(Cj0 ) = .
Let V = CV(C00 ) CV(C20 k −1 ) and V = V − CV(C 0 ). Let C 00 be de-
0
ned as PADAND(C V ), and let u0 uk−1 be counting variables such that
u0 uk−1 V = . Then (C) = SELECT(C000 C200k −1 u0 uk−1 ).

The next two lemmas will prove that the de nition of is correct and that the
functions computed by the corresponding circuit families are equal.

Lemma 1. If (Cn ) is a uniform AC0 (respectively NC1 family of circuits, then


( (Cn )) is a uniform AC0 (resp. NC1 family of counting circuits.

Proof. Throughout the construction, we assured that the entry circuits of an


AND gate do not have common counting variables. Clearly, no input variable
can be counting and standard at the same time.
Since there are a polynomial number of gates and for each gate, we introduced
a polynomial number of counting variables, the number of counting variables is
bounded by a polynomial. The uniformity of ( (Cn )) follows from the uniformity
of (Cn ). To nish the proof, we should consider the depth of the counting circuits.
In the unbounded case, ( (Cn )) is of constant depth since the SELECT gates
which replace the OR gates of the original circuit are of constant depth.
In the bounded case, let k be such that there are at most nk variables in Cn .
The depth of Cn is O(log n). Let us de ne d = max depth( (D)) where D is
a subcircuit of Cn of depth i. Then we have

d +1 3 + max(d k log n)

since the depth increases only when the output gate is an OR. Therefore, ( (Cn ))
is of logarithmic depth.

Lemma 2. For every circuit C, #PTC (x) = #CO (C) (x).

Proof. We will prove this by structural recursion on the output gate G of C. If


G is a literal, then by de nition, circuits and counting circuits de ne the same
counting function. If G is an AND gate then since for i = 0 k the vari-
ables in CV(C 0 ) are distinct, #CO (C) (x) = #COC (x), which is the same
as #CO (C ) (x) because C 0 was obtained from (C ) by renaming the vari-
ables. By the inductive hypothesis and the de nition of the proof tree model,
94 Fabrice Noilhan and Miklos Santha

this is equal to #PTC (x). If G is an OR gate then since the counting vari-
ables u0 uk−1 are distinct from the counting variables of the subcircuits,
#CO (C) (x) = #COC (x). For every i, #COC (x) = #COC (x) since the
PADAND gate xes all the counting variables outside V . This is the same value
as #CO (C ) (x) since C 0 was obtained from (C ) by renaming the variables.
The statement follows from the inductive hypothesis.

The two lemmas imply

Theorem 4. #AC0 #AC0CO and #NC1 #NC1CO .

3.2 Simulating Counting Circuits by Circuits

Let us remark rst that any counting circuit C can be easily transformed to
another counting circuit C 0 computing the same counting function such that if
PADAND(D0 u0 ul ) is a subcircuit of C 0 , then u0 ul CV(D0 ) = .
This is indeed true since if PADAND(D u0 ul ) is a subcircuit of C and if
for example u0 CV(D) then we can rewrite D with respect to u0 = 1 by
modifying SELECT and PADAND gates accordingly. For the rest of the paper,
we will suppose that counting circuits have been transformed this way.
We will use in the construction circuits computing xed integers which are
powers of 2. For l 0 the circuit A2l computing the integer 2l is de ned as
follows. A1 is the constant 1 and A2 is OR(1 1). For l 2, in the unbounded
case, A2l has a topmost unbounded AND gate with l subcircuits A2 . In the
bounded case, we replace the unbounded AND gate by its standard bounded
simulation consisting of a balanced binary tree of bounded AND gates. Clearly,
the depth of A2l in the bounded case is log l + 1.
We now de ne a function which maps counting circuits into circuits by
structural recursion on the output gate G of the counting circuit. Again, the
de nition will be done for the unbounded case, from which the bounded case
can be obtained by replacing the parameter k with 1.

De nition 3 (the function). If G is a literal, then (G) = G. If G is an


AND gate whose entries are C0 Ck , then (C) = AND( (C0 ) (Ck )).
If G is a PADAND whose entries are C0 and u0 ul , then (C) = (C0 ). If
G is a SELECT gate whose entries are C0 C2k −1 u0 uk−1 then set V =
CV(C0 ) CV(C2k −1 ) and V = V − CV(C ). We let C 0 = AND( (C ) A2 V )
and (C) = OR(C00 C20 k −1 ).

Again, we proceed with two lemmas to prove the correctness of the simulation.

Lemma 3. If (Cn ) is a uniform AC0 (respectively NC1 family of counting cir-


cuits, then ( (Cn )) is a uniform AC0 (resp. NC1 family of circuits.

Proof. In the construction, we get rid of PADAND and SELECT gates and of
the counting variables. We do not modify the number of standard variables. The
only step where we increase the size or the depth of the circuit is when SELECT
Semantical Counting Circuits 95

gates are replaced. Each replacement introduces at most a polynomial number


of gates. Therefore the size remains polynomial. Uniformity of ( (Cn )) follows
from the uniformity of (Cn ).
In the unbounded case, the depth remains constant since the circuits A2l
have constant depth. In the bounded case, we claim that every replacement of a
SELECT gate increases the depth by a constant. This follows from the fact that
depth(A2 V ) depth(C1− ) + 1 for i = 0 1 since a bounded counting circuit
with m counting variables has depth at least log(m + 1) . Therefore the whole
circuit remains of logarithmic depth.

Lemma 4. For every counting circuit C, #PT (C) (x) = #COC (x).

Proof. We will prove this by structural recursion on the output gate G of the
counting circuit. In the proof, we will use the notation of de nition 3. If G is
a literal, then by de nition, C and (C) de ne the same counting function. If
G is an AND gate then by de nition #PT (C) (x) = #PT (C ) (x). Since for
i=0 k the subcircuits C do not share common counting variables, using
the inductive hypothesis this is equal to #COC (x). If G is a PADAND gate
then from the de nition and the inductive hypothesis, we have #PT (C) (x) =
#COC0 (x). Since for all i, v CV(C0 ), we have #COC0 (x) = #COC (x). If G is
a SELECT gate then #PT (C) (x) = 2jV j #PT (C ) (x). Also, #COC (x) =
jV j
2 #COC (x) since the value of variables in V do not influence the value
of the circuit C . The result follows from the inductive hypothesis.

Theorem 5. #AC0CO #AC0 and #NC1CO #NC1 .

4 Branching Programs and Counting Circuits


Branching problems constitute another model for de ning low level counting
classes, and this possibility was rst explored by Caussinus et al. [CMTV98]. Let
us recall that a counting branching program is an ordinary branching program
[Bar89] with two types of (standard and counting) variables with the restriction
that every counting variable appears at most once on each computation path.
Given a branching program B(x u) whose standard variables are x, the counting
function computed by it is

B(x) if B does not have counting variables,


#B(x) =
# u B(x u) = 1 otherwise.

Finally #BR is the family of counting functions computed by DLOGTIME-


uniform polynomial size and constant width counting branching programs. Causs-
inus et al. [CMTV98] could prove that #BR #NC1 , but they left open the
question if the inclusion was proper. Our results about counting circuits enable
us to show that #AC0 #BR. Our proof will proceed in two steps. First we
show that #AC0CO is included in #BR and then we use theorem 5 to conclude.
96 Fabrice Noilhan and Miklos Santha

Theorem 6. #AC0CO #BR

Proof. We de ne a function which maps counting circuits into synchronous


counting branching programs such that for every counting circuit C, #COC =
# (C). The de nition will be done by structural recursion on the output gate
G of the counting circuit, and the proof of this statement can be done without
di culty also by recursion.
If C is a boolean tree circuit, then (C) is a width-5 polynomial size branching
program provided by Barrington’s result which recognizes the language accepted
by C. If C0 and C1 are counting circuits and u is a new counting variable then
(SELECT(C0 C1 u)) is the branching program whose source is labelled by
u, and whose two successors are the source nodes of respectively (C0 ) and
(C1 ). If C0 and C1 are counting circuits without common variables, then by
increasing the width of (C0 ) by 1, we ensure that it has a unique 1-sink. Then
(AND(C0 C1 )) is the branching program whose source is the source of (C0 )
whose 1-sink is identi ed with the source of (C1 ). If C is a counting circuit
and u0 uk are counting variables, then (PADAND(C u0 uk )) = (C 0 )
0
where C is obtained from C by setting u0 uk to 1.
If (Cn )1 0
n=1 is an AC family of counting circuits, then ( (Cn )) has con-
stant width since we double the width only a constant number of times for
the SELECT gates. Also, the uniformity of the family follows from the unifor-
mity of (Cn ).

5 Gap and Random Classes via Semantical Counting

In this section, we will point out another similarity between semantical counting
circuits and deterministic Turing machine based counting: we will de ne proba-
bilistic classes by counting the fraction of assignments for the counting variables
which make the circuit accept. We will prove that in the unbounded error case,
our de nitions coincide with the syntactical de nition via Gap classes. For the
proof, we will need the notion of an extended counting circuit which may contain
OR and PADOR gates without changing the family of counting functions they
can compute. In the bounded error and one sided error models we could not
determine if our de nitions and the syntactical ones are identical.

5.1 xtented Counting Circuits

By de nition, the unbounded fan-in PADOR(x0 u0 ul ) gate has at least two


arguments and is de ned as OR(x0 u0 ul ). Its bounded fan-in equivalent is
represented by a circuit of depth log(l + 2) , consisting in the usual bounded
expansion of the above circuit. Similarly to the case of the PADAND gate, from
now on we will suppose without loss of generality that if PADOR(D u0 ul )
is a subcircuit of an extended counting circuit, then CV(D) u0 ul = .
An unbounded fan-in extended counting circuit is a counting circuit with the
following additional construction steps to the De nition 1.
Semantical Counting Circuits 97

if C0 Ck are extended counting circuits and they do not have any com-
mon counting variable then OR(C0 Ck ) is an extended counting circuit.
if C is an extended counting circuit and u0 ul are input variables,
then PADOR(C u0 ul ) is an extended counting circuit. The variables
u0 ul are counting variables.
if C is a counting circuit, C is an extended counting circuit.
We obtain the de nition for the bounded fan-in case by taking again k = 1. We
will denote by #AC0 CO (respectively #NC1 CO ) the set of functions computed
by a uniform AC0 (respectively NC1 ) family of extended counting circuits. The
following theorem whose proof will be given in the appendix shows that extended
counting circuits are no more powerful than regular ones.
Theorem 7. #AC0 CO = #AC0CO and #NC1 CO = #AC0CO .
Proof. Let C be an extended counting circuit. We will show that there exists a
counting circuit computing #COC whose size and depth is of the same order of
magnitude. First observe that negation gates in C can be pushed down to the
literals. For this, besides the standard de Morgan laws, one can use the following
equalities whose veri cation is straightforward:

SELECT(C0 C2k −1 u0 uk−1 ) = SELECT(C0 C2k −1 u0 uk−1 )

PADAND(C0 (u0 ul )) = PADOR(C0 (u0 ul ))


PADOR(C0 (u0 ul )) = PADAND(C0 (u0 ul ))
Then one can get rid of the OR gates by recursively replacing OR(C0 C2k −1 )
with SELECT(C0 C2k −1 u0 uk−1 ) where u0 uk−1 CV(C0 )
CV(C2k −1 ) = . Since C0 C2k −1 do not have any common counting variables,
this does not change the counting function computed by the counting circuit.
Finally we show how to extend the function of De nition 2 to PADOR gates
while keeping the same counting function as in Lemma 2. We de ne

(PADOR(C0 (u0 ul ))) = OR(A2 CV( (C0 )) (2l+1 −1) (C0 ))

Then, #PT (C) (x) = #PT (C0 ) (x) + (2jCV( (C0 ))j (2l+1 − 1)) which by the
inductive hypothesis is #COC0 (x)+(2jCV( (C0 ))j (2l+1 −1)) This in turn equals to
#COC (x) by de nition of the PADOR gate and since u0 ul are not counting
variables of C0 .
Since all these transformations may increase the size or the depth only by a
constant factor, the statement follows.

5.2 Gap Classes


As usual, for any counting class #C, we de ne the associated Gap class GapC.
A function f is in GapC i there are two functions f1 and f2 in #C such that
f = f1 − f2 . The following theorem will be useful for discussing probabilistic
complexity classes.
98 Fabrice Noilhan and Miklos Santha

Theorem 8. Let f be a function in GapAC0 (respectively GapNC1 . Then


there is an AC0 (resp. NC1 uniform family of counting circuits (Cn ) such that
2 f (x) = #COC x (x) − #COC x (x).

Proof. In fact, we will construct a family of extended counting circuits with the
required property. Then, the result will follow from theorem 7. Let #C be one of
the classes #AC0 or #NC1 , and let f be a function in GapC. Then there exist
two functions in #C, f1 and f2 such that f = f1 − f2 . Fix an entry x of length
n. Let us take two uniform #C families of counting circuits which compute f1
and f2 , and let respectively D1 and D2 be the counting circuits in these families
which have n input variables.
Let V1 = CV(D1 ), V2 = CV(D2 ) and m = V1 + V2 . We will suppose
without loss of generality that V1 V2 = (by renaming the variables if neces-
sary). We de ne D10 = PADAND(D1 V2 ) and D20 = PADAND(D2 V1 ). Let t be
a counting variable such that t V1 V2 and de ne Cn = SELECT(D10 D20 t).
The counting circuit family (Cn ) is uniform and is in C since its depth and size
are changed only up to a constant with respect to the families computing f1 and
f2 .
We rst claim that the counting function associated with Cn , on entry
x, computes f1 (x) + (2m − f2 (x)). First, let us observe that #COD (x) =
2
2m − #COD2 (x). Therefore, since t does not appear in D10 and D20 , we have
#COCn (x) = #COD1 (x)+(2m −#COD2 (x)). By the de nition of the PADAND
gate, the claim follows.
The number of variables in Cn is m+1. Therefore, #COCn (x)−#COCn (x) =
2 #COCn (x) − 2m+1 . By the previous claim, this is 2f (x).

5.3 Random Classes via Semantical Counting

Another advantage of semantical counting circuits is that probabilistic complex-


ity classes can easily be de ned in this model. The de nition is analogous to the
de nition of probabilistic classes based on Turing machines’ computation: for a
given input, we will count the fraction of all settings for the counting variables
which make the circuit accept. We will de ne now the usual types of probabilis-
tic counting classes in our model and compare them to the existing de nitions.
For a counting circuit C we de ne PrCO (C(x)) by:

C(x) if CV(C) =
PrCO (C(x)) =
# v C(x v) = 1 2jCV(C)j if CV(C) =

Let now C be one of the classes AC0 or NC1 . Then PCCO is the family of
languages for which there exists a uniform C family of counting circuits (Cn ) such
that x L i PrCO (Cjxj (x)) > 1 2. Similarly BPCCO is the family of languages
for which there exists a uniform C family of counting circuits (Cn ) such that
x L i PrCO (Cjxj (x)) > 1 2 + for some constant > 0. Finally, RCCO is
the family of languages for which there exists a uniform C family of counting
Semantical Counting Circuits 99

circuits (Cn ) such that if x L then PrCO (Cjxj (x)) 1 2 and if x L then
PrCO (Cjxj (x)) = 0.
Let us recall that PC was de ned [AAD97, CMTV98] as the family of lan-
guages L for which there exists a function f in GapC such that x L i f (x) > 0.
Theorem 8 implies that these de nitions coincide with ours.
Bounded error and one sided error circuit based probabilistic complexity
classes were de ned in the literature for the classes in the AC and NC hierarchy
[Weg87, Joh90, Coo85]. These are semantical de nitions in our terminology, but
unlike in our case, no special restriction is put on the way counting variables are
introduced. To be more precise, let a probabilistic circuit family (Cn ) as a uni-
form family of circuits where the circuits have standard and probabilistic input
variables and the number of probabilistic input variables is polynomially related
to the number of input variables. For any input x, the probability that such
a family accepts x is the fraction of assignments for the probabilistic variables
which make the circuit Cjxj accept x. Then the usual de nition of BPC and RC
is similar to that of BPCCO and RCCO except that probabilistic circuit families
and not counting circuit families are used in the de nition.
The robustness of our de nitions is underlined by the fact that the bounded
error (respectively one-sided error) probabilistic class de ned via constant depth
and polynomial size branching programs lies between the classes BPAC0CO and
BPNC1CO (respectively RAC0CO and RNC1CO ). This follows from the inclusions
#AC0CO #BR #NC1CO , and from the fact that counting branching programs
are also de ned semantically.
As we mentioned already, it is known [SST95, VW96] that if PAC0 is de-
ned via probabilistic and not counting circuit families, then it is equal to PP.
Therefore, it is natural to ask what happens in the other two error models: is
BPCCO = BPC and is RCCO = RC? If not, then we think that since branching
programs form a natural model for de ning low level probabilistic complexity
classes, the above result indicates that counting circuits might constitute the
basis of the right de nition.

6 Conclusion

Circuit based counting functions and probabilistic classes can be de ned seman-
tically via counting circuits families. These circuits contain additional counting
variables whose appearances are restricted. When these de nitions are equivalent
to the syntactical ones, we can rightly consider the classes robust. In the opposite
case, we think that this is a serious reason for reconsidering the de nitions.
Let us say a word about the restrictions we put on the counting variables.
They are essentially twofold: rstly they can not appear in several subcircuits
of an AND gate and secondly they can be introduced only via the SELECT
and PADAND gates. Of these restrictions, the second one is purely technical.
Indeed, any appearance of a counting variable u can be replaced by a subcircuit
SELECT(0 1 u) without changing the counting function computed by the cir-
100 Fabrice Noilhan and Miklos Santha

cuit. On the other hand, the rst one is essential: without it, #AC0 would be
equal to #P.

References
[AAB+ 99] . Allender, A. Ambainis, D.M. Barrington, S. Datta, and H. LeThanh.
Bounded depth arithmetic circuits: Counting and closure. In Proceed-
ings of the 26th International Colloquium on Automata, Languages and
Programming, pages 149 158, 1999.
[AAD97] M. Agrawal, . Allender, and S. Datta. On T C 0 , AC 0 , and arithmetic
circuits. In Proceedings of the 12th Annual I Conference on Compu-
tational Complexity, pages 134 148, 1997.
[ABL98] A. Ambainis, D.M. Barrington, and H. LeThanh. On counting AC 0 cir-
cuits with negated constants. In Proceedings of the 23th ACM Symposium
on Mathematical Foundations of Computer Science, pages 409 417, 1998.
[AJ93] Alvarez and Jenner. A very hard log-space counting class. Theoretical
Computer Science, 107:3 30, 1993.
[AO96] . Allender and M. Ogihara. Relationships among PL, #L, and the de-
terminant. RAIRO, 30:1 21, 1996.
[Bar89] D. A. Barrington. Bounded-width polynomial-size branching programs
recognize exactly those languages in N C 1 . Journal of Computer and Sys-
tem Sciences, 38:150 164, 1989.
[CMTV98] H. Caussinus, P. McKenzie, D. Therien, and H. Vollmer. Nondeterministic
N C 1 computation. Journal of Computer and System Sciences, 57:200 212,
1998.
[Coo85] S.A. Cook. A taxonomy of problems with fast parallel algorithms. Infor-
mation and Control, 64:2 22, 1985.
[Dam91] C. Damm. DET = L#L . Informatik-Preprint, Fachbereich Informatik der
Humboldt-Universität zu Berlin 8, 1991.
[FFK94] S. A. Fenner, L. J. Fortnow, and S. A. Kurtz. Gap-de nable counting
classes. Journal of Computer and System Sciences, 48(1):116 148, 1994.
[Joh90] D.S. Johnson. A catalog of complexity classes. Handbook of Theoretical
Computer Science, A:67 161, 1990.
[MV97] M. Mahajan and V. Vinay. Determinant: Combinatorics, algorithms, and
complexity. Chicago Journal of Theoretical Computer Science, December
1997.
[SST95] S. Saluja, K. V. Subrahmanyam, and M. N. Thakur. Descriptive complex-
ity of #P functions. Journal of Computer and System Sciences, 50(3):493
505, 1995.
[ST98] M. Santha and S. Tan. Verifying the determinant in parallel. Computa-
tional Complexity, 7:128 151, 1998.
[Tod91] S. Toda. Counting problems computationally equivalent to computing the
determinant. Tech. Rep. CSIM 91-07, 1991.
[Val79] L.G. Valiant. The complexity of computing the permanent. Theoretical
Computer Science, 8:189 201, 1979.
[Val92] L.G. Valiant. Why is boolean complexity theory di cult? London Math-
ematical Society Lecture Notes Series, 16(9), 1992.
[Ven92] H. Venkateswaran. Circuit de nitions of nondeterministic complexity
classes. SIAM Journal on Computing, 21:665 670, 1992.
Semantical Counting Circuits 101

[Vin91] V. Vinay. Counting auxiliary pushdown automata and semi-unbounded


arithmetic circuits. In Proceedings of the 6th Annual I Conference on
Structure in Complexity Theory, pages 270 284, 1991.
[VW96] H. Vollmer and K.W. Wagner. Recursion theoretic characterizations of
complexity classes of counting functions. Theoretical Computer Science,
163(1 2):245 258, 1996.
[Weg87] I. Wegener. The complexity of boolean functions. Wiley - Teubner series
in computer science, 1987.
[Yam96] T. Yamakami. Uniform AC 0 counting circuits. Manuscript, 1996.
The Hardness of Placing Street Names in a
Manhattan Type Map?

Sebastian Seibert and Walter Unger

Dept. of Computer Science I (Algorithms and Complexity), RWTH Aachen,


Ahornstra e 55, 52056 Aachen, Germany
seibert,quax @i1.informatik.rwt -aac en.de

Ab tract. Map labeling is a classical key problem. The interest in this


problem has grown over the last years, because of the need to churn
out di erent types of maps from a growing and altering set of data. We
will show that the problem of placing street names without conflicts in
a rectangular grid of streets is NP-complete and APX-hard. This is the
rst result of this type in this area. Further importance of this result
arises from the fact that the considered problem is a simple one. ach
row and column of the rectangular grid of streets contains just one street
and the corresponding name may be placed anywhere in that line.

1 Introduction and De nitions

The street layout of some modern cities planned on a drawing table are often
right angled. In a drawing of such a map the street names should be placed
without conflict. Thus each name should be drawn within the rectangular area
without splitting or conflicting with other names. See Figure 1 for an example.
The names are indicated by simple lines.
A map consists of an N Nv grid with N columns and Nv rows. A horizontal
line, i.e. a horizontal street name, is given by the pair (i l) where i (1 i Nv )
indicates the row of the street and l (1 l N ) the length of that name. The
vertical lines are given in a similar way, by indicating their column and height.
A placement in a map assigns to every line (street name) a position to place
the rst character in. If the rst character of a horizontal line (i l) is placed in
column s (1 s N − l + 1), then the name will occupy the following space
in the grid: (j i) s j s + l − 1 . Vertical lines are placed analogously. A
conflict in a placement is a position occupied by both, a vertical and a horizontal
line.
Given a map and a set of lines to be placed in the map, StrP denotes the
problem to decide whether the given lines may be placed conflict-free. We will
show that this problem is NP-complete. Max-StrP is the problem to nd a place-
ment that maximizes the number of lines placed without conflict, which will

Supported by DFG Project HR 14/5-1 Zur Klassi zierung der Klasse praktisch
lösbarer algorithmischer Aufgaben

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 102 112, 2000.

c Springer-Verlag Berlin Heidelberg 2000
The Hardness of Placing Street Names in a Manhattan Type Map 103

1
2
3
4
5

Nv
1 2 3 4 5 6 7 8 Nh

Fig. 1. A map with street names

be shown to be APX-hard below. An overview on approximation problems and


hardness of approximation proofs can be found in [Ho96, MPS98].
Map labeling problems have been investigated intensively in the last twenty
years. Most results are in the eld of heuristic and practical implementation
[WS99]. The known complexity results are about restricted versions of the map
labeling problem. In [KI88, FW91, MS91, KR92, IL97], NP-completeness results
are presented for the case that each label has to be attached to a xed point.
The type of label and alignment varies in these papers. For other models of the
map labeling problem see [WS99].
Note that this type of map labeling was motivated by practical applications
and introduced in [NW99]. There are algorithms given for solving StrP which run
for some special cases in polynomial time and perform reasonable in practical
applications. With this respect, the APX-hardness is even more surprising. For
the harder problem to place lines on a cylindrical map, there is also a proof of
NP-completeness in [NW99] which relies essentially on the cylindrical structure.
Note that our notation di ers from [NW99] where for instance a vector of
street-lengths for all rows and columns is given as input. But for showing APX-
hardness by a reduction from SAT, we need to describe e ciently a map of
order n2 n having only O(n) lines to be placed, if n is the size of the formula.
Our reduction is based on the following result. Please note that in the for-
mulae constructed in the proof of that theorem, every variable is used at least
twice1 .

1
This is not surprising at all: variables used at most once in a formula can be assigned
a truth value trivially, hence their inclusion would rather make the problem more
tractable, not harder.
104 Sebastian Seibert and Walter Unger

Theorem 1 (J. Hastad [Ha97]).


For every small > 0, the problem to distinguish SAT instances that can be
satis ed from those where at most a fraction of 7 8 + of the clauses can be
satis ed at the same time, (1 78 + ) − SAT for short, is NP-hard.

The construction and proof of NP-completeness of StrP will be presented in


Section 2. Section 3 contains the proof for the APX -completeness of Max-StrP
and Section 4 gives the conclusions.

Map Construction and NP-Hardness

In this section, we give a reduction from SAT to StrP which we use for showing
NP-hardness as well as APX-hardness. The proof of NP-hardness comes imme-
diately with the construction. For the proof of APX-hardness of StrP only a
small construction detail has to be added. Thus we give in this section the full
construction, and we prove the APX-hardness of StrP in the next section.
Assume we have a SAT formula consisting of clauses c1 cl over vari-
ables x1 xm . ach clause ci is of the form zi 1 zi 2 zi 3 , the zi j being
from x1 xm x1 xm . We assume w.l.o.g. that each variable occurs
at least twice.

Variables Mirror neg. occurr. Clauses

stpvar var
1 a stp1 b ··· stpvar mir
m bstpi j ··· stpmir
i0 j 0
stpc1 1 stpc1 2 stpc1 3 ··· stpcl 3

Fig. . The outline of the map

First we give an informal description of the proof idea. We want to construct


a map M out of . Let n = 3l + m. M will have height Nv = 14n and width
N 36n2 . It will be split into several vertical stripes, which means that there
are several vertical lines of full height Nv . For clear reference, we use names for
the stripes. The general picture of M is shown in Figure 2.
It consists of three groups of vertical stripes. To the left, we have a pair of
vertical stripes for each variable. Here, the placement of lines corresponds to
assigning a certain value to the respective variable. The rightmost part contains
for each clause a triple of vertical stripes such that a non-conflicting placement
of all lines in these stripes can be made only if there is a clause satisfying setting
of at least one of its variables represented in the leftmost part. The middle
The Hardness of Placing Street Names in a Manhattan Type Map 105

part, called mirror negative occurrences is necessary to connect the other two
properly. It will be explained below.
To ease the description we will just de ne the width of these stripes and
further lines added within the strips. Thus the position of the separating vertical
lines will be implicitly de ned. The width of these stripes varies between 6n and
4n − l > 3n. The position of vertical lines put within such a stripe will be given
relative to the separating vertical lines. Let stp be a stripe of width w surrounded
by vertical lines at position c and c + w + 1. If we say a vertical line is put at
position i (1 i w in stripe stp, this line will be put in column c + i of the
whole map.
To move the information around in M , our main tool are horizontal lines
which t into the above mentioned stripes. These horizontal lines will be ordered
(from top and bottom rows towards the middle) according to their length. A line
of width 6n − j will be in row n + j or Nv − n − j, where 0 j < 3n. The reason
for leaving the top and bottom n rows unused will become clear in Section 3.
We denote by H(j) a horizontal line of width 6n − j put in row n + j and by
H(j) a horizontal line of width 6n − j put in row Nv − n − j.

Fig. 3. A vertical stripe

In the middle of a typical vertical stripe of width 6n − j, we will put a


vertical line of height Nv − 1 − (n + j). Consequently, either H(j) or H(j) can
be placed in this stripe, by placing the middle vertical line either at the lowest
position, opening row n + j for H(j), or at the highest position, opening row
Nv − n − j + 1 for H(j). Furthermore, no other horizontal line H(j 0 ) or H(j 0 )
(0 j 0 < 3n j 0 = j) can be placed in that stripe. ither it is wider than the
whole stripe, or if it is smaller, it will be in a row which is already blocked by
the middle vertical line, see Figure 3.
There will be some exception where we may place several horizontal lines in
the same stripe. But as we will see, the overall principle will remain una ected.
The details of the construction are shown in Figure 4. Here we have solid
lines depicting lines drawn in their de nite position or in one of their only two
possibly non-conflicting positions. These positions are always limited to the re-
spective vertical stripe. Dashed lines are used for the horizontal lines which are
the essential tool to connect the di erent parts of the map. They have always
106 Sebastian Seibert and Walter Unger

two possibly non-conflicting positions in two di erent parts of the map. For some
of them, you can see both positions in Figure 4. Finally, there is the dotted line
shown in all of its three possibly non-conflicting positions in the rightmost part.

6n 6n-5 6n-k 6n-w2


3n 5n-k 3n
n hocc

vic j
n+5 hvar
1t

hi j
2n+k hocc
ij n+w2

v1var
b

v1var
a
vimir
j

vimir
j

H-3n-k hci

viocj
H-n-5
hvar hocc
H-n-3 hocc 1b H-n-k ij

3n 3n 4n-k
stpvar
1a stpvar
1b stpmir
ij stpci 1 stpci 2 stpci 3
( (b (c
Fig. 4. Components of the map

The Variable Part


More precisely, we start constructing M by building a pair of vertical stripes
stpvar var
i a stpi b as in Figure 4 (a) for each variable xi (1 i m). Assume xi
Pi
occurs si times in . Furthermore, let ti = i + j=1 sj , and t0 = 0. Stripe
stpvar var
i a is set to width wi a = 6n − ti−1 and the width of stripe stpi b is wi b =
6n − ti + 1 = 6n − ti−1 − si . You will see that i = 1 and si = 5 gives a picture
The Hardness of Placing Street Names in a Manhattan Type Map 107

as in Figure 4 (a). Note that in Figure 4, numbers above and below the map
give the amount of free space between vertical lines whereas numbers to the left
mark the row a horizontal line is put in.
Now we put vertical lines vivar var
a and vi b in the middle of the stripes stpi a
var
var
and stpi b to prevent unwanted placement of horizontal lines. Since all we need
is that at most 3n columns are left free to either side (all horizontal lines will
be wider), it has only to be roughly the middle. Thus we put vertical lines of
height Nv − n − ti in both stripes at the respective position 3n + 1.
These are complemented by horizontal lines var i t = H(ti − 1) and i b =
var

H(ti − 1). Together, these lines construct a switch as depicted in Figure 5.

(a (b

Fig. 5. Variable switch

Our general principle of stripe construction explained above assures that the
horizontal line varit
var
i b must not be placed elsewhere without conflict. Thus,
Figure 5 shows the essentially only two possible non-conflicting placements of the
lines built so far. Here, it is unimportant which placement exactly a horizontal
line H(ti − 1) or H(ti − 1) gets in the left stripe stpvaria .
Next, we add a horizontal line for each occurrence of the variable xi . If zj j 0
contains the kth occurrence of xi in , then occ j j 0 represents that occurrence.
If zj j 0 = xi then occjj 0 = H(t i−1 + k), on the other hand, if zj j 0 = xi , then
occ
j j 0 = H(ti−1 + k). These are the dashed lines of Figure 4 (a).
If the switch is placed as in Figure 5 (a) all lines corresponding to posi-
tive occurrences of xi can be placed without conflict in stripe stpvar i a . The lines
corresponding to negative occurrences cannot be placed without conflict in the
switch since stripe stpvar var
i a is blocked and the stripe stpi b is narrower than all
of these lines. This placement will correspond to setting xi = 1. Similarly, the
other possible placement of the switch corresponds to setting xi = 0.
We will end up using just n = 3l + m di erent widths of horizontal lines,
from 6n to 5n + 1. Thus, the next step can be started by using width 5n.
108 Sebastian Seibert and Walter Unger

The Middle Part

The aim of the middle part is to mirror the horizontal lines representing the
negative occurrences of variables into the upper half of the map. This is done by
using a stripe stpmir
i j as depicted in Figure 4 (b). Assume the occurrence under
consideration is zi j = xu .
The new stripe stpmir
i j has the width of the horizontal line to be mirrored. Let
that line be occ
ij = H(k), 0 k < n. The width of stripe stpmiri j is 6n − k. Now
mir
we add inside the stripe a vertical line vi j at position (5n − k) + 1 of height
Nv − 1 − (n + j)). This reduces the width of the stripe for all inner rows from
6n − k to 5n − k. Inside this reduced stripe, we put new lines iocc j = H(n + k)
and a vertical line vimir
j at position 3n + 1 of height Nv − 1 − (2n + j)).
Let us look at the idea of this construction. If xu is set to 0 (i.e. the lines
var var var var occ var
ut u b vu a vu b are placed accordingly), i j can be placed in stpu a . Then
vimir occ
j can be placed in lowest position. This again allows to place i j above it.
mir
If xu is set to 1, occ mir
i j must be placed in stripe stpi j . Then both vi j and
vimir occ
j are pushed upwards which makes it impossible to place i j in stpi j
mir

without conflict.
The whole middle part of the map is made up out of stripes like that. Note
that the new horizontal lines are all of di erent width since so where the original
ones. We just reduce the width by n.

Stripes for Clauses

Finally, we go into constructing a triple of stripes stpci 1 stpci 2 stpci 3 for each
clause ci . Let ci = zi 1 zi 2 zi 3 , and let 4n − j > 3n be a new width, not used
before. Consider the horizontal lines i 1 i 2 i 3 representing those occurrences
in the upper half of rows. That means i j = occ i j if zi j is a positive occurrence
of a variable, and i j = occij if it is a negative occurrence (for j 1 2 3 ). For
speaking about the width of the new stripes, let i j = H(wj ) for j 1 2 3 .
For each of the three occurrences, we construct a stripe stpci 1 i 1 2 3
analogously to those described for the middle part, except that we use as new
line width for all three stripes the same value 4n − k. Consequently, only one
new horizontal line ci = H(2n + k) is created. The width of stripe stpci j is wj
for j 1 2 3 .
ach of the three strips will have new vertical lines are vic j of height Nv −
1 − n − wj at position 4n − k + 1 (reducing the width to 4n − k) and vic j of height
Nv − 1 − 3n − k at position 3n (the middle line), see Figure 4 (c).
Now the overall e ect of this is as follows. Assume one of the literals zi j in
a clause ci is zi j = xu , and xu is set to 1, represented by placing the lines of
the variable switch for xu as described above. Then the corresponding horizontal
var
line i j = occi j can be placed in stpu a . This leaves free the place of that line in
the corresponding stripe stpci j of the clause. The vertical lines vic j vic j of that
stripe can be placed in highest position, and nally, the unique horizontal line
The Hardness of Placing Street Names in a Manhattan Type Map 109

c
i can be placed in that stripe. Similarly, assume zi j = xu , and xu is set to 1,
represented by placing the lines of the variable switch for xu as described above.
Then occ var occ mir
i j can be placed in stpu a , and i j = i j can be placed in stpi j after
placing vimir mir
j and vi j in lowest position. Again, this leaves free the place of line
c c
i j in the corresponding stripe stpi j of the clause, and nally, i can be placed
in that stripe.
If on the other hand none of the three variables is set to ful ll the clause,
that causes all three horizontal lines i 1 i 2 i 3 to be placed in their re-
spective stripes stpci 1 stpci 2 stpci 3 . But this pushes down the vertical lines
vic 1 vic 1 vic 2 vic 2 vic 3 vic 3 which results in the impossibility to place the clause
representing line ci conflict free.
Overall, each non-satis ed clause corresponds to an unavoidable conflict. So
far, we have shown our rst result.

Theorem . StrP is NP-hard

3 A Bound of Approximability

In this section we will show that the previous construction can be used to obtain
thresholds on approximating StrP.

Theorem 3. For every small > 0, the problem to distinguish StrP instances
that can be placed conflict-free from those where at most a fraction of 223
224 + of
224 + ) − StrP for short, is NP-hard.
the lines can be placed without conflict, (1 223

Corollary 1. For every small > 0, there is no polynomial time ( 224 223 − )-
approximation algorithm for Max-StrP unless P=NP, i.e. Max-StrP is APX-hard.

Proof (of Theorem 3 . In view of Theorem 1, it is su cient to show that the


above construction satis es the following claims.

1. From a formula containing l clauses, the above construction yields, in


polynomial time, a map M containing at most 28l lines (if every variable
is used at least twice).
2. There exists a polynomial time procedure Pa that works on a map M as
follows. Given a placement p where m horizontal lines have conflicts, Pa
outputs a placement p0 where at most m horizontal lines have conflicts, such
that each has only one conflict. Moreover, all horizontal lines with conflicts
are of the type ci constructed in the last part of the above construction.
3. There exists a polynomial time procedure Pb that works on a map M as
follows. Given a placement p0 as generated by Pa with m conflicts, Pb gen-
erates an assignment to the variables of such that at most m clauses of
are not satis ed.
110 Sebastian Seibert and Walter Unger

Then, assuming we would have an algorithm A deciding (1 223 224 + ) − StrP


in polynomial time, we could get one deciding (1 78 + 28 ) − SAT in polynomial
time as follows. (Note that we do not need the fact that Pa and Pb are e cient.)
Given (1 78 + 28 ) − SAT instance with l clauses, construct M and apply
the assumed decision algorithm.
If is satis able, there is a conflict-free placement in M as we have already
seen in Section 2.
If on the other hand at most a fraction 78 + 28 of the clauses of where
satis able, we look at M . Assume p to be a placement where a maximal number
of lines in M are placed without conflict. Let m be the number of horizontal
lines having a conflict under placement p. That is, the fraction of lines which
could be placed without conflict in M is at most 28l−m 28l .
Now we apply procedures Pa and Pb , getting an assignment satisfying at
least l − m out of l clauses in . By assumption about , we have
 
l−m 7 1
+ 28 that is, m − 28 l
l 8 8

and hence 
28l − m 28l − 1
8 − 28 l 223
= +
28l 28l 224
Thus the result of algorithm A would decide (1 7
8 +28 )− SAT, in contradiction
to Theorem 1.
It remains to prove the above claims.

1. As mentioned in Section 2, there are at most 32 l variables if we assume


every variable to be used at least twice. Furthermore, we have exactly 3l
occurrences of variables and l clauses.
Constructing the leftmost part of M , we have invented 6 lines per variable
and 1 per occurrence which gives at most 12l lines. In the middle part, we
need 4 new lines per negative occurrence, that is at most 6l lines (remember
that at most half of the occurrences are negative). Finally, in the rightmost
part of M , we use 9 new vertical and one new horizontal line per clause,
being 10l new lines.
Overall, there are at most 28l lines to be placed in M .
2. We use the names of the lines and stripes as given in Section 2.
The basic idea of Pa is that each horizontal line is best placed in one of the
vertical stripes made for it . More precisely, Pa works by modifying p as
follows.
(a) For each variable xu , place the lines vuvar var
a vu b
var
ub
var
u t in one of the
two possible ways they may be placed without conflict among each other
within the two stripes stpvar var
u a stpu b for xu . Which one of the two pos-
sibilities is taken, is decided in such a way that the minimal number of
conflicts between vuvar occ
a and those lines i j (placed as in p) occurs where
occ
zi j is an occurrence of xu , and i j is placed completely within stripe
stpvar
u a.
The Hardness of Placing Street Names in a Manhattan Type Map 111

(b) Place every line occ c


i j which still has a conflict in stripe stpi j if zi j is a
mir
positive occurrence, and in stpi j if zi j is a negative occurrence.
(c) For each negative occurrence zi j , place vimir mir
j and vi j in their highest
occ mir
position if i j is placed in stpi j , and in lowest position otherwise.
(d) If vimir
j vimir occ c
j are placed in highest position, place i j in stpi j . Otherwise,
occ mir
place i j in stpi j .
(e) If i j is placed in stripe stpci j , place vic j and vic j in their lowest position,
otherwise in their highest position.
(f) It ci still has a conflict, place it in any of the stripes stpmir mir
i 1 , stpi 2 , or
mir
stpi 3 .
The crucial point is that none of these steps increases the number of hori-
zontal lines having conflicts. We check this step by step.
(a) First, var ub
var
u t are placed without conflict here. Secondly, due to the
general principle of the construction, all other horizontal lines than those
occ
i j representing occurrences of xu cannot be placed without conflict
within the stripes for xu . And those representing occurrences of xu may
be placed without conflict only in the left stripe. All that remains to show
is that the minimal number of conflict between vuvar a and those lines is
assumed in one of the extremal positions of vuvar a . But now, the fact that
the rst and last n rows of M are not used for horizontal lines takes
e ect. The height of vuvar a is always at least Nv − 2n. Thus it is impossible
to have horizontal lines placed both below and above vuvar a at the same
time. Consequently, from any placement of vuvar a , changing it into at least
one of its extremal positions is save.
(b) Only horizontal lines already having conflicts are a ected.
(c) Only occ occ mir
i j and i j may be placed in stpi j without conflict. If i j is
occ

present, this step may move a conflict from occ occ


i j to i j . Otherwise, only
occ
a possible conflict of i j is resolved.
(d) Again, only horizontal lines having conflicts are a ected.
(e) Symmetrically to step (d), a conflict may only be moved from i j to ci .
(f) Again, only horizontal lines already having conflicts are a ected.
This guarantees that the number of horizontal lines already having conflicts
is not increased.
Moreover, steps (b) and (c) for negative occurrences, resp. (b) and (e) for
positive occurrences, assure that lines occ i j are placed without conflict. Re-
member that i j = occ ij for positive occurrences, and i j = occ i j for nega-
occ
tive. Steps (d) and (e) guarantee the same for i j .
Finally, the lines ci are placed in a stripe where they may have a conflict
with at most some vic j , i.e. have at most one conflict.
3. In Section 2, we have convinced ourselves that the variable switches in
the leftmost part of M can be placed without conflict only in two ways,
interpretable as setting the corresponding variable to 0 or 1. Since all con-
flicts are on the rightmost part of M , we can take that interpretation as
an assignment to the variables. Also in Section 2, we have seen that this
112 Sebastian Seibert and Walter Unger

assignment satis es a clause ci if the line ci can be placed without conflict.


Thus, if there were initially m conflicts present in the placement, at most m
clauses will be non-satis ed.

4 Conclusion
We have presented the rst proofs of NP-completeness and APX-hardness for a
simple map labeling problem. An algorithm with an approximation factor of two
is easy. Just label either all horizontal or all vertical streets. It remains open to
close the gap between both factors. Our results extend easily to the case where
some regions of the map may not be used for labels. It is also interesting to look
for more extensions.

References
[CMS92] Jon Christensen, Joe Marks, Stuart Shieber: Labeling Point Features on
Maps and Diagrams, Harvard CS, 1992, TR-25-92.
[FW91] Michael Formann, Frank Wagner: A Packing Problem with Applications to
Lettering of Maps, Proc. 7th Annu. ACM Sympos. Comput. Geom., 1991,
pp. 281 288.
[Ha97] J. Hastad: Some optimal inapproximability results, in Proceedings of the 29th
Annual ACM Symposium on Theory of Computing, 1997, pp. 1 10.
[Ho96] D. S. Hochbaum ( d.): Approximation Algorithms for NP-hard Problems.
PWS Publishing Company 1996.
[IL97] Claudia Iturriaga, Anna Lubiw: NP-Hardness of Some Map Labeling Prob-
lems, University of Waterloo, Canada, 1997, CS-97-18.
[KI88] T. Kato, H. Imai: The NP-Completeness of the Character Placement Prob-
lem of 2 or 3 degrees of freedom, Record of Joint Conference of lectrical
and lectronic ngineers in Kyushu, 1988, pp. 1138.
[KR92] Donald . Knuth, Arvind Raghunathan: The Problem of Compatible Rep-
resentatives, SIAM J. Discr. Math., 1992, Vol. 5, Nr. 3, pp. 422 427.
[MPS98] . W. Mayr, H. J. Prömel, A. Steger ( ds.): Lectures on Proof Veri cation
and Approximation Algorithms. LNCS 1967, Springer 1998.
[MS91] Joe Marks, Stuart Shieber: The Computational Complexity of Cartographic
Label Placement, Harvard CS, 1991, TR-05-91.
[NW99] Gabriele Neyer, Frank Wagner: Labeling Downtown, Department of Com-
puter Science, TH Zurich, Switzerland, TR 324, May 1999
[SW99] Tycho Strijk, Alexander Wol : Labeling Points with Circles, Institut für In-
formatik, Freie Universität Berlin, 1999, B 99-08.
[WS99] Tycho Strijk, Alexander Wol : The Map Labeling Bibliography,
http://www.inf.fu-berlin.de/map-labeling/bibliography.
Labeling Downtown

1 2
Gabriele Neyer and Frank Wagner
1
Institut für Theoretische Informatik,
ETH Zürich, CH-8092 Zürich
[email protected]
2
Transport-, Informatik- und Logistik- Consulting (TLC),
Weilburgerstraße 28, D-60326 Frankfurt am Main and
Institut für Informatik, Freie Universität Berlin, Takustraße 9, D-14195 Berlin
[email protected]

Abstract American cities, especially their central regions usually have a very
regular street pattern: We are given a rectangular grid of streets, each street has to
be labeled with a name running along its street, such that no two labels overlap.
For this restricted but yet realistic case an efficient algorithmic solution for the
generally hard labeling problem gets in reach.
The main contribution of this paper is an algorithm that guarantees to solve every
solvable instance. In our experiments the running time is polynomial without a
single exception. On the other hand the problem was recently shown to be N P-
hard.
Finally, we present efficient algorithms for three special cases including the case
of having no labels that are more than half the length of their street.

1 Introduction

C1 C2 The general city map labeling prob- Cm


lem is too hard to be automated
Irving Blvd.
Serrano Av.
Tulip Street

R1 Beachwood Dr.
Elmwood Av.

yet [NH00]. In this paper we focus


R2 Maplewood Av. on the downtown labeling problem, a
Rosewood Av. special case, that was recently shown
Ridgewood Blvd.

Miller Street

to be N P-hard [US00].
Oakwood Av.

Wilshire Blvd.
Ardmore Av.

Larchmont Blvd.

The clearest way to model it, is


Wilton Av.

Gramery Blvd.
Clinton St.

Beverly Blvd.
to abstract a grid-shaped downtown
Kingsley Dr. street pattern into a chess board of ad-
Santa Monica Blvd. equate size. The names to be placed
Rn Manhattan Blvd. along their streets we abstract to be
Fig.1. American downtown street map. tiles that w.l.o.g. span an integer num-
ber of fields. A feasible labeling then
is a conflict free tiling of the board placing all the labels along their streets.
This work was partially supported by grants from the Swiss Federal Office for Education and
Science (Projects ESPRIT IV LTR No. 21957 CGAL and No. 28155 GALIA), and by the
Swiss National Science Foundation (grant “Combinatorics and Geometry”).
Frank Wagner is a Heisenberg–Stipendiat of the Deutsche Forschungsgemeinschaft, grant
Wa1066/1

G. Bongiovanni, G. Gambosi, R. Petreschi (Eds.): CIAC 2000, LNCS 1767, pp. 113–124, 2000.

c Springer-Verlag Berlin Heidelberg 2000
114 Gabriele Neyer and Frank Wagner

Our main algorithm is kind of an adaptive backtracking algorithm that is guaranteed


to find a solution if there is one. Surprisingly it has an empirically strictly bounded
depth of backtracking, namely one, which makes it an empirically efficient algorithm.
Given this experience this makes the theoretical analysis of our algorithm even more
interesting.
Using results from a well studied family of related problems from Discrete Tomog-
raphy [Woe96, GG95], we provide a NP-hardness result for a slightly different labeling
problem, taking place on a cylinder instead of a rectangle.
We round up the paper by giving efficient solutions to special cases: There is a
polynomial algorithm, if
– no label is longer than half of its street length
– all vertical labels are of equal length
– the map is quadratic and each label has one of two label lengths
One general remark that helps to suppress a lot of formal overhead: Often, we only
discuss the case of horizontal labels or row labels and avoid the symmetric cases of
vertical labels and columns or vice versa.

2 Problem Definition
Let G be a grid consisting of n rows and m columns. Let R = R1 Rn and
C = C1 Cm be two sets of labels. The problem is to label the it row of G with
Ri and the j t column of G with Cj such that no two labels overlap. We will represent
the grid G by a matrix.
Definition 1 (Label problem (G R C n m)).
Instance: Let Gn m be a two dimensional array of size n m, Gi j r c . Let Ri
be the label of the it row and let ri be the length of label Ri , 1 i n. Let Ci be
the label of the it column and let ci be the length of label Ci .
Problem: For each row i set ri consecutive fields of Gi  to r and for each column j
set cj consecutive fields of G j to c.
Of cause no label can be longer than the length of the row or column, respectively.
Initially, we set Gi j = which denotes that the field is not yet set. Let [a b[ be
an interval such that Gi x r , for x [a b[. We say that Gi [a b[ is free for row
labeling. Furthermore, this interval has length b − a. We also say that Gi  contains
two disjoint intervals of length at least b−a
2 that are free for row labeling, namely
[a a + b−a 2 [ and [a + b−a
2 b[.

3 General Rules
Assume we have a label with length longer than half of its street length. No matter how
we position the label on its street, there are some central fields in the street that are
always occupied by this label. We therefore can simply mark these fields occupied. It
is easy to see that these occupied fields can produce more occupied fields. The follow-
ing rules check whether there is sufficiently large space for each label and determines
occupied fields.
Labeling Downtown 115

Rule 3.1 (Conflict) Let I = [a b[ be the longest interval of row i that is free for row
labeling. If ri > b − a, then row i can not be labeled, since it does not contain enough
free space for row labeling. In this case we say that a conflict occurred and it follows
that the instance is not solvable.

Rule 3.2 (Large labels) Let I = [a b[ be the only interval in Gi  that is free for fea-
sible row labeling. Observe that the fields that are occupied simultaneously when Ri is
positioned leftmost and rightmost in I have to be occupied by Ri no matter where it is
placed. These fields we set to r and call them preoccupied.

Procedure 1 P REPROCESSING (G R C n m)
1) repeat
2) G = G;
3) run Rule 3.2 on (G R C n m) and on (GT C R m n);
4) if Rule 3.1 yields a conflict on (G R C n m) or on (GT C R m n) then
5) return “conflict”;
6) until (G = G );
7) return true;

Our P REPROCESSING Procedure 1 iteratively executes the Rules 3.1 and 3.2 until
none of them yields a further change to the label problem or a conflict occurs. In the
latter case we have that the instance is not solvable. We will spell out special cases
where the successful preprocessing implies solvability. Furthermore, the preprocessing
underlies the following considerations.
For each unfixed label that is limited to just one interval of at most twice its length or
to two intervals of exactly its length we can check whether these labels can be simulta-
neously positioned without conflicts. This can be done since all possible label positions
of these rows and columns can be encoded in a set of 2SAT clauses, the satisfaction
of which enforces the existence of a conflict free label positioning of these labels. On
the other hand a conflict free label positioning of these labels implies a satisfying truth
assignment to the set of clauses. Even, Itai and Shamir [EIS76] proposed a polynomial
time algorithm that solves the 2SAT problem in time linear in the number of clauses
and variables.
We therefore represent each matrix field Gi j by two boolean variables. We have
the boolean variable Gi j = r and its negation Gi j = r which means Gi j = r or
Gi j c . As the second variable we have Gi j = c and its negation Gi j = c
which means Gi j = c or Gi j r . Of course these two variables are coupled by
the relation (Gij = r) (Gij = c).
Those rows and columns, where the possible label positions are limited to just one
interval of at most twice its length or to two intervals of exactly it length, we call dense.
We now encode all possible label positions of the dense rows and columns in a set of
2SAT clauses the satisfaction of which yields a valid labeling of these rows and columns
and vice versa.
116 Gabriele Neyer and Frank Wagner

Property 1 (Density Property I). Let Gi  be a row that contains exactly two maximal
intervals each of length ri that are disjoint and free for feasible row labeling. Let these
intervals be [a b[ and [c d[, 1 a < b < c < d n + 1. Then, a valid labeling exists
if and only if the conditions

1. (Gi a = r) (Gi a+1 = r) (Gi a+2 = r) (Gi b−1 = r),


2. (Gi c = r) (Gi c+1 = r) (Gi c+2 = r) (Gi d−1 = r),
3. (Gi a = r) = (Gi c = r)

are fulfilled.

(Gi a = r) (Gi a+1 = r) can be written as the 2SAT clauses (Gi a = r


Gi a+1 = r), (Gi a = r Gi a+1 = r) and since the condition (Gi a = r) (Gi c = =
r) can be written as (Gi a = r Gi c = r), (Gi a = r Gi c = r) it is easy to see that
the complete Density Property 1 can be written as a set of 2SAT clauses. The feasible
label placements are (Gi a = r Gi b−1 = r) and (Gi c = r Gi d−1 = r).

Property 2 (Density Property II). Let Gi  be a row that contains only one maximal
interval [a b[ that is free for feasible row labeling, ri < b − a 2ri . Then, a valid
labeling for the row exists if and only if the conditions
1. (G a = r) (G a+1 = r) (G a+2 = r) (G b−ri −1 = r)
2. G b−ri = r G a+ri −1 = r, and
3. (G a = r) = (G a+ri = r) (G a+1 = r) = (G ri +1 = r) (G b−ri −1 = r) =
(G b = r)

are fulfilled.

Analogously to the first Density Property, the conditions of the second Density
Property can be formulated as a set of 2SAT clauses. All feasible label placements
are (Gi a = r Gi a+1 = r Ga+ri −1 = r), (Gi a+1 = r Gi a+2 = r Ga+ri =
r), (Gi a+2 = r Gi a+3 = r Ga+ri +1 = r), , (Gi b−ri = r Gi b−ri +1 =
r Gi b = r). Note that the properties work analogously for columns.

Theorem 1. The 2SAT formula of all dense rows and columns can be created in O(nm)
time. The 2SAT instance can be solved in O(nm) time.

Proof: The number of variables is limited by 2nm. For each dense row we have at
most 32 n clauses. Analogously, for each dense column we have at most 32 m clauses.
Altogether we have O(nm) clauses. Thus, the satisfiability of the 2SAT instance can be
checked in O(nm) time [EIS76]. 2
Procedure 2 calls Procedure 1, our preprocessing. In case of success, all dense rows
and columns are encoded as a set of 2SAT clauses with the aid of Density Property 1
and 2. Then, their solvability is checked e.g. by invoking the 2SAT algorithm of Even,
Itai and Shamir [EIS76].
Lemma 1. The P REPROCESSING Procedure 1 and the D RAW C ONCLUSIONS Proce-
dure 2 can be implemented to use at most O(nm(n + m)) time.
Labeling Downtown 117

Procedure 2 D RAW C ONCLUSIONS (G R C n m)


1) if P REPROCESSING(G,R,C,n,m) then
2) := the set of 2SAT clauses of the dense rows and column;
3) if is satisfiable then return true;
4) else return false;

Proof: The rules only need to be applied to those rows and columns in which an entry
was previously set to r or c. A setting of a field Gi j can only cause new settings in row
i or column j, which by themselves can again cause new settings. The application of
the rules on a row and a column takes time O(n + m). Since at most 2nm fields can
be set we yield that the preprocessing can be implemented such that its running time
is O(nm(n + m)). In Theorem 1 we proved that the 2SAT clauses can be generated
and checked for solvability in O(nm) time. Thus, we have a worst case time bound of
O(nm(n + m)). 2
Thus, we can solve dense problems:
Theorem 2. In case that each row and each column of a preprocessed labeling instance
(G R C n m) either fulfills the Density Property 1 or 2, Procedure 2 D RAW C ON -
CLUSIONS decides if the instance is solvable. In case of solvability we can generate
a valid labeling from a truth assignment. The overall running time is bounded by
O(nm(n + m)).

4 A General Algorithm
In this section we describe an algorithmic approach with a backtracking component that
solves any label problem. Empirically it uses its backtracking ability in a strictly limited
way such that its practical runtime stays in the polynomial range. After performing the
P REPROCESSING and satisfiability test for dense rows and columns (see Procedure 2
D RAW C ONCLUSIONS), we adaptively generate a tree that encodes all possible label
settings of the label problem. Each node in the first level of the search tree corresponds
to a possible label setting for the first row label. In the it level the nodes correspond
to the possible label settings for the it row, depending on the label settings of all
predecessor rows. Thus, we have at most m possible positions for a row label and at
most n levels. Our algorithm searches for a valid label setting in this tree by traversing
the tree, depth-first, generating the children of a node when necessary.
In the algorithm, we preprocess matrix G and check the solvability of the dense
rows and columns by invoking Procedure 2 D RAW C ONCLUSIONS. We further mark
all these settings permanently. When we branch on a possible label setting for a row, we
increase the global timestamp, draw all conclusions this setting has for the other labels
by invoking Procedure 2 D RAW C ONCLUSIONS and timestamp each new setting. These
consequences can be a limitation on the possible positions of a label or even the impos-
sibility of positioning a label without conflicts. After that, we select one of the newly
generated children, increase the timestamp and again timestamp all implications. When
a conflict occurs, the process resumes from the deepest of all nodes left behind, namely,
118 Gabriele Neyer and Frank Wagner

from the nearest decision point with unexplored alternatives. We mark all timestamps
invalid that correspond to nodes that lie on a deeper level than the decision point. This
brings the matrix G into its previous state without storing each state separately. Let the
algorithm return a valid label setting for all rows. Since Procedure 1 ensures that each
column i contains an interval of length at least ci that is free for column labeling we
can simply label each column and yield a valid label setting. The algorithm is given in
Algorithm 1, and in the Procedures 1, 2, and 3.

Algorithm 1 L ABEL (G R C n m)
1) timestamp:= 1;
2) if D RAW C ONCLUSIONS(G,R,C,n,m) yields a conflict
3) return “not solvable”;
4) timestamp each setting;
5) let w be the first row that is not yet labeled;
6) if P OSITION AND BACKTRACK(w,G,R,C,n,m,timestamp)
7) label all columns that are not yet completely labeled;
8) return G;
9) else
10) return “not solvable”;

Procedure 3 P OSITION AND BACKTRACK (w G R C n m timestamp)


1) while there are untested possible positions for label rw in row w
2) local timestamp:=timestamp:=timestamp+1;
3) label row w with rw in one of these positions;
4) timestamp each new setting;
5) if D RAW C ONCLUSIONS(G,R,C,n,m) then
6) timestamp each new setting;
7) if there is a row w that is not yet labeled
8) if P OSITION AND BACKTRACK(w G R C n m) then
9) return true;
10)
11) else return true;
12)
13) timestamp each new setting;
14) mark local timestamp invalid;
15)
16) return false;

We implemented the backtracking algorithm and tested it on over 10000 randomly


generated labeling instances with n and m at most 100. After at most one backtracking
step per branch the solvability of any instance was decided. The algorithm is construc-
tive; for each solvable instance a labeling was produced. This makes it reasonable to
study the worst case run time behavior of the algorithm with backtracking depth one.
Labeling Downtown 119

The algorithm behaves in the worst case when each label is positioned first in all places
that cause a conflict, before it is positioned in a conflict free place. A row label can be
positioned in at most m different places. Each time when a label is positioned the Pro-
cedure 2 D RAW C ONCLUSIONS is called, which needs at most O(nm(n + m)) time.
Thus, the time for positioning a row label is bounded by O(nm2 (n + m)) time. Since n
rows have to be labeled the backtracking approach with backtracking depth one needs
at most O(n2 m2 (n+m)) time. If the assumption of limited backtracking behavior does
not hold the runtime is exponential.

5 Complexity Status
Although, Unger and Seibert recently proved the
C2 N P-completeness of the label problem [US00],
C1
we now show that a slight generalization, namely
R1 the labeling of a cylinder shaped downtown, is
R2 N P-hard. The reason is that our reduction could
... be helpful in understanding the complexity of
the original problem. In addition it is quite intu-
itive and much shorter than that from Unger and
Seibert. Instead of labeling an array we now la-
bel a cylinder consisting of n cyclic rows and m
columns. Figure 2 shows an example of a cylin-
der instance. We show that this problem is N P-
complete by reducing a version of the Three Parti-
Fig.2. Cylinder Label problem tion problem to it. Our proof is similar to a N P-
completeness proof of Woeginger [Woe96] about the reconstruction of polyominoes
from their orthogonal projections. Woeginger showed that the reconstruction of a two-
dimensional pattern from its two orthogonal projections H and V is N P-complete
when the pattern has to be horizontally and vertically convex. This and other related
problems, also discussed in [Woe96] show up in the area of discrete tomography.

Definition 2 (Cylinder Label problem (Z R C n m)).


Instance: Let Zn m be a cylinder consisting of n cyclic rows and m columns. Let Ri
be the label of the it row and let ri be the length of label Ri , 1 i n. Let Ci be
the label of the it column and let ci be the length of label Ci , 1 i m.
Problem: For each row i set ri consecutive fields of Zi  to r, for each column j set cj
consecutive fields of Z j to c.

Our reduction is done from the following version of the N P-complete Three Parti-
tion problem [GJ79].

Problem 1 (Three Partition).


Instance: Positive integers a1 a3k that are encoded in unary and that fulfill the two
3k
conditions (i) i=1 ai = k(2B + 1) for some integer B, and (ii)(2B + 1) 4 <
ai < (2B + 1) 2 for 1 i 3k.
120 Gabriele Neyer and Frank Wagner

Problem: Does there exist a partition of a1 a3k into k triples such that the ele-
ments of every triple add up to exactly 2B + 1?

Theorem 3. The Cylinder Label problem is N P-complete.

Proof: Cylinder Label problem N P: The Cylinder Label problem is in N P since


it is easy to check whether a given solution solves the problem or not.
Transformation: Now let an instance of Tree Partition be given. From this instance we
construct a Cylinder Label problem consisting of n = k(2B + 2) rows and m = 3k
columns. The vector r defining the row label length is of the form:

(m m − 1 m−1 m m−1 m−1 )

(2B + 1)-times (2B + 1)-times

Since a row label of length m occupies the whole row, those rows with label length
m have no free space for column labeling. Therefore the rows with label length m
subdivide the rows in k blocks, each containing 2B + 1 rows each of which has one
entry that is free for column labeling when the row is labeled. The vector defining the
column label length is of the form:
(a1 a2 a3k )

The transformation clearly is polynomial.


The Tree Partition instance has a solution the Cylinder Label instance has a
solution:
“ ”: Let (x1 y1 z1 ) (xk yk zk ) be a partition of a1 a3k into k triples such
that xi + yi + zi = 2B + 1, 1 i k. For each i, (xi yi zi ) = (af ag a ), for some
indices f ,g, and , 1 i f g 3k. We now label the columns f g and among
themselves in the i-th block of rows. More precisely, in column f we label the fields
Zf (i−1)(2B+2)+2 = c Zf (i−1)(2B+2)+1+caf = c. In column g we label the fields
Zg (i−1)(2B+2)+2+caf = c ,Zg (i−1)(2B+2)+1+caf +1+cag = c. In column we
label the fields Z (i−1)(2B+2)+2+caf +cag = c Z (i−1)(2B+2)+1+caf +cag +ca =
c. It then follows that the rows j(2B + 2) + 1 are free for row labeling, for 0 j k.
Thus, we can label them with their labels of length 3k = m. All other rows have exactly
one entry occupied by a column label. Since the rows are cyclic we can label each of
these rows with a label of length 3k − 1.
“ ”: Let Z be a solution of the Cylinder Label instance. Each row contains at most one
entry that is occupied by a column label. Each column label ai has length (2B + 1) 4 <
ai < (2B + 1) 2, 1 i 3k. Therefore, exactly three columns are label in the rows
j(2B + 2) + 2 (j + 1)(2B + 2), for 0 j k − 1. Furthermore the label length
of each triple sums up to 3k and thus partitions a1 a3k into k triples. Thus solves
the Three Partition instance. 2
Labeling Downtown 121

6 Solvable Special Cases


In the following section we derive an O(nm) time algorithm for the special case where
no label is longer than half of its street length. We think that this case applies especially
to large downtown maps, where the label length is short in respect to the street length.
In Section 6.2 we solve the label problem when each vertical label is of equal length. In
many American cities the streets in one orientation (e.g. north-south) are simply called
1-st Avenue, 2-nd Avenue, . These names have all the same label length and thus the
label problem can be solved with the algorithm in Section 6.2. Another solvable special
case is the following: We are given a quadratic map, where each label has one of two
label lengths. Such an instance has a solution if and only if no conflict occurred in the
P REPROCESSING Procedure 1. Due to space limitations, the length of the algorithm,
and its proof we refer to the technical report of this paper for this special case [NW99].
The algorithms of the last two cases have runtime O(nm(n + m)).

6.1 Half Size

Let (G R C n m) be a label problem. In this section we study the case where each
row label has length at most m n
2 and each column label has length at most 2 . We
show that the label problem is solvable in this case.

Algorithm 2 H ALF S OLUTION (G R C n m)


n
1) label the rows 1 2
leftmost;
n
2) label the rows 2 + 1 n rightmost;
m
3) label the columns 1 2
bottommost;
4) label the columns m 2
+ 1 m topmost;

Theorem 4. Let (G R C n m) be a label problem. Let ri m


2 and let ci
n
2 .
Then, Algorithm 2 computes a solution to Problem 1 in O(nm) time.

Proof: Take a look at Figure 3.

6.2 Constant Vertical Street Length


In this section we consider the special case of the label problem (G R C n m) where
all column labels have length l. This problem we denote with (G R C n m l). We
show that we can decide whether the label problem (G R C n m l) is solvable or
not. We further give a simple algorithm that labels a solvable instance correctly. All
results of this section are assignable for the constant row length case.

Theorem 5 (Constant Column Length). Let (G R C n m l) be a label problem


with ci = l, 1 i n. The instance is solvable if and only if no conflict occurred in
the P REPROCESSING Procedure 1.
122 Gabriele Neyer and Frank Wagner

6 5 4 6 3 2 6 5 5 3 5 6 4 4 5 2 Ci
00
11 0
1 0
100 1
11 0 0
100
11 0
1

10th Avenue
11th Avenue
12th Avenue
8th Avenue
9th Avenue
00
11
00 1
0
0 1
0
011
00 0
1
0 1
0
011
00
00 1
0

7th Avenue
5 11
00
11 1
0
1
1
0
100
11
1
00 1
11 0 1
0
111
00
11 0
1
0
1 Ron Smothermon Street
00
11 0
1 0
100
11 0
1 0
100
11
00 1
0
7 00 1
11
00
11 0
1
011
1
0 1
000
11
0 1
1
00 1
0
011
0
1
11
00 0
1
0
1
00
11
00
11 0
1 0
100
11 0
1 0
100
11 Alice Miller Drive
8 0
1
00 1
11 0
100
11
011
1
0 1 0
1
00 1 0
1
0 1
1
00
11
00
11
011
1
00
11 000
11 0 000

6th Avenue
6 00
11
00
11 0
1
011
100
00 1
11 0
0 1
1 0
011
100
00
11
00
11
00
11 0
1
0
100
11 00
11 Lewis Caroll Street
5 00
11
00
11 011
1
0
1
00
00
11
00
11
00
11
00
11
00
11
00
11
811
00 00
11 00
11 00
11 Phillip Roth Street
00
11 00
11 00
11 00
11
00
11
811
00 00
11
00
11 00
11
00
11
00 1
11 0 00
11 00 1
11 0 Maria Nurowska Blvd.
7 11
00
00
11 0 11
1
00 1
11 0
00
00
11
00
11
00
11
00
11 0
1
00 1
11 0

2nd Avenue
0
1 0
1 0
1

3rd Avenue
4th Avenue
5th Avenue
00
11 00
11 00
11

1st Avenue
511
00 0 1
1 011
00 00
11 0
1
00
11 0
1
0
1 0
100
11 0 11
1 00 0
1 Robert Anton Wilson Street
00 1
11 0
1
0 1
0 0
1
00 1
11 0 00 1
11 0
0
1
611
00
00
11 0
1
0
1
00
11
011
100 00
11
0 11
1 00 0
1
00 1
11 0
1
0 1
0 0
1
00 1
11 0 0
100 1
011
1 0
0
1 Matt Ruff Blvd
00
11
511
00 0
1 00
11
011
100 0 1
1 000
11
00
11 0
1
0
1
00 1
11 0 0
1 0
1
00 1
11 00 1
011
1 0
0
1
00
11 0
1
0 1
1 00
11
011 0 0
100
11 0
1
311
00 0
1
00 1
11 0
100 0 1
1
0
1 000
11
00 1
011
1 0 Douglas Adams Drive
0 11
00 0
1
Ri

Fig.3. Solution of a typical half size label Fig.4. Typical downtown map where the ver-
problem according to Algorithm 2. tical street names have constant length.

m n
We assume that l 2 . Otherwise, row 2 contains no fields that are free for
row labeling. The next lemma states that the preoccupied fields are symmetric to the
vertical central axis of G.

Lemma 2. Let (G R C n m l) be a successfully preprocessed label problem. After


the preprocessing each row has the form [aba], where

Gi 1 = Gi a = Gi a+1 = x Gi a+b = x Gi a+b+1 = Gi m =

for x = r or x = c, m b 0 and 2a + b = m.

Proof: Initially we have Gi j = for 1 i n, 1 j m. Executing Rule 3.2 on


each row i with length ri > m 2 yields Gi m−r i +1 = r G i ri = r. Thus, all r-entries
of G are symmetric to the vertical mid axis of G. Remember that each column has label
length l. Therefore, executing Rule 3.2 on each column i yields that for each entry Gi j
that is set to c the fields Gi j+1 = c Gi m−j+1 = c, if 1 m
2 ; and Gi m−j+1 =
c Gi j−1 = c, if 2 m
i m. Therefore, all c-entries of G are symmetric to
the central vertical axis of G. Thus, until now each row i has the form [aba], where
Gi 1 = Gi a = Gi a+1 = x Gi a+b = x Gi a+b+1 = Gi m =
for x y r c , b 0, 2a+b = m and 1 i n. Assume that the repeated execution
of Rule 3.2 on row i of form [aba] and x = c does change an entry of Gi  . In this case
a < ri and it follows that the instance is not solvable. Therefore, the repeated execution
of Rule 3.2 on a column can not change the instance and the lemma follows. 2

Lemma 3. Let (G R C n m l) be a successfully preprocessed label problem. Then


Algorithm 3 computes a feasible solution to (G R C n m l) in O(nm(n + m)) time.

Proof: Since (G R C n m l) is preprocessed successively it follows that each column


contains an interval of length at least l that is free for column labeling. Assume that after
processing steps 2-3 there exists a row i not containing an interval of length ri that is
free for row labeling. We make a case distinction according to the length of Ri :
Labeling Downtown 123

Algorithm 3 C ONSTANT C OLUMN L ENGTH (G R C n m l)


1) if P REPROCESSING(G R C n m)
m
2) label the columns 1 2
− 1 bottommost;
m
3) label the columns 2 m topmost;
4) label the rows 1 n in the free space;
5)

Case ri > m 2 : We know that the fields Gi m−ri +1 = r Gi ri = r were set in the
preprocessing. Furthermore, Lemma 2 yields that no other entry of Gi  was set to c
in the preprocessing. Therefore, each column G j with 1 j < m − ri + 1 or ri +
1 j m contains either one interval of length at least 2l that is free for column
labeling or two intervals each of length at least l that is free for column labeling.
From the symmetry of the label problem and since the column labels of the columns
j with 1 j < m − ri + 1 are labeled bottom most and the labels j with ri + 1
j m are labeled top most it follows that either the fields Gi 1 Gi m−ri +1 are
free for row labeling or the fields Gi ri +1 Gi m . Thus, Gi  contains an interval
of length ri that is free for row labeling. Contradiction.
2 : Lemma 2 yields that this row has the form [aba] with Gi 1 =
m
Case ri
Gi a = Gi a+1 = c Gi a+b = c Gi a+b+1 = Gi m = , b 0 and
2a + b = m. Since the instance is solvable it follows that a ri . With the same
arguments as above it follows that either the fields Gi 1 Gi ri are free for row
labeling or the fields Gi m−ri +1 Gi m are free for row labeling. Contradiction.
The running time is dominated by the preprocessing and thus O(nm(n + m)). 2
See Figure 5 and 6 for an example. Figure 4 shows a typical downtown city map in
which all vertical streets have the same length.

0000000000
1111111111 11111111111
00000000000 1111111
0000000
0000000000
1111111111
1111111111111111111111111111111111111111
0000000000000000000000000000000000000000
0000000000000000000000000000000000000000
1111111111111111111111111111111111111111
00000000000
11111111111
111111111
000000000
11111111111
00000000000
00000000000
11111111111
0000000
1111111
0000000000000000000000
1111111111111111111111
00
11
1111111111111111111111111111
0000000000000000000000000000
00
11
00
11
0
1
0
1
0
1
0
1
00
11
0
1
0
1
00
11
0
1
0
1
00
11
0
1
0
1
0
1
0
1
1111111111111111111111111111111
0000000000000000000000000000000
11111111111111111111
00000000000000000000
11111
00000
000000
111111
0000000000000000000000
1111111111111111111111
00000000000
11111111111
000000
111111
11111111111
00000000000
11111111
00000000
000000
111111
00
11
00
11
0
1
1111111111111
0000000000000
00
11
00
11
0
1
00
11
00
11
0
1
00
11
00
11
0
1
00
11
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
00
11
0
1
00
11
0
1
00
11
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
00
11
0
1
00
11
0
1
00
11
0
1
00
11
0
1
0
1
0
1
0
1
0
1
00
11
0
1
0
1
0
1
00
11
0
1
00
11
0
1
0
1
00
11
0
1
0
1
0
1
0
1
0
1
1111111111
0000000000
000000
111111 00
11
00
11
0
1 00
11
0
1
0
10
1 0
10
1 0
1
111111111
000000000 00
11
00
11
0
1 00
11
0
1
0
10
1 0
10
1 0
1
111111111
000000000 00
11
00
11
0
1 00
11 0
10
1 0
1
01
1111111111
0000000000
0000
1111 0
0
10
1
111111111
000000000 00
11
00
11
1 0
1
0
10
1
11111111111111111111111
00000000000000000000000
1111111111111
0000000000000
0000
1111
00000
11111
11111111111
00000000000
00000
11111
0000000000
1111111111
11111111111
00000000000
0000000000 1
1111111111
000000000000000
111111111111111
1
00
11
0
0
1
00
11
00
11
0
1
0
1
00
11
0
1
0
1
0
1
0
1
0
1
111111111
000000000
0 0
10
1
0
1
01
00
10
1000000
111111
00
11 0
1
00
11 0
10
10
1
00
11
0
1 0
100
11
0
100
11
0
100
11
0
1 0
1 00
11
00
11
0
1 0
1
00
11
0
10
10
1
00
11
0
1
0
1
00
110
1
00
11
0
100
11
0
1 0
1
00
11
0
1
00
11
1 0
10
10
1111111
000000
000000
111111
00
11
00
11
0
1
00
11
00
11
0
1 0
1
0
10
1
0
10
1
00
11
0
10
1
00
11
0
1 0
1
0
100
11
0
1
00
11
0
100
11
0
1
00
11
0
1
00
11
0
1
00
11
0
1
0
1
0
1 00
11
0
100
11
0
1 0
1
00
11
0
10
10
1
00
11
0
1
0
1
00
110
1
00
11
0
100
11
0
1 0
10
1
00
11
00
11
0
1 0
10
10
1111111
000000
00
11 00
11
01
1 01
101
10
1
00
11
01
1 0
100
11
011
100
11
011
100
11
01
1 0
1 00
11
0
1
0
100
11
0
1 0
1
00
11
0
10
10
1
00
11
0
1
0
1
00
110
1
00
11
0
100
11
0
1 0
10
1
00
11
00
11
0
1 0
10
10
1 11111
00000
0000
1111
00
11
00
11
00
11
0
1
00
11
0
1
0
0
1
0
0
1
0
00
11
0
1
0
1
00
11
0
1
0
0
1
00
11
0
1
00
11
0
1
00
0
1
00
11
0
1
00
0
1
00
11
0
1
0
0
1 00
11
0
1
0
100
11
0
1 0
1
00
11
0
10
10
1
00
11
0
1
0
1
00
110
1
00
11
0
100
11
0
1 0
10
1
00
11
00
11
0
1 0
10
10
1 11111
00000
0000
1111
0000
1111
00
11 00
11
01
1 0
01
0
01
0
00
11
0
1
01
000
11
0
1
011
00
0
1
011
00
0
1
01
0 00
11
00
11
0
100
11
0
1 00
11
0
1
0
100
11
0
1 0
1
00
11
0
10
10
1
00
11
0
1
0
1
00
110
1
00
11
0
100
11
0
1 0
10
1
00
11
00
11
0
1 0
10
10
1 0000
1111
00
11
00
11
0
100
11
0
1
11111111111111111111111111
00000000000000000000000000
0
1
00
11
0
1
00
11
0
1 0
1
00
11 110
1
00
11
1 0
100
11
100
11
1 1
00
11 0
10
10
1
00
11
0
1 0
100
11
0
100
11
0
1 0
1
00
1101
00
11
0
100
11
1 00
11
0
1
000
11
0
1 0
1
00
11
0
10
10
1
00
11
0
1
00
11
0
1 0
1
00
11
0
100
11
0
1 11111111
00000000
111111111
000000000
00
11
0
1
0
100
11
0
1 0
1
00
11
0
10
10
1
00
11
0
1
00
11
0
1 0
1
00
11
0
100
11
0
1 0000000000
1111111111
11111111111111111111111111111111111111
00000000000000000000000000000000000000 00
11
00
11
0
1
00
11
00
11
01
1
00
11
0
100
11
0
1 00
11
0
1
0
0
1
0
1 00000000000000000000000000
11111111111111111111111111
00
11
01
1 0
00
11
0
101
10
00
11
0
1
00
11
01
1 0
00
11
0
100
11 111111111
000000000
111111
000000
0 1111111111
1 0000000000 00000
11111
000000
111111
00000
11111
1111111111111111111111111111111111111111111 11
0000000000000000000000000000000000000000000 00
00
11
0
1
00
1100
11
0
1
01
00
11
0
1
00
11
00
11
1
00
11
0
100
11
0
1 111111111111111111111111111111
000000000000000000000000000000
0
1
0 000000
111111
1111111111
0000000000 1111
0000
Fig.5. Matrix of a constant column length Fig.6. Solution of the label problem of the left
problem after the successful preprocessing. En- figure.
tries that are set in the preprocessing are col-
ored black and gray.
124 Gabriele Neyer and Frank Wagner

Acknowledgements

The authors would like to thank Peter Widmayer and Wolfram Schlickenrieder for in-
teresting and fruitful discussions.

References
[EIS76] Even S., Itai A., Shamir A.: On the Complexity of Timetable and Multicommodity Flow
Problems. SIAM Journal on Computing, 5(4):691-703, (1976)
[GG95] Gardner R.J., Gritzmann P.: Descrete Tomography: Determination of Finite Sets by X-
Rays. Technical Report TR-95-13, Institute of Computer Science, University of Trier, Austria,
(1995)
[GJ79] Garey M.R., Johnson D.S.: Computers and Intractibility, A Guide to the Theory of NP-
Completeness. W. H. Freeman and Company, New York, (1979)
[NH00] Neyer G., Hennes H.: Map Labeling with Application to Graph Labeling. GI
Forschungsseminar: Zeichnen von Graphen, Teubner Verlag, to appear, (2000)
[NW99] Neyer G., Wagner F.: Labeling Downtown. Technical Report TR 324,
ftp://ftp.inf.ethz.ch/pub/publications/tech-reports/3xx/324.ps.gz, Institute of Computer
Science, ETH Zurich, Switzerland, (1999)
[US00] Unger W., Seibert S.: The Hardness of Placing Street Names in a Manhattan Type Map.
Proceedings of the 4th Italian Conference on Algorithms and Complexity (CIAC’2000), Lec-
ture Notes in Computer Science, Springer-Verlag, this volume, (2000)
[Woe96] Woeginger G.J.: The Reconstruction of Polyominoes from their Orthogonal Projec-
tions. Technical Report SFB-Report 65, TU Graz, Institut für Mathematik, A-8010 Graz, Aus-
tria, (1996)
The Online Dial-a-Ride Problem under
Reasonable Load?

Dietrich Hauptmeier, Sven O. Krumke, and Jörg Rambau

Konrad-Zuse-Zentrum für Informationstechnik Berlin,


Takustr. 7, 14195 Berlin, GERMANY,
auptmeier,krumke,rambau @zib.de

Ab tract. In this paper, we analy e algorithms for the online dial-a-


ride problem with request sets that ful ll a certain worst-case restriction:
roughly speaking, a set of requests for the online dial-a-ride problem is
reasonable if the requests that come up in a su ciently large time period
can be served in a time period of at most the same length. This new
notion is a stability criterion implying that the system is not overloaded.
The new concept is used to analy e the online dial-a-ride problem for
the minimi ation of the maximal resp. average flow time. Under reason-
able load it is possible to distinguish the performance of two particular
algorithms for this problem, which seems to be impossible by means of
classical competitive analysis.

1 Introduction
It is a standard assumption in mathematics, computer science, and operations
research that problem data are given. However, many aspects o li e are online.
Decisions have to be made without knowing uture events relevant or the current
choice. Online problems, such as vehicle routing and control, management o call
centers, paging and caching in computer systems, oreign exchange and stock
trading, had been around or a long time, but no theoretical ramework existed
or the analysis o online problems and algorithms.
Meanwhile, competitive analysis has become the standard tool to analyze
online-algorithms [4,6]. O ten the online algorithm is supposed to serve the re-
quests one at a time, where a next request becomes known when the current
request has been served. However, in cases where the requests arrive at certain
points in time this model is not su cient. In [3,5] each request in the request
sequence has a release time. The sequence is assumed to be in non-decreasing
order o release times. This model is sometimes re erred to as the real time
model. A similar approach was used in [1] to investigate the online dial-a-ride
problem OlD rp or short which is the example or the new concept in this
paper.
Since in the real time model the release o a new request is triggered by a
point in time rather than a decision o the online algorithm we essentially do not
?
Research supported by the German Science Foundation (grant 883/5-2)

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 125 136, 2000.

c Springer-Verlag Berlin Heidelberg 2000
126 Dietrich Hauptmeier, Sven O. Krumke, and Jörg Rambau

need a total order on the set o requests. There ore, or the sake o convenience,
we will speak o request sets rather than request sequences.
In the problem OlD rp objects are to be transported between points in a
given metric space X with the property that or every pair o points (x y) X
there is a path p : [0 1] X in X with p(0) = x and p(1) = y o length d(x y).
An important special case occurs when X is induced by a weighted graph.
A request consists o the objects to be transported and the corresponding
source and target vertex o the transportation request. The requests arrive online
and must be handled by a server which starts and ends its work at a designated
origin. The server picks up and drops objects at their starts and destinations. It
is assumed that neither the release time o the last request nor the number o
requests is known in advance.
A easible solution to an instance o the OlD rp is a schedule o moves (i.e.,
a sequence o consecutive moves in X together with their starting times) in X
so that every request is served and that no request is picked up be ore its release
time. The goal o OlD rp is to nd a easible solution with minimal cost ,
where the notion o cost depends on the objective unction used. The ocus o
this paper is the investigation o the notoriously di cult task to minimize the
maximal or average flow time online.
Recall that an online-algorithm A is called c-competitive i there exists a
constant c such that or any request set (or request sequence i we are
concerned with the classical online model) the inequality A( ) c OPT( ) holds.
Here, X( ) denotes the objective unction value o the solution produced by
algorithm X on input and OPT denotes an optimal o ine algorithm. Sometimes
we are dealing with various objectives at the same time. We then indicate the
objective obj in the superscript, as in Xobj ( ).
Competitive analysis o OlD rp provided the ollowing (see [1]):
There are competitive algorithms (IGNORE and REPLAN, de nitions see
below) or the goal o minimizing the total completion time o the schedule.
For the task o minimizing the maximal (average) waiting time or the maxi-
mal (average) flow time there can be no algorithm with constant competitive
ratio. In particular, the algorithms IGNORE and REPLAN have an unbounded
competitive ratio.
We do not claim originality or the actual online-algorithms IGNORE and RE-
PLAN; instead we show a new method or their analysis. As the reader will see in
the de nitions, both REPLAN and IGNORE are straight- orward online strategies
based on the ability to solve the o ine version o the problem to optimality or a
constant- actor approximation thereo (with respect to the minimization o the
total completion time).
The rst to the best o our knowledge occurrence o the strategy IGNORE
can be ound in the paper by Shmoys, Wein, and Williamson [13]: They show a
airly general result about obtaining competitive algorithms or minimizing the
total completion time in machine scheduling problems when the jobs arrive over
time: I there is a -approximation algorithm or the o ine version, then this
implies the existence o a 2 -competitive algorithm or the online-version, which
The Online Dial-a-Ride Problem under Reasonable Load 127

is essentially the IGNORE strategy. The results rom [13] show that IGNORE-
type strategies are 2-competitive or a number o online-scheduling problems.
The strategy REPLAN is probably olklore; it can be ound also under di erent
names like REOPT or OPTIMAL.
It should be noted that the corresponding o ine-problems with release times
(where all requests are known at the start o the algorithm) are NP-hard to solve
or the objective unctions o minimizing the average or maximal flow time it is
even NP-hard to nd a solution within a constant actor rom the optimum [11].
The o ine problem without release times o minimizing the total completion
time is polynomially solvable on special graph classes but NP-hard in general
[8,2,7,10].
I we are considering a continuously operating system with continuously ar-
riving requests (i.e., the request set may be in nite) then the total completion
time is meaningless. Bottom-line: in this case, the existing positive results cannot
be applied and the negative results tell us that we cannot hope or per ormance
guarantees that may be relevant in practice, such as bounds or the maximal
or average flow time. In particular, the two algorithms IGNORE and REPLAN
cannot be distinguished by classical competitive analysis because it is easy to see
that no online-algorithm can have a constant competitiveness ratio with respect
to minimizing the maximal or average flow time.
The point here is that we do not know any notion rom the literature to
describe what a particular set o requests should look like in order to allow or a
continuously operating system. In queuing theory this is usually modelled by a
stability assumption: the rate o incoming requests is at most the rate o requests
served. To the best o our knowledge, so ar there has been nothing similar in the
existing theory o discrete online-algorithms. Since in many instances we have no
exploitable in ormation about the distributions o requests we want to develop a
worst-case model rather than a stochastic model or stability o a continuously
operating system.
Our idea is to introduce the notion o -reasonable request sets. A set o
requests is -reasonable i roughly speaking requests released during a period
o time can be served in time at most . A set o requests R is reasonable
i there exists a < such that R is -reasonable. That means, or non-
reasonable request sets we nd arbitrarily large periods o time where requests
are released aster than they can be served even i the server has the optimal
o ine schedule. When a system has only to cope with reasonable request sets, we
call this situation reasonable load. Section 3 is devoted to the exact mathematical
setting o this idea.
We now present our main result on the OlD rp under -reasonable which
we prove in Sects. 4 and 5.
Theorem 1. For the OlD rp under -reasonable load, IGNORE yields a max-
imal and an average flow time of at most 2 , whereas the maximal and the
average flow time of REPLAN are unbounded.
The algorithms IGNORE and REPLAN have to solve a number o o ine in-
stances o OlD rp, minimizing the total completion time, which is in general
128 Dietrich Hauptmeier, Sven O. Krumke, and Jörg Rambau

NP-hard, as we already remarked. We will show how we can derive results or


IGNORE when using an approximate algorithm or solving o ine instances o
OlD rp ( or approximatation algorithms or o ine instances o OlD rp, re-
er to [8,2,7,10]). For this we re ne the notion o reasonable request sets again,
introducing a second parameter that tells us, how ault tolerant the request
set is. In other words, the second parameter tells us, how good the algorithm
has to be to show stable behavior. Again, roughly speaking, a set o requests is
( , )-reasonable i requests released during a period o time can be served
in time at most .I = 1, we get the notion o -reasonable as described
above. For > 1, the algorithm is allowed to work sloppily (e.g., employ ap-
proximation algorithms) or have break-downs to an extent measured by and
still show a stable behavior.
Note that our per ormance guarantee is with respect to the reasonableness
o the input set not with respect to an optimal o ine solution. One might
ask whether IGNORE is competitive with respect to minimizing the maximal or
average flow time. This ollows trivially rom our main result i the length o a
single request is bounded rom below; we leave it as an exercise or the reader to
show that without this assumption there can be no competitive online algorithm
or these objective unctions even under reasonable load.
The algorithms under investigation compute o ine locally optimal schedules
with respect to the minimization o the total completion time in order to glob-
ally minimize the maximal or average flow times. This is o practical relevance
because minimizing the total completion time o ine is easier than minimizing
the maximal or average flow time o ine (see [11]). It is an open problem whether
locally optimal schedules minimizing the maximal or average flow time yield bet-
ter results. However, then the locally optimal schedules would be much harder
to compute. Thus, such algorithms would not be easible in practice.

2 Preliminaries

Let us rst sketch the problem under consideration. We are given a metric space
(X d). Moreover, there is a special vertex o X (the origin). Requests are
triples r = (t a b), where a is the start point o a transportation task, b its
end point, and t its release time, which is in this context the time where r
becomes known. A transportation move is a quadruple m = (t a b R), where a
is the starting point and b the end point, and t the starting time, while R is the
set (possibly empty) o requests carried by the move. The arrival time o a move
is the sum o its starting time and d(a b). A (closed) transportation schedule is
a sequence (m1 m2 ) o transportation moves such that

the rst move starts in the origin o;


the starting point o mi is the end point o mi−1 ;
the starting time o mi carrying R is no earlier than the maximum o the
arrival time o mi and the release times o all requests in R.
the last move ends in the origin o.
The Online Dial-a-Ride Problem under Reasonable Load 129

An online-algorithm or OlD rp has to move a server in X so as to ul ll all


released transportation tasks without preemption (i.e., once an object has been
picked up it is not allowed to be dropped at any other place than its destination),
while it does not know about requests that come up in the uture. In order to plan
the work o the server, the online-algorithm may maintain a preliminary (closed)
transportation schedule or all known requests, according to which it moves the
server. A posteriori, the moves o the server induce a complete transportation
schedule that may be compared to an o ine transportation schedule that is
optimal with respect to some objective unction (competitive analysis).
We are concerned with the ollowing objective unctions:
The total completetion time (or makespan) Acomp ( ) o the solution pro-
duced by algorithm A on a request set is the time need by algorithm A to
serve all requests in .
The maximal resp. average flow time Amaxflow ( ) resp. Aflow ( ) is the max-
imal resp. average o the di erences between the completion times produced
by A and the release times o the requests in .

We start with some use ul notation.

De nition 1. The release time of a request r is denoted by t(r).


The o ine version of r = (t a b) is the request

ro ine
:= (0 a b)

The o ine version of R is the request set



Ro ine := ro ine : r R

An important characteristic o a request set with respect to system load


considerations is the time period in which it is released.

De nition 2. Let R be a nite request set for OlD rp. The release span (R)
of R is de ned as
(R) := max t(r) − min t(r)
r2R r2R

Provably good algorithms exist or the total completion time and the weighted
sum o completion times. How can we make use o these algorithms in order to
get per ormance guarantees or minimizing the maximal (average) waiting (flow)
times? We suggest a way o characterizing request sets which we want to consider
reasonable .

3 Reasonable Load
In a continuously operating system we wish to guarantee that work can be
accomplished at least as ast as it is presented. In the ollowing we propose a
mathematical set-up which models this idea in a worst-case ashion. Since we are
130 Dietrich Hauptmeier, Sven O. Krumke, and Jörg Rambau

always working on nite subsets o the whole request set the request set itsel
may be in nite, modeling a continuously operating system.
We start by relating the release spans o nite subsets o a request set to the
time we need to ul ll the requests.
De nition 3. Let R be a request set for the OlD rp. A weakly monotone
function 
IR IR
:
( );
is a load bound on R if for any IR and any nite subset S of R with (S)
the completion time OPTcomp (S o ine ) of the optimum schedule for the o ine
version S o ine of S is at most ( ). In formula:
OPTcomp (S o ine
) ( )

Remark 1. I the whole request set R is nite then there is always the trivial
load bound given by the total completion time o R. For every load bound we
may set (0) to be the maximum completion time we need or a single request,
and nothing better can be achieved.
A stable situation would be characterized by a load bound equal to the
identity on IR. In that case we would never get more work to do than we can
accomplish. I it has a load bound equal to a unction id , where id is the
identity and where 0, then measures the tolerance o the request set: An
algorithm that is by a actor worse then optimal will still accomplish all the
work that it gets. However, we cannot expect that the identity (or any linear
unction) is a load bound or OlD rp because o the ollowing observation: a
request set consisting o one single request has a release span o 0 whereas in
general it takes non-zero time to serve this request. In the ollowing de nition
we introduce a parameter describing how ar a request set is rom being load-
bounded by the identity.
De nition 4. A load bound is ( , )-reasonable for some IR if
( ) for all
A request set R is ( , )-reasonable if it has a ( , )-reasonable load bound. For
= 1, we say that the request set is -reasonable.
In other words, a load bound is ( , )-reasonable, i it is bounded rom above
by 1 id (x) or all x and by the constant unction with value 1
otherwise.
Remark 2. I is su ciently small so that all request sets consisting o two or
more requests have a release span larger than then the rst-come- rst-serve
strategy is good enough to ensure that there are never more than two unserved
requests in the system. Hence, the request set does not require scheduling the
requests in order to provide or a stable system. (By stable we mean that the
number o unserved requests in the system does not become arbitrarily large.)
The Online Dial-a-Ride Problem under Reasonable Load 131

In a sense, is a measure or the combinatorial di culty o the request set R.


Thus, it is natural to ask or per ormance guarantees or algorithms in terms o
this parameter. This is done or the algorithm IGNORE in the next section.

4 Bounds for the Flow Times of IGNORE

We are now in a position to prove bounds or the maximal resp. average flow
time in the OlD rp or algorithm IGNORE stated in Theorem 1. We start by
recalling the algorithm IGNORE rom [1]

De nition 5 (Algorithm IGNORE). Algorithm IGNORE works with an inter-


nal buffer. It may assume the following states (initially it is IDL ):

IDL Wait for the next point in time when requests become available. Goto
PLAN.
BUSY While the current schedule is in work store the upcoming requests in a
bu er ( ignore them ). Goto IDL if the bu er is empty else goto PLAN.
PLAN Produce a preliminary transportation schedule for all currently avail-
able requests R (taken from the bu er) with (approximately) minimal total
completion time OPTcomp (Ro ine ) for Ro ine . (Note: This yields a feasi-
ble transportation schedule for R because all requests in R are immediately
available.) Goto BUSY.

We assume that IGNORE solves o ine instances o OlD rp employing a


-approximation algorithm. Recall that a -approximation algorithm is a poly-
nomial algorithm that always nds a solution that is at most times the optimum
objective value.
Let us consider the intervals in which IGNORE organizes its work in more
detail. The algorithm IGNORE induces a dissection o the time axis IR in the
ollowing way: We can assume, w.l.o.g., that the rst set o requests arrives at
time 0. Let 0 = 0, i.e., the point in time where the rst set o requests is
released that are processed by IGNORE in its rst schedule. For i > 0 let i be
the duration o the time period the server is working on the requests that have
been ignored during the last i−1 time units. Then the time axis is split into the
intervals

[ 0 =0 0] ( 0 1] ( 1 1 + 2] ( 1 + 2 1 + 2 + 3]

Let us denote these intervals by I0 I1 I2 . Moreover, let Ri be the set o those


requests that come up in Ii . Clearly, the complete set o requests R is the disjoint
union o all the Ri .
At the end o each interval Ii we solve an o ine problem: all requests to
be scheduled are already available. The work on the computed schedule starts
immediately (at the end o interval Ii ) and is done i+1 time units later (at the
end o interval Ii+1 ). On the other hand, the time we need to serve the schedule
is not more than times the optimal completion time o Ri o ine . In other words:
132 Dietrich Hauptmeier, Sven O. Krumke, and Jörg Rambau

Lemma 1. For all i 0 we have

i+1 OPTcomp (Ri o ine


)

Let us now prove the rst statement o Theorem 1 in a slightly more general
version.

Theorem 2. Let > 0 and 1. For all instances of OlD rp with ( )-


reasonable request sets, IGNORE employing a -approximate algorithm for solving
o ine instances of OlD rp yields a maximal flow time of no more than 2 .

Proof. Let r be an arbitrary request in Ri or some i 0, i.e., r is released in Ii .


By construction, the schedule containing r is nished at the end o interval Ii+1 ,
i.e., at most i + i+1 time units later than r was released. Thus, or all i > 0 we
get that
IGNOREmaxflow (Ri ) i + i+1

I we can show that i or all i > 0 then we are done. To this end, let
: IR IR be a ( )-reasonable load bound or R. Then OPTcomp (Ri o ine )
( i ) because (Ri ) i.
By Lemma 1, we get or all i > 0

i+1 OPTcomp (Ri o ine


) ( i) max i

Using 0 = 0 the claim now ollows by induction on i.

The statement o Theorem 1 concerning the average flow time o IGNORE


ollows rom the act that the average is never larger then the maximum.

Corollary 1. Let > 0. For all -reasonable request sets algorithm IGNORE
yields a average flow time no more than 2 .

5 A Disastrous xample for REPLAN

We rst recall the strategy o algorithm REPLAN or the OlD rp. Whenever
a new request becomes available, REPLAN computes a preliminary transporta-
tion schedule or the set R o all available requests by solving the problem o
minimizing the total completion time o Ro ine .
Then it moves the server according to that schedule until a new request
arrives or the schedule is done. In the sequel, we provide an instance o OlD rp
and a -reasonable request set R such that the maximal and the average flow
time REPLANmaxflow (R) is unbounded, thereby proving the remaining assertions
o Theorem 1.

Theorem 3. There is an instance of OlD rp under reasonable load such that


the maximal and the average flow time of REPLAN is unbounded.
The Online Dial-a-Ride Problem under Reasonable Load 133

d
c

b
a
t
3 2 − −2 −2
0
Fig. 1. A sketch o a (2 23 )-reasonable instance o OlD rp ( = 18 ).

Proof. In Fig. 1 there is a sketch o an instance or the OlD rp. The metric
space is a path on our nodes a b c d; the length o the path is , the distances
are d(a b) = d(c d) = , and hence d(b c) = − 2 . At time 0 a request rom a
to d is issued; at time 3 2 − , the remaining requests periodically come in pairs
rom b to a and rom c to d, resp. The time distance between them is − 2 .
We show that or = 18 the request set R indicated in the picture is 2 23 -
reasonable. Indeed: it is easy to see that the rst request rom a to d does not
influence reasonableness. Consider an arbitrary set Rk o k adjacent pairs o
requests rom b to a resp. rom c to d. Then the release span (Rk ) o Rk is
(Rk ) = (k − 1)( − 2 )
The o ine version Rk o ine
o Rk can be served in time
comp
OPT (Rk o ine
) = 2 + (k − 1) 4
In order to nd the smallest paramter or which the request set Rk is
-reasonable we solve or the integer k − 1 and get
 
2
k−1= =3
−6
Hence, we can set to
:= OPTcomp (R4 o ine
) = 2 32
Now we de ne 8
< IR IR

: or <
:
otherwise
134 Dietrich Hauptmeier, Sven O. Krumke, and Jörg Rambau

t
3 2 − −2 −2
0
Fig. 2. The track o the REPLAN-server. Because a new pair o requests is issued
exactly when the server is still closer to the requests at the top all the requests
at the bottom will be postponed in an optimal preliminary schedule. Thus, the
server always returns to the top when a new pair o requests arrives.

By construction, is a load bound or R4 . Because the time gap a ter which


a new pair o requests occurs is certainly larger than the additional time we need
to serve it (o ine), is also a load bound or R. Thus, R is -reasonable, as
desired.
Now: how does REPLAN per orm on this instance? In Fig. 2 we see the track
o the server ollowing the preliminary schedules produced by REPLAN on the
request set R.
The maximal flow time o REPLAN on this instance is realized by the flow
time o the request (3 2 − b a), which is unbounded.
Moreover, since all requests rom b to a are postponed a ter serving all the
requests rom c to d we get that REPLAN produces an unbounded average flow
time as well.

In Fig. 3 we show the track o the server under the control o the IGNORE-
algorithm. A ter an initial ine cient phase the server ends up in a stable oper-
ating mode. This example also shows that the analysis o IGNORE in Sect. 4 is
sharp.

6 Reasonable Load as a General Framework

We introduced the new concept o reasonable request sets, using as example the
problem OlD rp. However, the concept can be applied to any combinatorial
The Online Dial-a-Ride Problem under Reasonable Load 135

Fig. 3. The track o the IGNORE-server.

online problem with (possibly in nte) sets o time stamped requests, such as on-
line scheduling, e.g., as described by Sgall [12], or the Online Traveling Salesman
Problem, studied by Ausiello et al. [3].
The algorithms IGNORE and REPLAN represent general online paradigms
which can be used or any online problem with time-stamped requests. We notice
that the proo o the result that the average and maximal flow and waiting times
o IGNORE are bounded by 2 has not explicitly drawn on any speci c property
o OlD rp this result holds or all combinatorial online problems with requests
given by their release times.
The proo that the maximal flow and waiting time o a -reasonable request
set is unbounded or REPLAN is equally applicable to the Online Traveling Sales-
man Problem by Ausiello et.al. [3]. We expect that the same is true or any su -
ciently di cult online problem with release times or very simple problems,
such as OlD rp on a zero dimensional space, the result trivially does not hold.

7 Conclusion
We have introduced the mathematical notion -reasonable describing the com-
binatorial di culty o a possibly in nite request set or OlD rp. For reason-
able request sets we have given bounds on the maximal resp. average flow time
o algorithm IGNORE or OlD rp; in contrast to this, there are instances o
OlD rp where algorithm REPLAN yields an unbounded maximal and average
flow time. One key property o our results is that they can be applied in contin-
uously working systems. Computer simulations have meanwhile supported the
theoretical results in the sense that algorithm IGNORE does not delay individual
requests or an arbitraryly long period o time, whereas REPLAN has a tendency
to do so [9].
136 Dietrich Hauptmeier, Sven O. Krumke, and Jörg Rambau

While the notion o -reasonable is applicable to minimizing maximal flow


time, it would be o interest to investigate an average analogue in order to prove
non-trivial bounds or the average flow times.

References
1. N. Ascheuer, S. O. Krumke, and J. Rambau. Online dial-a-ride problems: Mini-
mi ing the completion time. In Proceedings of t e 17t International Symposium
on T eoretical Aspects of Computer Science, Lecture Notes in Computer Science,
2000. To appear.
2. Mikhail J. Atallah and S. Rao Kosaraju. E cient solutions to some transportation
problems with applications to minimi ing robot arm travel. SIAM Journal on
Computing, 17:849 869, 1988.
3. Georgio Ausiello, Esteban Feuerstein, Stefano Leonardi, Leen Stougie, and Mau-
ri io Talamo. Algorithms for the on-line traveling salesman. Algorithmica, to
appear.
4. Allan Borodin and Ran El-Yaniv. Online Computation and Competitive Analysis.
Cambridge University Press, 1998.
5. Esteban Feuerstein and Leen Stougie. On-line single server dial-a-ride problems.
Theoretical Computer Science, special issue on on-line algorithms, to appear.
6. Amos Fiat and Gerhard J. Woeginger, editors. Online Algorit ms: T e State of
t e Art, volume 1442 of Lecture Notes in Computer Science. Springer, 1998.
7. Greg N. Frederickson and D. J. Guan. Nonpreemptive ensemble motion planning
on a tree. Journal of Algorit ms, 15:29 60, 1993.
8. Greg N. Frederickson, Matthew S. Hecht, and Chul Kim. Approximation algo-
rithms for some routing problems. SIAM Journal on Computing, 7:178 193, 1978.
9. Martin Grötschel, Dietrich Hauptmeier, Sven O. Krumke, and Jörg Rambau. Sim-
ulation studies for the online dial-a-ride-problem. Preprint SC 99-09, Konrad-Zuse-
Zentrum für Informationstechnik Berlin, 1999.
10. Dietrich Hauptmeier, Sven O. Krumke, Jörg Rambau, and Hans C. Wirth. Euler
is standing in line dial-a-ride problems with fo-precedence-contraints. In Pe-
ter Widmeyer, Gabriele Neyer, and Stephan Eidenben , editors, Grap T eoretic
Concepts in Computer Science, volume 1665 of Lecture Notes in Computer Science,
pages 42 54. Springer, Berlin Heidelberg New York, 1999.
11. Hans Kellerer, Thomas Tautenhahn, and Gerhard Woeginger. Approximability and
nonapproximabiblity results for minimi ing total flow time on a single machine. In
Proceedings of t e Symposium on t e T eory of Computing, 1996.
12. Jir Sgall. Online scheduling. In Amos Fiat and Gerhard J. Woeginger, editors,
Online Algorit ms: T e State of t e Art, volume 1442 of Lecture Notes in Computer
Science. Springer, 1998.
13. David B. Shmoys, Joel Wein, and David P. Williamson. Scheduling parallel ma-
chines on-line. SIAM Journal on Computing, 24(6):1313 1331, December 1995.
The Online-TSP against Fair Adversaries

Michiel Blom1 , Sven O. Krumke2 , Willem de Paepe3 , and Leen Stougie4


1
KPN-Research, P. O. Box 421, 2260AK Leidschendam, The Netherlands,
2
Konrad-Zuse-Zentrum für Informationstechnik Berlin, Department Optimization,
Takustr. 7, D-14195 Berlin-Dahlem, Germany,
[email protected]
3
Faculty of Technology Management, Technische Universiteit indhoven,
P. O. Box 513, 5600MB indhoven, The Netherlands,
[email protected]
4
Faculty of Mathematics and Computer Science, Technische Universiteit indhoven,
P. O. Box 513, 5600MB indhoven, The Netherlands,
L.Stou [email protected]

Ab tract. In the online traveling salesman problem requests for visits


to cities (points in a metric space) arrive online while the salesman is
traveling. The salesman moves at no more than unit speed and starts and
ends his work at a designated origin. The objective is to nd a routing
for the salesman which nishes as early as possible.
We consider the online traveling salesman problem when restricted to the
non-negative part of the real line. We show that a very natural strategy
is 3 2-competitive which matches our lower bound. The main contribu-
tion of the paper is the presentation of a fair adversary , as an alterna-
tive to the omnipotent adversary used in competitive analysis for online
routing problems. The fair adversary is required to remain inside the con-
vex hull of the requests released so far. We show that on IR+0 algorithms
can achieve a strictly better competitive ratio against a fair adversary
than against a conventional adversary. Speci cally, we present an algo-
rithm against a fair adversary with competitive ratio (1 + 17) 4 1 28
and provide a matching lower bound. We also show competitiveness re-
sults for a special class of algorithms (called diligent algorithms) that
do not allow waiting time for the server as long as there are requests
unserved.

1 Introduction

The traveling salesman problem is a well studied problem in combinatorial op-


timization. In the classical setting, one assumes that the complete input o an
instance is available or an algorithm to compute a solution. In many cases this
?
Supported by the TMR Network DON T of the uropean Community RB TMRX-
CT98-0202
??
Research supported by the German Science Foundation (DFG, grant Gr 883/5-2)
???
Supported by the TMR Network DON T of the uropean Community RB TMRX-
CT98-0202

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 137 149, 2000.

c Springer-Verlag Berlin Heidelberg 2000
138 Michiel Blom et al.

o ine optimization model does not reflect the real-world situation appropri-
ately. In many applications not all requests or points to be visited are known
in advance. Decisions have to be made online without knowledge about uture
requests.
Online algorithms are tailored to cope with such situations. Whereas o ine
algorithms work on the complete input sequence, online algorithms only see the
requests released so ar and thus, in planning a route, have to account or uture
requests that may or may not arise at a later time. A common way to evaluate
the quality o online algorithms is competitive analysis [3,5].
In this paper we consider the ollowing online variant o the traveling sales-
man problem (called Olt p in the sequel) which was introduced in [2]. Cities
(requests) arrive online over time while the salesman is traveling. The requests
are to be handled by a salesman-server that starts and ends his work at a des-
ignated origin. The objective is to nd a routing or the server which nishes as
early as possible (in scheduling theory this goal is usually re erred to as mini-
mizing the makespan). In this model it is easible or the server to wait at the
cost o time that elapses. Decisions are revocable, as long as they have not been
executed. Only history is irrevocable.

Previous Work. Ausiello et al. [2] present a 2-competitive algorithm or Olt p


which works in general metric spaces. The authors also show that or general
metric spaces no deterministic algorithm can be c-competitive with c < 2. For
the special case that the metric space is IR, the real line, they give a lower bound
o (9 + 17) 8 1 64 and a 7 4-competitive algorithm. Just very recently Lip-
mann [7] devised an algorithm that is best possible or this case with competitive
ratio equal to the be ore mentioned lower bound.

Our Contribution. In this paper the e ect o restricting the class o algorithms
allowed and restricting the power o the adversary in the competitive analysis
is studied. We introduce and analyze a new class o online algorithms which we
call diligent algorithms. Roughly speaking, a diligent algorithm never sits idle
while there is work to do. A precise de nition is presented in Section 3 where we
also show that in general diligent algorithms are strictly weaker than algorithms
that allow waiting time. In particular we show that no diligent algorithm can
achieve a competitive ratio lower than 7 4 or the Olt p on the real line. The
7 4-competitive algorithm in [2] is in act a diligent algorithm and there ore best
possible within this restricted class o algorithms.
We then concentrate on the special case o Olt p when the underlying met-
ric space is IR+0 , the non-negative part o the real line. In Section 4 we show that
an extremely simple and natural diligent strategy is 3 2-competitive and that
this result is best possible or (diligent and non-diligent) deterministic algorithms
on IR+0 . The main contribution is contained in Section 5. Here we deal with an
objection requently encountered against competitive analysis concerning the un-
realistic power o the adversary against which per ormance is measured. Indeed,
in the Olt p on the real line the be ore mentioned 7 4-competitive algorithm
The Online-TSP against Fair Adversaries 139

reaches its competitive ratio against an adversary that moves away rom the
previously released requests without giving any in ormation to the online al-
gorithm. We introduce a fair adversary that is in a natural way restricted in
the context o the online traveling salesman problem studied here. It should be
seen as a more reasonable adversary model. A air adversary always keeps its
server within the convex hull o the requests released so ar. We show that this
adversary model indeed allows or lower competitive ratios. For instance, the
above mentioned 3 2-competitive diligent strategy against the conventional ad-
versary is 4 3-competitive against the air adversary. This result is best possible
or diligent algorithms against a air adversary.

Diligent Algorithms General Algorithms


General Adversary LB = UB = 3 2 LB = UB = 3 2
Fair Adversary LB = UB = 4 3 LB = UB = (1 + 17) 4

Table 1. Overview o the lower bound (LB) and upper bound (UB) results or
the competitive ratio o deterministic algorithms or Olt p on IR+
0 in this paper.

We also present a non-diligent algorithm with competitive ratio (1+ 17) 4


1 28 < 4 3 competing against the air adversary. Our result is the rst one that
shows that waiting is actually advantageous in the Olt p. The be ore mentioned
algorithm in [7] also uses waiting, but became known a ter the one presented in
this paper and has not been o cially published yet. Such results are known al-
ready or online scheduling problems (see e.g. [6,4,8]) and, again very recently,
also or an online dial-a-ride problem [1]. Our competitiveness result is com-
plemented by a matching lower bound on the competitive ratio o algorithms
against the air adversary. Table 1 summarizes our results or Olt p on IR+ 0.

Preliminaries

An instance o the Online Traveling Salesman Problem (Olt p) consists o a


metric space M = (X d) with a distinguished origin o M and a sequence =
1 m o requests. In this paper we are mainly concerned with the special
case that M is IR+ 0 , the non-negative part o the real line endowed with the
Euclidean metric, i.e., X = IR+ 0 = x IR : x 0 and d(x y) = x − y . A
server is located at the origin o at time 0 and can move at most at unit speed.
Each request is a pair i = (ti xi ), where ti IR is the time at which
request i is released (becomes known), and xi X is the point in the metric
space requested to be visited. We assume that the sequence = 1 m o
requests is given in order o non-decreasing release times. For a real number t
we denote by t and <t the subsequence o requests in released up to time t
and strictly be ore time t, respectively.
140 Michiel Blom et al.

It is assumed that the online algorithm does neither have in ormation about
the time when the last request is released nor about the total number o requests.
An online algorithm or Olt p must determine the behavior o the server
at a certain moment t o time as a unction o all the requests in t (and
the current time t). In contrast, the o ine algorithm has in ormation about all
requests in the whole sequence already at time 0. A easible online/o ine
solution is a route or the server which serves all requested points, where each
request is served not earlier than the time it is released, and which starts and
ends in the origin o.
The objective in the Olt p is to minimize the total completion time (also
called the makespan in scheduling) o the server, that is, the time when the
server has served all requests and returned to the origin.
Let alg( ) denote the completion time o the server moved by algorithm alg
on the sequence o requests. We use opt to denote the optimal o ine algo-
rithm. An online algorithm alg or Olt p is c-competitive, i there exists a con-
stant c such that or every request sequence the inequality alg( ) c opt( )
holds.

3 Diligent Algorithms
In this section we introduce a particular class o algorithms or Olt p which we
call diligent algorithms. Intuitively, a diligent algorithm should never sit and wait
when it could serve yet unserved requests. A diligent server should also move
towards work that has to be done directly without any detours. To translate this
intuition into a rigorous de nition some care has to be taken.

De nition 1 (Diligent Algorithm). An algorithm alg for Olt p is called


diligent, if it satis es the following conditions:
1. If there are still unserved requests, then the direction of the server operated
by alg changes only if a new request becomes known, or the server is either
in the origin or at a request that has just been served.
2. At any time when there are yet unserved requests, the server operated by alg
either moves towards an unserved request or the origin at maximum (i.e.
unit) speed. The latter case is only allowed if the server operated by alg is
not yet in the origin.

We emphasize that a diligent algorithm is allowed to move its server towards


an unserved request and change his direction towards another unserved request
or to the origin at the moment a new request becomes known.

Lemma 1. No diligent online algorithm for Olt p on the real line IR has com-
petitive ratio of less than 7 4.

Proof. Suppose that alg is a diligent algorithm or Olt p on the real line.
Consider the ollowing adversarial input sequence. At time 0 two requests 1 =
(0 1 2) and 2 = (1 2 0) are released. There will be no urther requests be ore
The Online-TSP against Fair Adversaries 141

time 1. Thus, by the diligence o the algorithm the server will be at the origin
at time 1.
At time 1 two new requests at points 1 and −1, respectively, are released.
Since the algorithm is diligent, starting at time 1 it must move its server to
one o these requests at maximum, i.e., unit, speed. Without loss o generality
assume that this is the request at 1. alg’s server will reach this point at time 2.
Starting at time 2, alg will have to move its server either directly towards the
unserved request at −1 or towards the origin, which essentially gives the same
movement and implies that the server is at the origin at time 3. At that time,
the adversary issues another request at 1. Thus, alg’s server will still need at
least 4 time units to serve −1 and 1 and return at the origin. There ore, he will
not be able to complete its work be ore time 7.
The adversary handles the sequence by rst serving the request at −1, then
the two requests at 1 and nally returns to the origin at time 4, yielding the
desired result.
This lower bound shows that the 7 4-competitive algorithm presented in [2],
which is in act a diligent algorithm is best possible within the class o diligent
algorithms or the Olt p on the real line.

4 The Oltsp on the Non-negative Part of the Real Line


We rst consider Olt p on IR+
0 when the o ine adversary is the conventional
(omnipotent) opponent.
Theorem 1. No deterministic algorithm for Olt p on IR+
0 has a competitive
ratio of less than 3 2.
Proof. At time 0 the request 1 = (0 1) is released. Let T be the time that the
server operated by alg has served the request 1 and returned to the origin 0.
I T 3, then no urther request is released and alg is no better than 3 2-
competitive since opt( 1 ) = 2. Thus, assume that T < 3. In this case the
adversary releases a new request 2 = (T T ). Clearly, opt( 1 2 ) = 2T . On the
other hand alg( 1 2 ) 3T , yielding a competitive ratio o 3 2.
The ollowing extremely simple strategy achieves a competitive ratio that
matches this lower bound (as we will show below):
Strategy mrin( Move-Right-If-Necessary ) I a new request is released and
the request is to the right o the current position o the server operated
by mrin, then the mrin-server starts to move right at ull speed. The server
continues to move right as long as there are yet unserved requests to the
right o the server. I there are no more unserved requests to the right, then
the server moves towards the origin 0 at ull speed.
It is easy to veri y that Algorithm mrin is in act a diligent algorithm. The
ollowing theorem shows that the strategy has a best possible competitive ratio
or Olt p on IR+ 0.
142 Michiel Blom et al.

Theorem . Strategy mrin is a diligent 3 2-competitive algorithm for the


Olt p on the non-negative part IR+
0 of the real line.

Proof. We show the theorem by induction on the number o requests in the


sequence . It clearly holds i contains at most one request. The induction
hypothesis is that it holds or any sequence o m − 1 requests.
Suppose that request m = (t x) is the last request o = 1 m−1 m.
I t = 0, then mrin is obviously 3 2-competitive, so we will assume that t > 0.
Let f be the position o the request unserved by the mrin-server at time t
(excluding m ), which is urthest away rom the origin.
In case x f , mrin’s cost or serving is equal to the cost or serving the
sequence consisting o the rst m − 1 requests o . Since new requests can never
decrease the optimal o ine cost, the induction hypothesis implies the theorem.
Now assume that f < x. Thus, at time t the request in x is the urthest
unserved request. mrin will complete its work no later than time t + 2x. The
optimal o ine cost opt( ) is bounded rom below by max t + x 2x . There ore,

mrin( ) t+x x t+x x 3


+ + =
opt( ) opt( ) opt( ) t + x 2x 2

The result established above can be used to obtain competitiveness results


or the situation o the Olt p on the real line when there are more than one
server, and the goal is to minimize the time when the last o its servers returns
to the origin 0 a ter all requests have been served.

Lemma . There is an optimal o ine strategy for Olt p on the real line with
k 2 servers such that no server ever crosses the origin.

Proof. Omitted in this abstract.

Corollary 1. There is a 3 2-competitive algorithm for the Olt p with k 2


servers on the real line.

5 Fair Adversaries

The adversaries used in the bounds o the previous section are abusing their
power in the sense that they can move to points where they know a request
will pop up without revealing the request to the online server be ore reaching
the point. As an alternative we propose the ollowing more reasonable adversary
that we baptized fair adversary. We show that we can obtain better competitive
ratios or the Olt p on IR+ 0 under this model. We will also see that under
this adversary model there does exist a distinction in competitiveness between
diligent and non-diligent algorithms. Recall that <t is the subsequence o
consisting o those requests with release time strictly smaller than t.
The Online-TSP against Fair Adversaries 143

De nition (Fair Adversary). An o ine adversary for the Olt p in the


uclidean space (IRn ) is air, if at any moment t, the position of the server
operated by the adversary is within the convex hull of the origin o and the re-
quested points from <t .

In the special case o IR+


0 a air adversary must always keep its server in the
interval [0 F ], where F is the position o the request with the largest distance
to the origin 0 among all requests released so ar. The ollowing lower bound
result shows that the Olt p on the real line against air adversaries is still a
non-trivial problem.

Theorem 3. No deterministic algorithm for Olt p on IR has competitive ratio


less than (5 + 57) 8 1 57 against a fair adversary.

Proof. Suppose that there exists a c-competitive online algorithm. The adver-
sarial sequence starts with two requests at time 0, 1 = (0 1) and 2 = (0 −1).
Without loss o generality, we suppose that the rst request that is served is 1 .
At time 2 the online server can’t have served both requests. We distinguish two
main cases divided in some sub-cases.
Case 1: None of the requests has been served at time 2.

I at time 3 request 1 is still unserved, let t0 be the rst time the server
crosses the origin a ter serving the request. Clearly, t0 4. At time t0 the
0
online server still has to visit the request in −1. I t > 4c − 2 the server can
not be c-competitive because the air adversary can nish the sequence at
time 4.
Thus, suppose that 4 t0 4c − 2. At time t0 a new request 3 = (t0 1)
is released. The online server can not nish the complete sequence be ore
t0 +4
t0 + 4, whereas the adversary needs at t0 + 1. There ore, c t0 +1 and or
4 t0 4c − 2 we have that

(4c − 2) + 4 4c + 2
c =
(4c − 2) + 1 4c − 1

implying c (5 + 57) 8.
I at time 3 the request 1 has already been served, the online server can not
be to the le t o the origin at time 3 (given the act that at time 2 no request
had been served). The adversary now gives a new request 3 = (3 1). There
are two possibilities: either 2 , the request in −1, is served be ore 3 or the
other way round.
I the server decides to serve 2 be ore 3 then it can not complete be ore
time 7. Since the adversary completes the sequence in time 4, the competitive
ratio is at least 7 4.
I the online server serves 3 rst, then again, let t0 be the time that the server
crosses the origin a ter serving 3 . As be ore, we must have 4 t0 4c − 2.
At time t0 the ourth request 4 = (t0 1) is released. The same arguments as
above apply to show that the algorithm is at least (5 + 57) 8-competitive.
144 Michiel Blom et al.

Case : One of the requests has been served at time 2 by the online server.
We assume without loss o generality that 1 has been served. At time 2 the
third request 3 = (2 1) is released. In act, we are back in the situation in
which at time 2 none o the two requests are served. In case the movements o
the online server are such that no urther request is released by the adversary,
the latter will complete at time 4. In the other cases the last released requests
are released a ter time 4 and the adversary can still reach them in time.

For comparison, the lower bound on the competitive ratio or the Olt p in IR
against an adversary that is not restricted to be air is (9 + 17) 8 [2]. As men-
tioned be ore, only recently Lipmann [7] presented a (9 + 17) 8-competitive
algorithm against a non- air adversary. He conjectures that a similar type o
algorithm will also be best possible against a air adversary. In contrast, the
picture or the problem on the non-negative part o the real line is already com-
plete (see Theorems 5 and 6 or diligent algorithms and Theorems 4 and 7 or
non-diligent algorithms below).

Theorem 4. No deterministic algorithm for Olt p on IR0 has competitive


ratio of less than (1 + 17) 4 1 28 against a fair adversary.

Proof. Suppose that alg is c-competitive. At time 0 the adversary releases re-
quest 1 = (0 1). Let T denote the time that the server operated by alg has
served this request and is back at the origin. For alg to be c-competitive, we
must have that T c opt( 1 ) = 2c, otherwise no urther requests will be
released. At time T the adversary releases a second request 2 = (T 1). The
completion time o alg becomes then at least T + 2.
On the other hand, starting at time 0 the air adversary moves its server
to 1, lets it wait there until time T and then goes back to the origin 0 yielding
a completion time o T + 1. There ore,

alg( ) T +2 2c + 2 1
=1+
opt( ) T +1 2c + 1 2c + 1

given the act that T 2c. Since by assumption alg is c-competitive, we have
that 1 + 1 (2c + 1) c, implying that c (1 + 17) 4.

For diligent algorithms we can show a higher lower bound against a air
adversary.

Theorem 5. No deterministic diligent algorithm for Olt p on IR+


0 has com-
petitive ratio of less than 4 3 against a fair adversary.

Proof. Consider the adversarial sequence 1 = (0 1), 2 = (1 0), and 3 = (2 1).


By its diligence the online algorithm will start to travel to 1 at time 0, back to 0
at time 1, arriving there at time 2. Then its server has to visit 1 again, so that
he will nish no earlier than time 4. Obviously, the optimal o ine solution is to
leave 1 not be ore time 2, and nishing at time 3.
The Online-TSP against Fair Adversaries 145

We show now that the algorithm mrin presented be ore has a better com-
petitive ratio against the air adversary than the ratio o 3 2 achieved against a
conventional adversary. In act we show that the ratio matches the lower bound
or diligent algorithms proved in the previous theorem.

Theorem 6. Strategy mrin is a 4 3-competitive algorithm for the Olt p on IR+


0
against a fair adversary.

Proof. Omitted in this abstract.

Thus, Algorithm mrin attains a best possible competitive ratio against the
air adversary among all diligent algorithms. Given the lower bound or general
non-diligent algorithms in Theorem 4 we aim now at designing an online algo-
rithm that obtains better competitive ratios against a air adversary. In view
o Theorem 5 such an algorithm will have to be non-diligent, i.e., incorporate
waiting times.
The problem with Algorithm mrin is that shortly a ter it starts to return
towards the origin rom the urthest previously unserved request, a new request
to the right o its server arrives. In this case the mrin-server has to return to
a position it just le t. Algorithm w presented below attempts success ully to
avoid this pit all.

Strategy ws( Wait Smartly ) The w -server moves right i there are yet
unserved requests to the right o his present position. Otherwise, it takes
the ollowing actions. Suppose it arrives at his present position, which is a
currently rightmost unserved request, s(t) at time t.
1. Compute the the optimal o ine solution value opt( t ) or all requests
released up to time t.
2. Determine a waiting time W := opt( t ) − s(t) − t, with =
(1 + 17) 4.
3. Wait at point s(t) until time t + W and then start to move back to the
origin 0.

We notice that when the server is moving back to the origin and no new
requests are released until time t + W + s(t), then the w -server reaches the
origin 0 at time t + W + s(t) = opt( t ) having served all requests released
so ar. I a new request is released at time t0 W + t + s(t) and the request is
to the right o s(t0 ), then the w -server starts to move to the right immediately
until it reaches the urthest unserved request.

Theorem 7. Algorithm w is -competitive with = (1 + 17) 4 1 28 for


the Olt p on IR+
0 against a fair adversary.

Proof. By the de nition o the waiting time it is su cient to prove that at any
point where a waiting time is computed this waiting time is non-negative. In
that case the server will always return at o be ore time opt( ). This is clearly
true i the sequence contains at most one request. We make the induction
hypothesis that it is also true or any sequence o at most m − 1 requests.
146 Michiel Blom et al.

Let = 1 m be any sequence o requests and let m =: (t x) be the


request released last. I t = 0, then there is nothing le t to show, so we will
assume or the remainder o the proo that t > 0.
We denote by s(t) and s (t) the positions o the w - and the air adver-
sary’s server at time t, respectively. We also let f = (tf f ) be the urthest
(i.e. most remote rom the origin) yet unserved request by w at time t exclud-
ing the request m . Finally, let = (t F ) be the urthest released request
in 1 m−1 . Obviously f F . Again, we distinguish three di erent cases
depending on the position o x relative to f and F .
Case 1: x f
Since the w -server has to travel to f anyway and by the induction hypothesis
there was a non-negative waiting time in f or s(t) (depending on whether s(t) >
f or s(t) f ) be ore request m was released, the waiting time in f or s(t) can
not decrease since the optimal o ine completion time can not decrease by an
additional request).
Case : f x < F
I s(t) x, then again by the induction hypothesis and the act that the route
length o w ’s server does not increase, the possible waiting time at s(t) is non-
negative.
Thus we can assume that s(t) < x. The w -server will now travel to point x,
arrive there at time t + d(s(t) x), and possibly wait there some time W be ore
returning to the origin, with
W = opt( ) − (t + d(s(t) x)) − x
Inserting the obvious lower bound opt( ) t + x yields
W ( − 1)opt( ) − d(s(t) x) (1)
0
To bound opt( ) in terms o d(s(t) x) consider the time t when the w -
server had served the request at F and started to move le t. Clearly t0 < t
since otherwise s(t) could not be smaller than x as assumed. Thus, the sub-
sequence t0 o does not contain (t x). By the induction hypothesis, w is
-competitive or the sequence t0 . At time t0 when he le t F he would have
arrived in the origin at time opt( t0 ), i.e.,
t0 + F = opt( t0 ) (2)
Notice that t t0 + d(F s(t)). Since opt( t0 ) 2F we obtain rom (2) that
t 2F − F + d(F s(t)) = (2 − 1)F + d(s(t) F ) (3)
Since by assumption we have s(t) < x < F we get that d(s(t) x) d(s(t) F )
and d(s(t) x) F , which inserted in (3) yields
t (2 − 1)d(s(t) x) + d(s(t) x) = 2 d(s(t) x) (4)
We combine this with the previously mentioned lower bound opt( ) t + x to
obtain:
opt( ) 2 d(s(t) x) + x (2 + 1)d(s(t) x) (5)
The Online-TSP against Fair Adversaries 147

Using inequality (5) in (1) gives


W ( − 1)(2 + 1)d(s(t) x) − d(s(t) x)
= (2 2
− − 2)d(s(t) x)
9+ 17 1+ 17
= − − 2 d(s(t) x)
4 4
=0
This completes the proo or the second case.
Case 3: f F x
Starting at time t the w -server moves to the right until he reaches x, and a ter
waiting there an amount W returns to 0, with
W = opt( ) − (t + d(s(t) x)) − x (6)
We will show that also in this case W 0. At time t the adversary’s server still
has to travel at least d(s (t) x) + x units. This results in
opt( ) t + d(s (t) x) + x
Since the o ine adversary is air, its position s (t) at time t can not be strictly
to the right o F .
opt( ) t + d(F x) + x (7)
Insertion into (6) yields
W ( − 1)opt( ) − d(s(t) F ) (8)
since F > s(t) by de nition o the algorithm.
The rest o the arguments are similar to those used in the previous case.
Again that w ’s server started to move to the le t rom F at some time t0 < t,
and we have
t0 + F = opt( t0 ) (9)
Since t t0 + d(s(t) F ) and opt( t0 ) 2F we obtain rom (9) that
t 2F − F + d(s(t) F ) = (2 − 1)F + d(s(t) F ) 2 d(s(t) F )
We combine this with (7) and the act that x d(s(t) F ) to achieve
opt( ) 2 d(s(t) F ) + d(F x) + x (2 + 1)d(s(t) F )
Using this inequality in (8) gives
W ( − 1)(2 + 1)d(s(t) F ) − d(s(t) F )
= (2 2
− − 2)d(s(t) F )
9+ 17 1+ 17
= − − 2 d(s(t) F )
4 4
=0
This completes the proo .
148 Michiel Blom et al.

6 Conclusions

We introduced an alternative more air per ormance measure or online algo-


rithms or the traveling salesman problem. The rst results are encouraging.
On the non-negative part o the real line the air model allows a strictly lower
competitive ratio than the conventional model with an omnipotent adversary.
Next to that we considered a restricted class o algorithms or the online
traveling salesman problems, suggestively called diligent algorithms. We showed
that in general diligent algorithms have strictly higher competitive ratios than
algorithms that sometimes leaves the server idle, to wait or possible additional
in ormation. In online routing companies, like courier services or transporta-
tion companies waiting instead o immediately starting as soon as requests are
presented is common practice. Our results support this strategy.
It is still open to nd a best possible non-diligent algorithm or the problem on
the real line against a air adversary. However, it is very likely that an algorithm
similar to the best possible algorithm presented in [7] against a non- air adversary
will appear to be best possible or this case.
We notice here that or general metric spaces the lower bound o 2 on the
competitive ratio o algorithms in [2] is established with a air adversary as
opponent. Moreover, a diligent algorithm is presented which has a competitive
ratio that meets the lower bound.
We hope to have encouraged research into ways to restrict the power o
adversaries in online competitive analysis.

Acknowledgement: Thanks to Maarten Lipmann or providing the lower


bound in Theorem 3.

References

1. N. Ascheuer, S. O. Krumke, and J. Rambau. Online dial-a-ride problems: Minimiz-


ing the completion time. In Proceedings of t e 17t International Symposium on
T eoretical Aspects of Computer Science, Lecture Notes in Computer Science, 2000.
To appear.
2. G. Ausiello, . Feuerstein, S. Leonardi, L. Stougie, and M. Talamo. Algorithms for
the on-line traveling salesman. Algorit mica, 1999. To appear.
3. A. Borodin and R. l-Yaniv. Online Computation and Competitive Analysis. Cam-
bridge University Press, 1998.
4. B. Chen, A. P. A. Vestjens, and G. J. Woeginger. On-line scheduling of two-machine
open shops where jobs arrive over time. Journal of Combinatorial Optimization,
1:355 365, 1997.
5. A. Fiat and G. J. Woeginger, editors. Online Algorit ms: T e State of t e Art,
volume 1442 of Lecture Notes in Computer Science. Springer, 1998.
6. J. A. Hoogeveen and A. P. A. Vestjens. Optimal on-line algorithms for single-
machine scheduling. In Proceedings of t e 5t Mat ematical Programming Soci-
ety Conference on Integer Programming and Combinatorial Optimization, Lecture
Notes in Computer Science, pages 404 414, 1996.
The Online-TSP against Fair Adversaries 149

7. M. Lipmann. The online traveling salesman problem on the line. Master’s thesis,
Department of Operations Research, University of Amsterdam, The Netherlands,
1999.
8. C. Philips, C. Stein, and J. Wein. Minimizing average completion time in the
presence of release dates. In Proceedings of t e 4t Works op on Algorit ms and
Data Structures, volume 955 of Lecture Notes in Computer Science, pages 86 97,
1995.
uickHeapsort,
an E cient Mix of Classical Sorting Algorithms

Domenico Cantone and Gianluca Cincotti

Dipartimento di Matematica e Informatica, Universita di Catania


Viale A. Doria 6, I 95125 Catania, Italy
cantone,cincotti @c .unict.it

Abstract. We present a practically e cient algorithm for the internal


sorting problem. Our algorithm works in-place and, on the average, has a
running-time of O(n log n) in the length n of the input. More speci cally,
the algorithm performs n log n + 3n comparisons and n log n + 2 65n
element moves on the average.
An experimental comparison of our proposed algorithm with the most
e cient variants of uicksort and Heapsort is carried out and its results
are discussed.
Keywords: In-place sorting, heapsort, quicksort, analysis of algorithms.

1. Introduction
The problem of sorting an initially unordered collection of keys is one of the most
classical and investigated problems in computer science. Many di erent sorting
algorithms exist in literature. Among the comparison based sorting methods,
uicksort [7, 17, 18] and Heapsort [4, 21] turn out, in most cases, to be the most
e cient general-purpose sorting algorithms.
A good measure of the running-time of a sorting algorithm is given by the
total number of key comparisons and the total number of element moves per-
formed by it. In our presentation, we mainly focus our attention on the number of
comparisons, since this often represents the dominant cost in any reasonable im-
plementation. Accordingly, to sort n elements the classical uicksort algorithm
performs 1 386n log n − 2 846n + 1 386 log n key comparisons on the average
and O(n2 ) key comparisons in the worst-case, whereas the classical Heapsort
algorithm, due to Floyd [4], performs 2n log n + (n) key comparisons in the
worst-case.
Several variants of Heapsort are reported in literature. One of the most
e cient is the Bottom-Up-Heapsort algorithm discussed by Wegener in [19],
which performs n log n + f (n)n key comparisons on the average, where f (n)
[0 34 0 39], and no more than 1 5n log n+O(n) key comparisons in the worst-
case. In [9, 10], Katajainen uses a median- nding procedure to reduce the num-
ber of comparisons required by Bottom-Up-Heapsort, completely eliminating
the sift-up phase. This idea has been further re ned by Rosaz in [16]. It is to
be noted, though, that the algorithms described in [9, 10, 16] are mostly of

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 150 162, 2000.

c Springer-Verlag Berlin Heidelberg 2000
uickHeapsort, an E cient Mix of Classical Sorting Algorithms 151

theoretical interest only, due to the overhead introduced by the median- nding
procedure.
This paper tries to build a bridge between theory and practice. More speci -
cally, our goal is to produce a practical sorting algorithm which couples some of
the theoretical ideas introduced in the algorithms cited above with the e cient
strategy used by uicksort.
Compared to uicksort, our proposed algorithm, called uickHeapsort, works
in-place , i.e. no stack is needed for recursion. Moreover, its average number
of comparisons is shown to be less than n log n + 3n. Its behavior is also ana-
lyzed from an experimental point of view by comparing it to that of Heapsort,
Bottom-Up-Heapsort, and some variants of uicksort. The results show that
uickHeapsort has a good practical behavior especially when key comparison
operations are computationally expensive.
The paper is organized as follows. In Section 2 we introduce a variant of the
Heapsort algorithm which does not work in-place, just to present the main idea
upon which uickHeapsort is based. The uickHeapsort algorithm is fully de-
scribed and analyzed in Section 3. An experimental session with some empirical
results aiming at evaluating and comparing its e ciency in practice is discussed
in Section 4. Section 5 concludes the paper with some nal remarks.

2. A Not In-Place Variant of the Heapsort Algorithm

In this section we illustrate a variant of the Heapsort algorithm, External-


Heapsort, which uses an external array to store the output. For this reason,
External-Heapsort is mainly of theoretical interest and we present it just to intro-
duce the main idea upon which the uickHeapsort algorithm, to be described in
the next section, is based. External-Heapsort sorts n elements in (n log n) time
by performing at most n log n + 2n key comparisons, and at most n log n + 4n
element moves in the worst-case.
We begin by recalling some basic concepts about the classical binary-heap
data structure. A max-heap is a binary tree with the following properties:

1. it is heap-shaped : every level is complete, with the possible exception of the


last one; moreover the leaves in the last level occupy the leftmost positions;
2. it is max-ordered : the key value associated with each non-root node is not
larger than that of its parent.

A min-heap can be de ned by substituting the max-ordering property with the


dual min-ordering one. The root of a max-heap (resp. min-heap) always contains
the largest (resp. smallest) element of the heap. We refer to the number of
elements in a heap as its size; the height of a heap is the height of the associated
binary tree.
A heap data structure of size n can be implicitly stored in an array A[1 n]
with n elements without using any additional pointer as follows. The root of the
heap is the element A[1]. Left and right sons (if they exist) of the node stored
152 Domenico Cantone and Gianluca Cincotti

into A[i] are, respectively, A[2i] and A[2i + 1], and the parent of the node stored
into A[i] (with i > 1) is A[ 2i ].
In all Heapsort algorithms, the input array is sorted in ascending order, by
rst building a max-heap and then by performing n extractions of the root
element. After each extraction, the element in the last leaf is rstly moved into
the root and subsequently moved down along a suitable path until the max-
ordering property is restored.
Bottom-Up-Heapsort works much like the classical Heapsort algorithm. The
only di erence lies is the rearrangement strategy used after each max-extraction:
starting from the root and iteratively moving down to the child containing the
largest key, when a leaf is reached it climbs up until it nds a node x with a key
larger than the root key. Subsequently, all elements in the path from x to the
root are shifted one position up and the old root is moved into x.
The algorithm External-Heapsort, whose pseudo-code is shown in Fig. 1,
takes the elements of the input array A[1 n] and returns them in ascending
sorted order into the output array Ext[1 n].
External-Heapsort starts by constructing a heap and successively performs n
extractions of the largest element. Extracted elements are moved into the output
array in reverse order. After each extraction, the heap property is restored by
a procedure similar to the bottom-phase of Bottom-Up-Heapsort. Speci cally,
starting at the root of the heap, the son with the largest key is chosen and it
is moved one level up. The same step is iteratively repeated until a leaf, called
special leaf, is reached. At this point the value − is stored into the special leaf
key eld.1 The path from the root to the special leaf is called a special path.
Notice that no sift-up phase is executed and that the length of special paths
does not decrease during the execution of the algorithm.
External-Heapsort makes use of the procedure Build-Heap and the function
Special-Leaf. Build-Heap rearranges the input array A[1 n] into a classical max-
heap, e.g., by using the standard heap-construction algorithm by Floyd [4]. The
function Special-Leaf, whose pseudo-code is also shown in Fig. 1, assumes that
the value contained in the root of the heap has already been removed.
Correctness of the algorithm follows by observing that if a node x contains
the key − than the whole sub-tree rooted at x contains − ’s. It is easy to
check that the max-ordering property is ful lled at each extraction.
We proceed now to the analysis of the number of key comparisons and el-
ements moves performed by External-Heapsort both in the worst and in the
average case.2
Many variants for building heaps have been proposed in the literature [1, 6,
13, 19, 20], requiring quite involved implementations. Since our goal is to give
a practical and e cient general-purpose sorting algorithm, we simply use the

1
We assume that a key value − , smaller than all keys occurring in A[1 n], is avail-
able.
2
In the average-case analysis we make use of the assumption that all permutations
are equally likely.
uickHeapsort, an E cient Mix of Classical Sorting Algorithms 153

xternal-Heapsort

PROCEDURE External-Heapsort (Input Array A, Integer n; Output Array Ext)


Var Integer j l;
Begin
Build-Heap (A n);
For j := n downto 1 do
Ext[j] := A[1];
l := Special-Leaf (A n);
Key(A[l]) := − ;
End for;
End;

FUNCTION Special-Leaf (Input Array A, Integer n) : Integer


Var Integer i;
Begin
i := 2;
While i n do
If Key(A[i]) Key(A[i + 1]) then i := i + 1; End if;
A[ 2i ] := A[i];
i := 2i;
End while;
If i = n then
A[ 2i ] := A[n];
i := 2i;
End if;
Return 2i ;
End;

Fig. 1. Pseudo-code of External-Heapsort algorithm

classical heap-construction procedure due to Floyd [4, 11]. In such a case we


need the following partial results [2, 11, 15]:

Lemma 1. In the worst-case, the classical heap-construction algorithm builds a


heap with n elements by performing at most 2n key comparisons and 2n element
moves. 2

Lemma . On the average, constructing a full-heap, i.e. a heap of size n =


2l − 1 l > 0, with the classical algorithm requires 1 88n key comparisons and
1 53n element moves. 2
The number of key comparisons and element moves performed by the Special-
Leaf function obviously depends only on the size of the input array, so that worst-
and average-case values coincides for it.
154 Domenico Cantone and Gianluca Cincotti

Lemma 3. Given a heap of size n, the Special-Leaf function performs exactly


log n or log n − 1 key comparisons and the same number of element moves.
2
The preceding lemmas yield immediately the following result.
Theorem 1. External-Heapsort sorts n elements in (n log n) worst-case time
by performing fewer than n log n + 2n key comparisons and n log n + 4n ele-
[c] [m]
ment moves.3 Moreover, on the average, Hav (n) key comparisons and Hav (n)
element moves are performed, where
[c]
n log n + 0 88n Hav (n) n log n + 1 88n
[m]
n log n + 0 53n Hav (n) n log n + 3 53n

2
In the following, we will also use a min-heap variant of the External-Heapsort
algorithm. In particular, special paths in min-heaps are obtained by following
the children with smallest key and the value − is replaced by + . Obviously,
the same complexity analysis can be carried out for the min-heap variant of
External-Heapsort.

3. uickHeapsort

In this section, a practical and e cient in-place sorting algorithm, called uick-
Heapsort, is presented. It is obtained by a mix of two classical algorithms:
uicksort and Heapsort. More speci cally, uickHeapsort combines the uick-
sort partition step with two adapted min-heap and max-heap variants of the
External-Heapsort algorithm presented in the previous section, where in place
of the in nity keys , only occurrences of keys in the input array are used.
As we will see, uickHeapsort works in place and is completely iterative, so
that additional space is not required at all.
The computational complexity analysis of the proposed algorithm reveals
that the number of key comparisons performed is less than n log n + 3n on the
average, with n the size of the input, whereas the worst-case analysis remains
the same of classical uicksort. From an implementation point of view, uick-
Heapsort preserves uicksort e ciency, and it has in many cases better running
times than uicksort, as the experimental Section 4 illustrates.
Analogously to uicksort, the rst step of uickHeapsort consists in choosing
a pivot, which is used to partition the array. We refer to the sub-array with the
smallest size as heap area, whereas the largest size sub-array is referred to as
work area. Depending on which of the two sub-arrays is taken as heap-area, the
3
As will be clear in the next section, it is convenient to count the assignment of −
to a node as an element move.
uickHeapsort, an E cient Mix of Classical Sorting Algorithms 155

adapted max-heap or min-heap variant of External-Heapsort is applied and the


work area is used as an external array. At the end of this stage, the elements
moved in the work area are in correct sorted order and the remaining unsorted
part of the array can be processed iteratively in the same way.
A detailed description of the algorithm follows.

1. Let A[1 n] be the input array of n elements to be sorted in ascending order.


A pivot M , of index m, is chosen in the set A[1] A[2] A[n] . As in
uicksort, the choice of the pivot can be done in a deterministic way (with
or without sampling) or randomly. The computational complexity analysis
of the algorithm is influenced by the choice adopted.
2. The array A[1 n] is partitioned into two sub-arrays, A[1 Pivot − 1] and
A[Pivot + 1 n], such that A[Pivot ] = M , the keys in A[1 Pivot − 1] are
larger than or equal to M , and the keys in A[Pivot + 1 n] are smaller than
or equal to M .4 The sub-array with the smallest size is assumed to be the
heap area, whereas the other one is treated as the work area (if the two
sub-arrays have the same size, a choice can be made non-deterministically).
3. Depending on which sub-array is taken as heap area, the adapted max-heap
or min-heap variant of External-Heapsort is applied using the work area as
external array. Moreover occurrences of keys contained in the work area are
used in place of the in nity values .
More precisely, if A[1 Pivot − 1] is the heap area, then the max-heap version
of External-Heapsort is applied to it using the right-most region of the work
area as external array. In this case, at the end of the stage, the right-most
region of the work area will contain the elements formerly in A[1 Pivot − 1]
in ascending sorted order.
Similarly, if A[Pivot + 1 n] is the heap area, then the min-heap version of
External-Heapsort is applied to it using the left-most region of the work area
as external array. In this case, at the end of the stage, the left-most region
of the work area will contain the elements formerly in A[Pivot + 1 n] in
ascending sorted order.
4. The element A[Pivot ] is moved in the correct place and the remaining part
of A[1 n], i.e. the heap area together with the unused part of the work area,
is iteratively sorted.

Correctness of uickHeapsort follows from that of the max-heap and min-


heap variants of External-Heapsort, by observing that assigning to a special leaf
a key value taken in the work area is completely equivalent to assigning the key
value − , in the case of the max-heap variant, or the key value + , in the case
of the min-heap variant.
Complexity results in the average case are summarized in the theorem below.
For simplicity, the results have been obtained only in the case in which pivots
are chosen deterministically and without sampling, e.g. always the rst element
of the array is chosen.
4
Observe that uicksort partitions the array in the reverse way.
156 Domenico Cantone and Gianluca Cincotti

Lemma 4. Let H(n) = n log n + n and f1 (n) f2 (n) be functions of type n +


o(n) for all n N, with R. The solution to the following recurrence
equations, with initial conditions C(1) = 0 and C(2) = 1:
1
C(2n) = [(2n + 1) C(2n − 1) − C(n − 1) + H(n − 1) + f1 (n)] (1)
2n
1
C(2n + 1) = [2(n + 1) C(2n) − C(n) + H(n) + f2 (n)] (2)
2n + 1
for all n N, is:
C(n) = n log n + ( + − 2 8854)n + o(n)
Proof. Among the reasonable solutions, we posit the trial solution:
C(n) = an log n + bn + c log n with a b c R (3)
In several of the calculations we need to manipulate expressions of the form
log(m + t) with m N m > 1 and t = 1. The expansion of the natural
logarithm for small x R to second order (we do not need any further here),
2
t2
ln(1+x) = x− x2 , gives ln(m+t) = ln[m (1+ m t t
)] = ln m+ m − 2m 2 . Multiplying

by = 1 (ln 2) 1 4427, we get:


t
log(m + t) = log m + −
m 2m2
Using such expansion in the de nition of H(n) and in (3), we get:

H(n − 1) = n log n + n − log n − ( + ) + (4)


2n
a − 2c
C(n − 1) = an log n + bn + (c − a) log n − (a + b) + (5)
2n
C(2n − 1) = 2an log n + 2(a + b)n + (c − a) log n +
a − 2c
+ (c − a − b − a ) + (6)
4n
where the lowest order terms are not considered.
Let S = (c − 12 a + 1); substituting (4), (5) and (6) into equation (1) and
simplifying, we nd the following:
S
(1 − a)n log n + ( + − b − 2a )n − log n + (c − a − − S) + + o(n) = 0
2n
Examining the leading coe cients in such equality, we get the asymptotic con-
sistency requirements:
a=1 b= + −2
Such constraints are similarly obtained expanding equation (2).
Clearly, the missing requirement about c, simply means the posited solution
does not have the exact functional form. The two leading terms of the solution
are surprisingly precise, indeed by numerical computation, solution (3) tracks
the behavior of C(n) fairly well for an extended range of n.
uickHeapsort, an E cient Mix of Classical Sorting Algorithms 157

Theorem . uickHeapsort sorts n elements in-place in O(n log n) average-


case time. Moreover, on the average, it performs no more than n log n + 3n key
comparisons and n log n + 2 65n element moves.
Proof. First, we estimate the average number of key comparisons.
0
Let Hav (n) (resp. Hav (n)) be the average number of key comparisons to
sort n elements with the adapted max-heap (resp. min-heap) version of External-
0
Heapsort (see the beginning of Section 3). Plainly, we have Hav (i) = Hav (i) =
0 with i = 0 1.
Let C(n) be the average number of key comparisons to sort n elements with
uickHeapsort. We have C(1) = 0 and, for all n N:
2
n
1 4X
C(2n) = p(2n) + [Hav (j − 1) + C(2n − j)]+
2n j=1
3
X2n
+ 0
[Hav (2n − j) + C(j − 1)]5
j=n+1
2
Xn
1 4 [Hav (j − 1) + C(2n + 1 − j)]+
C(2n + 1) = p(2n + 1) +
2n + 1 j=1
3
X
2n+1
+ [Hav (n) + C(n)] + 0
[Hav (2n + 1 − j) + C(j − 1)]5
j=n+2

To compute the total average number of key comparisons we add the number of
comparisons p(m) = m + 1 needed to partition the array of size m to the average
number of comparisons needed to sort the two sub-arrays obtained. The index j
denotes all the possible choices (uniformly distributed) for the pivot. Obviously
0
Hav (n) = Hav (n), so by simple indices manipulation, we obtain the following
recurrence equations:
n
X
(2n) C(2n) = (2n) p(2n) + 2 [Hav (j − 1) + C(2n − j)] (7)
j=1
(2n + 1) C(2n + 1) = (2n + 1) p(2n + 1) + [Hav (n) + C(n)] +
Xn
+2 [Hav (j − 1) + C(2n + 1 − j)] (8)
j=1

They depend on the previous history but can be reduced to semi- rst order
recurrences. Let (80 ) be the equation obtained from (8) by substituting the index
n with n − 1. Subtracting equation (7) from (80 ) and from (8) we obtain the
following equations:

(2n) C(2n) = (2n + 1) C(2n − 1) − C(n − 1) + Hav (n − 1) + f1 (n)


(2n + 1) C(2n + 1) = (2n + 2) C(2n) − C(n) + Hav (n) + f2 (n)
158 Domenico Cantone and Gianluca Cincotti

where f1 (n) = (2n) p(2n)− (2n− 1) p(2n− 1) = 4n and f2 (n) = (2n+ 1) p(2n+
1) − (2n) p(2n) = 4n + 2. By using Lemma 4 and the upper bound in Theorem
1, it can be shown that the recurrence equation satis es C(n) n log n + 3n.
Analogous recurrence equations can be written to get the average number of
0
element moves. In such a case, the function Hav (n) (resp. Hav (n)) denotes the
average number of element moves to sort n elements with the adapted max-heap
(resp. min-heap) version of External-Heapsort; whereas p(m) is three times the
average number q(m) of exchanges used during the partitioning stage of a size
m array.
If the chosen pivot A[1] is the k-th smallest element in the array of size
m, q(m) is the number of keys among A[2] A[k] which are smaller than the
 k−1  m−1
pivot. There are exactly t such keys with probability pk t = m−k
(m)
t k−1−t k−1 .
Averaging on t and k, we get:

1 XXh i
m k−1
(m)
q(m) = t pk t =
m
k=1 t=0
m
" k−1   #
1 X m−k X m−k−1 k−1 1
= m−1
 = (m − 2)
m k−1 t=0
t−1 k−1−t 6
k=1

where the last equality is obtained by two applications of Vandermonde’s con-


volution.
From p(m) = 12 (m − 2), we get f1 (n) = 2n − 32 and f2 (n) = 2n − 12 . Thus,
Lemma 4 and the upper bound in Theorem 1 yield immediately that the average
number of element moves is no more than n log n + 2 65n.

4. Experimental Results

In this section we present some empirical results concerning the performance


of our proposed algorithm. Speci cally, we will compare the number of basic
operations and the timing results of both uickHeapsort ( H) and its variant
clever- uickHeapsort (c- H) (which implements the median of three elements
strategy) with those of the following comparison-based sorting algorithms:

the classical Heapsort algorithm (H), implemented with a trick which saves
some element moves;
the Bottom-Up-Heapsort algorithm (BU), implemented with bit shift oper-
ations, as suggested in [19];
the iterative version of uicksort (i- ), implemented as described in [3];
the uicksort algorithm ( ), implemented with bounded stack usage, as
suggested in [5];
the very e cient LEDA [12] version of clever- uicksort (c- ), where the
median of three elements is used as pivot.
uickHeapsort, an E cient Mix of Classical Sorting Algorithms 159

Our implementations have been developed in standard C (GNU C compiler


ver. 2.7) and all experiments have been carried out on a PC Pentium (133 MHz)
32MB RAM with the Linux 2.0.36 operating system.
The choice to use C, rather than C++ extended with the LEDA library is
motivated by precise technical reasons. In order to get running-times indepen-
dent of the implementation of the data type <array> provided by the LEDA
library, we preferred to implement all algorithms by simply using C arrays, and
accordingly by suitably rewriting the source code supplied by LEDA for uick-
sort.
Observe that all implementation tricks, as well as the various policies to
choose the pivot, used for uicksort can be applied to uickHeapsort too.
For each size n = 10i i = 1 6, a xed sample of 100 input arrays has been
given to each sorting algorithm; each array in such a sample is a randomly gen-
erated permutation of the keys 1 n. For each algorithm, the average number
of key comparisons executed, E[Cn ], is reported together with its relative stan-
dard deviation, [Cn ] n, normalized with respect to n. Analogously, E[An ] and
[An ] n refers to the number of element moves. Experimental results are shown
in Table 1.
They con rm pretty well the theoretical results hinted at in the previous
section. Notice that most of the numbers quoted in Table 1 about Heapsort
and uicksort are in perfect agreement with the detailed experimental study of
Moret and Shapiro [14].
We are mainly interested in the number of key comparisons since these rep-
resent the dominant cost, in terms of running-times, in any reasonable imple-
mentation. Observe that, in agreement with intuition, the improvement of c-
relative to (in terms of number of key comparisons) is more sensible than
that of c- H relative to H. With the exception of BU, when n is large enough
c- H executes the smallest number of key comparisons, on the average; more-
over, according to theoretical results, H always beat both and i- . It is also
interesting to note that H and BU are very stable, in the sense that they present
a small variance of the number of key comparisons.
In Table 2, we report the average running times required by each algorithm
to sort a xed sample of 10 randomly chosen arrays of size n = 10i , with i = 4 6.
Such results depend on the data type of the keys to be ordered (integer
or double) and the type of comparison operation used (either built-in or via a
user-de ned function cmp). In particular, six di erent cases are considered. In
the rst two cases, the comparison operation used is the built-in one. In the
third and fourth case, a simple comparison function cmp 1 is used. Finally, in the
last two cases, two computationally more expensive comparison functions cmp2
and cmp3 are used (but only with keys of type integer), to simulate situations in
which the cost of a comparison operation is much higher than that of an element
move.5 The function cmp2 (resp. cmp3 ) has been obtained from a simple function
5
For instance, such situations arise when the array to sort contains pointers to the
actual records. A move is then just a pointer assignment, but a comparison involves
at least one level of indirection, so that comparisons become the dominant factor.
160 Domenico Cantone and Gianluca Cincotti

n = 10 n = 102
E[Cn ] [Cn ] n E[An ] [An ] n E[Cn ] [Cn ] n E[An ] [An ] n
H 39 (.21) 73 (.17) 1030 (.08) 1078 (.07)
BU 35 (.21) 73 (.17) 709 (.08) 1078 (.07)
i- 63 (.96) 43 (.64) 990 (.61) 685 (.26)
41 (.63) 27 (.32) 868 (.64) 500 (.19)
c- 28 (.19) 37 (.53) 638 (.29) 617 (.20)
H 39 (.48) 54 (.39) 806 (.58) 847 (.23)
c- H 29 (.20) 60 (.51) 714 (.22) 870 (.22)

n = 103 n = 104
E[Cn ] [Cn ] n E[An ] [An ] n E[Cn ] [Cn ] n E[An ] [An ] n
H 16848 (.031) 14074 (.024) 235370 (.010) 174198 (.007)
BU 10422 (.021) 14074 (.024) 137724 (.006) 174198 (.007)
i- 14471 (.605) 9146 (.106) 194279 (.878) 114419 (.092)
13297 (.609) 7285 (.095) 179948 (.654) 95807 (.072)
c- 10299 (.355) 8543 (.102) 142443 (.401) 109141 (.065)
H 11881 (.630) 11838 (.202) 152789 (.664) 152155 (.201)
c- H 11135 (.333) 11959 (.182) 146643 (.323) 152909 (.121)

n = 105 n = 106
E[Cn ] [Cn ] n E[An ] [An ] n E[Cn ] [Cn ] n E[An ] [An ] n
H 3019638 (.0031) 2074976 (.0025) 36793760 (.0010) 24048296 (.0008)
BU 1710259 (.0024) 2074976 (.0025) 20401466 (.0007) 24048296 (.0008)
i- 2421867 (.7037) 1374534 (.0689) 28840152 (.6192) 16068733 (.0649)
2249273 (.6828) 1189502 (.0726) 27003832 (.5389) 14212076 (.0635)
c- 1816706 (.3367) 1328265 (.0546) 22113966 (.2962) 15649667 (.0497)
H 1869769 (.6497) 1854265 (.2003) 21891874 (.6473) 21901092 (.1853)
c- H 1799240 (.3254) 1866359 (.1675) 21355988 (.3282) 21951600 (.1678)

Table 1. Average number of key comparisons and element moves (sample size
= 100).

cmp 1 by adding one call (resp. two calls) to the function log of the C standard
mathematical library.
For each case considered, an approximation of the average times required by
a single key comparison tc and by a single element move tm is also reported.
Table 2 con rms the good behaviour of all uicksort variants i- , , c-
; moreover, we can see that BU su ers from higher overhead due to internal
bookkeeping. In most cases the running-times of H and c- H are between those
of the variants of Heapsort, H and BU, and those of the variants of uicksort,
, i- , and c- .
For each trial, the best running time (represented as boxed value in Table
2) is always achieved by a clever algorithm, namely either c- or c- H. In
particular, when each key comparison operation is computationally expensive,
c- H turns out to be the best algorithm, on the average, in terms of running
times.
uickHeapsort, an E cient Mix of Classical Sorting Algorithms 161

Integer Double cmp1 (Double)


tc = 0 05 sec , tm = 0 06 sec tc = 0 05 sec , tm = 0 07 sec tc = 0 3 sec , tm = 0 07 sec
n= 104 105 106 104 105 106 104 105 106
H 0.05 0.75 12.37 0.10 1.40 20.35 0.14 1.96 27.28
BU 0.09 1.25 18.80 0.13 1.83 25.88 0.15 2.00 27.62
i- 0.04 0.40 4.81 0.07 0.80 9.74 0.10 1.25 15.10
0.03 0.37 4.54 0.06 0.74 8.92 0.09 1.12 13.49
c- 0.03 0.33 4.08 0.05 0.69 8.34 0.08 0.99 11.99
H 0.05 0.64 10.43 0.08 1.10 16.85 0.10 1.42 19.96
c- H 0.04 0.65 10.48 0.08 1.11 16.92 0.10 1.38 20.05

cmp1 (Integer) cmp2 (Integer) cmp3 (Integer)


tc = 0 19 sec , tm = 0 06 sec tc = 2 9 sec , tm = 0 06 sec tc = 4 7 sec , tm = 0 06 sec
n= 104 105 106 104 105 106 104 105 106
H 0.10 1.35 19.16 0.54 7.04 88.29 1.00 12.92 159.66
BU 0.11 1.52 21.23 0.37 4.72 58.94 0.64 8.09 98.42
i- 0.07 0.85 10.12 0.42 5.44 64.96 0.79 10.24 120.69
0.07 0.80 9.60 0.40 4.93 59.53 0.74 9.20 110.29
c- 0.05 0.68 8.30 0.35 4.42 53.82 0.64 8.22 99.24
H 0.08 0.99 13.99 0.37 4.57 54.57 0.66 8.17 95.11
c- H 0.07 0.98 13.98 0.34 4.22 52.15 0.61 7.63 91.58

Table . Average running times in seconds (sample size = 10).

Cache performance has considerably less influence on the behaviour of sorting


algorithms than does paging performance (cf. [14], Chap. 8); for such reason, we
believe that we can ignore completely possible negative e ects due to caching.
Concerning virtual memory problems, i.e. demand paging, all uicksort al-
gorithms show good locality of reference, whereas Heapsort algorithms, and also
uickHeapsort algorithms, tend to use pages that contain the top of the heap
heavily, and to use in a random manner pages that contain the bottom of the
heap (cf. [14]). Such observation allows us to conclude that an execution of c-
cannot be more penalized than an execution of c- H by delays due to paging
problems. Hence, we can reasonably conclude that the success of c- H is not
due to paging performance.

5. Conclusions

We presented uickHeapsort, a new practical in-place sorting algorithm ob-


tained by merging some characteristics of Bottom-Up-Heapsort and uicksort.
Both theoretical analysis and experimental tests con rm the merits of uick-
Heapsort.
The experimental results obtained show that it is convenient to use clever-
uickHeapsort when the input size n is large enough and each key comparison
operation is computationally expensive.
162 Domenico Cantone and Gianluca Cincotti

Acknowledgments
The authors wish to thank Micha Hofri for helping them in solving the recurrence
equation reported in Lemma 4.

References
[1] S. Carlsson, A variant of heapsort with almost optimal number of comparisons,
Information Processing Letters, Vol. 24, pp. 247-250, 1987.
[2] E.E. Doberkat, An average analysis of Floyd’s algorithm to construct heaps, In-
formation and Control, Vol.61, pp.114-131, 1984.
[3] B. Durian, uicksort without a stack, Lect. Notes Comp. Sci. Vol. 233, pp.283-289,
Proc. of MFCS 1986.
[4] R.W. Floyd, Treesort 3 (alg. 245), Comm. of ACM, Vol. 7, p. 701,1964.
[5] G. Gonnet, R. Baeza-Yates, Handbook of algorithms and data structures, Addison-
Wesley, Reading, MA, 1991.
[6] G. Gonnet, J. Munro, Heaps on Heaps, Lect. Notes Comp. Sci. Vol. 140, Proc. of
ICALP’82, 1982.
[7] C.A.R. Hoare, Algorithm 63(partition) and algorithm 65( nd), Comm. of ACM,
Vol. 4(7), pp. 321-322, 1961.
[8] M. Hofri, Analysis of Algorithms: Computational Methods & Mathematical Tools,
Oxford University Press, New York, 1995.
[9] J. Katajainen, The Ultimate Heapsort, DIKU Report 96/42, Department of Com-
puter Science, Univ. of Copenhagen, 1996.
[10] J. Katajainen, T. Pasanen, J. Tehuola, Top-down not-up heapsort, Proc. of The
Algorithm Day in Copenhagen, Dept. of Comp. Sci., University of Copenhagen,
pp. 7-9, 1997.
[11] D.E. Knuth, The Art of Computer Programming, Volume 3: Sorting and Search-
ing, Addison-Wesley, 1973.
[12] LEDA, Library of E cient Data structures and Algorithms,
http://www.mpi-sb.mpg.de/LEDA/leda.html.
[13] C.J. McDiarmid, B.A. Reed, Building Heaps Fast, Journal of algorithms, Vol. 10,
pp. 352-365, 1989.
[14] B.M.E. Moret, H.D. Shapiro, Algorithms from P to NP, Volume 1: Design and
E ciency, The Benjamin Cummings Publishing Company, 1990.
[15] T. Pasanen, Elementary average case analysis of Floyd’s algorithms to construct
heaps, TUCS Technical Report N. 64, 1996.
[16] L. Rosaz, Improving Katajainen’s Ultimate Heapsort, Technical Report N.1115,
Laboratoire de Recherche en Informatique, Universite de Paris Sud, Orsay, 1997.
[17] R. Sedgewick, uicksort, Garland Publishing, New York, 1980.
[18] R. Sedgewick, Implementing quicksort programs, Comm. of ACM 21(10) pp.847-
857, 1978.
[19] I. Wegener, Bottom-Up-Heapsort, a new variant of Heapsort beating, on an aver-
age, uicksort (if n is not very small), Theorical Comp. Sci., Vol. 118, pp. 81-98,
1993.
[20] I. Wegener, The worst case complexity of McDiarmid and Reed’s variant of
Bottom-Up heap sort is less than nlogn+1.1n, Information and Computation, Vol.
97, pp. 86-96, 1992.
[21] J.W. Williams, Heapsort (alg.232), Comm. of ACM, Vol. 7, pp. 347-348, 1964.
Triangulations without Minimum-Weight Drawing1

2
Cao An Wang2, Francis Y. Chin3, and Boting Yang

2
Department of Computer Science, Memorial University of Newfoundland, St. John's,
Newfoundland, Canada A1B 3X5
[email protected]
3
Department of Computer Science and Information Systems, The University of Hong Kong,
Pokfulam Road, Hong Kong
[email protected]

Abstract. It is known that some triangulation graphs admit straight-line


drawings realizing certain characteristics, e.g., greedy triangulation, minimum-
weight triangulation, Delaunay triangulation, etc.. Lenhart and Liotta [12] in
their pioneering paper on “drawable” minimum-weight triangulations raised an
open problem: ‘Does every triangulation graph whose skeleton is a forest admit
a minimum-weight drawing?’ In this paper, we answer this problem by
disproving it in the general case and even when the skeleton is restricted to a
tree or, in particular, a star.

Keywords: Graph drawing, Minimum-weight triangulation.

1 Introduction

Drawing of a graph on the plane is a pictorial representation commonly used in many


applications. A “good” graph drawing has some basic characteristics [4], e.g.,
planarity, straight-line edges, etc. One of the problems facing graph drawing is where
to place the graph vertices on the plane, so as to realize these characteristics. For
example, the problem of Euclidean minimum spanning tree (MST) realization is to
locate the tree vertices such that the minimum spanning tree of these vertices is
isomorphic to the given tree. However, not all trees have a MST realization, it can be
shown easily that there is no MST realization of any tree with a vertex of degree 7 or
more. In fact, the MST realization of a tree with maximum vertex degree 6 is NP-
complete [6].
Recently, researchers have paid a great deal of attention to the graph drawing of
certain triangulations. A planar graph G=(E,V) is a triangulation, if all faces of G are
bounded by exactly three edges, except for one which may bounded by more than

1
This work is supported by NSERC grant OPG0041629 and RGC grant HKU 541/96E.

G. Bongiovanni, G. Gambosi, R. Petreschi (Eds.): CIAC 2000, LNCS 1767, pp. 163-173, 2000.
 Springer-Verlag Berlin Heidelberg 2000
164 Cao An Wang, Francis Y. Chin, and Boting Yang

three edges, and this face is called the outerface. A minimum-weight triangulation
realization of G=(E,V) is to place V in the plane so that the minimum weight
triangulation of V, (MWT(V)), is isomorphic to G. An excellent survey on drawability
and realization for general graphs can be found in [2] and a summary of results related
to our work can be found in the following table.

GRAPH REALIZATION RESULT


1 Planar Graph Straight-line drawing Always possible [8]
2 Tree Minimum spanning tree Maximum vertex degree
≤ 5 polynomial time
= 6 NP-complete [6]
> 6 non-drawable [15]
3 Triangulation Delaunay triangulation Drawable and non-drawable
conditions [5]
4 Maximal Minimum-weight Linear time algorithm and
Outerplanar Graph triangulation non-drawable condition [11]
5 Triangulation Minimum-weight Non-drawable condition [12]
triangulation
6 Maximal Maximum-weight Non-drawable condition [17]
Outerplaner Graph triangulation
7 Caterpillar Graph Inner edges of Maximum- Linear time [17]
weight triangulation of a
convex point set

Table 1

(1) Every planar graph has a straight-line drawing realization [8].


(2) Monma and Suri [15] showed that a tree with maximum vertex degree of more
than six does not admit a straight-line drawing of minimum spanning tree. Eades
and Whitesides [6] proved that the realization of Euclidean minimum spanning
trees of maximum vertex degree six is NP-hard.
(3) Dillencourt [5] presented a necessary condition for a triangulation admitting a
straight-line drawing of Delaunay triangulation and also a condition for non-
drawability.
(4) Lenhart and Liotta [11] studied the minimum-weight drawing for a maximal
outerplanar graph, and discovered a characteristic of the minimum-weight
triangulation of a regular polygon using the combinatorial properties of its dual
trees. With this characteristic, they devised a linear-time algorithm for the
drawing.
(5) Lenhart and Liotta [12] further demonstrated some examples of ‘non-drawable’
graphs for minimum-weight realizations, and also proved that if any graph
contains such non-drawable subgraph, then it is not minimum-weight drawable.
(6) Wang, et. al. studied the maximum-weight triangulation and graph drawing, a
simple condition for non-drawability of a maximal outerplanar graph is given in
[17].
Triangulations without Minimum-Weight Drawing 165

(7) A caterpillar is a tree such that all internal nodes connect to at most 2 non-leaf
nodes. Wang, et. al. [17] showed that caterpillars are always linear-time realizable
by the inner edges of maximum-weight triangulation of a convex point set.

In this paper, we investigate the open problem raised by Lenhart and Liotta [12] to
determine whether or not every triangulation whose ‘skeleton’ is a forest admits a
minimum-weight drawing. The skeleton of a triangulation graph is the remaining
graph after removing all the boundary vertices and their incident edges on the
outerface. Intuitively, the answer to this open problem seems to be affirmative by
adapting the same idea in the drawing of wheel graphs or k-spined graphs [12]. That
is, one can stretch the vertices of a tree in the forest-skeleton arbitrarily far apart from
each other as well as from other trees. In this manner, all the vertices in the forest-
skeleton would be “isolated” from each other. The edges of the trees would be
minimum-weight and the edges connecting the removed vertices would also be
minimum-weight in hoping that the “long distance” will make such a localization.
However, this intuition turns out to be false as the removed part of the graph plays an
indispensable role in the MWT. As matter of a fact, there exist some minimum-weight
non-drawable triangulations whose skeleton is a forest or a tree. It is worth noting
that the proof of some graphs being ‘non-drawable’ is similar to the proof of a lower
bound of a problem, which requires some non-trivial observation. In Section 3, we
derive a combinatorial non-drawability sufficient condition for any minimum weight
triangulation. Then we apply this condition to show that some triangulations with
forest skeletons are not minimum-weight drawable. In Section 4, we further disprove
the conjecture by showing the existence of a tree-skeleton triangulation, in particular,
a star-skeleton triangulation which is not minimum-weight drawable. In Section 5, we
conclude our work.

2 Preliminaries

Definition 1: Let S be a set of points in the plane. A triangulation of S, denoted by


T(S), is a maximal set of non-crossing line segments with their endpoints in S.
The weight of a triangulation T(S) is given by ω(T(S)) = ∑ d (s s i j ) , where d(sisj)
si s j ∈T ( S )

is the Euclidean distance between si and sj of S. A minimum-weight triangulation of


S, denoted by MWT(S), is defined as, for all possible T(S), ω(MWT(S)) = min
{ω(T(S))}. !

Property 1: (Implication property)

A triangulation T(S) is called k-gon local minimal or simply k-minimal, denoted by


Tk(S), if any k-gon extracted from T(S) is a minimum-weight triangulation for this k-
gon. Let ‘a ! b’ denote ‘a implies b’ and a contains b. Then following implication
property holds:

MWT(S) ! Tn-1(S) ! " !T4(S) ! T(S). !


166 Cao An Wang, Francis Y. Chin, and Boting Yang

Figure 1(a) illustrates an example which is 4-minimal but not 5-minimal. Note in the
figure that every quadrilateral has a minimum-weight triangulation but not the
pentagon abdef. So Figure 1(a) is a T4 but not T5 nor MWT. On the other hand, Figure
1(b) gives the MWT of the same vertex set.

(a): T4 (b): MWT

Figure 1: 4-minimal but not minimum

Definition 2: Let e be an internal edge of any triangulation. Then, e is a diagonal of


a quadrilateral inside the triangulation, say abcd with e = (a, c). Angles ∠abc and
∠cda are called facing angles w.r.t. e. Note that each internal edge of a triangulation
has exactly two facing angles (Figure 2). !

Property 2: Let ∆abc be a triangle in the plane and d be an internal vertex in ∆abc.
Then, at most one of ∠adb, ∠bdc, and ∠cda can be acute. !

Lemma 1: Let abcd denote a quadrilateral with diagonal (a, c) and with two obtuse
facing angles, ∠abc and ∠cda. If such a quadrilateral always exists in any drawing
of a given triangulation, then this triangulation is minimum-weight non-drawable, in
particular, 4-minimal non-drawable.

c
Figure 2: For the proof of Lemma 1
Triangulations without Minimum-Weight Drawing 167

Proof: Since both facing angles ∠abc and ∠cda of edge (a, c) are greater than 90°,
the quadrilateral abcd must be convex (refer to Figure 2). Then, edge (b, d) lies inside
abcd and (b, d) < (a, c), quadrilateral abcd is not 4-minimal. Since such a
quadrilateral always exists in any drawing of the triangulation by the premise, the
triangulation is not a 4- minimal, nor minimum-weight drawable. !

3 Forest-Skeleton Triangulations

In this section, we shall give a combinatorial sufficient condition for a triangulation to


be minimum-weight non-drawable. With the condition, we can prove that there exists
a forest-skeleton which is minimum-weight non-drawable, thus, disprove the
conjecture by Lenhart and Liotta [12].

3.1 Non-drawable Condition for Minimum-Weight Triangulations

In the following, we shall provide a combinatorial sufficient condition for a


triangulation to be 4-minimal non-drawable.

N4o-Condition: Let G be a triangulation such that

(1) G contains a simple circuit C with non-empty set V of internal vertices.

(2) Inside C, let V’ denote the subset of V such that each element in V’ is of
degree three; each element in V” = V − V’ is of degree more than three; and
let f be the number of faces after the removal of vertices in V’ and their
incident edges. Then, G satisfies the following conditions:

(i) V”> 1, and

(ii) f < V’ + (V”− 1)/2. !

It is easy to see that no two vertices in V’ are adjacent to each other and thus V’≤ f.
Figure 3 gives a subgraph which satisfies the N4o-Condition, V’=11, V”=4, and
f = 12 < V’ + (V”− 1)/2 = 12.5.

Lemma 2: Let G be a triangulation. If G satisfies the N4o-Condition, then G is 4-


minimal non-drawable.

Proof: Let Gc denote the portion of G enclosed by C. Let G’c denote the remaining
graph of Gc after the removal of V’ (the vertices of degree three) as well as their
incident edges. In G’c, let f, e, and n denote the number of faces, the number of
edges, and the number of vertices, respectively; let f’ denote the number of faces
originally not containing any vertex of V’; let e’ denote the number of edges not lying
168 Cao An Wang, Francis Y. Chin, and Boting Yang

on C; and let nc denote the number of vertices on C. Since all faces in Gc are triangles,
2(e − nc ) + nc
we have f = . Together with Euler formula on Gc, f − e + n = 1, we
3
have that
e = 3n − 3 − nc , f = 2n − 2 − nc ………. (1).

By part (ii) of N4o-Condition, f < V’ + (V”− 1)/2. As f = f’ + V’, f’ + V’<


V’+ (V”− 1)/2. Then, f’ < (V”− 1)/2, or 2f’ < V” − 1. Note that V” = n
− nc, we have
2f’ < n − nc −1 ……. (2).

As e − nc = e’ , we have by (1) and (2) that


f + 2 f’ < e’ ………. (3).

Note that an edge of G’c not on C can be regarded as the diagonal of a quadrilateral in
Gc. As every diagonal has two facing angles and G’c contains e’ internal edges, there
are exactly 2e’ facing angles. Moreover, G’c contains f faces, f − f’ of them have a
white node inside and thus each of these faces contributes at most one acute facing
angle (Property 2). On the other hand, each of the f’ faces contributes at most three
acute facing angles. Thus, the total number of acute facing angles for these e’ interior
edges in Gc is at most f − f’ + 3f’ (= f’ + 2f’). By (3), the number of internal edges is
greater than the number of acute facing angles in Gc. Thus, at least one of the e’
internal edges (diagonals) is not associated with an acute facing angle and must have
two obtuse facing angles. By Lemma 1, Gc cannot admit a 4-minimal drawing. !

By Property 1 and Lemma 2, we have

Theorem 1: Let G be a triangulation. If G satisfies the N4o-Condition, then G is


minimum-weight non-drawable. !

Note that Theorem 1 is applicable to any triangulation GT by treating the hull of GT as


the circuit C stated in the N4o-Condition. Refer to Figure 3.

3.2 A Minimum-Weight Non-drawable Example for a Forest-Skeleton


Triangulation

We shall construct a triangulation whose skeleton is a forest and which satisfies the
N4o-Condition. Then, the non-drawable claim follows from Theorem 1, which
answers the open problem that not all triangulations with a forest-skeleton are
minimum-weight drawable.
Triangulations without Minimum-Weight Drawing 169

Figure 3: An example of a 4-optimal non-drawable triangulation. The darken vertices are of


degree more than three and the white vertices are of degree three. The sizes of V’ and V” are
11 and 4 respectively, n = 10, e = 21, f = 12, nc = 6, f’ = 1, e’ = 15.

Theorem 2: There exists a triangulation with a forest-skeleton which is not


minimum-weight drawable.

Proof: The triangulation shown in Figure 4 has a forest-skeleton (the darken edges).
It contains a simple cycle CG = (v1, v2, v3, v4). Inside CG, V”= 2, i.e., {v11, v12};
V’= 6, i.e., {v5, v6, v7, v8, v9, v10}; f = 6 (the number of faces after the removal of
V’). Thus, part (1) of N4o-Condition: V” > 1 is satisfied and part (2) of N4o-
Condition: f < V’+ (V”− 1)/2 is also satisfied. Then, G is not minimum-weight
drawable by Theorem 1. !

v v
v v v v v v
v v

Figure 4: A non-drawable forest-skeleton triangulation


170 Cao An Wang, Francis Y. Chin, and Boting Yang

4 Tree-Skeleton Triangulation

In this section, we shall show that there exist tree-skeleton triangulations which are
minimum-weight non-drawable further disproving the claim in [12]. Let us consider a
triangulation in the plane with two adjacent triangles, ∆abc and ∆bcd, each of which
has an internal vertex with degree 3, as shown in Figure 5. As agreed previously,
each of the internal degree-3 vertices can contribute at most one acute angle. In the
following, we shall prove that if the only acute angle in ∆abc, ∠aeb, and that in ∆cbd,
∠bfd, are facing edge (a,b) and edge (b,d) respectively, then by Lemma 1, the
triangulation with these two adjacent triangles is not minimum-weight drawable. This
is because e and f will be on the quadrilateral bfce and edge (e,f) crosses edge (b,c)
and is shorter than edge (b,c).

Figure 5: An non-drawable case

Let us consider a convex polygon P with n ≥ 13 vertices. We shall show that P has at
least 3 consecutive inner angles with degree > 90°.

Lemma 3: Any convex polygon cannot have more than 4 acute inner angles.

Proof: If the convex n-gon has 5 or more acute angles, then the sum of angles is no
more than 180° (n-5) + 5 × 90° = 180° n – 900° + 450° = 180° n – 450° = 180° (n-2)
–90° < 180°(n-2). This contradicts the fact that the sum of inner angles of a convex
n-gon must be 180°(n-2). !

Lemma 4: If P is a convex polygon with n ≥ 13 vertices, then P has at least 3


consecutive obtuse inner angles.

Proof: The proof is by contradiction. Assume P has at most 2 consecutive obtuse


inner angles, then there must exist at least 5 acute inner angles to separate the other
obtuse inner angles in P for n ≥ 13. This contradicts Lemma 3. !
Triangulations without Minimum-Weight Drawing 171

b

Figure 6: A non-drawable tree-/star-skeleton triangulation

Theorem 3: There exists a tree-skeleton (star-skeleton) triangulation which does not


admit a minimum-weight drawing.

Proof: Refer to Figure 6. By Lemma 4, P contains at least three consecutive obtuse


inner angles. Without loss of generality, let ∠a’, ∠b’, and ∠a’bb’ be three
consecutive obtuse inner angles of P. In order for P to be drawable, the region
caa’bb’d must also be drawable. It follows that edges (a,b), (b,d), and (b,c) must be
drawable. Note that ∠aeb and ∠bfd must be acute since ∠a’ and ∠b’ are already
obtuse. Then, the angle ∠bec in triangle ∆abc and the angle ∠bfc in triangle ∆bcd
must be obtuse by Property 2. Then, edge (b,c) cannot be an edge in any minimum-
weight drawing. As the graph is a star-skeleton triangulation and star is a subclass of
tree, the theorem also applies to the tree-skeleton triangulations. !

5 Conclusion

In this paper, we investigated the minimum-weight drawability of triangulations. We


show that triangulations with forest-skeletons or even with tree-/star-skeleton are not
minimum-weight drawable, disproving the conjecture by [12]. Furthermore, we
found that in addition to wheel graph and k-spined graph, a subclass of star-skeleton
graph, regular star-graph is minimum-weight drawable. It will be sketched out in the
following appendix.

Appendix: Drawable Triangulation with Minimum-Weight

In this section, we shall show that some special triangulation, namely, regular star
skeleton graph, admits a minimum-weight drawing.
172 Cao An Wang, Francis Y. Chin, and Boting Yang

Definition 3: There exist only three types of edges in a triangulation whose skeleton
is a forest , namely, (1) skin-edge (simply, s-edge): both vertices of the edge are on
the hull, (2) tree-edge (simply, t-edge): both vertices of the edge are not on the hull,
and (3) bridge-edge (simply, b-edge): one vertex of the edge is on the hull and the
other is not. A base skin-edge is the most `inner' layer of s-edges. A graph G is a
regular star skeleton graph, denoted by RSSG, if G has a star skeleton, G contains
only base skin, and all the b-edges of a branch on the same side are connected to the
vertex of its neighboring b-edge.

By the definition, an RSSG can always be decomposed into a wheel and several k-
spined triangles, where k can be different for different triangles. Each k-spined
triangle consists of two fans, and the apex of a fan is a vertex of b-edge in the wheel
and the boundary of the fan is a branch of the star-skeleton. We shall give a high-level
description of the algorithm.

Algorithm: The algorithm first identifies if G is an RSSG. If it is, then label the b-
edges and t-edges for the wheel and the fans of this RSSG. For a given resolution of
the drawing, we can determine the size of a fan that is a function of the number of
radial edges, k, and the distance δ between the apex and its closest boundary vertex.
Now, the algorithm will draw a wheel. During the arrangement of its radial edges, we
take the size of the attached fans into a count. There are two types of drawings for
fans: FAN1 and FAN2.
FAN1: Let v be the apex of a fan and (v1, v2, ..., vk) be the sequence of vertices on its
boundary (the interior vertices of the (k-1)-spined triangle. The drawing is similar to
that in [12] (refer to Lemma 7 of [12]).
FAN2: The apex v' lies on the opposite side of the apex v along (v1, v2, ..., vk). Since all
the edges in FAN2 are stable edges, they belong to MWT(S) [15].

Theorem 4. Graph RSSG is minimum-weight drawable.

Proof Sketch: We shall prove that each edge of the drawing belongs to the MWT of
this point set. There are three types of edges in the drawings: the base s-edges, the b-
edges, and the t-edges. The s-edges obviously belong to the MWT of this point set
since all these edges are on the convex hull of this point set. Let us consider b-edges
and t-edges. By our algorithm and by Lemma 7 in [12], all individual fans belong to
their own MWTs respectively. We can show that they also belong to the final MWT by
proving all the b-edges separating them belong to MWT(S) (using the local replacing
argument [16]).

References

[1] Bose P., Di Battista G., Lenhart W., and Liotta G., Constraints and representable trees,
Proceedings of GD94, LNCS 894, pp. 340-351.
[2] Di Battista G., Lenhart W., and Liotta G., Proximity drawability: a survey, Proceedings of
GD94, LNCS 894, pp. 328-339.
[3] Di Battista G., Eades P., Tamassia R., and Tollis I.G., Algorithms for automatic graph
drawing: an annotated bibliography, Computational Geometry: Theory and Applications, 4,
1994, pp. 235-282.
Triangulations without Minimum-Weight Drawing 173

[4] Di Battista G., Eades P., Tamassia R., and Tollis I.G., Graph Drawing: Algorithms for the
Visualization of Graphs, Prentice-Hall, 1999 (ISBN: 0-13-301615-3).
[5] Dillencourt M., Toughness and Delaunay triangulations, Discrete and Computational
Geometry, 5, 1990, pp. 575-601.
[6] Eades P. and Whitesides S., The realization problem for Euclidean minimum spanning tree
th
is NP-hard, Proceedings of 10 ACM Symposium on Computational Geometry, Stony Brook,
NY, 1994, pp. 49-56.
[7] ElGindy H., Liotta G., Lubiw A., Mejier H., and Whitesides S., Recognizing rectangle of
influence drawable graphs, Proceedings of GD94, LNCS 894, pp. 252-263.
[8] Fary I., On straight lines representations of planar graphs, Acta Sci. Math., Szeged, 11,
1948, pp. 229-233.
[9]Gimikowski R, Properties of some Euclidean proximity graphs, Pattern Recognition letters,
13, 1992, pp. 417-423.
[10] Keil M., Computing a subgraph of the minimum-weight triangulation, Computational
Geometry: Theory and Applications, 4, 1994, pp. 13-26.
[11] Lenhart W. and Liotta G., Drawing outerplanar minimum-weight triangulations,
Information Processing Letters, 57, 1996, pp. 253-260.
[12] Lenhart W. and Liotta G., Drawable and forbidden minimum-weight triangulations,
Proceedings of GD97, LNCS 894, pp. 1-12.
[13] Matula D. and Sokal R., Properties of Gabriel graphs relevant to geographic variation
research and the clustering of points in the plane, Geographical Analysis, 12(3), 1980, pp.
205-222.
[14] Monma C. and Suri S., Transitions in geometric minimum spanning trees, Proceedings of
th
7 ACM Symposium on Computational Geometry, North Conway, NH, 1991, pp. 239-249.
[15] Preparata F. and Shamos M., Computational Geometry, 1985, Springer-Verlag.
[16] Wang C.A., Chin F., and Xu Y., A new subgraph of minimum-weight triangulations,
Journal of Combinatorial Optimization, 1997, pp. 115-127.
[17] Wang C.A., Chin F., and Yang B.T., Maximum Weight Triangulation and Graph Drawing,
Information Processing Letters, 70(1), 1999, pp. 17-22.
Faster xact Solutions for Max Sat

Jens Gramm and Rolf Niedermeier

Wilhelm-Schickard-Institut für Informatik, Universität Tübingen,


Sand 13, D-72076 Tübingen, Fed. Rep. of Germany
ramm,[email protected] en.de

Ab tract. Given a boolean 2CNF formula F , the Max Sat problem


is that of nding the maximum number of clauses satis able simultane-
ously. In the corresponding decision version, we are given an additional
parameter k and the question is whether we can simultaneously satisfy
at least k clauses. This problem is NP -complete. We improve on known
upper bounds on the worst case running time of Max Sat, implying
also new upper bounds for Maximum Cut. In particular, we give exper-
imental results, indicating the practical relevance of our algorithms.
Keyword : NP -complete problems, exact algorithms, parameterized
complexity, Max Sat, Maximum Cut.

1 Introduction
The (unweighted) Maximum Satis ability problem (MaxSat) is to assign val-
ues to boolean variables in order to maximize the number of satis ed clauses
in a CNF formula. Restricting the clause size to two, we obtain Max Sat.
When turned into yes no problems by adding a goal k representing the num-
ber of clauses to be satis ed, MaxSat and Max Sat are NP -complete [7].
E cient algorithms for MaxSat, as well as Max Sat, have received consid-
erable interest over the years [2]. Furthermore, there are several papers which
deal with Max Sat in detail, e.g., [3,4,6]. These papers present approximation
and heuristic algorithms for Max Sat. In this paper, by way of contrast, we
introduce algorithms that give optimal solutions within provable bounds on the
running time. The arising solutions for Max Sat are both fast and exact and
show themselves to be interesting not only from a theoretical point of view, but
also from a practical point of view due to the promising experimental results we
have found.
The following complexity bounds are known for Max Sat: There is a de-
terministic, polynomial time approximation algorithm with approximation fac-
tor 0 931 [6]. On the other hand, unless P = NP, the approximation factor
cannot be better than 0 955 [9]. With regard to exact algorithms, research so
far has concentrated on the general MaxSat problem [1,12]. As a rule, the al-
gorithms which are presented there (as well as our own) are based on elaborate
case distinctions. Taking the case distinctions in [12] further, Bansal and Ra-
man [1] have recently presented the following results: Let be the length of
the given input formula and K be the number of clauses in . Then MaxSat

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 174 186, 2000.

c Springer-Verlag Berlin Heidelberg 2000
Faster xact Solutions for Max Sat 175

can be solved in times O(1 3413K ) and O(1 1058jF j). The latter result implies
that, using = 2K, Max Sat can be solved in time O(1 2227K ), this being
the best known result for Max Sat so far. Moreover, Bansal and Raman have
shown that, given the number k of clauses which are to be satis ed in advance,
MaxSat can be solved in O(1 3803k k 2 + ) time.
Our main results are as follows: Max Sat can be solved in times
O(1 0970jF j), O(1 2035K ), and O(1 2886k k + ), respectively. In addition, we
show that if each variable in the formula appears at most three times, then
Max Sat, still NP -complete, can be solved in time O(1 2107k ). In reference
to modi cations of our algorithms done in [8], we nd that Maximum Cut in
a graph with n vertices and m edges can be solved in time O(1 3197m). If re-
stricted to graphs with vertex degree at most three, it can be solved in time
O(1 5160n ), and, if restricted to graphs with vertex degree at most four, in time
O(1 7417n ). In addition, the same algorithm computes a Maximum Cut of size
at least k in time O(m + n + 1 7445k k), improving on the previous time bounds
of O(m + n + 4k k) [11] and O(m + n + 2 6196k k) [12].
Aside from the theoretical improvements gained by the new algorithms we
have developed, an important contribution of our work is also to show the prac-
tical signi cance of the results obtained. Although our algorithms are based on
elaborate case distinctions which show themselves to be complicated upon anal-
ysis, they are relatively easy to apply when dealing with the number of cases
the actual algorithm has to distinguish. Unlike what is known for the general
MaxSat problem [1,12], we thereby have for Max Sat a comparatively small
number of easy to check cases, making our implementation practical. Moreover,
analyzing the frequency of how often di erent rules are applied, our experiments
also indicate which rules might be the most valuable ones. Our algorithms can
compete well with heuristic ones, such as the one described by Borchers and
Furman [3].
Independent from our work, Hirsch [10] has simultaneously developed up-
per bounds for the Max Sat problem. He presents an algorithm with bounds
of O(1 0905jF j) with respect to the formula length and O(1 1893K ) with
respect to the number of clauses K, which are better than the bounds shown
for our algorithms. Moreover, he points out that his algorithm also works for
weighted versions of Max Sat. On the other hand, however, he does not give
any bound with respect to k, the number of satis able clauses. His analysis is
simpler than ours, as he makes use of a result by Yannakakis [15]. The algo-
rithm itself, however, seems much more complex and is not yet accompanied by
an implementation. The reduction step of Hirsch’s algorithm has a polynomial
complexity, as a maximum flow computation has to be done, and it would be
interesting to see whether this will turn out to be e cient in practice.
Due to the lack of space, we omitted several details and refer to [8] for more
material.
176 Jens Gramm and Rolf Niedermeier

2 Preliminaries and Transformation Rules


We use primarily the same notation as in [1,12]. We study boolean formulas in
2CNF, represented as multisets of sets (clauses). A subformula, i.e., a subset
of clauses, is denoted closed if it is a minimal subset of clauses allowing no
variable within the subset to occur outside of this subset as well. A clause that
contains the same variable positively and negatively, e.g., x x , is satis ed by
every assignment. We will not allow for these clauses here, and assume that
such clauses are always replaced by a special clause , denoting a clause that
is always satis ed. We call a clause containing r literals simply an r-clause.
Its length is therefore r. A formula in CNF is one consisting of 1- and 2-
clauses. We assume that 0-clauses do not appear in our formula, since they are
clearly unsatis able. The length of a clause is its cardinality, and the length of
a formula is the sum of the lengths of its clauses. Let l be a literal occurring
in a formula . We call it an (i j)-literal if the variable corresponding to l
occurs exactly i times as l and exactly j times as l. Analogously, we obtain
(i+ j)-, (i j + )-, and (i+ j + )-literals by replacing exactly with at least at the
appropriate positions, and get (i− j)-, (i j − )- and (i− j − )-literals by replacing
exactly with at most . Following Bansal and Raman [1], we call an (i j)-
literal an (i j)[p1 pi ][n1 nj ]-literal if the clauses containing l are of
length p1 pi and those containing l are of length n1 nj . For a
literal l and a formula , let [l] be the formula originating from by replacing
all clauses containing l with and removing l from all clauses where it occurs.
We say x occurs in a clause C if x C or x C. We write #x for the number
of occurrences of x in the formula. Should variable x and variable y occur in the
same clause, we call this instance a common occurrence and write #xy for the
number of their common occurrences in the formula. In the same way, we write
#xy for the number of common occurrences of literals x and y.
As with earlier exact algorithms for MaxSat [1,12], our algorithms are re-
cursive. They go through a number of transformations and branching rules,
where the given formula is simpli ed by assigning boolean values to some care-
fully selected variables. The fundamental di erence between transformation and
branching rules is that when the former has been given a formula, it is replaced
by one simpler formula, whereas in the latter a formula is replaced by at least
two simpler formulas. The asymptotic complexity of the algorithm is governed
by the branching rules. We will use recurrences to describe the size of the cor-
responding branching trees created by our algorithms. Therefore, we will apply
one of the transformation rules whenever possible, as they avoid a branching of
recursion.
In the rest of this section, we turn our attention to the transformation rules.
Our work here follows that of [12] closely, as the rst 4 rules have also been
used there. Their correctness is easy to check.

1. Pure Literal Rule: Replace with [l] if l is a (1+ 0)-literal.


2. Dominating 1-Clause Rule: If l occurs in i clauses and l occurs in at least i
1-clauses of , then replace with [l].
Faster xact Solutions for Max Sat 177

3. Complementary 1-Clause Rule: If = x x G, then replace with G,


increasing the number of satis ed clauses by one.
4. Resolution Rule: If = x l1 x l2 G and G does not contain x, then
replace with l1 l2 G, increasing the number of satis ed clauses by
one.
5. Almost Common Clauses Rule: If = x y x y G, then replace
with x G, increasing the number of satis ed clauses by one.
6. Three Occurrence Rules: We consider two subcases:
(a) If x is a (2 1)-literal, = x y x y x y G, and G does not
contain x, then replace with G, increasing the number of satis ed
clauses by three.
(b) If x is (2 1)-literal, and either = x y x y x l1 G or =
x y x l1 x y G, then replace with y l1 G or y l1
G, respectively, increasing the number of satis ed clauses by two.

The Almost Common Clauses Rule was introduced by Bansal and Raman [1].
In the rest of this paper, we will call a formula reduced if none of the above trans-
formation rules can be applied to it. The correctness of many of the branching
rules that we will present relies heavily on the fact that we are dealing with
reduced formulas.

3 A Bound in the Number of Satis able Clauses

Theorem 1 For a CNF formula , it can be computed in time O( +


1 2886k k) whether or not at least k clauses are simultaneously satis able.

Theorem 1 is of special interest in so-called parameterized complexity theory [5].


The corresponding bound for formulas in CNF is O( + 1 3803k k 2 ) [1]. In
k
this expression 1 3803 gives an estimation of the branching tree size. The time
spent in each node of the tree is O( ), which for CNF formulas is shown to
be bounded by k 2 [11]. For 2CNF formulas, however, we can improve this factor
K
for every node of the tree from k 2 to k: Note that the case where k 2 with
K as the number of clauses is trivial, since for a random assignment, either this
assignment or its inverse satisfy K K
2 clauses. For k > 2 , however, Max Sat
formulas have = O(k).
Before sketching the remaining proof of Theorem 1, we give a corollary. Con-
sider a 2CNF input formula in which every variable occurs at most three times.
This problem is also NP -complete [13], but we can improve our upper bounds by
excluding some of the cases necessary for general 2CNF formulas, thus obtaining
a better branching than in Theorem 1. We omit details.

Corollary 2 For a CNF formula where every variable occurs at most three
times, it can be computed in time O( + 1 2107k k) whether or not at least k
clauses are simultaneously satis able.
178 Jens Gramm and Rolf Niedermeier

We now sketch the proof of Theorem 1. We present algorithm A with the given
running time. As an invariant of our algorithm, observe that the subsequently
described branching rules are only applied if the formula is reduced, that is, there
is no transformation rule to apply. The idea of branching is based on dividing the
search space, i.e. the set of all possible assignments, into several parts, nding an
optimal assignment within each part, and then taking the best of them. Carefully
selected branchings enable us to simplify the formula in some of the branches.
Observe that the subsequent order of the steps is important. In each step, the
algorithm always executes the applicable branching rule with the lowest possible
number:
RUL 1: If there is a (9+ 1)-, (6+ 2)-, or (4+ 3+ )-literal x, then we branch
into [x] and [x]. The correctness of this rule is clear. In the worst case, a
(4 3)-literal, by branching into [x], we may satisfy 4 clauses and by branching
into [x], we may satisfy 3 clauses. We describe this situation by saying that
we have a branching vector (4 3), which expresses the corresponding recurrence
for the search tree size, solvable by standard methods (cf. [1,12]). Solving the
corresponding recurrence for the branching tree size, we obtain here the branch-
ing number 1 2208. This means that were we always to branch according to a
(4 3)-literal, the branching tree size would be bounded by 1 2208k . It is easy to
check that branching vectors (9 1) and (6 2) yield better (i.e., smaller) branching
numbers.
RUL 2: If there is a (2 1)-literal x, such that = x y x z G and
y occurs at least as often in as z, then branch as follows: If both y and z are
(2 1)-literals, branch into [x] and [x]. We can show a worst case branching
vector of (4 5) in these situations. Otherwise, i.e., if one of y and z is not (2 1),
then branch into [y] and [y]. The correctness is again obvious. However, the
complexity analysis (i.e., analysis of the branching vectors) is signi cantly harder
in this case. Keep in mind that the formula is reduced, meaning that we may
exclude all cases where a transformation rule would apply.
First, we distinguish according to the number of common occurrences of
x and y: Assuming that there are three common occurrences we either have
clauses x y , x y , clauses x y , x y , or clauses x y , x y , x y . In
the rst two cases, the Almost Common Clause Rule applies (cf. Section 2), and
in the latter case, the rst of the Three Occurrence Rules applies. Analogously,
assuming two common occurrences, either the Almost Common Clause Rule or
the second of the Three Occurrence Rules applies. Hence, because the formula
is reduced, we can neglect these cases.
It remains to consider only one common occurrence of x and y. We make the
following observation: By satisfying y, we reduce literal x to occurrence two and
the Resolution Rule applies, eliminating x and satisfying one additional clause.
On the other hand, satisfying y leaves a unit occurrence of x and the Dominating
1-Clause Rule applies, eliminating x from the formula and satisfying the two x-
clauses. Now we consider each possible occurrence pattern for literal y. If y occurs
at least four times, it is a (3+ 1)-, (1 3+ )-, or a (2+ 2+ )-literal, and using the
given observation, in the worst cases we obtain branching vectors (4 3), (2 5),
Faster xact Solutions for Max Sat 179

or (3 4). If y occurs only three times, it is a (2 1)- or (1 2)-literal. We then


take a literal z into consideration as well. We know from the way in which y was
chosen that the literal z is also of occurrence three. We consider all combinations
of y and z, which are either (2 1)- or (1 2), and also cover a possible common
occurrence of y and z in one clause. Branching as speci ed, we in the worst case
obtain a branching vector (2 6), namely when both y and z are (1 2) and there
is no common clause of y and z. We omit the details here.
Summarizing, for RULE 2, the worst observed branching vector is (2 5),
which corresponds to the branching number 1 2365.
RUL 3: If there is a (3+ 3+ )- or (4+ 2)-literal x, then branch into [x]
and [x]. Trivially we get the branching vectors (3 3) and (4 2), implying the
branching numbers 1 2600 and 1 2721.
RUL 4: If there is a (c 1)-literal x with c 3 4 5 6 7 8 , then choose a
literal y occurring in a clause x y and branch into [y] and [y]. Again, this
is clearly correct.
With regard to the complexity analysis, we observe that by satisfying y, a
unit occurrence of x arises and the Dominating 1-Clause Rule applies, satisfying
all x-clauses. Having reached RULE 4, we know that all literals in the formula
occur at least four times, as the 3-occurrences are eliminated by RULE 2. We
consider di erent possible cases for y, namely y being a (3+ 1)-, (1 3+ )-, or
(2+ 2+ )-literal, and we consider all possible numbers of common occurrences of
x and y. Using the given observation, we can show a branching vector of (1 6)
in the worst case, namely for a (3 1)-literal x, a (1 3)-literal y, and #xy = 1.
This corresponds to the branching number 1 2852. Again, we omit the details.
RUL 5: By this stage, there remain only (2 2)-, (3 2)-, or (2 3)-literals
in the formula. RULE 5 deals with the case that there is a (2 2)-literal x. Our
branching rule now is more involved. We choose a literal y occurring in a clause
x y and a literal z occurring in a clause x z . For x having at least two
common occurrences with y or z, we branch into [x] and [x]. If this is not
the case but y and z have at least two common occurrences, we branch into [y]
and [y]. It remains that #xy = 1, #xz = 1, and #yz 1. If y and z have a
common occurrence in a clause y z , we branch into [y], [yz], and [y z]. If
not, i.e. there is no clause y z , we branch into [yz], [yz], [y z], and [y z].
It is easy to verify that we have covered all possible cases.
Regarding the complexity analysis, we rst make use of the following: When-
ever two literals being (2 2) or (3 2) have at least two common occurrences, we
can take one of them and branch setting it true and false. In the worst case, this
results in the branching vector (2 5) with branching number 1 2366.
Thus, we are only left with situations in which #xy = 1, #xz = 1, and
#yz 1. For these cases, we consider all arrangements of x, y and z possible,
with x being (2 2), y being (2+ 2+ ) and z being (2+ 2+ ). We obtain good
branching numbers of 1 2886 for vectors as, such as (5 6 5 6) in most cases
by branching into [yz], [yz], [y z], and [y z]. Only for a possible common
occurrence of y and z in a clause y z would the branching number be worse.
We avoid this by branching into [y], [yz], and [y z] instead. Here, we study
180 Jens Gramm and Rolf Niedermeier

in more detail what happens in the single subcases: Setting y true in the rst
subcase of the branching, we satisfy two y-clauses. By setting y false and z true in
the second subcase, we directly eliminate two y- and two z-clauses. Consequently,
the Dominating Unit Clause Rule now applies for x and satis es two additional
clauses. In total, we satisfy six clauses in the second subcase. Setting y and z
false in the third subcase, we satisfy two y- and two z-clauses. In addition, there
arise unit clauses for x and x such that the Complementary 1-Clause Rule and
then the Resolution Rule apply, satisfying two additional clauses. Summarizing
these considerations, the resulting branching vector is (2 6 6) with branching
number 1 3022.
For our purpose, this vector is still not good enough. However, we observe
that in the rst branch x, is reduced to occurrence three, meaning that in this
branch the next rule that will be applied will undoubtly be RULE 2. We recall
that RULE 2 yields the branching vector (2 5), and possibly even a better one.
Combining these two steps, we obtain the branching vector (4 7 6 6) and the
branching number 1 2812.
Note that in RULE 5, we have the real worst case of the algorithm, namely
for the situation of #xy = #xz = #yz = 1 and y and z having their common
occurrence in a clause y z . For this situation, we can nd no branching rule
improving the branching number 1 2886.
RUL 6: When this rule applies, all literals in the formula are either (3 2)
or (2 3). We choose a (3 2)-literal x. The branching instruction is now primarily
the same as in RULE 5 above. However, it is now possible that there is no literal
z occurring in a clause x z , as the two x-occurrences may be in unit clauses.
In this case, i.e. for two x-unit clauses, we branch into [y] and [y]. Having two
or more common occurrences for a pair of x, y, and z, we branch as in RULE 5.
For the remaining cases, i.e. #xy = 1, #xz = 1, and #yz 1, we branch into
[y], [yz], and [yz].
The complexity analysis works analogously to RULE 5. For #xy = 1, #xz =
1, and #yz 1 we test all possible arrangements of x, y, and z with x being
(3 2) and y and z being either (3 2) or (2 3). The worst case branching vector
in these situations, when branching into [y], [yz], and [y z], is (2 9 5) and
yields the branching number 1 2835. Again, we omit the details.

4 A Bound in the Formula Length


Compare Theorem 3 with the O(1 1058jF j) time bound for MaxSat [1]. Observe
that when the exponential bases are close to 1, even small improvements in the
exponential base can mean signi cant progress.

Theorem 3 Max Sat can be solved in time O(1 0970jF j).


We sketch the proof of Theorem 3, presenting Algorithm B with the given run-
ning time. For the most part, it is equal to Algorithm A, sharing the branching
instructions of RULEs 1 to 4. Taking up ideas given in [1], we replace RULE 5
and 6 with new branching instructions RULE 50 , 60 , 70 , and 80 .
Faster xact Solutions for Max Sat 181

For the rules known from Algorithm A, it remains to examine their branching
vectors with respect to formula length. As the analysis is in essence the same
as that of the proof for Theorem 1, we omit the details once again, while only
stating that the worst case branching vector with respect to formula length for
RULEs 1 to 4 is (7 8) (branching number 1 0970), and continue with the new
instructions:
RUL 50 : Upon reaching this rule, all literals in the formula are of type
(2 2), (3 2), or (2 3). RULE 50 deals with the case that there is a (3 2)-literal
x, which is not (3 2)[2 2 2][1 2].
If x is a (3 2)[2 2 2][2 2]-literal, we branch into [x] and [x]. Counting the
literals eliminated in either branch, we easily obtain a branching vector of (8 7).
If x is (3 2)[2 2 2][1 1]-literal with clauses x y1 , x y2 , and x y3 in
which some of y1 , y2 , and y3 may be equal, we branch into [x] and [xy1 y2 y3 ].
This is correct, as should we want to satisfy more clauses by setting x to true
than by setting x to false, all y1 , y2 and y3 must be falsi ed. We easily check
that if all y1 , y2 , and y3 are equal, we obtain a branching vector of (10 10).
For at least two literals of y1 , y2 , and y3 being distinct, we eliminate in the
rst subcase eight literals, namely the literals in the satis ed x-clauses and the
falsi ed x-literals. In the second subcase, we eliminate x, having ve occurrences
and two variables having at least four occurrences. This gives a branching vector
of (8 13), corresponding to the branching number 1 0866.
If x is ultimately a (3 2)[1 2 2][2 2]-literal with clauses x z1 , x z2 in
which z1 and z2 may be equal, we branch into [x] and [xz1 z2 ]. The correctness
is shown as in the previous case. In the rst branch, we directly eliminate eight
literals. In the second branch, we eliminate literal x having ve occurrences and
at least one literal having four or ve occurrences. This gives a branching vector
of (7 9), corresponding to the branching number 1 0910.
By using these branching instructions we obtain for RULE 50 the worst case
branching vector (8 7) in terms of formula length, namely for a (3 2)[2 2 2][2 2]-
literal x. This corresponds to the branching number 1 0970 and will turn out to
be the overall worst case in our analysis of the algorithm.
RUL 60 : Upon reaching this rule, all remaining literals in the formula are
either (2 2), (3 2)[2 2 2][1 2], or (2 3)[1 2][2 2 2]. RULE 60 deals with the case
that there is a (2 2)[2 2][1 2]-literal x, i.e. a (2 2)-literal having a unit occurrence
of x. As this rule is similar to RULE 50 , we omit the details here and claim a
worst case branching vector of (5 12) corresponding to the number 1 0908.
RUL 70 eliminates all remaining (3 2)-literals, namely those of type
(3 2)[2 2 2][1 2]. We select literals y1 , y2 , y3 , and z from clauses x y1 , x y2 ,
x y3 , and x z . If there is a variable y which equals at least two of the
variables y1 , y2 , y3 , and z, we branch into subcases [y] and [y]. Otherwise,
i.e. all variables y1 , y2 , y3 , and z are distinct, we branch into subcases [y1 x],
[y1 xy2 y3 z], and [y1 ]. The analysis of this rule is omitted here, as it is in large
extent analogous to the nal RULE 80 , which we will study in more detail.
RUL 80 applies to the (2 2)[2 2][2 2]-literals, which are the only literals
remaining in the formula. Consider clauses x y1 , x y2 , x z1 , and x z2 .
182 Jens Gramm and Rolf Niedermeier

In the case where there is a variable y which equals two of the variables y1 ,
y2 , z1 , or z2 , i.e. y has two or more common occurrences with x, we branch
into [y] and [y]. We can easily see how to obtain a branching vector of (8 8)
and the branching number 1 0906, as setting a value for y implies a value for x.
Therefore, we proceed to the case of distinct variables y1 , y2 , z1 , and z2 .
First, we discuss the correctness of the subcases. The correctness of the sub-
cases [y1 x], [y1 x], [y1 x] and [y1 x] is obvious. Now assume in the second
branch that a partner of x, e.g. z1 , would be falsi ed. Then, in comparison to
the rst branch, we would lose the now falsi ed clause x z1 , but could, in the
best case, gain one additional x-clause. On the other hand, assume that y2 would
be satis ed. Then in the second branch, as compared with the rst one, we can
not gain any additional x-clause, but could lose some x-clauses. This shows that
in the second branch, we can neglect the considered assignments, as they do
not improve the result obtained in the rst branch. Analogously, we obtain the
additional assignments in the fourth branch and, therefore, branch into subcases
[y1 x], [y1 xy2 z1 z2 ], [y1 x], and [y1 xy2 z1 z2 ]. Knowing that all literals in the
formula are (2 2)[2 2][2 2], we obtain the vector (11 20 11 20).
As this vector does not satisfy our purpose, we further observe that in branch
[y1 x] and in branch [y1 x], there are undoubtly literals reduced to an occur-
rence of three or two. These literals are either eliminated due further reduction,
or give rise to a RULE 2 branching in the next step. We check that the worst
case branching vector in RULE 2 is (7 10). Combining these steps, we are now
able to give a worst case branching vector for RULE 80 of (18 21 20 18 21 20),
corresponding to the branching number 1 0958.
This completes our algorithm and its analysis in terms of formula length.
Omitting some details, we have shown a worst case branching number of 1 0970
in all branching subcases, which justi es the claimed time bound.
For MaxSat in terms of the number of clauses, the upper time bound
O(1 3413K ) is known [1]. Setting = 2K in Theorem 3, we obtain:

Corollary 4 Max Sat can be solved in time O(1 2035K ).

Using this algorithm we can also solve the Maximum Cut problem, as we
can translate instances of the Maximum Cut problem into 2CNF formulas [11].
In fact, these formulas exhibit a special structure and we can modify and even
simplify the shown algorithm, in order to obtain better bounds on formulas hav-
ing this special structure. As shown in [8] on 2CNF formulas generated from
Maximum Cut instances, Max Sat can be solved in time O(1 0718jF j) and
O( + 1 2038k ), where k is the maximum number of satis able clauses in the
formula. This implies the bounds for Maximum Cut shown in Theorem 5. Ob-
serve for part (2) that Maximum Cut, when restricted to graphs of vertex degree
at most three, is NP-complete [7].

Theorem 5 1. For a graph with n vertices and m edges the Maximum Cut
problem is solvable in O(1 3197m ) time.
Faster xact Solutions for Max Sat 183

. If the graph has vertex degree at most three, then Maximum Cut can be
solved in time O(1 5160n ). If the graph has vertex degree at most four, then
Maximum Cut can be solved in time in O(1 7417n ).
3. We can compute in time O(m+n+k 1 7445k k) whether there is a maximum
cut of size k.

5 xperimental Results

Here we indicate the performance of our algorithms A (Section 3) and B (Sec-


tion 4), and compare them to the two-phase heuristic algorithm for MaxSat
presented by Borchers and Furman [3]. The tests were run on a Linux PC with an
AMD K6 processor (233 MHz) and 32 MByte of main memory. All experiments
are performed on random 2CNF-formulas generated using the MWFF package
from Bart Selman [14]. We take di erent numbers of variables and clauses into
consideration and, for each such pair, generate a set of 50 formulas. As results,
we give the average for these sets of formulas. If at least one of the formulas in a
set takes longer than 48 hours, we do not process the set and indicate this in the
table by not run . Our algorithms are implemented in JAVA. This gives credit
to the growing importance of JAVA as a convenient and powerful programming
language. Furthermore, our aim is to show how the algorithms limit the expo-
nential growth in running time, being e ective independent of the programming
language. The algorithm of Borchers and Furman is coded in C. Coding a sim-
ple program for Fibonacci recursion in C and JAVA and running it in the given
environment, we found the C program to be faster by a factor of about nine.
Due to the di erent performance of the programming languages, it is di cult
to only compare running times. As a fair measure of performance we, therefore,
also provide the size of the scanned branching tree, as it is responsible for the
exponential growth of the running time. More precisely, for the branching tree
size we count all inner nodes, where we branch towards at least two subcases.
There is almost no di erence in the performance between algorithms A and B;
therefore they are not listed separately. This is plausible, as in the processing of
random formulas, the bad case situations in whose handling our algorithms
di er, are rare. On problems of small size, the 2-phase-EPDL (Extended Davis-
Putnam-Loveland) algorithm of Borchers and Furman [3] has smaller running
times despite its larger branching trees. One reason may also be the di erence
in performance of JAVA and C. Nevertheless, with a growing problem size our
algorithm does a better job in keeping the exponential growth of the branching
tree small, which also results in signi cantly better running times, see Table 1.
In order to gain insight into the performance of our rules, we collected some
statistics on the application of the transformation and branching rules. For al-
gorithm B, we examine which rules apply how often during a run on random
formulas. First, we consider the transformation rules. Note that at one point,
several transformation rules could be applicable to a formula. Therefore, for judg-
ing the results, it is important to know the sequence in which the application
of transformation rules is tested. We show the results in Table 2, with the rules
184 Jens Gramm and Rolf Niedermeier

Algorithm B 2-Phase- DPL


n m Tree Time Tree Time
25 100 16 0.77 961 0.27
200 108 1.78 37 092 1.93
400 385 5.41 514 231 43.72
800 752 12.59 2 498 559 9:16.51
50 100 6 0.70 69 0.48
200 320 4.48 611 258 27.11
400 18 411 3:45.80 not run
100 200 36 1.14 10 872 2.14
400 91 039 23:50.09 not run
200 400 1 269 21.87 not run

Table 1. Comparison of average branching tree sizes (Tree) and average running
times (Time), given in minutes:seconds, of our Algorithm B and the 2-phase-
EDPL by Borchers and Furman. Tests are performed on 2CNF formulas with
di erent numbers of variables (n) and clauses (m).

Variables 25 50
Clauses 100 200 400 800 100 200 400
Search Tree Size 16 108 385 752 6 320 18 411
Almost Common Cl. 10 38 102 235 4 76 1 421
Pure Literals 26 82 111 99 34 658 11 476
Dominating 1-Clause 123 704 1 844 2 839 42 3387 134 425
Complementary 1-Clause 40 571 2 831 6 775 4 1173 128 030
Resolution 22 57 54 30 25 726 10 821
Three Occurrences 1 0 1 4 7 0 1 35
Three Occurrences 2 6 25 51 47 2 86 2 810

Table 2. Statistics about the application of transformation rules in algorithm


B on random formulas.

being in the order in which they are applied. Considering the shown and addi-
tional data, we nd application pro les being characteristic for variable/clause
ratios. We observe for formulas having a higher ratio, i.e. with fewer clauses
for a xed number of variables, that the Dominating 1-Clause Rule is the rule
which is applied most often. With lower ratio, i.e. when we have more clauses
for the same number of variables, the Complementary 1-Clause Rule gains in
importance.
Besides the transformation rules, we also study the frequency in which the
single branching rules are applied. Recall that algorithm B has a list of eight
di erent cases with corresponding branching rules. We show the results col-
lected during runs on random formulas in Table 3. We observe that the most
branching steps occur with RULE 1 or RULE 2. The other rules are used in
less than one percent of the branchings. It is reasonable that in formulas with
Faster xact Solutions for Max Sat 185

Variables 25 50
Clauses 100 200 400 800 100 200 400
Tree Size 15.26 107.4 384.4 751.18 5.26 319.36 18 410.38
RUL 1 10.62 102.32 381.76 749.42 0.88 217.36 18 300.02
RUL 2 4.02 3.54 1.18 0.32 4.34 100.12 97.54
RUL 3 0.5 0.9 0.94 0.64 0 1.44 11.14
RUL 4 0.08 0.44 0.26 0.36 0.04 0.34 1.22
RUL 50 0.04 0.12 0.16 0.26 0 0.08 0.3
RUL 60 0 0.06 0.1 0.16 0 0.02 0.16
RUL 70 0 0.02 0 0 0 0 0
RUL 80 0 0 0 0.02 0 0 0

Table 3. Statistics on the application of branching rules in algorithm B on


random formulas having n variables and m clauses. Recall that each result is the
average on 50 formulas to understand that we give non-integer values. Thereby
we even see the application of very rare rules.

a high variable/clause ratio, i.e. fewer clauses, we have more variables with an
occurrence of three. Therefore, the rule applied most while processing these for-
mulas is RULE 2. As the variable/clause ratio shifts down, i.e. when we have
more clauses for the same number of variables, there necessarily are more vari-
ables with a large number of occurrence in the formula. Consequently, RULE 1
becomes dominating.
Considering our statistics, we can roughly conclude: Some of the transforma-
tion rules are, in great part, responsible for the good practical performance of
our algorithms, as they help to decrease the search tree size. The less frequent
transformation rules and the rather complex set of branching rules, on the other
hand, are mainly important for guaranteeing good theoretical upper bounds.

6 Open Questions
There remains the option of investigating exact algorithms for other versions
of Max Sat, for example, Max3Sat. Furthermore, n being the number of
variables, can Max Sat be solved in less than 2n steps? Regarding Hirsch’s
recent theoretical results [10], it seems a promising idea to combine our algorithm
with his, in order to improve the upper bounds for Max Sat even further.

References
1. N. Bansal and V. Raman. Upper bounds for MaxSat: Further improved. In Pro-
ceedings of the 10th International Symposium on Algorithms and Computation,
Lecture Notes in Computer Science, Chennai, India, Dec. 1999. Springer-Verlag.
2. R. Battiti and M. Protasi. Approximate algorithms and heuristics for MAX-SAT.
In D.-Z. Du and P. M. Pardalos, editors, Handbook of Combinatorial Optimization,
volume 1, pages 77 148. Kluwer Academic Publishers, 1998.
186 Jens Gramm and Rolf Niedermeier

3. B. Borchers and J. Furman. A two-phase exact algorithm for MAX-SAT and


weighted MAX-SAT problems. Journal of Combinatorial Optimization, 2(4):299
306, 1999.
4. J. Cheriyan, W. H. Cunningham, L. Tuncel, and Y. Wang. A linear programming
and rounding approach to Max 2-Sat. DIMACS Series in Discrete Mathematics
and Theoretical Computer Science, 26:395 414, 1996.
5. R. G. Downey and M. R. Fellows. Parameterized Complexity. Springer-Verlag,
1999.
6. U. Feige and M. X. Goemans. Approximating the value of two prover proof systems,
with applications to MAX 2SAT and MAX DICUT. In 3d I Israel Symposium
on the Theory of Computing and Systems, pages 182 189, 1995.
7. M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory
of NP-completeness. Freeman, San Francisco, 1979.
8. J. Gramm. xact algorithms for Max2Sat: and their applications. Diplomar-
beit, Universität Tübingen, 1999. Available through http://www-fs.informatik.uni-
tuebingen.de/∼niedermr/publications/index.html.
9. J. Hastad. Some optimal inapproximability results. In Proceedings of the 29th
ACM Symposium on Theory of Computing, pages 1 10, 1997.
10. . A. Hirsch. A new algorithm for MAX-2-SAT. Technical Report TR99-036,
CCC Trier, 1999. To appear at STACS 2000 .
11. M. Mahajan and V. Raman. Parameterizing above guaranteed values: MaxSat and
MaxCut. Journal of Algorithms, 31:335 354, 1999.
12. R. Niedermeier and P. Rossmanith. New upper bounds for MaxSat. In Proceedings
of the 26th International Conference on Automata, Languages, and Programming,
number 1644 in Lecture Notes in Computer Science, pages 575 584. Springer-
Verlag, July 1999. Long version to appear in Journal of Algorithms.
13. V. Raman, B. Ravikumar, and S. S. Rao. A simpli ed NP-complete MAXSAT
problem. Information Processing Letters, 65:1 6, 1998.
14. B. Selman. MWFF: Program for generating random Maxk-Sat instances. Available
from DIMACS, 1992.
15. M. Yannakakis. On the approximation of maximum satis ability. Journal of Al-
gorithms, 17:475 502, 1994.
Dynamically Maintaining the Widest k-Dense
Corridor

Subhas C. Nandy1 , Tomohiro Harayama2, and Tetsuo Asano2


1
Indian Statistical Institute, Calcutta - 700 035, India
2
School of Information Science, Japan Advanced Institute of Science and
Technology, Ishikawa 923-1292, Japan

Ab tract. In this paper, we propose an improved algorithm for dynam-


ically maintaining the widest k-dense corridor as proposed in [6]. Our
algorithm maintains a data structure of size O(n2 ), where n is the num-
ber of points present on the floor at the current instant of time. For each
insertion/deletion of points, the data structure can be updated in O(n)
time, and the widest k-dense corridor in the updated environment can
be reported in O(kn + nlogn) time.

1 Introduction

Given a set S of n points in the Euclidean plane a corridor C is de ned as an


open region bounded by parallel straight lines 0 and 00 such that it intersects the
convex hull of S [3]. The width of the corridor C is the perpendicular distance
between the bounding lines 0 and 00 . The corridor is said to be k-dense if C
contains k points in its interior. The widest k-dense corridor through S is a
k-dense corridor of maximum width [1]. See Figure 1 for illustration.
The widest empty corridor problem was rst proposed by [3] in the context of
robot motion planning where the objective was to nd an widest straight route
avoiding obstacles. They also proposed an algorithm for this problem with time
and space complexities O(n2 ) and O(n) respectively. The widest k-dense corri-
dor problem was introduced in [1] along with an algorithm of time and space
complexities O(n2 logn) and (n2 ) respectively. Here the underlying assumption
is that the robot can pass through (or in other words, can tolerate collission
with) a speci ed number (k) of obstacles. In [4], the space complexity of the
widest k-dense corridor problem was improved to O(n). In the same paper, they
have suggested an O(nlogn) time and O(n2 ) space algorithm for maintaining the
widest empty corridor where the set of obstacles is dynamically changing. How-
ever, the dynamic problem for general k ( 0), was posed as an open problem.
In [6], both the static and dynamic versions for the k-dense corridor problem are
studied. The time and space complexities of their algorithm for the static version
?
This work was done when the author was visiting School of Information Science,
Japan Advanced Institute of Science and Technology, Ishikawa 923-1292, Japan.

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 187 198, 2000.

c Springer-Verlag Berlin Heidelberg 2000
188 Subhas C. Nandy, Tomohiro Harayama, and Tetsuo Asano

Fig. 1. Two types of corridors

of the problem are O(n2 ) and O(n2 ) respectively. For the dynamic version, their
algorithm is the pioneering work. Maintaining an O(n2 ) size data structure they
proposed an algorithm which reports the widest k-dense corridor after the inser-
tion and deletion of a point. The time complexity of their algorithm is O(Klogn),
where O(K) is the combinatorial complexity of ( k)-level of an arrangement of
n half-lines, each of them belongs to and touching the same side of a given line.
They proved that the value of K is O(kn) in the worse case.
In this paper, we improve the time complexity of the dynamic version of the
widest k-dense corridor problem. Given O(n2 ) space for maintaining the data
structure, our algorithm can update the data structure and can report the widest
k-dense corridor in O(K+nlogn) time. As it is an online algorithm, this reduction
in the time complexity is de nitely important.

Geometric Preliminaries

Throughout the paper, we assume that the points in S are in general position,
i.e., no three points in S are collinear, and the lines passing through each pair
of points have distinct slope. Theorem 1, stated below, characterizes a widest
corridor among the points S.

Theorem 1. [1,3,4,6] Let C  be the widest corridor with bounding lines 0


and
00
. Then C  must satisfy the following conditions :

0
(A) touches two distinct points pi and pj of S and 00 touches a single point
pm of S, or
(B) 0 and 00 contain points pi and pj respectively, such that 0 and 00 are per-
pendicular to the line through pi and pj .

From now onwards, a k-dense corridor satisfying conditions (A) and (B) will be
referred to as type-A and type-B corridors respectively (see Figure 1).
Dynamically Maintaining the Widest k-Dense Corridor 189

2.1 Relevant Properties of Geometric Duality

We follow the same tradition [1,3,4,6] of using geometric duality for solving this
problem. It maps (i) a point p = (a b) to the line D(p) : y = ax − b in the dual
plane, and (ii) a non-vertical line : y = mx − c to the point D( ) = (m c) in
the dual plane. Needless to say, a point p is below (resp., on, above) a line in
the primal plane if and only if D(p) is above (resp., on, below) D( ) in the dual
plane. A line passing through two points p and q in the primal plane, corresponds
to the point of intersection of the lines D(p) and D(q) in the dual plane, and
vice versa.
For the k-dense vertical corridors, we can not apply geometric duality theory, and
so we apply the vertical line sweep technique. We maintain a balanced binary leaf
search tree, say BB( ) tree, with the existing set of points in the primal plane.
Here each point in S appears at the leaf level, and is attached with the width of
the widest k-dense vertical corridor with its left boundary passing through that
point. At each non-leaf node, we attach the width of the widest vertical k-dense
corridor in the subtree rooted at that node. It can be easily shown that for each
insertion/deletion of a point, the necessary updates in this data structure and
the reporting of widest k-dense vertical corridor can be done in O(k +logn) time.
Below, we concentrate on studying the properties of the non-vertical corridors.
Consider the two bounding lines 0 and 00 of a corridor C in the primal plane,
which are mutually parallel. The corresponding two points, D( 0 ) and D( 00 )
in the dual plane, will have the same x-coordinate. Thus a corridor C will be
represented by a vertical line segment joining D( 0 ) and D( 00 ) in the dual plane,
and will be denoted by D(C). The width of C is jy(D( ))−y(D( 2))j , and will be
1+(x(D( )))
referred as the dual distance between the points D( 0 ) and D( 00 ). Here x(p) and
y(p) denote the x- and y-coordinates of the point p respectively.
Let H = hi = D(pi ) pi S be the set of lines in the dual plane corresponding
to the n points of S in the primal plane. Let p be a point inside the corridor C. In
the dual plane, the points D( 0 ) and D( 00 ) will lie in the opposite sides of the line
D(p). Now, we have the following observation, which is the direct implication of
Theorem 1. The dual of a k-dense corridor is characterized in Observation 2.

0 00
Observation 1 Let C be a corridor bounded by a pair of parallel lines , .

Now, if C is a type-A corridor, 0 passes through pi and pj , and 00 passes


through pm . This implies that D( 0 ) corresponds to a vertex of A(H), which
is the point of intersection of hi and hj (denoted by hi hj ), and D( 00 )
corresponds to a point on hm satisfying x(D( 0 )) = x(hi hj ).
If C is a type-B corridor, 0 and 00 pass through the two points pi and pj
respectively. This implies, D( 0 ) and D( 00 ) will correspond to the two points
on hi and hj respectively, satisfying x(D( 0 )) = x(D( 00 )) = −(1 x(hi hj )).
190 Subhas C. Nandy, Tomohiro Harayama, and Tetsuo Asano

Thus, A non-vertical type-A corridor may uniquely correspond to a vertex of


A(H), and a non-vertical type-B corridor may also uniquely correspond to an
edge of A(H), on which its upper end point lies.

Observation 2 A corridor C is said to be k-dense if and only if there are exactly


k lines of H that intersect the vertical line segment D(C), representing the dual
of the corridor C, and will be commonly referred to as a k-stick.

Thus, recognizing a widest k-dense non-vertical corridor in the primal plane is


equivalent to nding a k-stick in the dual plane having maximum dual length.

3 Widest Non-vertical k-Dense Corridor

We now explain an appropriate scheme for maintaining the widest non-vertical


k-dense corridor dynamically. Let A(H) denote the arrangement of the set of
lines H [2]. The number of vertices, edges and faces in A(H) are all O(n2 ). In
the dynamic scenario, we need to suggest an appropriate data structure which
can be updated for insertion/deletion of points, and the widest k-dense corridor
can be reported e ciently in the changed scenario. As the deletion is symmetric
to insertion, we shall explain our method for insertion of a new point in S only.

3.1 Data Structures

We dynamically maintain the following data structure which stores the arrange-
ment of the lines in H. It is de ned using the concept of levels as stated below.

Fig. 2. Demonstration of levels in an arrangement of lines

De nition 1. [2] A point in the dual plane is at level (0 n) if there


are exactly lines in H that lie strictly below . The -level of A(H) is the
closure of a set of points on the lines of H whose levels are exactly in A(H),
and is denoted as L (H). See Figure 2 for an illustration.
Dynamically Maintaining the Widest k-Dense Corridor 191

Clearly, L (H) is a polychain from x = − to x = , and is monotone in-


creasing with respect to x-axis. The vertices on L (H) is precisely the union of
vertices of A(H) at levels − 1 and . The edges of L (H) are the edges of A(H)
at level . In Figure 2, a demonstration of levels in the arrangement A(H) is
shown. Here the thick chain represents L1 (H). Among the vertices of L1 (H),
those marked with empty (black) circles are appearing in level 0 (2) also. Each
vertex of the arrangement A(H) appears in two consecutive levels, and each edge
of A(H) appears in exactly one level. We shall store L (H), 0 n in a data
structure as described below.
level-structure
It is an array of size n, called primary structure, whose -th element is com-
posed of the following elds :

level-id : an integer containing the level-id .


left-prt : pointing to the left most node of the secondary structure T .
root-ptr : pointing the root node of the secondary structure T .
list-ptr : pointing to a linear link list, called T MP-list, whose each element is
a tuple ( ) of pointers. The T MP-list data structure will be explained
after de ning the secondary structure.

The secondary structure at a particular level , denoted as T , is organized


as a height balanced binary tree (AVL-tree). The nodes of this tree correspond
to the vertices and edges at level in left to right order. In addition, each node
is attached with the following information.

Two integer elds, called L N and MAX, are attached with each node. The
L N- eld contains the dual length of the k-stick attached to it. Here we ex-
plicitly mention that, if a node corresponds to a vertex of the arrangement,
it de nes at most one k-stick, but if it corresponds to an edge, more than
one k-sticks may be de ned by that edge. In that case, the L N- eld will
contain the length of the one having maximum length among them. A node
(corresponding to an edge) de ning no k-stick will contain a value 0 in its
L N- eld. The MAX- eld contains the maximum value of the L N elds
among all the nodes in the subtree rooted at that node. This actually indi-
cates the widest one among all the k-dense corridors stored in the subtree
rooted at that node.
Apart from its two child pointers, each node of the tree has three more pointers.
parent pointer : It helps in traversing the tree from a node towards its root.
The parent pointer of the root node points to the corresponding element
of the primary structure.
neighbor-pointer : It helps the constant time access of the in-order successor
of a node.
self-indicator : As an element representing a vertex appears in the secondary
structure (T ) of two consecutive levels, each of them is connected with
the other using this pointer.
192 Subhas C. Nandy, Tomohiro Harayama, and Tetsuo Asano

By Observation 1 and succeeding discussions, a type-A k-dense corridor corre-


sponds to a vertex of the arrangement. A vertex v A(H) appearing in levels,
say and + 1, may correspond to at most two k-sticks (corresponding to two
di erent type-A k-dense corridors), whose one end point is positioned at vertex
v, and their other end points lie on some edge at levels − k − 1 (if − k − 1 > 0)
and + k + 2 (if + k + 2 < n) respectively, and are attached to the vertex v
appearing in the corresponding levels. An edge e appearing at level stores at
most one k-stick which is de ned by it and another edge in ( − k − 1)-th level
and appears vertically below it.
T MP-list : After the addition of a new point p in S, its dual line h = D(p)
is inserted in A(H) to get an updated arrangement A(H 0 ), where H 0 = H h.
This may cause rede ning the k-sticks of some vertices and edges of A(H 0 ).
In order to store this information, we use a linear link list at each level of the
primary structure. Each element of this list is a tuple ( ). Here and points
to two elements (vertex/edge) at level , and the tuple ( ) represents a set of
consecutive elements (vertices/edges) in T such that the k-sticks de ned by all
the vertices and edges in that set has been rede ned due to the appearance of
the new line h in the dual plane. Note that,

The list attached to a particular level, say , of the arrangement may contain
more than one tuple after computing the k-sticks at all the vertices and
edges of A(H) a ected by the inclusion of h. In that case, the set of elements
represented by two distinct tuples, say ( 1 1 ) and ( 2 2 ) in that list must
be disjoint, i.e., 1 < 2 . Moreover, the elements represented by 1 and 2
must not be consecutive in T .

L-list : We shall use another temporary linear link list (L) during the processing
of an insertion/deletion of a line h in the arrangement A(H). This contains the
intersection points of h with the lines in H in a left to right order.

3.2 Updating the Primary Structure

We rst compute the leftmost intersection point of the newly inserted line h with
the existing lines in H by comparing all the lines. Let the intersection point be
and the corresponding line be hi . In order to nd the edge e A(H) on which
lies, we traverse along the line hi from its left unbounded edge towards right
using the neighbor-pointers and self-indicators.
Next, we use parent pointers from the edge e upto the root of T and nally, the
parent pointer of the root points to the primary structure record corresponding
to the level  . The level of the left unbounded edge e on the newly inserted
line h in the updated structure A(H 0 ) will be (=  or (  + 1)) depending on
whether h intersects e from below or from above.
We replace the old primary structure by a new array of size n + 1, and insert
a new level corresponding to the left unbounded edge e in appropriate place.
Dynamically Maintaining the Widest k-Dense Corridor 193

Moreover, the list ptr for all the levels are initialized to NULL. It is easy to see
that the updating of the primary structure requires O(n) time.
Our updating of the secondary structure will be guided by two pointers, P 1 and
P 2, which will initially contain the edges at the T −1 and T +1 which are just
below and above respectively.

3.3 Updating the Secondary Structure

Let the level of e (the unbounded portion of h to the left of ) be in A(H 0 ),


and the edge e ( A(H)), which is intersected by h, be at level  (= − 1 or
+ 1). In this subsection, we describe the creation of the new edges and vertices
generated due to the inclusion of h in A(H).
The portions of h to the left and right of are denoted as e and e0 respectively,
and the portions of e to the left and right of the point by e lef t and e ight
respectively. Note that, the vertex appears in both the levels and  . Next,
we do the following changes in the secondary structure for the inclusion of the
new vertex and its adjacent edges in the label-structure. Refer to Figure 3.

θ-th level e ^
e*right v
e*left α e’
θ*-th level
(θ* = θ - 1)

Fig. 3. Processing a new vertex of A(H 0 )

e and the vertex are added in T .


e lef t remains in its previous level  , so e is replaced by e lef t in T .
e ight goes to level . So, rst of all e ight is added to T .
Let v be the vertex at the right end of e (recently modi ed to e lef t ) in T .
The tree T is split into two height balanced trees, say TR and TL , where
TR contains all the elements (vertices and edges) in that level to the right of
v including itself, and TL contains all the elements in the same level to the
left of e lef t and including itself. This requires O(logn) time [5].
Next we concatenate TR to the right of e ight in T . The neighbor-pointer of
e ight is immediately set to point v, and the parent-pointers of the a ected
nodes are appropriately adjusted. This can be done in O(logn) time [5].
Finally, the vertex is added in TL as the right most element, and TL is
renamed as T . Note that a portion of e0 will be the right neighbor of the
194 Subhas C. Nandy, Tomohiro Harayama, and Tetsuo Asano

vertex in T . Now, if we have already considered all the n newly generated


vertices, the right end of e0 will be unbounded. In that case, e0 is added in
T as the rightmost element, and its neighbor pointer and parent pointer are
appropriately set. Otherwise, the right end of e is yet to be de ned, and its
addition in T is deferred until the detection of the next intersection.

In the former case, the updating of the secondary structure is complete. But in
the later case, we proceed with e0 , the portion of h to the right of . First of all,
we set e to e0 . Now, (i) if  = − 1, then P 2 is set to e ight and P 1 needs to
be set to an appropriate edge in level − 2, and (ii) if  = + 1, then P 1 is set
to e ight and P 2 needs to be set to an appropriate edge in level + 2. Finally,
the current level is set to  , and proceed to detect next intersection.
Now two important things need to be mentioned.

For all newly created edges/vertices, we set the width of the k-dense corridor
to 0. They will be computed afresh after the update of the level-structure.
During this traversal, we create L with all the newly created edges and
vertices on h in a left-to-right order. The edges are attached with their
corresponding levels. As the newly created vertices appear in two consecutive
levels, they show their lower levels in the L list.

Lemma 1. The time required for constructing A(H 0 ) from the existing A(H) is
O(nlogn) in the worst case.

3.4 Computing the New k-Dense Corridors

We now describe the method of computing all the k-sticks which intersect the
newly inserted line h. The L list contains the pieces of the line h separated by
the vertices in (A(H 0 ) − A(H)), which will guide our search process. We process
edges in the L list one by one from left to right. For each edge e L, we locate
all the vertices and edges of A(H 0 ) whose corresponding k-sticks intersect e.
We proceed with an array of pointers P of size 2k + 3, indexed by −(k +
1) 0 (k + 1). Initially, P (0) points to the leftmost edge in the L list. If
its level in the level-structure is then P (−1) P (−k − 1) will point to the
leftmost edges at levels ( − k − 1) ( − 1), and P (1) P (k + 1) will point to
the leftmost edges at levels ( + 1) ( + k + 1) in the level-structure.
While processing an edge e L (not the left-most edge), P (0) points to the edge
e; P (−1) P (−k − 1) point to k + 1 edges below the left end vertex of e and
P (1) P (k + 1) point to k + 1 edges above the left end vertex of e. At the end
of the execution of edge e, if e is not right unbounded, we set P (0) to the next
edge of e in L and proceed. Otherwise, our search stops.
In order to evaluate all the k-sticks intersecting e and having its bottom end at
level i (i = − k − 1 ), we need to consider the pair of levels (i i + k + 1).
Dynamically Maintaining the Widest k-Dense Corridor 195

Consider a x-monotone polygon bounded below (resp. above) by the x-monotone


chain of edges and vertices at level i (resp. i + k + 1) and by two vertical lines at
the end points of e (see Figure 4). This can easily be detected using the pointers
P (i − ) and P (i + k + 1 − ). The L N- elds of all the edges and vertices in the
above two x-monotone chains are initialized to zero. We draw vertical lines at
each vertex of the upper and lower chains, which split the polygon into a number
of vertical trapezoids.

l i+k+1
v1
v2 ri + k + 1
level i + k + 1

e
The dark line represents
level i h the edge e in the figure
w1
li w2
ri

Fig. 4. Computation of type-A and type-B corridors while processing an edge


e L

Each of the vertical lines drawn from the convex vertices de nes a type-A k-
stick. Its dual distance is put in the corresponding node of Ti or Ti+k+1 .
In order to compute the type-B k-sticks, consider each of the vertical trapezoids
from left to right. Let = v1 v2 w2 w1 be such a trapezoid whose v1 w1 and
v2 w2 are two vertical sides, I denote the x-range of . Let v1 v2 be a portion
of an edge e , which in turn lies on a line h H 0 , and w1 w2 lies on h H 0 .
We compute −1 x(h h ) and check whether it lies in I. If so, the vertical
line at x = −1 x(h h ), bounded by v1 v2 and w1 w2 , indicates a type-
B k-stick corresponding to the edge e . We compute its dual distance; this
newly computed k-stick replaces the current one attached with e provided
the dual length of the newly computed k-stick is greater than the L N- eld
attached with e in the data structure Ti .

Let i and i (resp. i+k+1 and i+k+1 ) denote the edges at level i (resp. i+k+1),
which are intersected by the vertical lines at the left and right end points of the
edge e( L) respectively. Note that, the de nition of k-sticks for the edges and
vertices of the i-th level between i and i may have changed due to the presence
of e A(H 0 ). So, we need to store the tuple ( i i ) in the T MP-list attached
to level i of the primary structure. But, before storing it, we need to check the
last element stored in that list, say (   ). If the neighbor pointer of  points
to i in Ti (the secondary structure at level i), then (  i ) is a continuous set of
elements in level i which are a ected due to the insertion of h. So, the element
(   ) is updated to (  i ); otherwise, (   ) is added in the T MP-list. We
store ( i+k+1 i+k+1 ) in the T MP-list attached to level i + k + 1 of the primary
structure in a similar way.
196 Subhas C. Nandy, Tomohiro Harayama, and Tetsuo Asano

Next, we may proceed by setting P (0) to the next edge e0 of L list. Each of the
pointers P (i) i = −k − 1 k + 1, excepting P (0), need to point to an edge
either at level (i + − 1) or at level (i + + 1) which lies just below or above the
current edge pointed by P (i) in the level-structure, depending on whether e0 lies
at level − 1 or + 1 in the level structure. From the adjacencies of vertices and
edges in the level-structure, this can be done in constant time for each P (i).

Theorem 2. The time required for computing the k-sticks intersecting the line
h in A(H 0 ) is O(nk).

Proof : Follows from the above discussions, and the fact that the complexity of
the k-levels of n half-lines lying above (below) the newly inserted line h in the
arrangement A(H 0 ) is O(nk). [6]

3.5 Location of Widest k-Dense Corridor

The T MP-list for a level, say , created in the earlier subsection, is used to
update the MAX- eld of the nodes of the tree T , by considering its elements
in a sequential manner from left to right. Let the tuple ( ) be an entry of the
T MP-list at the level . Let q be the common predecessor of the set of nodes
represented by the tuple ( ). Let PIN be a path from the root of T to the
node q, and PL and PR be two paths from q to and q to respectively. In T ,
the MAX- elds of all the nodes in the interval ( ), and the set of nodes in PIN ,
PL and PR may be changed. So, they need to be inspected in order to update
the MAX- elds of the nodes in T . Now we have the following lemma.

Lemma 2. For each entry ( ) of the T MP-list of a level, say , the number
of nodes of T which need to be visited to update the MAX- elds is O(logn + ),
where is the number of consecutive vertices and edges of the arrangement at
level represented by ( ).

In order to count the total number of elements attached to the T MP-list at


all the levels let us consider a n n square grid M whose rows represent the n
levels of the arrangement and its each column represents an edge on h in left to
right order (See Figure 5). Consider the shaded portion of the grid; observe that
its i-th column spans from row − k − 1 to + k + 1, where is the level of ei
in A(H 0 ). This corresponds to the levels which are a ected by ei . The shaded
region is bounded by two x-monotone chains. Now, let us de ne a horizontal
strip as a set of consecutive cells on a row which belong to the shaded portion
of the grid. A horizontal strip which spans from the rst to the last column of
the grid, is referred to as long strip. The strips which are not long, are called
short strips. Note that, each strip attached to a row represents an element of
the T MP-list attached to the corresponding level. It is easy to observe that the
number of such short strips is O(n), and the number of such long strips may be
at most 2k − 3 in the worst case. Now we have the following lemma.
Dynamically Maintaining the Widest k-Dense Corridor 197

12
1212
123 12
123
12
123
1212 121212
123
123
123
12 1212
12 1212
123
123 1212
123 1212
θ+k+1 123
123
123
1212 12
121212121212
12 12123 12 123
123
123 121212121212
123
12123 12 123
123
123 121212121212
123
12
12123
1234
12123
123123
123
123 12 1212121212123
123
12 1234
12123
123123
123
123 1212
123 121212121212123 1234
123
12
12123
1234
123123
123
123123
levels 123
123
123
123
123
123
123
12
123
123
12
12
12
121212
12
12
12
12
12
12
12
12
12
12
12
12
12
123
1234
123
12
1234
123
12
12
1234
123
123
123
123
123123
123
123123
123
123
123 121212 1212 12123
1234
12123
123123
θ-k-1 123
123
123
123
1212
12
12
12 12123 1234
123 12
1234
123
12
1234
12
123
123
123123
123
123
123
123
123
1234
1234 123 123
123 123

ei
edges on h

Fig. 5. Grid M estimating the number of elements in the T MP list

Lemma 3. In order to update the MAX- eld of the nodes of the secondary struc-
ture, the tree traversal time for an entry of any of the T MP-lists is O( + logn)
if the corresponding strip is short, and is O( ) if the strip is long, where , the
length of the strip, is the number of nodes represented by the corresponding entry
of the T MP-list.

Proof : For the short strips, the result follows from Lemma 2. For the long strips,
all the nodes of the corresponding tree is a ected. So, subsumes O(logn).
Finally, the roots of the trees at all levels need to be inspected to determine the
widest k-dense corridor.

3.6 Complexity

Theorem 3. Addition of a new point requires

(a) O(nlogn) time for updating the data structure.


(b) O(nk) time to compute the k-dense corridors containing that point.
(c) O(nk + nlogn) time to traverse the trees attached to di erent levels of the
secondary structure for reporting the widest k-dense corridor.

As we are preserving all the vertices and edges of the arrangement of the dual
lines in our proposed level-structure, the space complexity is O(n2 ), where n is
the number of points on the floor at the current instant of time.

References
1. S. Chattopadhyay and P. P. Das, T e k-dense corridor problems, Pattern Recog-
nition Letters, vol. 11, pp. 463-469, 1990.
198 Subhas C. Nandy, Tomohiro Harayama, and Tetsuo Asano

2. H. Edelsbrunner, Algorit ms in Combinatorial Geometry, Springer, Berlin, 1987.


3. M. Houle and A. Maciel, Finding t e widest empty corridor t roug a set of points,
Report SOCS-88.11, McGill University, Montreal, uebec, 1988.
4. R. Janardan and F. P. Preparata, Widest-corridor problems, Nordic J. Comput.,
vol. 1, pp. 231-245, 1994.
5. E. M. Reingold, J. Nievergelt and N. Deo, Combinatorial Algorit ms : T eory and
Practice, Prentice-Hall, N.J., 1977.
6. C. -S. Shin, S. Y. Shin and K. -Y. Chwa, T e widest k-dense corridor problems,
Information Processing Letters, vol. 68, pp. 25-31, 1998.
Reconstruction of Discrete Sets from Three
or More X-Rays

Elena Ba cucci1 , Sa a B unetti1 , Albe to Del Lungo2 , and Mau ice Nivat3
1
Dipartimento di Sistemi e Informatica, Universita di Firenze,
Via Lombroso 6/17, 50134, Firenze, Italy,
[barcucci,brunetti]@d i.unifi.it
2
Dipartimento di Matematica, Universita di Siena,
Via del Capitano 15, 53100, Siena, Italy,
dellungo@uni i.it
3
LIAFA, Institut Blaise Pascal, Universite Denis Diderot ,
2 Place Jussieu 75251, Paris Cedex 05, France,
[email protected] ieu.fr

Abstract. The problem of reconstructing a discrete set from its X-rays


in a nite number of prescribed directions is NP-complete when the
number of prescribed directions is greater than two. In this paper, we
consider an interesting subclass of discrete sets having some connectivity
and convexity properties and we provide a polynomial-time algorithm
for reconstructing a discrete set of this class from its X-rays in directions
(1, 0), (0, 1) and (1, 1). This algorithm can be easily extended to contexts
having more than three X-rays.
keywords: algorithms, combinatorial problems, discrete tomography,
discrete sets, X-rays.

1 Introduction
A discrete set is a nite subset of the intege lattice ZZ2 and can be ep esented
by a bina y mat ix o a set of unita y squa es. A direction is a vecto of the Eu-
clidean plane. If u is a di ection, we denote the line th ough the o igin pa allel
to u by lu . Let F be a disc ete set; the X-ray of F in direction u is the function
Xu F , de ned as: Xu F (x) = F (x + lu ) fo x u? , whe e u? is the o thogonal
complement of u (see Fig. 1). The function Xu F is the p ojection of F on u?
counted with multiplicity. The inve se p oblem of econst ucting a disc ete set
f om its X- ays is of fundamental impo tance in elds such as: image p ocessing
[17], statistical data secu ity [14], biplane angiog aphy [16], g aph theo y [1] and
econst ucting c ystalline st uctu es f om X- ays taken by an elect on mic o-
scope [15]. An ove view on the p oblems in disc ete tomog aphy and a study
of the complexity can be found in [12] and [7]. Many autho s have studied the
p oblem of dete mining a disc ete set f om its X- ays in both ho izontal and
ve tical di ections. Some polynomial algo ithms that econst uct some special
sets having some convexity and/o connectivity p ope ties, such as ho izontally
and ve tically convex polyominoes [3,4], have been dete mined.

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 199 210, 2000.

c Springer-Verlag Berlin Heidelberg 2000
200 Elena Barcucci et al.

X(1, 1) F
1 1 1 2 1 1
1
1
3
2
3
X(1, 0) F 3
3
0
2
0
1
2 2 3 2 3 1
X(0, 1) F

Fig. 1. X- ays in the di ections u1 = (1 0) u2 = (0 1) and u3 = (1 1).

In this pape , we study the econst uction p oblem with espect to th ee


di ections: (1 0) (0 1) and (1 1). We denote the X- ays in these di ections
by H V and D, espectively. The basic question is to dete mining if, given
H INm V INn and D INn+m−1 , a disc ete set F whose X- ays a e (H V D)
exists. Let u1 uk be a nite set of p esc ibed di ections. The gene al p ob-
lem can be fo mulated as follows:
Consistency(u1 uk )
Instance: k vecto s X1 Xk
uestion: is the e F such that Xui F = Xi fo i = 1 k?
Ga dne , G itzmann and P angenbe g [8] p oved that Consistency((1 0) (0 1)
(1 1)) is NP-complete in the st ong sense. Then, by means of this esult and two
lemmas, they p oved that the p oblem Consistency(u1 uk ) is NP-complete
in the st ong sense, fo k 3.

In this pape , we dete mine a class of disc ete sets fo which the p oblem
is solvable in polynomial time. These sets a e hex-connected and convex in the
ho izontal, ve tical and diagonal di ections. They can be ep esented by a set
of hexagonal cells. By exploiting the new geomet ic p ope ties of these st uc-
tu es, we can p ovide an algo ithm which nds a solution in O(n5 ), whe e we
assume n = m. The algo ithm can be easily extended to contexts having mo e
than th ee X- ays and can econst uct some disc ete sets that a e convex in the
di ections of the X- ays. We wish to point out that the question of dete mining
when plana convex bodies can be econst ucted f om thei X- ays was aised by
Hamme [6,11] in 1963. The disc ete analogue of this question aised by G itz-
mann [10] in 1997 is an open p oblem. We believe that ou algo ithm can be
conside ed to be an initial app oach to this p oblem in so fa as it econst ucts
a disc ete set which is convex in the di ections of the X- ays.

De nitions and Preliminaries


We now wish to examine an inte esting class of disc ete sets fo which the p ob-
lem can be solved in polynomial time. We int oduce some de nitions which allow
Reconstruction of Discrete Sets from Three or More X-Rays 201

us to cha acte ize this class. Let us take the t iangula lattice made up of di ec-
tions: (1 0) (0 1) and (1 1) into conside ation. A point = (i j) of this lattice
has 6 neighbou s and can be ep esented by a hexagonal cell (see Fig. 2).

Q
i

Fig. 2. The 6-neighbou s of = (i j).

De nition 1. If F is a discrete set, a 6-path f om P to in F is a sequence


P1 P of points in F , w ere P = P1 = P and Pt is a 6-neig bour of
Pt−1 , for t = 2 s.

It can be noted that the sequence with s = 1 is also a 6-path.

De nition 2. F is hex-connected if, for eac pair of F points, t ere is a 6-pat


in F t at connects t em.

Finally,

De nition 3. A ex-connected set is ho izontally, ve tically and diagonally con-


vex if all its rows, columns and diagonals are ex-connected.

We denote the class of hex-connected and ho izontally, ve tically and diagonally


convex sets by F . An element of this class co esponds to a convex polyomino
with hexagonal cells. Fig. 3 shows a hex-connected disc ete set and a disc ete set
of F with its co esponding hexagonal convex polyomino. This class of hexagonal
polyominoes was studied by some autho s in enume ative combinato ics [5,18].

3 The Reconstruction Algorithm

Let us now conside the econst uction p oblem applied to class F . Given H =
(h1 hm ) V = (v1 vn ) and D = (d1 dn+m−1 ), we want to establish
the existence of a set F F such that X(1 0) F = H X(0 1) F = V X(1 1) F = D.
A necessa y condition fo the existence of this set is:
m n n+m−1
hi = vj = dk (3.1)
i=1 j=1 k=1
202 Elena Barcucci et al.

1 2 l1 k
1 2 j n
1
Q 2

i
Q l2 Q

P 6-path
n +m -1
A
m
(a) (b) base (c)

Fig. 3. a) A hex-connected set. b) A set of F . c) Its co esponding hexagonal


convex polyomino.

Without loss of gene ality, we can assume that hi = 0 fo i = 1 m, and


vj = 0 fo j = 1 n. F om this assumption and the de nition of F , it follows
that the e a e two intege s l1 and l2 such that:

1 l1 l2 n+m−1
dk = 0 fo k = l1 l2 (3.2)
dk = 0 fo k = 1 l1 − 1 l 2 + 1 n + m − 1;

this in tu n means that F is contained in the disc ete hexagon:

A = (i j) IN2 : 1 i m 1 j n l1 i+j−1 l2

A is the smallest disc ete hexagon containing F . We wish to point out that
the X- ay vecto components a e numbe ed sta ting f om the left-uppe co ne
(see Fig. 3b)). We call bases of F the points belonging to the bounda y of A
(see Fig. 3c)). Ou aim is to dete mine which points of A belong to F . Let
= (i j) A; this point de nes the following six zones (see Fig. 4):

1 l1 j i + j -1
1
n

Z2(Q)
F
Z1(Q)
Z3(Q)
i l2
Z6(Q) Q
Z4(Q)
m
Z5(Q) A

Fig. 4. The six zones dete mined by = (i j).


Reconstruction of Discrete Sets from Three or More X-Rays 203

Z1 ( ) = (r c) A: r i and c j
Z2 ( ) = (r c) A: j c and r + c i+j
Z3 ( ) = (r c) A: r i and i + j r+c
Z4 ( ) = (r c) A: i r and j c
Z5 ( ) = (r c) A: c j and i + j r+c
Z6 ( ) = (r c) A: i r and r + c i+j
Each zone contains . Mo eove , f om the de nition of the class F , it follows
that, if does not belong to F , the e a e two consecutive zones which do not
contain any point of F . Fo example, the point in Fig. 4 does not belong to F ,
and the inte section between F and Z5 ( ) Z6 ( ) is the empty set. By setting
Zk ( ) = Zk ( ) Zk+1 ( ), with k = 1 5, and Z6 ( ) = Z6 ( ) Z1 ( ), we
obtain:
Property 1. Let be a point of the smallest disc ete hexagon A containing a
disc ete set F of F . The point F if and only if Zk ( ) F = , fo each
1 k 6.
We can dete mine some F points just by efe ing to the geomet y of A. Let
I1 = (l2 − n + 1 n) I2 = (l1 1) J1 = (1 l1 ) J2 = (m l2 − m + 1) K1 =
(m 1) K2 = (1 n). These points a e the ve tices of A as shown in Fig. 5. Let

J1 j2
K2

k1
Q1
I2 i2

Q
i1 I1
Q2
k2

K1 j1 J2

Fig. 5. The points 1 and 2 belonging to eve y disc ete set of F contained in
hexagon A.

it jt and kt be the ow, column and diagonal index containing It Jt and Kt ,


espectively, with t = 1 2. Note that, i2 = j1 = l1 , i1 = l2 −n+1, j2 = l2 −m+1,
k1 = m and k2 = n. The hexagon illust ated in Fig. 5 is such that: i1 > i2 j1 <
j2 and k1 > k2 . Since A is the smallest disc ete hexagon containing F , then the
sides I2 J1 and J1 K2 contain at least a base of F . So, if = (i j) A is such
that i i2 o k k2 , with k = i + j − 1, then Z1 ( ) F = (see Fig. 5). We
p oceed in the same way fo the othe ve pai s of consecutive sides:
204 Elena Barcucci et al.

i i1 o j j1 Z2 ( ) F =
j j2 o k k2 Z3 ( ) F =
i i1 o k k1 Z4 ( ) F =
i i2 o j j2 Z5 ( ) F =
j j1 o k k1 Z6 ( ) F =

whe e k = i + j − 1. The points 1 = (i2 j1 ) and 2 = (i1 j2 ) ve ify these six


inequalities and so, by P ope ty 1, 1 and 2 belong to F . Fig. 6 illust ates

J1 K2
J1 K2
J1 K2
Q2 Q2
Q1 C2 I1
I2
C1 I1
I2 Q1 Q1
Q2 I1 I2

K1 J2 K1 J2 K1 J2
J1 K2 J1
J1 K2
K2
Q1 I2 Q2
I2 I1 I1
Q2
I1 Q1 I2
Q2 Q1
K1 J2 K1 K1 J2
J2

(a) (b) (c)

Fig. 6. The six con gu ations of hexagon A.

the six allowed con gu ations of A and the points 1 and 2 belonging to each
disc ete set of F contained in A. We can divide these con gu ations into th ee
g oups:
a. if i2 i1 j1 j2 , then 1 = (i2 j1 ) and 2 = (i1 j2 ) (see Fig. 6a));
b. if j2 j1 k2 k1 , then 1 = (k1 − j2 + 1 j2 ) and 2 = (k2 − j1 + 1 j1 )
(see Fig. 6b));
c. if i1 i2 k1 k2 , then 1 = (i2 k1 − i2 + 1) and 2 = (i1 k2 − i1 + 1) (see
Fig. 6c)).
We efe to these con gu ations as case a, case b and case c. We notice that, if
i1 = i2 , j1 = j2 and k1 = k2 , we nd one point ( 1 = 2 ) of F , and if i1 = i2
o j1 = j2 o k1 = k2 , we nd mo e than two F points.
Let us now dete mine a 6-path f om 1 to 2 made up of F points. In case a,
1 = (i2 j1 ) and the two points P1 = (i2 + 1 j1 ) and P2 = (i2 j1 + 1) adjacent
to 1 a e such that:
Zk (P1 ) F = , fo k = 1 2 3 4 6, and Zk (P2 ) F = , fo k = 1 3 4 5 6,
(see Fig. 6a)). F om P ope ty 1, we can deduce that:
Reconstruction of Discrete Sets from Three or More X-Rays 205

if Z5 (P1 ) F = , then P1 F, and if Z2 (P2 ) F = , then P2 F.

We p ove that P1 F o P2 F . Let us onside the following cumulated sums


of the ow, column and diagonal X- ays:
k
H0 = 0 Hk = i=1 hi k=1 m
k
V0 = 0 Vk = i=1 vi k=1 n
k
Dl1 −1 = 0 Dk = i=1 di k = l1 l2

and denote the common total sums of the ow, column and diagonal X- ays by
S. We have that:

Lemma 1. Let = (i j) be a point of exagon A containing a discrete set F


of F .

If Hi S − Di+j−1 , t en Z1 ( ) F =
If Hi Vj−1 , t en Z2 ( ) F =
If S − Di+j−2 Vj−1 , t en Z3 ( ) F =
If S − Di+j−2 Hi−1 , t en Z4 ( ) F =
If Vj Hi−1 , t en Z5 ( ) F =
If Vj S − Di+j−1 , t en Z6 ( ) F =

Proof. We denote the st j columns and the st i − 1 ows of the set F by


Cj and Ri−1 , espectively. If Z5 ( ) F = , then Cj Ri−1 and so Vj < Hi−1
(see Fig 7). The efo e, if Vj Hi−1 , then Z5 ( ) F = We obtain the othe

Cj
1 j n
1

Ri -1

i -1
Q
i
Z5(Q)

m A

Fig. 7. A point F and Z5 ( ) F = .

statements in the same way.

Theorem 1. T ere exists a 6-pat from 1 to 2 made up of F points.


206 Elena Barcucci et al.

Proof. By lemma 1 and the p evious discussion:

if Vj1 Hi2 then P1 = (i2 + 1 j1 ) F and if Hi2 Vj1 then P2 = (i2 j1 + 1) F

Since Vj1 Hi2 o Hi2 Vj1 , we have P1 F o P2 F . We can epeat the same
ope ation on the two points adjacent to the point Pk F (with k = 1 o k = 2)
dete mined in the p evious step. We wish to point out that if Vj1 = Hi2 , then
P1 F and P2 F . In this case, Zk (i2 + 1 j1 + 1) F = , fo each 1 k 6.
So (i2 + 1 j1 + 1) F and we epeat the ope ation on its two adjacent points.
We pe fo m this p ocedu e until it dete mines a point P F which belongs to
the ow o the column containing 2 = (i1 j2 ) (i.e., P = (i1 j) o P = (i j2 )).
Eve y point between P and 2 = (i1 j2 ) is such that Zk ( ) F = , fo each
1 k 6 and so F . By means of this p ocedu e, we a e able to nd a
6-path f om 1 to 2 made up of F points.

Fo example, if H = (1 4 5 7 9 7 5 3 2) and V = (1 3 3 4 6 4 6 5 4 3 3 1),


we obtain the 6-path f om 1 to 2 shown in Fig. 8. We t eat the othe two cases
in the same way. F om Lemma 1, it follows that we have to use the cumulated
sums Vj and S − Di+j−1 in case b, and Hi and S − Di+j−1 in case c.

Vj
D
1 4 7 11 17 21 27 32 36 39 42 43 0 0 1 3 4 3 2 3 2 3 2 2
1 1
B1 B1 3
5 4
10
Q1 P2 5 α=β=F 4
5
17 R1 P1 6-path
7
5
Hi 26 H 9
P Q2 1
33 7
0
38 5
B2 B2 0
41 R2 3
43 2
1 3 3 4 6 4 6 5 4 3 3 1
V
(a) (b)

Fig. 8. a) The 6-path f om 1 to 2 made up of F points. b) The disc ete set


= =F F.

Let us now take the bases into conside ation. Each side of hexagon A contains
at least one base of F . Let B1 and B2 be a base of the sides I2 J1 and a base of
I1 J2 , espectively. In case a, Bi and i , with i = 1 o i = 2, de ne a disc ete
ectangle Ri such that: if Ri , then Zk ( ) F = , fo each 1 k 6 and
so F (see Fig. 8). Notice that Ri can degene ate into a disc ete segment
when Bi and i belong to the same ow o column. The efo e, we obtain a
hex-connected set made up of F points and connecting two opposite sides of
A. Unfo tunately, we do not usually know the positions of the bases and so ou
algo ithm chooses a pai of base-points (B1 B2 ) belonging to two opposite sides
Reconstruction of Discrete Sets from Three or More X-Rays 207

of A, and then attempts to const uct a disc ete set F of F whose X- ays a e
equal to H V and D. If the econst uction attempt fails, the algo ithm chooses
a di e ent position of the base-points (B1 B2 ) and epeats the econst uction
ope ations. Mo e p ecisely, the algo ithm dete mines 1 , 2 and the 6-path f om
1 to 2 ; afte that it chooses:
- a point B1 I2 J1 and a point B2 I1 J2 in case a, o
- a point B1 K1 J2 and a point B2 K2 J1 in case b, o
- a point B1 I2 K1 and a point B2 I1 K2 in case c,
and then econst ucts the ectangle Ri de ned by i and Bi , with i = 1 2. I
then uses the same econst uction p ocedu e desc ibed in the algo ithm de ned
in [3], that is, it pe fo ms the lling operations in the di ections (1 0) (0 1)
(1 1) and, if necessa y, links ou p oblem to the 2-Satis ability p oblem which
can be solved in linea time [2].
We now desc ibe the main steps of this econst uction p ocedu e. We call any
set such that F , a kernel, and we call any set , such that F A,
a s ell. Assuming that is the econst ucted hex-connected set f om B1 to B2
and is hexagon A, we pe fo m the lling ope ations that expand and educe
. These ope ations take advantage of both the convexity const aint and vecto s
H V D, and they a e used ite atively until o and a e inva iant with
espect to the lling ope ations.
If we obtain , the e is no disc ete set of F containing B1 and B2 and having
X- ays H V D. The efo e, the algo ithm chooses anothe pai of base-points and
pe fo ms the lling ope ations again.
If = , then = F and so the e is at least one solution to the p oblem
(the algo ithm econst ucts one of them). Fo example, by pe fo ming the lling
ope ations on the hex-connected set f om B1 to B2 in Fig 8, we obtain = ,
and is a disc ete set having X- ays H V D.
Finally, if we obtain , then − is a set of indete minate points and
we a e not yet able to say that a set F having X- ays H V D exists. The efo e,
we have to pe fo m anothe ope ation to establish the existence of F . At st,
is a hex-connected set f om B1 to B2 , whe e B1 to B2 belong to two opposite
sides of A; the efo e by pe fo ming the lling ope ations, we obtain:
- has at least one point in each diagonal of A in case a;
- has at least one point in each ow of A in case b;
- has at least one point in each column of A in case c.
Assume that we have case b: the e is at least one point of in each ow of A,
and so by the p ope ties of the lling ope ations (see [3]), the length of the i-th
ow of is smalle than 2hi fo each 1 i m. If we a e able to p ove that:
I) the length of the j-th column of is equal to, o less than, 2vj fo each
1 j n;
II) the length of the k-th diagonal of is equal to, o less than, 2dk fo each
1 k n + m − 1,
the e is a polynomial t ansfo mation of ou econst uction p oblem to the 2-
Satis ability p oblem. We st p ove (I) and (II), and we then outline the main
208 Elena Barcucci et al.

1 j n
1

(i,j) (i,j+hi )
i α

A
m
β−α α

Fig. 9. The ke nel, the shell and the set of the indete minate points.

idea of the eduction to 2-Satis ability p oblem de ned in [3]. By the p ope -
ties of lling ope ations, the indete minate points follow each othe into two
sequences, one on the left side of and the othe on its ight side. If (i j) is
an indete minate point of the left sequence, then (i j + hi ) belongs to the ight
sequence (see Fig. 9) and these ponts a e elated to each othe ; let us assume
that the e is a disc ete set F of F having X- ays equal to H V D, then:
if (i j) F , then (i j + hi ) F,
if (i j) F , then (i j + hi ) F.
As a esult, the numbe of indete minate points belonging to F is equal to the
numbe of indete minate points not belonging to F . This means that, in o de to
the conditions given by ho izontal X- ay be satis ed, half of the indete minate
points have to be in F . If the e is at least a j such that j-th column of is la ge
than 2vj , the numbe of its indete minate points belonging to F has to be less
than the numbe of its indete minate points not belonging to F . The efo e, less
than half of the indete minate points a e in F . We got a cont adiction and so
satis es (I). By p oceeding in the same way, we p ove that satis es (II).
Consequently, we can educe ou p oblem to a 2-Satis ability p oblem. We p ove
the same esult fo the cases a and c in the same way. The efo e, the algo ithm
solves the p oblem fo all H V D instances.
We can summa ize the econst uction algo ithm as follows.
Input: Th ee vecto s H INm , V INn and D INn+m−1 ;
Output: a disc ete set of F such that X(1 0) F = H X(0 1) F = V X(1 1) F = D
or a message that the e is no a set like this;
1. check if H V and D satisfy conditions (3.1)and (3.3);
2. compute the points 1 and 2 ;
3. compute the cumulated sums Hi Vj Dk , fo i = 1 m j=1 n and
k=1 n + m − 1;
4. compute the 6-path f om 1 to 2 ;
5. repeat
Reconstruction of Discrete Sets from Three or More X-Rays 209

5.1. choose a pai of base-points (B1 B2 ) belonging to two


opposite sides of A;
5.2. compute the ectangles R1 and R2 ;
5.3. := R1 R2 the 6-path f om 1 to 2 ;
5.4. := A;
5.5. repeat
5.5.1. perform the lling ope ations;
until or a e inva iants;
5.7. if = then F = is a solution;
5.8. if then educe ou p oblem to 2SAT;
until the e is a disc ete set of F having X- ays equal to H V D or all the
base-point pai s have been examined.
We now examine the complexity of the algo ithm desc ibed. Dete mining the
hex-connected set f om B1 to B2 (i.e., 1 , 2 , the 6-path f om 1 to 2 , and
the ectangles R1 and R2 ) involves a computational cost of O(nm). In [13], the
autho p oposes a simple p ocedu e fo pe fo ming the lling ope ations whose
computational cost is O(nm(n + m)). This p ocedu e gives a ke nel and a shell
inva iant with espect to the lling ope ations. If we obtain , the algo ithm
t ansfo ms ou p oblem into a 2-Satis ability p oblem and solves it in O(nm)
time. In case of failu e, that is, when the e is no disc ete set of F containing
B1 and B2 and having X- ays H V D, the algo ithm chooses anothe pai of
base-points and pe fo ms the lling ope ations again. At most, it has to check
all the possible base-point pai s, that is O((n + m)2 ) pai s. Consequently, the
algo ithm decides if the e is a disc ete set of F having X- ays H V D; if so, the
algo ithm econst ucts one of them in O(nm(n + m)3 ) time.
Theorem 2. Consistency((1 0) (0 1) (1 1)) on F can be solved in
O(nm(n + m)3 ) time.

Remark 1. The algo ithm can be easily extended to contexts having mo e than
th ee X- ays and can econst uct disc ete sets convex in the di ections of the X-
ays. This means that Consistency((1 0) (0 1) (1 1) u4 uk ) on the class of
connected sets, which a e convex in all the di ections (1 0) (0 1) (1 1) u4 uk ,
is solvable in polynomial time.

References
1. R. P. Anstee, Invariant sets of arcs in network flow problems, Disc. Appl. Math.
13, 1-7 (1986).
2. B. Aspvall, M.F. Plass and R.E. Tarjan, A linear-time algorithm for testing the
truth of certain quanti ed Boolean formulas , Inf. Proc. Lett., 8, 3 (1979).
3. E. Barcucci, A. Del Lungo, M. Nivat and R. Pinzani, Reconstructing convex
polyominoes from their horizontal and vertical projections , Theor. Comp. Sci.,
155 (1996) 321-347.
4. M. Chrobak, C. Dürr, Reconstructing hv-Convex Polyominoes from Orthogonal
Projections , Inf. Proc. Lett. (69) 6, (1999) 283-289.
210 Elena Barcucci et al.

5. A. Denise, C. Dürr and F.Ibn-majdoub-Hassani, Enumeration et generation


aleatoire de polyominos convexes en reseau hexagonal , Proc. of the 9th FPSAC,
Vienna, 222-235 (1997).
6. R. J. Gardner, Geometric Tomography, Cambridge University Press, Cambridge,
UK, 1995, p.51.
7. R. J. Gardner and P. Gritzmann, Uniqueness and Complexity in Discrete Tomog-
raphy , in discrete tomography: foundations, algorithms and applications, editors
G.T. Herman and A. Kuba, Birkhauser, Boston, MA, USA, (1999) 85-111.
8. R. J. Gardner, P. Gritzmann and D. Prangenberg, On the computational com-
plexity of reconstructing lattice sets from their X-rays, Disc. Math. 0 , (1999)
45-71.
9. M. R. Garey and D.S. Johnson, Computers and intractability: a guide to the theory
of NP-completeness, Freeman, New York, (1979) 224.
10. P. Gritzmann, Open problems, in Discrete Tomography: Algorithms and Com-
plexity, Dagstuhl Seminar report 165 (1997) 18.
11. P. C. Hammer, Problem 2, in Proc. Simp. Pure Math. vol VII: Convexity, Amer.
Math. Soc., Providence, RI, (1963) 498-499.
12. G.T. Herman and A. Kuba, Discrete Tomography: A Historical Overview , in dis-
crete tomography: foundations, algorithms and applications, editors G.T. Herman
and A. Kuba, Birkhauser, Boston, MA, USA, (1999) 3-29.
13. M. Gebala :The reconstruction of convex polyominoes from horizontal and vertical
projections, private comunication.
14. R. W. Irving and M. R. Jerrum, Three-dimensional statistical data security prob-
lems, SIAM Journal of Computing 3, 170-184 (1994).
15. C. Kisielowski, P. Schwander, F. H. Baumann, M. Seibt, Y. Kim and A. Ourmazd,
An approach to quantitative high-resolution transmission electron microscopy of
crystalline materials, Ultramicroscopy 58, 131-155 (1995).
16. G. P. M. Prause and D. G. W. Onnasch, Binary reconstruction of the heart cham-
bers from biplane angiographic image sequence, I Trans. Medical Imaging 15,
532-559 (1996).
17. A. R. Shliferstein and Y. T. Chien, Switching components and the ambiguity
problem in the reconstruction of pictures from their projections, Pattern Recog-
nition 10, 327-340 (1978).
18. X. G. Viennot, A Survey of polyomino enumeration, Proc. Series formelles et
combinatoire algebrique, eds. P. Leroux et C. Reutenauer, Publications du LACIM
11, Universite du uebec a Montreal (1992).
Modi ed Binary Searching for Static Tables

Donatella Merlini, Renzo Sprugnoli, and M. Cecilia Verri

Dipartimento di Sistemi e Informatica - Universita di Firenze


Via C. Lombroso 6/17 - 50134 Firenze - Italy
merlini, sprugnoli, verri @dsi.unifi.it

Abstract. We consider a new method to retrieve keys in a static table.


The keys of the table are stored in such a way that a binary search can
be performed more e cently. An analysis of the method is performed
and empirical evidence is given that it actually works.

Keywords: binary searching, static dictionary problem.

1 Introduction
The present paper was motivated by some obvious observations concerning bi-
nary searching when applied to static sets of strings; by following the directions
indicated by these observations, we obtained some improvements that reduce
execution time for binary searching of 30 − 50%, and also more for large tables.
Let us consider the set of the zodiac names

S = capricorn acquariu pi ce arie tauru gemini cancer leo

virgo libra corpio agittariu


and consider the problem of nding out if identi er x belongs or does not belong
to S To do this, we can build a lexicographically ordered table T with the names
in S and then use binary searching to establish whether x S or not (see Table
1(a)). What is ne in binary searching is that, if the program is properly realized,
the number of character comparisons is taken to a minimum. For instance, the
case of arie is as follows: 1 comparison (a against l) is used to compare arie
and the median element leo; 1 comparison (a against c) is used to distinguish
arie from cancer; 2 comparisons (ar against ac) are used for arie against
acquariu . After that, the searched string is compared to the whole item arie .
The problem here is that we do not know in advance how many characters
have to be used in each string-to-string comparison; therefore, every string com-
parison requires a (comparatively) high number of housekeeping instructions,
which override the small number of character comparisons and make the match-
ing procedure relatively time consuming.
The aim of this paper is to show that the housekeeping instructions can be al-
most eliminated by arranging the elements in a suitable way. First of all we have
to determine the minimal number of characters able to distinguish the elements

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 211 225, 2000.

c Springer-Verlag Berlin Heidelberg 2000
212 Donatella Merlini, Renzo Sprugnoli, and M. Cecilia Verri

Table 1. Two table’s structures for zodiac names


(a (b : s=1
1 acquarius 1 a cquarius
2 aries 1 c ancer
3 cancer 2 leo
4 capricorn 1 l ibra
5 gemini 1 p isces
6 leo 5 capr i corn
7 libra 1 s agittarius
8 pisces 1 t aurus
9 sagittarius 2 s c orpio
10 scorpio 1 a ries
11 taurus 1 g emini
12 virgo 1 v irgo

of the set: in the previous example the rst three characters are su cient to
univocally distinguish strings in S and this is not casual. In fact, if we have a set
S containing n strings on the alphabet with = what we are expecting
is that log n characters actually distinguish the elements in S The problem
is that in general these characters can be in di erent positions and we have to
determine, string by string, where they are. What we wish to show here is that
for any given set S, it is relatively easy to nd out whether the elements in S can
be distinguished using a small amount of characters. In that case, we are then
able to organize the set S in a table T such that a modi ed binary searching can
be performed with a minimum quantity of housekeeping instructions, thus im-
proving the traditional searching procedure. Presently, we are also investigating
how modi ed binary search compares to the method proposed by Bentley and
Sedgewick [1]. To be more precise, in Section 2 we show that a pre-processing
algorithm exists which determines (if any) the optimal organization of the ta-
ble for the given set S; the time for this pre-processing phase is in the order
of n(log n)2 Then, in Section 3, we nd the probability that the pre-processing
algorithm gives a positive result, i.e., it nds the optimal table organization
for set S Finally, in Section 4, an actual implementation of our method is dis-
cussed and empirical results are presented showing the improvement achieved
over traditional binary searching.

2 The Pre-processing for Modi ed Binary Search

Let us consider a set S of n elements or keys, each u characters long (some


characters may be blank). We wish to store these elements in a table T in which
binary searching can be performed with a minimum quantity of housekeeping
instructions. To this purpose we need a vector V [1 n] containing, for every
i=1 2 n, the position from which the comparison, relative to the element
Modi ed Binary Searching for Static Tables 213

T [i], has to begin. A possible Pascal-like implementation of this Modi ed Binary


Searching (MBS) method can be the following:

function MBS(st :string): integer;


var a b k p: integer; f ound:boolean;
begin a := 1; b := n; f ound:=false;
while a b do
k := (a + b) 2 ;
p:=Compare(st , T [k] V [k] s);
if p = 0
then f ound:=true; a := b + 1
else if p < 0
then b := k − 1
else a := k + 1 od;
if f ound and T [k] = st then MBS:=k else MBS:=0
end;

Here, the function Compare(A,B,i,s) compares the two strings A and B ac-
cording to s characters beginning from position i; the result is −1, 0 or 1 accord-
ing to the fact that the s characters in A are lexicographically smaller, equal or
greater than the corresponding characters in B.
The procedure, we are going to describe to build table T and vector V , will
be called PerfectDivision(S,n,s), where S and n are as speci ed above and s is
the number of consecutive characters used to distinguish the various elements
(in most cases 1 s 4 is su cient). It returns a table T of dimension n the
optimal table, containing S’s elements and a vector V of dimension n such that,
for all i from 1 to n V [i] indicates the position from which the comparison has to
begin. More precisely, the elements in table T are arranged in such a way that,
if we binary search for the element x = x1 xu and compare it with element
T [i] = T [i]1 T [i]u we have only to compare characters xV [i] xV [i]+s−1 with
characters T [i]V [i] T [i]V [i]+s−1
The procedure consists of two main steps, the rst of which tries to nd out
the element in S which determines the subdivision of the keys into two subtables.
The second step recursively calls PerfectDivision to actually construct the two
subtables.
We begin by giving the de nition of a selector:
[s]
De nition 1 Let S be a set containing n u-length strings. The pair (i =
a 1 a2 as ) i [1 u − s + 1] is an s[i] -selector for S i :

1) !w S : wi wi+1 wi+s−1 = a1 a2 as ;
2) (n − 1) 2 keys y S : yi yi+1 yi+s−1 < a1 a2 as ;
3) (n − 1) 2 keys y S : yi yi+1 yi+s−1 > a1 a2 as

Theorem 1 Let S be a set containing n u-length strings. If n = 1 or n = 2


then S always admits a selector.
214 Donatella Merlini, Renzo Sprugnoli, and M. Cecilia Verri

Proof: If S = x then it admits the 1[1] -selector (1 x1 ) If S = x y , it admits


a 1[p] -selector (p zp ) with p = min i [1 u] : xi = yi and zp = min xp yp
(such a p exists since x and y are distinct elements). We observe that if S admits
a 1[p] -selector then it also admits an s[q] -selector p + 1 − s q p

More in general, for n > 2 we can give an integer function FindSelector(var


T, a, b, s) to determine whether or not the elements from position a to position
b in a table T containing b − a + 1 u-length strings, admit an s-selector; after a
call to this function, if the selector has been found and k = (a + b) 2 is the
index of the median element, then the elements in T are arranged in such a way
that:
1. the median element T [k] contains [s] starting at position i;
2. the elements T [a] T [k − 1] are less than T [k] when we compare the
characters from position i to position i + s − 1 with [s] ;
3. the elements T [k + 1] T [b] are greater than T [k] when we compare the
characters from position i to position i + s − 1 with [s] .
The function returns the position i = sel at which the selector begins, if any, 0
otherwise. This value is stored in vector V from procedure PerfectDivision after
FindSelector is called.
FindSelector calls, in turn, a procedure, Select(var T, a, b, sel, s), and a
boolean function, qual(x, y, sel, s), which operate in the following way:

procedure Select arranges the elements in table T from position a to position


b by comparing, for each element, the s characters beginning at position
sel; the arrangement is such that the three conditions 1., 2. and 3. above are
satis ed. As is well known, the selection problem to nd the median element
can be solved in linear time; however, in our algorithm we decided to use a
heapsort procedure to completely sort the elements in T . This is slower, but
safer1 than existing selection algorithms, and the ordering could be used at
later stages;
function qual compares the s characters of the strings x y beginning at
position sel and returns true if the compared characters are equal, false
otherwise. It corresponds to Compare(x,y,sel,s)= 0 above.

Function FindSelector(var T,a,b,s): integer;


var sel i: integer;
begin
sel := 1;
if b − a 2 then
FindSelector:=0;
i := (a + b) 2 ;
1
The algorithm of Rivest and Tarjan [3, vol. 3, pag. 216], can only be applied to large
sets; the algorithm of selection by tail recursion [2, pag. 230] is linear on the average
but is quadratic in the worst case.
Modi ed Binary Searching for Static Tables 215

while sel (u − s + 1) do
Select(T, a, b, sel, s);
if Equal(T [i] T [i + 1] sel s) or Equal(T [i] T [i − 1] sel s) then
sel := sel + 1
else
FindSelector:=sel;
sel := u + 1 to force exit from the loop

od
else if b − a = 1 then
while sel (u − s + 1) do
Select(T,a,b,sel,s);
if Equal(T [a] T [b] sel s) then
sel := sel + 1
else
FindSelector:=1;
sel := u + 1 to force exit from the loop

od
FindSelector:=sel
else
FindSelector:=sel

end FindSelector ;

Procedure PerfectDivision(var T,a,b,s);


var k i: integer;
begin
k :=FindSelector(T, a, b, s);
if k = 0 then
fail
else
i := (a + b) 2 ;
V [i] := k;
if a < i then PerfectDivision(T, a, i-1, s) ;
if i < b then PerfectDivision(T, i+1, b, s)

end PerfectDivision ;

If we want to apply PerfectDivision to a set S of n u-length strings, we rst


store S’s elements in a table T , hence call PerfectDivision(T,1,n,s) by choosing
an appropriate value for s (in practice, we can start with s = 1 and then increase
its value if the procedure fails).
This procedure applied to the set of zodiac names by using s = 1 returns the
table T and the vector V as in Table 1(b). Let us take into consideration again
the case of arie : we start with the median element and since V [6] = 5 we have
216 Donatella Merlini, Renzo Sprugnoli, and M. Cecilia Verri

to compare the fth character of arie , , with the fth character of capricorn,
i; >i and we proceed with the new median element corpio. Since V [9] = 2
we compare r with c; in the next step the median element is gemini and we
compare the rst character of arie with the rst one of gemini. After that, the
searched string is compared to the whole string arie and the procedure ends
with success.
For the set of the rst twenty numbers names’ the procedure fails by using
s = 1 and returns the situation depicted in Table 2 by choosing s = 2

Table . Table T and vector V when s = 2 for the rst twenty numbers’ names

V T
1 1 te n
2 3 th re e
3 5 seve n
4 5 seve nt een
5 2 n in e
6 1 ve
7 1 on e
8 1 si x
9 1 tw o
10 4 nin et een
11 1 th irteen
12 2 f if teen
13 5 eigh t
14 5 eigh te en
15 2 s ix teen
16 3 tw el ve
17 3 tw en ty
18 3 el ev en
19 4 fou r
20 4 fou rt een

The complexity of PerfectDivision is given by the following Theorem:

Theorem If S is a set of words having length n and we set d = u − s + 1


then the complexity of the procedure PerfectDivision is at most ln 2dn log22 n if
we use Heapsort as a selection algorithm, and O(dn log2 n) on the average if we
use a linear selection algorithm.

Proof: The most expensive part of PerfectDivision is the selection phase, while
all the rest is performed in constant time. If we use HeapSort as an in-place
sorting procedure, the time is An log2 n where A 2 ln 2 This procedure is
Modi ed Binary Searching for Static Tables 217

executed at most d times. Let Cn be the total time for nding a complete division
for our set of n elements; clearly we have:

Cn Adn log2 n + 2Cn 2

There are general methods, see [4], to nd a valuation for Cn ; however, let us
suppose n = 2k , for some k and set ck = C2k ; then we have ck = Ad2k k + 2ck−1
By unfolding this recurrence, we nd:

ck = Ad2k k + 2Ad2k−1 (k − 1) + 4Ad2k−2 (k − 2) + =

= Ad2k (k + (k − 1) + + 1) = Ad2k−1 k(k + 1)


Returning to Cn :
n
Cn Ad log2 n(log2 n + 1) = O(dn(log2 n)2 )
2
Obviously, if we use a linear selection algorithm the time is reduced by a factor
O(log2 n). Finally, we observe that d log2 n and this justi es our statement in
the Introduction.

The next section is devoted to the problem of evaluating the probability that
a perfect subdivision for the table T exists.

3 The Analysis
We now perform an analysis of the pre-processing phase of modi ed binary
searching. Let = 1 2 be a nite alphabet and let its elements be
ordered in some way, for instance 1 < 2 < < . In real situations, can
be the Latin alphabet, = 26 or = 27 (if we also consider the space) and its
ordering is the usual lexicographical order; so, on a computer, is a subset of
ASCII codes and if we have s = 3 then we should consider triples of letters. We
can now abstract and de ne A = s , for the speci c value of s, with the order
induced by the order in . Then we obscure s and set A = a1 a2 a ,
where = s . Finally, we observe that if S is any set of strings, S u
, for a
given starting index the subwords of the words in S beginning at that position
and composed by s consecutive characters are a multiset S over A What we are
looking for, is a suitable division of this multiset.
Therefore, let us consider the multiset S with S = n over our abstract
alphabet A Our problem is to nd the probability that an element am A exists
such that: i) am S but has no duplicate in S; ii) there are exactly (n+1) 2 −1
elements ai S such that ai < am (i.e., these elements constitute a multiset over
a 1 a2 am−1 ); iii) there are exactly (n + 1) 2 elements aj in S such that
aj > am (i.e., these elements constitute a multiset over am+1 a ). In a
more general way we can de ne the following three (m p)-separation conditions
for a multiset S as above, relative to an element am A (see also De nition 1):
218 Donatella Merlini, Renzo Sprugnoli, and M. Cecilia Verri

i) am S and has no duplicate in S;


ii) there are (p − 1) elements in S preceding am ;
iii) there are (n − p) elements in S following am

Our rst important result is the following:

Theorem 3 Let (n p m) be the probability that a multiset S on A (S and


A as above) contains a speci c element am for which the (m p)-separation con-
ditions are satis ed; then
 
p n
(n p m) = n (m − 1)p−1 ( − m)n−p
p

Proof: Let us count the total number P (n p m) of the multisets satisfying


the three (m p)-separation conditions for a given element am A (1 m ).
If we imagine that the elements in S are sorted according to their order sequence
in A, the rst p − 1 elements must belong to the subset a1 a2 am−1 , and
therefore (m − 1)p−1 such submultisets exist. In the same way, the last n − p
elements in S must belong to am+1 a and therefore ( − m)n−p such
submultisets exist. Since every rst part can be combined with each second
part, and the pth character must equal am by hypothesis, there exists a total
of (m − 1)p−1 ( − m)n−p ordered multisets of the type described. The original
multisets, however, are not sorted, and the elements may assume any position
in S. The di erent combinations are counted by a simple trinomial coe cient,
which reduces to a binomial coe cient:
   
n n! n! n
= =p =p
p−1 1 n−p (p − 1)! 1! (n − p)! p! (n − p)! p

We conclude that the total number of multisets we are looking for is:
 
n
P (n p m) = p (m − 1)p−1 ( − m)n−p
p
n
Finally, the corresponding probability is found by dividing this expression by
the total number of multisets with n elements.

An immediate corollary of this result gives us the probability that, given S


an element am exists that satis es the (m p)-separation conditions:

Theorem 4 Given a multiset S as above, the probability (n p ) that an ele-


ment am A exists for which the (m p)-separation conditions hold true is:
 X
n
p
(n p ) = n (m − 1)p−1 ( − m)n−p
p m=1
Modi ed Binary Searching for Static Tables 219

Proof: Obviously, any am A could satisfy the separation conditions, each


one in an independent way of any other (a single multiset can contain several
elements satisfying the conditions). Therefore, the probability is the sum of the
single probabilities.

The formula of this theorem is not very appealing and does not give an
intuitive idea of how the probability (n p ) varies with p; in particular, what
it is when p = (n + 1) 2 the case we are interested in. In order to arrive
at a more intelligible formula, we are going to approximate it by obtaining its
asymptotic value. To do that, we must suppose n < and this hypothesis
will be understood in the sequel. In real situations, where = A = 26s or
= A = 27s we simply have to choose a selector of s character with:

s=1 for tables up to n = 22 elements;


s=2 for tables up to n = 570 elements;
s=3 for tables up to n = 15 000 elements;
s=4 for tables up to n = 400 000 elements.

These numbers correspond to n 0 85 if A = 26s and to n 0 8 if


s
A = 27 . Obviously, a selector with s > 4 can result in a worsening with
respect to traditional binary searching. On the other hand, very large tables
should reside in secondary storage.
A curious fact is that the probabilities (n p ) are almost independent of
p; as we are now going to show, the dependence on p can only be appreciated
when p is very near to 1 or very near to n

Theorem 5 The probability (n p ) of the previous theorem has the following


asymptotic approximations (n < ):
 n   3 
1 n
(n p ) = 1 − 1+O 3
when 2 < p < n − 2; (3.1)

 n   3 
1 n(n − 1) n
(n 2 ) = (n n − 1 ) = 1− 1− + O ; (3.2)
12( − 1) 2 3

(n 1 ) = (n n ) =
 n   3 
1 n n(n − 1) n
= 1− 1− − +O (3.3)
2( − 1) 12( − 1) 2 3

Proof: PLet us apply the uler-McLaurin summation formula to approximate


the sum m=1 (m − 1)p−1 ( − m)n−p By writing m as the continuous variable
x, we have:

X −1
X
(m − 1)p−1 ( − m)n−p = (m − 1)p−1 ( − m)n−p =
m=1 m=1
220 Donatella Merlini, Renzo Sprugnoli, and M. Cecilia Verri
Z
B2 0
= (x − 1)p−1 ( − x)n−p dx + B1 [f (x)]1 + [f (x)]1 +
1 2!
Here, f (x) = (x−1)p−1 ( −x)n−p , and we immediately observe that for p > 1 we
have [f (x)]1 = 0. In the same way, the rst (p − 1) derivatives are 0 and, in order
to nd suitable approximations for (n p ), we are reduced to the three cases
p = 1 p = 2 and p > 2. By symmetry, we also have (n p ) = (n n − p + 1 ).
Since f (x) is a simple polynomial in x, the integral can be evaluated as
follows:
Z Z −1
(x − 1) ( − x)
p−1 n−p
dx = y p−1 ( − 1 − y)n−p dy =
1 0
!
Z −1 X
n−p
n−p

= y p−1
( − 1) (−y)
k n−p−k
dy =
0 k
k=0

X
n−p
n−p
 Z −1
= ( − 1)k (−1)n−p−k y n−k−1 dy =
k 0
k=0

X
n−p
n−p

( − 1)n−k X n − p (−1)k
n−p
= ( − 1)k (−1)n−p−k = ( − 1)n =
k n−k k p+k
k=0 k=0
 −1
1 ( − 1)n n
= ( − 1)n
=
p n−p+p
n−p
p p

The last sum in this derivation is well-known and represents the inverse of a
binomial coe cient (see Knuth [3, vol. 1, pag. 71]). Therefore we obtain formula
(3.1) for 2 < p < n−2 For p = 2 we have a contribution from the rst derivative:
 
[f 0 (x)]1 = ( − x)n−2 − (x − 1)(n − 2)( − x)n−3 1 = −( − 1)n−2

and therefore we have formula (3.2). Finally, for p = 1 we also have a contribu-
tion from [f (x)]1 and nd formula (3.3).

Table 3 illustrates the probabilities (n p ) for n = 12 and = 20. We


used a computer program for simulating the problem, and in the table we show
the probabilities found by simulation, the exact probabilities given by Theorem
4, and the approximate probabilities as computed by formulas (3.1), (3.2) and
(3.3).
Formula (3.1), in the short version (n p ) = (1 − 1 )n allows us to obtain
an estimate of the probability of nding a division with p = (n+1) 2 for a given
set S of identi ers or words over some language. In general, several positions d
of the identi ers are available for a division (function FindSelector nds the rst
one), and this increases the probability of success. Usually, if s is the length of
the selector and u the identi ers’ length, we have d = u − s + 1 and in general:
Modi ed Binary Searching for Static Tables 221

Table 3. Simulation, exact and approximate probabilities

SIMULATION FOR MODIFI D BINARY S ARCHING

Number of alphabet elements (r): 20


Number of multiset elements (n): 12
Number of simulation trials: 30000

p simulat. exact approx.

1 0.72176667 0.72739722 0.72746538


2 0.51906667 0.52409881 0.52389482
3 0.54113333 0.54015736 0.54036009
4 0.53856667 0.54042599 0.54036009
5 0.53563333 0.54036133 0.54036009
6 0.54176667 0.54035984 0.54036009
7 0.54300000 0.54035984 0.54036009
8 0.54173333 0.54036133 0.54036009
9 0.53923333 0.54042599 0.54036009
10 0.53963333 0.54015736 0.54036009
11 0.52596667 0.52409881 0.52389482
12 0.72970000 0.72739722 0.72746538

Theorem 6 Let S be a multiset of identi ers over A with S = n and A =


If d positions in the elements of S are available for division, then the probability
that an alement am S exists satisfying the (m p)-separation conditions with
p = (n + 1) 2 is:
  n d
1
(n d) = 1 − (1 − (n p )) d
1− 1− 1− (3.4)

This quantity can be approximated by:


 n d  
d(n − 1)
(n d) 1 − exp − (3.5)
2
Proof: We can consider the d positions as independent of each other, and
therefore equation (3.4) follows immediately. To prove (3.5) we need some com-
putations. First of all we have:
 n      
1 1 1 1 1
1− = exp n ln 1 − = exp n − − 2 − 3 − =
2 3
 n n n  2 3
= exp − − 2 − 3 − = e−n e−n 2 e−n 3
2 3
By expanding the exponentials, we nd:
 n
1 2 3
1− 1− 1 − e−n e−n 2 e−n 3 =
222 Donatella Merlini, Renzo Sprugnoli, and M. Cecilia Verri
   
n n2 n3 n n
=1− 1− + 2 − 3 + 1− 2 + 1− 3 + =
2 6 2 3
  3 
n n − 1 n2 − 3n + 2 n
= 1− + + O
2 6 2 3

This formula can be written as:


 n    2 
1 n n−1 n
1− 1− = exp − 1+O 2
2
and therefore:
  n d  n d  
1 d(n − 1)
1− 1− exp −
2

which immediately gives equation (3.5).

Having found a division for S, we are not yet nished, because the procedure has
to be recursively applied to the subsets obtained from S (see procedure Perfect-
Division). Since n 2 is the approximate size of these two subsets, the probability
of nding a division for each of them is:
n   n d  
d(n − 2)
d 1− exp − ;
2 2 4
in fact, except that in very particular cases, the number of possible positions,
at which division can take place, is not diminished. The joint probability that
both subsets can be divided is (n 2 d)2 and the probability of obtaining a
complete division of S so that modi ed binary searching is applicable, is:

(n d) = (n d) (n 2 d)2 (n 4 d)4
k
the product extended up to (n 2k d)2 such that n 2k 3 In fact, as ob-
served in Theorem 1, a division for tables with 1 or 2 elements is always possible
(i.e., = 1).
The probability (n d) is the quantity we are mainly interested in. Obvi-
ously, (n d) < (n d) but, fortunately, we can show that these probabilities
are almost equal, at least in most important cases.
Theorem 7 If (n d) is su ciently near to 1 then (n d) (n d)
Proof: First of all, let us develop (n d) :

(n d)
  n d     n d  
d(n − 1) d(n − 2)
1− exp − 1−2 exp −
2 2 4
 n d      
d(n − 1) n d d(n − 2)
1− exp − −2 exp − −
2 2 4
Modi ed Binary Searching for Static Tables 223

 n d  
d(n − 4)
−4 exp − −
4 8
What we now wish to show is that the rst term (after 1) dominates all the
others, and therefore (n d) (n d) Let us consider the ratio between
the rst and the second term after the 1 :
 n d    d  
d(n − 1) 1 2 d(n − 2)
exp − exp =
2 2 n 4
   
dn − 2d − 2dn + 2d dn
= 2d−1 exp = 2d−1 exp −
4 4
This quantity shows that the rst term is much larger than the second, as
claimed.

4 xperimental Results
In order to verify the e ectiveness of our modi ed binary searching method,
we devised some experiments comparing traditional and modi ed binary search
programs. We used Pascal as a programming language, well aware that this
might not be the best choice. However, our aim was also to show how better
is our method when realised without any particular trick in a high level lan-
guage. When the table is large, we need two or more characters for the selector
s; our implementation obviously performs one-character-at-a-time comparisons;
since the characters to be compared against s are consecutive, a machine lan-
guage realization would instead perform a multi-byte loading and comparing,
further reducing execution time. Presently, we are developing a C version of our
programs to compare them to the most sophisticated realisations of traditional
binary searching and to other, more recent approaches to string retrieval, such
as the method of Bentley and Sedgewick [1].
We used the following program for traditional binary searching:

function BS (str : string) : integer;


var a, b, k : integer; found : boolean;
begin
a := 1; b := n; found := false;
while a<=b do begin
k:=(a+b) div 2;
if str = T[k]
then begin a := b+1; found := true end
else if str < T[k] then b:=k-1 else a:=k+1
end;
if found then BS := k else BS := 0
end;
224 Donatella Merlini, Renzo Sprugnoli, and M. Cecilia Verri

For the modi ed binary searching, the program is:

function MBS1 (str : string): integer;


var a, b, j, k : integer; found : boolean;
begin
a := 1; b := n; found := false;
while a <= b do begin
k := (a+b) div 2; j := V[k];
if str[j] = T[k,j]
then begin a := b+1; found := true end
else if str[j] < T[k,j]
then b := k - 1
else a := k + 1 end;
if found and (str = T[k])
then MBS1 := k else MBS1 := 0
end;

This program is to be used when the selector length is 1; for a selector length
of 2 we simply perform two cascade if’s, and three for a selector length of 3.
We performed 10 blocks of 20,000 searches of all the strings in a table, ran-
domly chosen in a dictionary of 1,524 nglish words. For small tables (selector
length equal to 1) we obtained the average times in the rst part of Table 4 (the
time unit is inessential; only relative times are of importance). For larger tables
with selector length equal to 2 or 3 we obtained the times in parts two and three
of Table 4.

Table 4. Times for selector of length 1, 2 and 3

n MBS BS gain (%) n MBS BS gain (%) n MBS BS gain (%)


10 29.7 56.2 47.2 10 33.0 54.8 39.8 50 100.0 228.4 56.2
11 33.9 63.6 46.7 20 69.9 141.1 50.5 100 222.8 541.8 58.9
12 36.3 70.9 48.8 30 108.3 234.4 53.8 150 346.0 887.6 61.0
13 38.9 80.4 51.6 40 150.5 345.4 56.4 200 472.4 1256.6 62.4
14 44.5 86.3 48.4 60 235.1 568.5 58.6 250 607.6 1625.4 62.6
15 47.1 93.8 49.8 80 326.8 825.0 60.4 300 751.2 2035.4 63.1
16 52.0 103.1 49.6 100 421.8 1084.2 61.1 350 883.4 2452.8 64.0
17 54.4 113.7 52.2 120 519.6 1340.6 61.2 400 1032.8 2864.8 63.9
18 58.3 123.1 52.6 140 619.5 1625.3 61.9 450 1151.4 3283.4 64.9
20 65.9 139.6 52.8 160 713.4 1919.7 62.8 500 1310.4 3702.0 64.6
22 73.3 159.5 54.0 180 806.5 2214.3 63.6 550 1447.8 4147.0 65.1
24 81.3 178.8 54.5 200 913.3 2510.2 63.6 600 1592.8 4610.2 65.5
26 87.3 197.8 55.9 225 1032.6 2877.0 64.1 700 1874.0 5534.2 66.1
28 94.4 215.9 56.3 250 1146.3 3248.7 64.7 800 2202.6 6461.4 65.9
30 101.6 232.8 56.4 275 1263.1 3648.8 65.4 900 2491.4 7377.8 66.2
32 110.0 254.8 56.8 300 1408.8 4060.6 65.3 1000 2760.6 8303.8 66.8
Modi ed Binary Searching for Static Tables 225

5 Conclusions

We have considered a variant of binary searching, which avoids most of the


housekeeping instructions related to string comparisons. This requires a suitable
pre-processing phase for the table to be searched, and therefore only applies to
the static case. What we have shown is:
1. The variant is considerably faster than traditional binary searching; we give
empirical evidence of this fact, by comparing actual programs performing
both kinds of binary searching.
2. The pre-processing phase is fast, because it runs in time O(n(log2 n)2 ), and
produces with a very high probability the optimal arrangement of the table
elements. In any case, an almost optimal arrangement can always be found.
In our opinion, an important aspect of our method is that it can be e ciently
realised in a high level language; as is well-known, this is not always possible
for other kinds of fast retrieval methods for static tables, such as perfect hash-
ing. This makes our modi ed binary searching procedure attractive for actual
implemenattion in real systems.

References
1. J.L. Bentley, R. Sedgewick: Fast Algorithms for Sorting and Searching Strings. Pro-
ceedings of 8th Annual ACM-SIAM Symp. On Discrete Algorithms. (1997) 360 369
2. G. H. Gonnet, R. Baeza-Yates: Algorithms and Data Structures, 2nd edition. Ad-
dison Wesley (1991)
3. D. . Knuth: The Art of Computer Programming. Vol. 1-3. Addison-Wesley (1973)
4. R. Sedgewick, P. Flajolet: An Introduction to the Analysis of Algorithms. Addison-
Wesley (1996)
An Efficient Algorithm for the Approximate Median
Selection Problem

Sebastiano Battiato1 , Domenico Cantone1 , Dario Catalano1 , Gianluca Cincotti1 , and


Micha Hofri2
1
Dipartimento di Matematica e Informatica, Università di Catania
Viale A. Doria 6, I–95125 Catania, Italy
battiato,cantone,catalano,cincotti @cs.unict.it
2
Department of Computer Science, WPI
100 Institute Road, Worcester MA 01609-2280, USA
[email protected]

Abstract. We present an efficient algorithm for the approximate median selec-


tion problem. The algorithm works in-place; it is fast and easy to implement.
For a large array it returns, with high probability, a very close estimate of the true
median. The running time is linear in the length n of the input. The algorithm per-
forms fewer than 43 n comparisons and 13 n exchanges on the average. We present
analytical results of the performance of the algorithm, as well as experimental
illustrations of its precision.
Keywords: Approximation algorithms, in-place algorithms, median selection,
analysis of algorithms.

1. Introduction

In this paper we present an efficient algorithm for the in-place approximate median
selection problem. There are several works in the literature treating the exact median
selection problem (cf. [BFP*73], [DZ99], [FJ80], [FR75], [Hoa61], [HPM97]). Various
in-place median finding algorithms have been proposed. Traditionally, the “comparison
cost model” is adopted, where the only factor considered in the algorithm cost is the
number of key-comparisons. The best upper bound on this cost found so far is nearly
3n comparisons in the worst case (cf. [DZ99]). However, this bound and the nearly-as-
efficient ones share the unfortunate feature that their nice asymptotic behaviour is “paid
for” by extremely involved implementations.
The algorithm described here approximates the median with high precision and
lends itself to an immediate implementation. Moreover, it is quite fast: we show that
it needs fewer than 43 n comparisons and 13 n exchanges on the average and fewer than
3 1
2 n comparisons and 2 n exchanges in the worst-case. In addition to its sequential effi-
ciency, it is very easily parallelizable due to the low level of data contention it creates.
The usefulness of such an algorithm is evident for all applications where it is suffi-
cient to find an approximate median, for example in some heapsort variants (cf. [Ros97],
[Kat96]), or for median-filtering in image representation. In addition, the analysis of its
precision is of independent interest.

G. Bongiovanni, G. Gambosi, R. Petreschi (Eds.): CIAC 2000, LNCS 1767, pp. 226–238, 2000.

c Springer-Verlag Berlin Heidelberg 2000
An Efficient Algorithm for the Approximate Median Selection Problem 227

We note that the procedure pseudomed in [BB96, 7.5] is similar to performing just
one iteration of the algorithm we present (using quintets instead of triplets), as an aid in
deriving a (precise) selection procedure.
In a companion paper we show how to extend our method to approximate general
k-selection.
All the works mentioned above—as well as ours—assume the selection is from
values stored in an array in main memory. The algorithm has an additional property
which, as we found recently, has led to its being discovered before, albeit for solving a
rather different problem. As is apparent on reading the algorithms presented in Section
2, it is possible to perform the selection in this way “on the fly,” without keeping all
the values in storage. At the extreme case, if the values are read in one-by-one, the
algorithm only uses 4 log3 n positions (including log3 n loop variables). This way
of performing the algorithm is described in [RB90], in the context of estimating the
median of an unknown distribution. The authors show there that the value thus selected
is a consistent estimator of the desired parameter. They need pay no attention (and
indeed do not) to the relation between the value the algorithm selects and the actual
sample median. The last relation is the center point of interest for us. Curiously, Weide
notes in [Wei78] that this approach provides an approximation of the sample median,
though no analysis of the bias is provided. See [HM95] for further discussion of the
method of Rousseeuw and Bassett, and numerous other treatments of the statistical
problem of low-storage quantile (and in particular median) estimation.
In Section 2 we present the algorithm. Section 3 provides analysis of its run-time. In
Section 4, to show the soundness of the method, we present a probabilistic analysis of
the precision of its median selection. Since it is hard to glean the shape of the distribu-
tion function from the analytical results, we provide computational evidence to support
the conjecture that the distribution is asymptotically normal. In Section 5 we illustrate
the algorithm with a few experimental results, which also demonstrate its robustness.
Section 6 concludes the paper with suggested directions for additional research.
An extended version of this paper is available by anonymous ftp from
ftp://ftp.cs.wpi.edu/pub/techreports/99-26.ps.gz.

2. The Algorithm
It is convenient to distinguish two cases:

2.1 The Size of the Input Is a Power of 3: n = 3r


Let n = 3r be the size of the input array, with an integer r. The algorithm proceeds in r
stages. At each stage it divides the input into subsets of three elements, and calculates
the median of each such triplet. The ”local medians” survive to the next stage. The algo-
rithm continues recursively, using the local results to compute the approximate median
of the initial set. To incur the fewest number of exchanges we do not move the chosen
elements from their original triplets. This adds some index manipulation operations, but
is typically advantageous. (While the order of the elements is disturbed, the contents of
the array is unchanged).
228 Sebastiano Battiato et al.

Approximate Median Algorithm (1)

Triplet Adjust(A, i, Step)


Let j = i+Step and k= i+2 Step; this procedure moves the median of a triplet of
terms at locations i j k to the middle position.
if (A[i] < A[j])
then
if (A[k] < A[i]) then Swap(A[i] A[j]);
else if (A[k] < A[j]) then Swap(A[j] A[k]);
else
if (A[i] < A[k]) then Swap(A[i] A[j]);
else if (A[k] > A[j]) then Swap(A[j] A[k]);
Approximate Median(A, r)
This procedure returns the approximate median of the array A[ 0 3r − 1].
r
Step=1; Size=3 ;
repeat r times
i=(Step−1)/2;
while i < Size do
Triplet Adjust(A, i, Step);
i=i+(3 Step);
end while;
Step = 3 Step;
end repeat;
return A[(Size − 1) 2];

Fig. 1. Pseudo-code for the approximate median algorithm, n = 3r r N.

In Fig. 1 we show pseudo-code for the algorithm. The procedure Triplet Adjust finds
the median of triplets with elements that are indexed by two parameters: one, i, denotes
the position of the leftmost element of triplet in the array. The second parameter, Step, is
the relative distance between the triplet elements. This approach requires that when the
procedure returns, the median of the triplet is in the middle position, possibly following
an exchange. The Approximate Median algorithm simply consists of successive calls to
the procedure.

2.2 The Extension of the Algorithm to Arbitrary-Size Input

The method described in the previous subsection can be generalized to array sizes which
are not powers of 3. The basic idea is similar. Let n be the input size at the current stage,
where
n=3 t+k k 0 1 2
We divide the input into (t − 1) triplets and a (3 + k)-tuple. The (t − 1) triplets are pro-
cessed by the same Triplet Adjust procedure described above. The last tuple is sorted
An Efficient Algorithm for the Approximate Median Selection Problem 229

(using an adaptation of selection-sort) and the median is extracted. The algorithm con-
tinues iteratively using the results of each stage as input for a new one. This is done
until the number of local medians falls below a small fixed threshold. We then sort the
remaining elements and obtain the median. To symmetrize the algorithm, the array is
scanned from left to right during the first iteration, then from right to left on the second
one, and so on, changing the scanning sense at each iteration. This should reduce the
perturbation due to the different way in which the medians from the (3 + k)-tuples are
selected and improve the precision of the algorithm. Note that we chose to select the
second element out of four as the median (2 out of 1..4). We show pseudo-code for the
general case algorithm in Fig. 2.

Approximate Median Algorithm (2)

Selection Sort (A, Left, Size, Step)


This procedure sorts Size elements of the array A located at positions Left, Left + Step,
Left + 2 Step Left + (Size − 1) Step.
for (i = Left ; i < Left + (Size − 1) Step; i = i + Step)
M in = i;
for (j = i + Step; j < Left + Size Step; j = j + Step)
if (A[j] < A[min]) then min = j;
end for;
Swap(A[i] A[min]);
end for;
Approximate Median AnyN (A, Size)
This procedure returns the approximate median of the array A[0 Size − 1].
LeftToRight = F alse; Left = 0; Step = 1;
while (Size > Threshold) do
LeftToRight = Not (LeftToRight);
Rem = (Size mod 3);
if (LeftToRight) then i = Left;
else i = Left + (3 + Rem) Step;
repeat (Size 3 − 1) times
Triplet Adjust (A, i, Step);
i = i + 3 Step;
end repeat;
if (LeftToRight) then Left = Left + Step;
else i = Left;
Left = Left + (1 + Rem) Step;
Selection Sort (A, i, 3 + Rem, Step);
if (Rem = 2) then
if (LeftToRight) then Swap(A[i + Step], A[i + 2 Step])
else Swap(A[i + 2 Step], A[i + 3 Step]);
Step = 3 Step; Size = Size 3;
end while;
Selection Sort (A, Left, Size, Step);
return A[Left + Step (Size − 1) 2 ];

Fig. 2. Pseudo-code for the approximate median algorithm, any n N.


230 Sebastiano Battiato et al.

Note: The reason we use a terminating tuple of size 4 or 5, rather than 1 or 2, is to keep
the equal spacing of elements surviving one stage to the next.
The procedure Selection Sort takes as input four parameters: the array A, its size
and two integers, Left and Step. At each iteration Left points to the leftmost element
of the array which is in the current input, and Step is the distance between any two
successive elements in this input.
There are several alternatives to this approach for arbitrary-sized input. An attractive
one is described in [RB90], but it requires additional storage of approximately 4 log3 n
memory locations.

3. Run-Time Analysis: Counting Moves and Comparisons

Most of the work of the algorithm is spent in Triplet Adjust, comparing values and ex-
changing elements within triplets to locate their medians. We compute now the number
of comparisons and exchanges performed by the algorithm Approximate Median.
Like all reasonable median-searching algorithms, ours has running-time which is
linear in the array size. It is distinguished by the simplicity of its code, and hence it is
extremely efficient. We consider first the algorithm described in Fig. 1.
Let n = 3r , r N , be the size of a randomly-ordered input array. We have the
following elementary results:

Theorem 1. Given an input of size n, the algorithm Approximate Median performs


fewer than 43 n comparisons and 13 n exchanges on the average.

Proof: Consider first the Triplet Adjust subroutine. In the following table we show the
number of comparisons and exchanges, C3 and 3 , for each permutation of three distinct
elements:

A[i] A[i+Step] A[i+2*Step] Comparisons Exchanges


1 2 3 3 0
1 3 2 3 1
2 1 3 2 1
2 3 1 2 1
3 1 2 3 1
3 2 1 3 0

Clearly, assuming all orders equally likely, we find Pr(C3 = 2) = 1 − Pr(C3 =


3) = 1 3, and similarly Pr( 3 = 0) = 1 − Pr( 3 = 1) = 1 3, with expected values
[ 3 ] = 2 3 and [C3 ] = 8 3.
To find the work of the entire algorithm with an input of size n, we multiply the
above by T (n), the number of times the subroutine Triplet Adjust is executed. This
number is deterministic. We have T (1) = 0 and T (n) = n3 + T ( n3 ), for n > 1; for n
which is a power of 3 the solution is immediate: T (n) = 12 (n − 1).
Let n be the number of possible inputs of size n and let n be the total number
of comparisons performed by the algorithm on all inputs of size n.
An Efficient Algorithm for the Approximate Median Selection Problem 231

The average number of comparisons for all inputs of size n is:

n 16 T (n) n! 3! 4
C(n) = = = (n − 1)
n n! 3

To get n we count all the triplets considered for all the n inputs, i.e. n! T (n); for
each triplet we consider the cost over its 3! permutations (the factor 16 is the cost for
the 3! permutations of each triplet1 ).
The average number of exchanges can be shown analogously, since two out of three
permutations require an exchange.

By picking the “worst” rows in the table given in the proof of Theorem 1, it is straight-
forward to verify also the following:

Theorem 2. Given an input of size n = 3r , the algorithm Approximate Median per-


forms fewer than 32 n comparisons and 12 n exchanges in the worst-case.

For an input size which is a power of 3, the algorithm of Fig. 2 performs nearly
the same operations as the simpler algorithm – in particular, it makes the same key-
comparisons, and selects the same elements. For log3 n N , their performance only
differs on one tuple per iteration, hence the leading term (and its coefficient) in the
asymptotic expression for the costs is the same as in the simpler case.
The non-local algorithm described in [RB90] performs exactly the same number of
comparisons as above but always moves the selected median. The overall run-time cost
is very similar to our procedure.

4. Probabilistic Performance Analysis

4.1 Range of Selection

It is obvious that not all the input array elements can be selected by the algorithm —
e.g., the smallest one is discarded in the first stage. Let us consider first the algorithm of
Fig. 1 (i.e. when n is a power of 3). Let v(n) be the number of elements from the lower
end (alternatively – upper end, since the Approximate Median algorithm has bilateral
symmetry) of the input which cannot be selected out of an array of n elements. It is easy
to verify (by observing the tree built with the algorithm) that v(n) obeys the following
recursive inequality:

v(3) = 1 v(n) 2v(n 3) + 1 (1)

Moreover, when n = 3r , the equality holds. The solution of the recursive equation,

v(3) = 1; v(n) = 2v(n 3) + 1


1
Alternatively, we can use the following recurrence: C(1) = 0 and C(n) = n3 c + C( n3 ), for
n > 1, where c = 16 6
is the average number of comparisons of Triplet Adjust (because all the
3! permutations are equally likely).
232 Sebastiano Battiato et al.

is the following function

v(n) = nlog3 2 − 1 = 2log3 n − 1

Let x be the output of the algorithm over an input of n elements. From the definition of
v(n) it follows that
v(n) < rank(x) < n − v(n) + 1 (2)
The second algorithm behaves very similarly (they perform the same operations
when n = 3r ) and the range function v(n) obeys the same recurrence.
Unfortunately not many entries get thus excluded. The range of possible selection,
as a fraction of the entire set of numbers, increases promptly with n. This is simplest to
illustrate with n that is a power of 3. Since v(n) can be written as 2log3 n − 1, the ratio
v(n) n is approximately (2 3)log3 n . Thus, for n = 33 = 27, where the smallest (and
largest) 7 numbers cannot be selected, 52% of the range is excluded; the comparable
restriction is 17.3% for n = 36 = 729 and only 1.73% for n = 312 = 531441.
The true state of affairs, as we now proceed to show, is much better: while the pos-
sible range of choice is wide, the algorithm zeroes in, with overwhelming probability,
on a very small neighborhood of the true median.

4.2 Probabilities of Selection

The most telling characteristic of the algorithms is their precision, which can be ex-
pressed via the probability function

P (z) = Pr[zn < rank(x) < (1 − z)n + 1] (3)

for 0 z 1 2, which describes the closeness of the selected value to the true median.
The purpose of the following analysis is to show the behavior of this distribution.
We consider n which is a power of 3.
(r)
Definition 1. Let qa b be the number of permutations, out of the n! = 3r ! possible ones,
in which the entry which is the at smallest in the set is: (1) selected, and (2) becomes
the bt smallest in the next set, which has n3 = 3r−1 entries.

It turns out that this quite narrow look at the selection process is all we need to
characterize it completely.
It can be shown that
n    n 
(r) 3 − 1 a−b−1 X b − 1 3 −b 1
qa b = 2n(a − 1)!(n − a)! 3 (4)
b−1 i
i a − 2b − i 9 i

(for details, see [BCC*99]).


(r)
It can also be seen that qa b is nonzero for 0 a − 2b 3 − 1 only. The sum is
n

8
a−2b (u v) 5

expressible as a Jacobi polynomial, 9 Pa−2b 4 , where u = 3b − a − 1 v =
n
3 + b − a, and a simpler closed form is unlikely.
An Efficient Algorithm for the Approximate Median Selection Problem 233

(r)
Let pa b be the probability that item a gets to be the bt smallest among those se-
lected for the next stage. Since the n! = 3r ! permutations are assumed to be equally
(r) (r)
likely, we have pa b = qa b n!:

X b − 1 
n
−b 3 −1
(r) 23 n
3 −b 1
pa b = b−1

3 3−a n−1
a−1 i
i a − 2b − i 9i
n 
3 −1
3−b b−1
2 z n
=  [z a−2b ](1 + )b−1 (1 + z) 3 −b (5)
3 3−a n−1
a−1
9

(r)
This allows us to calculate the center of our interest: The probability Pa , of starting
with an array of the first n = 3r natural numbers, and having the element a ultimately
chosen as approximate median. It is given by
X (r) (r−1) X (r) (r−1) (2)
Pa(r) = pa br Pbr = pa br pbr br−1 pb3 2 (6)
br br br−1  b3

where 2j−1 bj 3j−1 − 2j−1 + 1, for j = 3 4 r.


(r)
Some telescopic cancellation occurs when the explicit expression for pa b is used
here, and we get
 r a−1 X Yr X bj − 1 3j−1 − bj  1
2 3
Pa(r) =  (7)
3 n−1 ij bj+1 − 2bj − ij 9ij
a−1 br br−1  b3 j=2 ij 0

As above, each bj takes values in the range [2j−1 3j−1 − 2j−1 + 1], b2 = 2 and
br+1 a (we could let all bj take all positive values, and the binomial coefficients
(r)
would produce nonzero values for the required range only). The probability Pa is
nonzero for v(n) < a < n − v(n) + 1 only.
This distribution has so far resisted our attempts to provide an analytic characteriza-
tion of its behavior. In particular, while the examples below suggest very strongly that
as the input array grows, it approaches the normal distribution, this is not easy to show
analytically. (See Section 4.4 of [BCC*99] for an approach to gain further information
about the large-sample distribution.)

4.3 Examples
(r)
We computed Pa for several values of r. Results for a small array (r = 3 n = 27)
are shown in Fig.3. By comparison, with a larger array (r = 5 n = 243, Fig. 4) we
notice the relative concentration of the likely range of selection around the true median.
In terms of these probabilities the relation (3) is:
X
Pa(r) (8)
bznc<a<d(1−z)ne+1

where 0 z < 12 .
234 Sebastiano Battiato et al.

0.25

0.2

0.15

0.1

0.05

0
8 10 12 14 16 18 20

Fig. 3. Plot of the median probability distribution for n=27.

We chose to present the effectiveness of the algorithm by computing directly from


equation (7) the statistics of the absolute value of the bias of the returned approximate
median, Dn Xn − Md (n) (where Md (n) is the true median, (n + 1) 2). We
compute its mean (Avg.) and standard deviation, denoted by d .
A measure of the improvement of the selection effectiveness with increasing (initial)
array size n is seen from the variance ratio d Md (n). This ratio may be viewed as a
measure of the expected relative error of the approximate median selection algorithm.
Numerical computations produced the numbers in Table 1; note the trend in the
rightmost column. (This trend is the basis for the approach examined in Section 4.4 of
[BCC*99].)

n r = log3 n Avg. d d n
9 2 0.428571 0.494872 0.164957
27 3 1.475971 1.184262 0.227911
81 4 3.617240 2.782263 0.309140
243 5 8.096189 6.194667 0.397388
729 6 17.377167 13.282273 0.491958
2187 7 36.427027 27.826992 0.595034

Table 1. Statistics of the median selection as function of array size.


An Efficient Algorithm for the Approximate Median Selection Problem 235

0.04

0.035

0.03

0.025

0.02

0.015

0.01

0.005

0
20 40 60 80 100 120 140 160 180 200 220

Fig. 4. Plot of the median probability distribution for n=243.

5. Experimental Results

In this section we present empirical results, demonstrating the effectiveness of the algo-
rithms – also for the cases which our analysis does not handle directly. Our implemen-
tation is in standard C (GNU C compiler v2.7). All the experiments were carried out on
a PC Pentium II 350Mhz with the Linux (Red Hat distribution) operating system. The
lists were permuted using the pseudo-random number generator suggested by Park and
Miller in 1988 and updated in 1993 [PM88]. The algorithm was run on random arrays
of sizes that were powers of 3, n = 3r , with r 3 4 11 , and others. The entry
keys were always the integers 1 n.
The following tables present results of such runs. They report the statistics of Dn ,
the absolute value of the bias of the approximate median. For each returned result we
compute its distance from the correct media n+1 2 . The units we use in the tables are
“normalized” values of Dn , denoted by d% ; these are percentiles of the possible range
of error of the algorithm: d% = 100 (n−1) Dn
2 . The extremes are d% = 0 when the
true median is returned – and it would have been 100 if it were possible to return the
smallest (or largest) elements. (But relation (2) shows that d% can get arbitrarily close
to 100 as n increases, but never quite reach it). Moreover, the probability distributions
of the last section suggest, as illustrated in Figure 4, that such deviations are extremely
unlikely. Table 2 shows selected results, using a threshold value of 8. All experiments
used a sample size of 5000, throughout.
236 Sebastiano Battiato et al.

n Avg. Avg. + 2 Rng(95%) (Min-Max)


50 10.27 7.89 26.05 24.49 0.00–44.90
100 8.45 6.63 21.70 20.20 0.00–48.48
35 6.74 5.13 17.00 16.53 0.00–38.02
500 5.80 4.41 14.63 14.03 0.00–29.06
36 4.83 3.71 12.26 12.09 0.00–24.73
1000 4.70 3.66 12.02 11.61 0.00–23.62
37 3.32 2.54 8.41 8.05 0.00–16.83
5000 2.71 2.10 6.91 6.72 0.00–17.20
38 2.31 1.75 5.81 5.67 0.00–11.95
10000 2.53 1.86 6.24 6.04 0.00–11.38
39 1.58 1.18 3.94 3.86 0.00–6.78

Table 2. Simulation results for d% , fractional bias of approximate median. Sample size
= 5000.

The columns are: n – array size; Avg. – the average of d% over the sample; – the
sample standard-error of d% ; Rng. (95%) – the size of an interval symmetric around
Avg. that contains 95% of the returned values; the last column gives the extremes of d%
that were observed. In the rows that correspond to those of Table 1, the agreement of the
Avg. and columns is excellent (the relative differences are under 0 5%). Columns 4
and 5 suggest the closeness of the median distribution to the Gaussian, as shown above.
All the entries show the critical dependence of the quality of the selection on the
size of the initial array. In the following table we report the data for different values of
n with sample size of 5000, varying the threshold.

t=8 t=26 t=80


n Avg. Rng. (95%) Avg. Rng. (95%) Avg. Rng. (95%)
100 8.63 6.71 22.22 6.80 5.31 16.16 4.64 3.61 12.12
500 5.80 4.41 14.43 4.40 3.30 10.82 3.22 2.40 7.62
1000 4.79 3.67 12.01 3.80 2.88 9.41 2.98 2.27 7.41
10000 2.54 1.87 6.05 1.67 1.28 4.14 1.40 1.06 3.44

Table 3. Quality of selection as a function of threshold value.

As expected, increasing the threshold—the maximal size of an array which is sorted,


to produce the exact median of the remaining terms—provides better selection, at the
cost of rather larger processing time. For large n, threshold values beyond 30 provide
marginal additional benefit. Settling on a correct trade-off here is a critical step in tuning
the algorithm for any specific application.
Finally we tested for the relative merit of using quintets rather than triplets when
selecting for the median. In this case n = 1000, Threshold=8, and sample size=5000.
An Efficient Algorithm for the Approximate Median Selection Problem 237

Avg. Avg. + 2 Rng(95%) (Min-Max)


Triplets 4.70 3.66 12.02 11.61 0.00–23.62
Quintets 3.60 2.74 9.08 9.01 0.00–16.42

Table 4. Comparing selection via triplets and quintets.

6. Conclusion

We have presented an approximate median finding algorithm, and an analysis of its


characteristics. Both can be extended. In particular, the algorithm can be adapted to
select an approximate k t -element – for any k [1 n]. The analysis of Section 4 can be
extended to show how to compute with the exact probabilities, as given in equation (7).
Also, the limiting distribution of the bias D with respect to the true median – while we
know it is extremely close to a gaussian distribution, we have no efficient representation
for it yet.

Acknowledgments

The derivations of Section 4.4 in [BCC*99] are largely due to Svante Janson. The refer-
ences [HM95], [RB90], and [Wei78] were provided by Reza Modaress. We are grateful
for their contributions.
The authors wish also to thank an anonymous referee for some useful comments.

References
[AHU74] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer
algorithms. Addison Wesley, Reading, MA 1974.
[BB96] G. Brassard and P. Bratley. Fundamentals of Algorithmics. Prentice-Hall, Engle-
wood Cliffs, NJ 1996.
[BCC*99] S. Battiato, D. Cantone, D. Catalano, G. Cincotti, and M. Hofri. An Efficient Al-
gorithm for the Approximate Median Selection Problem. Technical Report WPI-
CS-TR-99-26, Worcester Polytechnic Institute, October 1999. Available from
ftp://ftp.cs.wpi.edu/pub/techreports/99-26.ps.gz.
[BFP*73] M. Blum, R.W. Floyd, V. Pratt, R.L. Rivest, and R. Tarjan. Time bounds for selec-
tion. Journal of Computer and Systems Sciences, 7(4):448–461, 1973.
[CS87] S. Carlsson, M. Sundstrom. Linear-time In-place Selection in Less than 3n Com-
parisons - Division of Computer Science, Lulea University of Technology, S-971 87
LULEA, Sweden.
[CLR90] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. McGraw-
Hill, 1990.
[CM89] W. Cunto and J.I. Munro. Average case selection. Journal of the ACM, 36(2):270–
279, 1989.
[Dra67] Alvin W. Drake. Fundamentals of Applied Probability Theory. McGraw-Hill, 1967.
[DZ99] D. Dor, and U. Zwick. Selecting the Median. SIAM Jour. Comp., 28(5):1722–1758,
1999.
238 Sebastiano Battiato et al.

[FJ80] G.N. Frederickson, D.B. Johnson. Generalized Selection and Ranking. Proceedings
STOC-SIGACT, Los Angeles CA, 12:420–428, 1980.
[Fre90] G.N. Frederickson. The Information Theory Bound is Tight for selection in a heap.
Proceedings STOC-SIGACT, Baltimore MD, 22:26–33, 1990.
[FR75] R.W. Floyd, R.L. Rivest. Expected time bounds for selection. Communications of
the ACM, 18(3):165–172, 1975.
[Hoa61] C.A.R. Hoare. Algorithm 63(partition) and algorithm 65(find). Communications of
the ACM, 4(7):321–322, 1961.
[Hof95] M. Hofri, Analysis of Algorithms: Computational Methods & Mathematical Tools,
Oxford University Press, New York (1995).
[HM95] C. Hurley and Reza Modarres: Low-Storage quantile estimation, Computational
Statistics, 10:311–325, 1995.
[HPM97] P. Kirschenhofer, C. Martinez, and H. Prodinger. Analysis of Hoare’s Find algo-
rithm with median-of-three partition. Random Structures and Algorithms, 10:143–
156, 1997.
[Kat96] J. Katajainen. The Ultimate Heapsort, DIKU Report 96/42, Department of Com-
puter Science, Univ. of Copenhagen, 1996.
[Knu98] D.E. Knuth. The Art of Computer Programming, volume 3: Sorting and Searching.
Addison-Wesley, 2nd Ed. 1999.
[LW88] T.W. Lai, and D. Wood. Implicit Selection. Scandinavian Workshop on Algorithm
Theory (SWAT88):18–23, LNCS 38 Springer-Verlag, 1988.
[Meh84] K. Mehlhorn. Sorting and Searching, Data Structures and Algorithms, volume 1.
Springer-Verlag, 1984.
[Noz73] A. Nozaky. Two Entropies of a Generalized Sorting Problems. Journal of Computer
and Systems Sciences, 7(5):615–621, 1973.
[PM88] S. K. Park and K. W. Miller. Random number generators: good ones are hard to
find. Communications of the ACM, 31(10):1192–1201, 1988. Updated in Communi-
cations of the ACM, 36(7):108–110, 1993.
[Ros97] L. Rosaz.Improving Katajainen’s Ultimate Heapsort, Technical Report N.1115,
Laboratoire de Recherche en Informatique, Université de Paris Sud, Orsay, 1997.
[RB90] P.J. Rousseeuw and G.W. Bassett: The remedian: A robust averaging method for
large data sets. Jour. Amer. Statist. Assoc, 409:97–104, 1990.
[SJ99] Savante Janson – Private communication, June 1999.
[SPP76] A. Schonhage, M. Paterson, and N. Pippenger. Finding the median. Journal of Com-
puter and Systems Sciences, 13:184–199, 1976.
[Wei78] B. W. Weide. Space efficient on-line selection algorithm. Proceedings of the 11th
symposium of Computer Science and Statistics, on the interface, 308–311. (1978).
Extending the Implicit Computational
Complexity App oach to the Sub-elementa y
Time-Space Classes

Emanuele Covino, Giovanni Pani, and Salvato e Capo aso

Dipartimento di Informatica dell’Universita di Bari


(covino|pani|caporaso @di.uniba.it

Ab tract. A resource-free characterization of some complexity classes


is given by means of the predicative recursion and constructive diagonal-
ization schemes, and of restrictions to substitution. Among other classes,
we predicatively harmonize in the same hierarchy ptimef, the class E of
the elementary functions, and classes dtimespacef(np nq ).
Keyword : time-space classes, implicit computational complexity, ele-
mentary functions.

1 Int oduction
Position of the problem. The standa d de nition of a complexity class involves
the de nition of a bound imposed on time and/o space esou ces used by a
Tu ing Machine du ing its computation; a di e ent app oach cha acte izes com-
plexity classes by means of limited ecu sive ope ato s. The st cha acte ization
of this type of a small complexity class was given by Cobham [8], who showed
that the polytime functions a e exactly those functions gene ated by bounded re-
cursion on notation; howeve , a smash initial function 2jxjjyj was used to p ovide
space enough.
Leivant [12] and Bellantoni & Cook [2] gave the cha acte izations of ptimef;
seve al othe complexity classes have been cha acte ized by means of unlimited
ope ato s (see [13,5] fo pspacef, [4] fo ptime, pspace (languages), PH and
its elements, [1,3] fo N P, [14] fo pspacef and the class of the elementa y
functions, [7] fo the de nition of a time-space hie a chy between ptimef and
pspacef). All these app oaches have been dubbed Implicit Computational Com-
plexity: they sha e the idea that no explicitly bounded schemes a e needed to
cha acte ize a g eat numbe of classes of functions and that, in o de to do this, it
su ces to distinguish between safe and unsafe va iables (o , following Simmons
[17], between dormant and normal ones) in the ecu sion schemes. This dis-
tinction yields many fo ms of predicative recurrence, in which the being-de ned
function cannot be used as counte into the de ning one.

Statement of the result. We de ne a safe recursion scheme srec on a te na y


wo d algeb a, such that f (x y za) = h(f (x y z) y za), whe e x y z a e, e-
spectively, the auxilia y va iable, the pa amete and the ecu sion va iable; no

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 239 252, 2000.

c Springer-Verlag Berlin Heidelberg 2000
240 manuele Covino, Giovanni Pani, and Salvatore Caporaso

othe type of va iables can be used, and the identi cation of z with x is not
allowed. We also de ne a constructive diagonalization scheme cdiag, such that
f (n) = e(n) (n), whe e m is the standa d Klenee’s notation fo the function
coded by m and e is a given enume ato .
Sta ting f om a cha acte ization T1 of lintimef, and f om an assignment of
fundamental sequences n fo o dinals < 0 (see [16] fo fu the details on
o dinals), we de ne the hie a chy T < 0 as follows:

1. at each successo o dinal, T +1 is the class of all functions obtained by one


application of safe ecu sion to functions in T ;
2. at each limit o dinal , T is the class of all functions obtained by one
application of const uctive diagonalization in an enume ato e T 1 , such
that e(n) T n.
Given an o dinal in Canto no mal fo m, B (n) is the max(2 n)clps( n)
, whe e
clps( n) is the esult of eplacing by n in . We have that:
1. fo all nite k, Tk =dtimef(nk );
2. fo all < < 0 , dtimef(B (n)) T dtimef(B (n + O(1))).
Thus, < T =ptimef and < 0 T = , the elementa y functions.
In analogy with T < 0 we de ne a hie a chy S < 0 of not space-increasing
functions and, by means of a est icted fo m of substitution, we de ne a time-
space hie a chy T S qp qp< , such that
3. T S qp =dtimespacef(np nq ).

2 Costants, Basic Functions, and De nition Schemes


.1 Recursion Free Functions
T is the te na y alphabet 0 1 2 . p q s a e the wo d 0 o wo ds ove
T not beginning with 0; is the empty wo d. B is the bina y alphabet 1 2 .
U V Y a e wo ds ove B. a b a1 a e lette s of T o B.
The i-th component (s)i of a wo d s of the fo m Yn 0Yn−1 0 0Y2 0Y1 is Yi . The
ationale of this de nition is that te na y wo ds a e actually handled as tuples
of bina y wo ds, with the ze oes playing the ole of commas. s is the length of
the wo d s.
We denote with x y z va iables used as, espectively, auxilia y, pa amete and
p incipal in the const uction of the cu ent function. f (s t r) is the esult of
assigning wo ds s t r to x y z. By a notation like f (x y z) we always allow
some among the indicated va iables to be absent.

De nition 1. Given i 1, a = 1 2 and u = x y z, the initial functions a e the


following una y functions:
1. the identity i(u), which etu ns the value s assigned to u; sometimes we w ite
s instead of i(s);
xtending the Implicit Computational Complexity Approach 241

2. the constructor cai (u), which, when s is assigned to u, adds the digit a at
the ight of the last digit of (s)i , ; it leaves s unchanged if (s)i is a single
lette ;
3. the destructor di (s), which, when s is assigned to u: (a) e ases the ightmost
digit of (s)i , if (s)i is not a single digit; (b) etu ns 0, othe wise. Const ucto s
and dest ucto s leave s unchanged if it has less than i components.

xample 1. c11 (12022) = 120221; d2 (1010) = 100; d2 (100) = 100.

De nition . Given i 1 and b = 1 2, we have the following simple schemes:


1. f =idtx (g) is the esult of the identi cation of x as y in g;
2. f =idtz (g) is the esult of the identi cation of z as y in g;
3. f =asgu (s g) is the esult of the assignment of s to the va iable u in g;
4. f =branc bi (g h) is de ned by branching in g and h if fo all s t r we have
f (s t r) = if the ightmost digit of (s)i is b then g(s t r) else h(s t r).

xample 2. f =idtx (g) implies f (t r) = g(t t r). Simila ly, f =idtz (g) implies
f (s t) = g(s t t). Let s be the wo d 110212, and f =branc 12 (g h); we have
f (s t r) = g(s t r), since the ightmost digit of (s)2 is 1.

De nition 3. A modi er is the sequence composition of n const ucto s and m


dest ucto s.

b
De nition 4. Class T0 is the closu e of modi e s unde branc i and compo-
sition.

De nition 5. 1. The rate of growth rog(g) of a modi e g is n − m, whe e n


and m a e espectively the numbe of const ucto s and dest ucto s occu ing
in g.
2. Fo all f T0 built-up by means of some b anchings f om modi e s g1 gk ,
we have rog(f ) := maxik rog(gi ).

De nition 6. Class S0 is the class of functions in T0 with non-positive ate of


g owth, that is S0 = f T0 rog(f ) 0 .
Notice that all functions in T0 a e una y, and they modify thei inputs acco ding
to the esult of some test pe fo med ove a xed numbe of digits. Functions in
S0 , o thei ite ation, cannot etu n values longe than thei input.

. Safe Recursion and Diagonalization


De nition 7. f = srec(g h) is de ned by safe recursion in the basis function
g(x y) and in the step function h(x y z) if fo all s t r we have
f (s t a) = g(s t)
f (s t ra) = h(f (s t r) t ra)
242 manuele Covino, Giovanni Pani, and Salvatore Caporaso

In pa ticula , f =iter(h) is de ned by iteration of function h(x) if fo all s r


we have
f (s a) = s
f (s ra) = h(f (s r))
We w ite hjrj (s) fo iter(h)(s r) (i.e. the r -th ite ation of h on s).

De nition 8. f =cdiag(e) is de ned by constructive diagonalization in the


enumerator e if fo all s t r we have

f (s t r) = e(r) (s t r)

whe e m denotes the function coded by m.

De nition 9. f =cmp(h g) is de ned by composition of h and g if f (u) =


g(h(u)), with h o g in T0 .

De nition 10. Class T1 ( esp. S1 ) is the closu e unde simple schemes and cmp
of functions obtained by one application of iter to T0 ( esp. S0 ).

Note that since identi cation of z as x is not allowed (see de nition 2), the step
function cannot assign the p evious value of the function being de ned by srec
to the ecu sion va iable. Thus, we obtain that z is a dormant va iable, acco ding
to the Simmons’ app oach (see [17]), o a safe one, following Bellantoni&Cook:
we always know in advance the numbe of ecu sive calls of the step function,
and this numbe will neve be a ected by the p evious values of f .

xample 3. By a sequence of srec’s we now de ne a sequence of functions gn


which, at m, compute in unary mn , that is, such that gn (a t) t n . We then
use the fact that the gene ation of the gn ’s is unifo m in n to de ne by cdiag a
function f which computes in una y mm . That is:
g1 :=cmp(iter(c11 ) d1 ); fn+1 :=srec(gn gn ); gn+1 :=idtz (fn+1 ).
We have

g1 (s a) = gn (s t) fn+1 (s t a) = gn (s t)
;
g1 (s ra) = c11 (g1 (s r)) fn+1 (s t ra) = gn (fn+1 (s t r) t)
By induction on n and r one sees that we have fn+1 (s t r) = s + t n r and,
the efo e, gn (a t) t n . Assume now de ned a function e T1 such that
e(r) = gjrj . If we de ne f :=cdiag(e), we have f (a t t) = gjtj (a t) = t jtj .
d e

.3 Standard Computability

As model of computation we use the push-down Tu ing machine, and we give


the de nition of inclusion between classes of T M ’s and classes of functions.
xtending the Implicit Computational Complexity Approach 243

De nition 11. A binary push-down Tu ing machine M is de ned as follows:


M has k push-down tapes ove the alphabet B and m + 1 states (0 denotes
the nal state, 1 the initial state);
the desc iption of M consists of m ows of the type
Ri = (i j(i) i1 j1 I1 i2 j2 I2 i3 )
(one fo each non- nal state) whe e:
(i) i i1 i2 i3 a e states, with i = 0;
(ii) j(i) j1 j2 a e tapes, with j a given function;
(iii) I1 I2 a e de ned on the set of inst uctions pop push 1 push 2
each ow of M should be intended as
if the cu ent state is i then
if top(j(i)) = 1 then ente i1
apply I1 to j1 ;
if top(j(i)) = 2 then ente i2
apply I2 to j2 ;
if (j(i)) is empty then ente i3 .

Given a push-down T M M with k tapes, Ti = X means that the content of tape


Ti is the bina y wo d X.
Note that the p evious model and the o dina y Tu ing machine model a e equiv-
alent, with espect to the o de of time needed to compute a given function. In
fact, let M be an o dina y T M , with n tapes unlimited to the left, alphabet B
and a xed set of states. M can be simulated by a push-down T M N with 2n
tapes and the same numbe of states, which sto es the content of the i-th tape
of M at the left of the obse ved symbol in its tape 2i-1, and the pa t at the ight
in its tape 2i. If M moves left on tape i, N pops a symbol f om tape 2i-1 and
pushes it into tape 2i; simila ly, if M moves ight on tape i, N pops a symbol
f om tape 2i and pushes it into tape 2i-1.
De nition 1 . 1. A push-down Tu ing machine M , by input s = X1 0 0Xn ,
standard computes q = Y1 0 0Ym (M (s) =sc q) if sta ts with Ti = Xi
(1 i n) and stops with Tj = Yj (1 j m).
2. M standard computes the function f (M =sc f ) if f (s) = q implies that
M (s) =sc q.
It is natu al to obse ve that the numbe of tapes of the Tu ing machine which
standa d computes a function must be independent f om the numbe of compo-
nents of its input; following the p evious de nition, a new Tu ing machine should
be de ned fo each possible numbe of components of the input of f . Howeve ,
when we a e de ning a T M that standa d computes a function f , we need a
numbe of tapes that depends only on the maximum numbe of components of
the input that f can manipulate o check with a const ucto , a dest ucto o a
b anching.
De nition 13. 1. Given f T1 , we de ne the number of components of f
(denoted with #(f )) as max i di o a cai o a branc bi occu s in f .
244 manuele Covino, Giovanni Pani, and Salvatore Caporaso

2. Given a function f , we de ne the length of f (denoted with lh(f )) as the


numbe of dest ucto s, dest ucto s and de ning schemes occu ing in its
const uction.

De nition 14. Given the class of time-bounded push-down T M ’s dtime(p(n))


and given C a class of functions, we say that
1. dtime(p(n)) C if fo all M dtime(p(n)) the e exists a function f C
such that M (s) =sc f (s);
2. C dtime(p(n)) if fo all f C the e exists a push-down T M M with #(f )
tapes such that, fo all s t r, M etu ns f (s t r) in time p( s + t + r ).

Let U be the nite alphabet that we use to w ite ou functions; the code of a
function f is obtained by concatenation of the codes fo the lette s of U which
compose f ; the arity associated with each lette ensu es unique pa sing.

De nition 15. The code L of the i-th lette L of U is 2i+1 1. If the a ity of
L U is n then n 1 L codes the exp ession L 1 n . We w ite
1 n fo n 1 .

In the same way we can de ne the code of a push-down TM.

De nition 16. Let M be a bina y push-down T M with k tapes and m + 1


states.
1. The code of the ow Ri = (i j(i) i1 j1 I1 i2 j2 I2 i3 ) (1 i m) is the
wo d
i j(i) i1 j1 I1 i2 j2 I2 i3
2. The code of M is the wo d R1 Rm .
3. An instantaneous desc iption of M is coded by the wo d X1 Xk state,
whe e each Xi encodes the i-th tape of M , and state encodes the cu ent
state.

Lemma 1. T1 =dtimef(n).

Proof. We st show (by induction on the const uction of f ) that each function
f T1 can be computed by a push-down tm in time lh(f )n. Base. f T0 . The
esult follows f om the de nition of initial functions and cmp (see de nition 4).
Step. Case 1. f =iter(g), with g T0 . We have f (s r) = g jrj (s). A push-
down tm Mf with #(f ) + 1 tapes can be de ned as follows: with (s)i on tape
i (1 i #(f )) it computes g (in time lh(g)) and, afte r epetitions, it
stops etu ning the nal esult. Thus, Mf standa d computes f (s r), within
time r lh(g).
Case 2. Let f be de ned by b anching o cmp. The esult follows by di ect
simulation of the schemes.
In o de to p ove the second inclusion, we show that the behaviou of an m-tape
tm M (with linea time bound cn) can be simulated by a function in T1 . Indeed
xtending the Implicit Computational Complexity Approach 245

a function nxtM can be de ned in T0 , which uses two components fo each tape
of M , one fo the pa t at the left of the obse ved symbol, one fo the pa t at
the ight ( ead in eve se o de ); the inte nal state is sto ed in the (2m + 1)-
th component. nxtM (s) has the fo m if state[i](s) and top[b](j(s)) then ib ,
whe e: (a) state[i](s) is a test which is t ue i the state of M is coded by i; (b)
top[b](j(s)) is a test which is t ue i the obse ved cha acte on tape j(s) is b;
(c) ib is a sequence of modi e s which update the code of the state and pa t of
the tape, acco ding to the de nition of M . By means of c − 1 cmp’s we de ne
in T0 the function nxtcM , which applies c times the functions nxtM to the wo d
that encodes an istantaneous desc iption of M . De ne now in T1

linsimM (x a) = x
linsimM (x za) = nxtcM (linsimM (x z))

We have that linsimM (s t) ite ates nxtM (s) fo c t times, etu ning the code
of the ID of M which contains the nal esult.

3 The Hie a chies


3.1 Ordinals
In this section we de ne a hie a chy of functions T < 0 , sta ting f om T1 , by
means of closu es unde safe ecu sions and diagonalizations, such that T +1 is
de ned by one application of safe ecu sion to T , and T , fo the limit o dinal
, is obtained by diagonalization on classes T n .
In the est of this pape g eek small lette s a e o dinal numbe s, with , limit
o dinals; n is the n-th element of the fundamental sequence assigned to .
Recalling the de nition of the standa d assignment of fundamental sequences
fo all 0 (cf . [16], page 78), we int oduce a slightly modi ed assignment.

n if < 2
n
if Canto no mal fo m fo is
n = +1
n if Canto no mal fo m fo is
+( )n if Canto no mal fo m fo is +

We now de ne a hie a chy of functions B (n) := max(2 n)clps( n) , whe e


clps( n) is the esult of eplacing by n in the Canto no mal fo m of . By
c
simple computation one can see that Bm (n) = nm , B c (n) = ncn , B c (n) = nn ,
n
B c (n) = nn (c times).

De nition 17. Given 1 0 and a limit o dinal,


1. T +1 is the closu e unde the simple schemes and cmp of the class of functions
obtained by one application of srec to T .
2. T is the closu e unde the simple schemes and cmp of the class of functions
obtained by cdiag(e), whe e e T 1 , e(r) T |r| .
246 manuele Covino, Giovanni Pani, and Salvatore Caporaso

xample 4. In the example 3 we have de ned a sequence of functions gn , with


n 1; acco ding to de nition 17 one can easily ve ify that gn Tn , and that
f T .

De nition 18. Given 1 < 0 and a limit o dinal,

1. S +1 is the closu e unde the simple schemes of the class of functions ob-
tained by one application of srec to S .
2. S is the closu e unde the simple schemes of the class of functions obtained
by cdiag(e), whe e e S 1 , e(r) S |r| .

De nition 19. f =wsbst(h g) is de ned by weak substitution of h in g if


f (x y z) = g(h(x y z) y z).

De nition 0. Fo all positive p q, T S qp is the class of functions de ned by


weak substitution of h in g, with h Tq , g Sp and q < p.

Theorem 1. 1. Fo all nite k, Tk =dtimef(nk ).


2. Fo all < < 0 , dtimef(B (n)) T dtimef(B (n + 4)).
3. T S qp =dtimespacef(np nq ).

Proof. 1. Let f Tk . In lemma 2 a T M which inte p ets the function f is de ned,


and its untime is p oved to be in dtimef(nk ). Let M be a T M in dtimef(nk ).
By lemma 3 we have that the ite ation nk times of nxtM can be de ned in Tk ;
this function, sta ting f om the code of the initial con gu ation of M , etu ns
the code of its nal con gu ation.
2. The st inclusion follows by lemma 3, and the second by lemma 2
3. The two inclusions follow by lemma 6 and lemma 5

The following esults immediatly follow f om theo em 1.

Corollary 1. 1. dtimef(nn ) T dtimef((n + 4)(n+4) ).


2. < T =ptimef.
3. < 0T = .

4 P oofs

4.1 Simulation of Functions by TM’s

Lemma . Fo all we have T dtimef(B (n + 4)).


Fo all nite m, we have Tm dtimef(Bm (n)).
xtending the Implicit Computational Complexity Approach 247

Proof. In what follows we will de ne a T M IN T , which inte p ets the input


f s t r, etu ning the value of f applied to its a guments. Given a function
f T , we need to know, while designing IN T , the numbe of tapes it uses. In
o de to do this, de ne d such that (a) s + t + r d o (b) no cdiag occu s
in f and #(g) d, fo each g T1 occu ing in the de nition of f . This means
that the pa ts of the a guments which can be modi ed (and thus the numbe
of tapes of IN T ) depend on d. At the end of this p oof we educe IN T to a
two-tape T M with a loga ithmic inc ement in the time bound.
The inte p ete IN T uses the following stacks:
(a) T x T y T z , to sto e the values of x y z du ing the computation; each of them
consists of d tapes, one fo each of the modi able pa ts of the value assigned to
x; thei initial values a e, espectively, s t r;
(b) T u , to sto e the value of the p incipal va iable of the cu ent ecu sion;
(c) Tf , to sto e (the codes of) some sub-functions of f ; its initial value is the
code of f ;
IN T epeats, until Tf is not empty, the following cycle:
- it pops a function k f om the top of Tf , and un-nests the oute most sub-function
j of k;
- acco ding to the fo m of j, it ca ies-out di e ent actions on the stacks;
- if the fo m of j is iter(g) with g T0 , it calls an inte p ete IT R fo T1
which simulates g on T x fo t times, whe e t is on the top of T y ;
- in all othe cases, it pushes into Tf an info mation of the fo m j M ARK k,
whe e M ARK info ms about the oute most scheme used to de ne j.
Thus we de ne

IN T ( f s t r):=
Tf := f ; T x := s; T y := t; T z := r;
w ile Tf not empty do A := last record(s) of Tf ;
case
A = CM P (g h) t en push g h into Tf
A = ASGX (p g) t en push g into T f ; copy p into T X
A = IDTX (h) t en push h in Tf ; copy last record of T X into T y
A = BRAN CHib (g h) t en pop Tf ;
if top((T x)i ) = b then push g into Tf
else push h into Tf
A = DIAG(h) t en push DG h into Tf ; copy last record of T x into T u
A = DG t en pop Tf ; pop last record of T x and push it into Tf ;
pop last record of T u and push it into T x
A = SR C(g h) t en push A RC g into Tf ; copy last record of T z into T u ;
push last digit of T u into T z
A = SR C(g h) RC t en if T u = T z then pop Tf ; pop T u ; pop T z
else push h into Tf ; push last digit of T u into T z
A = IT R(g) t en call IT R.
end case;
end w ile.
248 manuele Covino, Giovanni Pani, and Salvatore Caporaso

We now show that fo all f s t r especting the imposed condition ove d we


have

f T IN T ( f s t r) = f (s t r) within time s + f B ( t + r + 1 )

whe e 1 = 0 if < and 1 = 1 othe wise. The esult follows, since eve y
function f T is then computed in dtimef(B (n + 1 )) by the composition
of the constant-time tm w iting the code of f with IN T .
De ne m := s n := t + r c := f . We show that, fo all f T , IN T
moves within m + cB (n + 1 ) steps f om an istantaneous desc iption of the
fo m
Tf = Z f ; T x = s0 s; T y = t0 t; T z = r0 r; T u = q
to a new istantaneous desc iption of the fo m

Tf = Z; T x = s0 f (s t r); T y = t0 t; T z = r0 r; T u = q

Induction on the const uction of f . Basis. Let f T1 . We have 1 = 0. The


complexity of IT R is obviously bounded by m + cn.
Step. Case 1. f =srec(g h). We have = + 1; let r be the wo d ajrj a1 .
By the inductive hypothesis, IN T needs time m + g B (n + 1 ) to p oduce
the istantaneous desc iption

Tf = Z f RC; T x = s0 g(s t a1 ); T y = t0 t; T z = r0 ra1 ; T u = qr

If r > 1 then IN T puts Tf := Z f RC h and T z := r0 ra2 a1 , and calls


itself in o de to compute h and the next value of f . By the inductive hypothesis
we have that IN T needs time g(s t a1 ) + h B (n + 1 ) to p oduce an
istantaneous desc iption of the fo m

Tf = Z f RC; T u = qr;

T x = s0 (h(g(s t a1 ) t a2 a1 )); T y = t0 t; T z = r0 ra2 a1


Afte r simulations of h we obtain the p omised istantaneous desc iption within
an amount of time

m + r max( g h )B (n + 1 ) m + r cB (n + 1 ) m + cB (n + 1 )

whe e, since 2, in these evaluations we may compensate the quad atic


amount of time needed to copy r and its digits with the di e ence between c
and max( g h ).
Case 2. f =cdiag(h) T . We have h T 1 and ( ecall that 1 = 1 when
< 2)

B 1 (n + 1) + B n (n + 1) B n+1 (n + 1) = B (n + 1) (1)

INT computes h(r), unde stands f om the ma k DG that the esult is the code
fo the function to be computed, and, acco dingly, pushes it into Tf .
xtending the Implicit Computational Complexity Approach 249

To compute h(r) and h(r) (s t r) the inte p ete INT needs, by the inductive
hypothesis, time m + h B 1 ( r + 1) + h(r) B |r| (n + 1) (by (1))m +
h B (n + 1) m + cB (n + 1)
Reduction to two tapes. The inte p ete we have just de ned uses a numbe
of tapes d that depends on the const uction of the simulated function. We use
now the gene al p ocedu e showed in [9] to educe a k-tape T M bounded by T (n)
to a 2-tape T M bounded by kT (n) log T (n); thus, we obtain a 2-tape inte p ete
bounded by
T (n) = kB (n + 1) log B (n + 1) kB (n + 2) log(n + 1) B (n + 4)

4. Simulation of TM’s by Functions

Lemma 3. Fo all 1 < 0, dtimef(B (n)) T .

Proof. Let M be a T M in dtimef(B (n)). The e exists a function nxtM T0


such that, fo input the code of an istantaneous desc iption of M , nxtM etu ns
the code of the next desc iption. We de ne the following function:

s nxtM if =0
(s )= (s ) (s ) srec if = +1
(s 1 ) (s jrj ) srec cdiag if is a limit o dinal

We p ove by induction on that the function whose code is gene ated by is


in T .
Base. = 0. We have that nxtM T0 , by hypothesis.
Step. Case 1. = + 1. We have that (s ) =srec( (s ) (s )).
The function is in T , by induction on and de nition 17.
Case 2. = . We have that (s ) = cdiag(srec( (s 1 ) (s jsj ))).
This function is in T , since (s 1 ) T 1 and, fo all s, (s jsj )
T | |.
The function w ites the code of a function which ite ates nxtM fo B (n)
times; by input the code of the initial con gu ation of M , this function etu ns
the codes of the nal con gu ation.
Note that, given the code of a limit o dinal , we need at most a quad atic
amount of time to etu n the code of n .

4.3 Time-Space Classes

Lemma 4. Fo all f in Sp , we have f (s t r) max( s t r ).

Proof. By induction on p. Base. f S1 .


Case 1. f is de ned by ite ation of a function g in S0 ; we have, by induction on
r, f (s a) = s , and f (s ra) = g(f (s r)) f (s r) max( s r )
Case 2. f is de ned by simple scheme o cmp. The esult follows by the inductive
hypothesis.
250 manuele Covino, Giovanni Pani, and Salvatore Caporaso

Step. Given f Sp+1 , de ned by srec in functions g and h in Sp , we have

f (s t a) = g(s t) by de nition of f
max( s t ) by inductive hypothesis.

and
f (s t ra) = h(f (s t r) t ra) by de nition of f
max( f (s t r) t ra ) by inductive hypothesis on h
max(max( s t r ) t ra ) by induction on r
max( s t ra )

Lemma 5. T S qp dtimespacef(np nq ).
Proof. Let f be a function in T S qp . By de nition 20, f is de ned by weak
substitution of a function h Tq into a function g Sp , that is, f (s t r) =
g(h(s t r) t r). The theo em 1 states that the e exists an inte p ete IN T com-
puting the values of h within time nq , and computing the values of g within time
np . The lemma 4 holds fo g, since g belongs to Sp ; thus, the space needed by
IN T to compute g is at most n.
De ne now a T M M that, by input f s t, and r pe fo ms the following steps
( ecall lemma 2 fo the de nition of IN T ):
(1) it pushes g h into the tape T f of IN T , which contains the codes of the
functions that the inte p ete will compute;
(2) it calls IN T on input g h s t r.
The time complexity of (1) is linea in the length of f ; in (2), IN T needs time
p
equal to nq to compute h, and needs only np (and not nq ) to compute g. This
happens because h(s t r) is computed in the safe position, and this implies that
its length does not a ect the numbe of steps pe fo med by the second call to
IN T . In fact, IN T neve moves the content of a safe position into the tapes
whose values play the ole of ecu sive counte s; they depend only on n, the
length of the o iginal input. Thus, the ove all time bound is nq + np , which can
be educed to np , being q < p.
IN T equi es space nq to compute the value of h on input s t r; as we noted
above, the space needed fo the computation of g is linea in the length of the
input, and thus the ove all space needed by M is still nq .

Lemma 6. dtimespacef(np nq ) T S qp
Proof. Let M be a T M in dtimespacef(np nq ). This means that the compu-
tation of M is time-bounded by nq and, simultaneously, it is space-bounded by
np . M can be simulated by the composition of two T M ’s, Mg and Mh , with
Mh dtimef(nq ) and Mg dtimespacef(np n): the fo me const ucts (within
polynomial time) the space that the latte will successively use in o de to sim-
ulate M .
By theo em 1 the e exists a function h Tq which simulates the behaviou of
Mh , and the e exists a function g Sq which simulates the behaviou of Mg ; in
xtending the Implicit Computational Complexity Approach 251

pa ticula , the e exists the function nxtg T0 . Note that nxtg belongs to S0 ,
since it neve adds a digit to the desc iption of Mg without e asing anothe one.
We de ne the function 0 :=iter(nxtg ), and the sequence γn+1 :=srec( n n ),
with n+1 :=idtz (γn+1 ).
We have that
γ1 (s t a) = iter(nxtg )(s t)
γ1 (s t ra) = iter(nxtg )(γ1 (s t r) t)
and
γn+1 (s t a) = n (s t)
γn+1 (s t ra) = n (γn+1 (s t r) t ra)
We can easily see that 0 S1 , by de nition of this class and, again by de nition
18, that n Sn+1 .
Given the code s of the initial id of M , we can de ne simM (s) = p−1 (h(s) s),
which simulates the behaviou of M . This function is de ned in T S qp .

Refe ences
1. S.J. Bellantoni, Predicative recursion and t e polytime ierarc y, in P.Clote and
J.Remmel (eds), Feasible Mathematics II (Birkauser, 1994), 320-343.
2. S. Bellantoni and S. Cook, A new recursion-t eoretic c aracterization of t e poly-
time functions, Computational Complexity 2(1992)97-110.
3. S. Caporaso, Safe Turing mac ines, Grzegorczyc classes and polytime, Intern. J.
Found. Comp. Sc., 7.3(1996)241-252.
4. S. Caporaso, N. Galesi, M. Zito, A predicative and decidable c aracterization of t e
polytime classes of languages, to appear in Theoretical Comp. Sc.
5. S. Caporaso, M. Zito, N. Galesi, . Covino Syntactic c aracterization in Lisp of t e
polynomial complexity classes and ierarc ies, in G. Bongiovanni, D.P. Bovet, G. Di
Battista (eds), Algorithms and Complexity, LNCS 1203(1997)61-73.
6. S. Caporaso, G. Pani, . Covino, Predicative recursion, constructive diagonaliza-
tion and t e elementary functions, Workshop on Implicit Computational Complexity
(ICC99), a liated with LICS99, Trento, 1999.
7. P. Clote, A time-space ierarc y between polynomial time and polynomial space,
Math. Sys. The. 25(1992)77-92.
8. A. Cobham, T e intrinsic computational di culty of functions, in Y. Bar-Hillel (ed),
Proceedings of the International Conference on Logic, Methodology, and Philosophy
of Science, pages 24-30, North-Holland, Amsterdam, 1962.
9. F.C.Hennie and R. .Stearns, Two-tape simulation of multi-tapes TM’s, Journal of
ACM 13.4(1966)533-546.
10. D. Leivant, A foundational delineation of computational feasibility, Proc. of the 6th
Annual I symposium on Logic in Computer Science, (I Computer Society
Press, 1991), 2-18.
11. D. Leivant, Strati ed functional programs and computational complexity, in Con-
ference Records of the 20th Annual ACM Symposium on Principles of Programming
Languages, New York, 1993, ACM.
12. D. Leivant, Rami ed recurrence and computational complexity I: word recurrence
and polytime, in P.Clote and J.Remmel (eds), Feasible Mathematics II (Birkauser,
1994), 320-343.
252 manuele Covino, Giovanni Pani, and Salvatore Caporaso

13. D. Leivant and J.-Y. Marion, Rami ed recurrence and computational complexity
II: substitution and polyspace, in J. Tiuryn and L. Pocholsky (eds), Computer Science
Logic, LNCS 933(1995) 486-500.
14. I. Oitavem, New recursive c aracterization of t e elementary functions and t e
functions computable in polynomial space, Revista Matematica de la Universidad
Complutense de Madrid, 10.1(1997)109-125.
15. R.W. Ritchie, Classes of predictable computable functions, Transactions of A.M.S.,
106, 1993.
16. H. . Rose, Subrecursion: Functions and ierarc ies, Oxford University Press, Ox-
ford, 1984.
17. H. Simmons, T e realm of primitive recursion, Arch.Math. Logic, 27(1988)177-188.
Group Updates for Red-Black Trees

Sabine Hanke and Eljas Soisalon-Soininen


1
Institut für Informatik, Universität Freiburg,
Am Flughafen 17, D-79110 Freiburg, Germany,
[email protected]
2
Department of Computer Science and ngineering, Helsinki University of
Technology, P.O.Box 5400, FIN02015 HUT, Finland,
ess@cs. ut.fi

Ab tract. If several keys are inserted into a search structure (or deleted
from it) at the same time, it is advantageous to sort the keys and per-
form a group update that brings the keys into the structure as a single
transaction. A typical application of group updates is full-text index-
ing for document databases. Then the words of an inserted or deleted
document, together with occurrence information, form the group to be
inserted into or deleted from the full-text index. In the present paper
a new group update algorithm is presented for red-black search trees.
The algorithm is designed in such a way that the concurrent use of the
structure is possible.

1 Introduction
If a la ge numbe of keys a e to be inse ted into a database index at one time,
then it is impo tant fo e ciency that the keys a e so ted and inse ted into the
index t ee as a g oup. In this way, it is not necessa y to t ave se the whole path
f om the oot to the leaf when pe fo ming each inse tion.
Typical applications whe e g oup ope ations a e needed a e databases in
which la ge collections of data a e sto ed at one time, such as document databases
o WWW sea ch engines [13, 14]. In such systems, full-text indexing is applied
fo te m sea ch. In the inve ted-index technique, fo each index te m an occu -
ence list is c eated that includes all documents the te m appea s in. The te ms
themselves a e o ganized as a sea ch st uctu e, typically as a t ee. When inse t-
ing a new document, an update of the inve ted index is equi ed fo each te m in
the inse ted document [4, 5]. This can be a long t ansaction, and without con-
cu ency intole able delays can occu if sea ches in the database a e f equent. To
ove come this p oblem concu ent g oup update algo ithms have been designed
[13].
Othe applications a ise, e.g., when updates occu in bu sts and a e collected
into g oups that a e me ged in ce tain inte vals with the main index [7], and
when la ge indexes a e const ucted on-line, i.e., du ing the const uction of the
index, new eco ds may be inse ted into the le to be indexed [11, 16]. To nish
the indexing, a g oup update is c eated f om the updates which occu ed du ing
the main const uction.

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 253 262, 2000.

c Springer-Verlag Berlin Heidelberg 2000
254 Sabine Hanke and ljas Soisalon-Soininen

In this pape we p esent an e cient g oup update algo ithm fo ed-black


sea ch t ees [6]. The algo ithm has two steps: Fi st, the ope ations in the unde -
lying g oup a e pe fo med without any balancing, except fo subg oups between
two consecutive keys in the o iginal t ee. In this way the updates a e made avail-
able as soon as possible without sac i cing the loga ithmic sea ch time. In the
second step, the t ee will be balanced, i.e., t ansfo med into a t ee satisfying the
(local) balance c ite ia of ed-black t ees. Balancing is designed as a backg ound
p ocess allowing the concu ent use of the st uctu e. The balancing time is com-
pa able with ea lie esults in cases when balancing is st ictly connected with
individual updates.
Recently, a g oup inse tion algo ithm fo AVL-t ees [1] was p esented in [10].
The motivation fo the new algo ithm is two-fold. Fi st, ed-black t ees a e mo e
e cient than AVL-t ees as ega ds the numbe of ebalancing ope ations needed
fo updates, see [8], fo example. Thus, it is impo tant to develop g oup update
algo ithms fo ed-black t ees, not only fo AVL-t ees. Mo eove , ou new imple-
mentation of g oup updates fo ed-black t ees has an impo tant advantage ove
the implementation given in [10]. The algo ithm given in [10] is based on height-
valued t ees de ned in [9]. This implies that even when a subg oup consists of
a single key and no ebalancing is needed, all nodes in the sea ch path must
be checked fo imbalance and the balance info mation must be sto ed. Ou new
algo ithm is based on a gene alization of ed-black t ees, called ch omatic t ees
[12] o elaxed ed-black t ees [8], and no unnecessa y checking fo imbalance
and esto ing balance info mation is needed.

Chromatic Trees

We shall conside leaf-oriented bina y sea ch t ees, which a e full bina y t ees
(each node has eithe two o no child en) with the keys sto ed in the leaves. The
inte nal nodes contain routers, which guide the sea ch th ough the t ee. The
oute sto ed in a node v must be g eate than o equivalent to any key sto ed
in the leaves of v’s left subt ee and smalle than any key in the leaves of v’s ight
subt ee.
We de ne a chromatic tree as a elaxed ve sion of a ed-black t ee. Instead
of using the two colo s, ed and black, we use weights. Each node in the t ee
has a non-negative intege weight. We efe to nodes with weight ze o as ed
and nodes with weight one as black. If a node has a la ge weight, we call it
overweighted, and its amount of overweight is its weight subt acted by 1. The
only equi ements in this ch omatic t ee a e that all paths from the root to a leaf
have the same weight and that leaves have a weight of at least one.
In a ch omatic t ee two consecutive ed nodes a e efe ed to as a red-red
conflict (which is assigned to the lowe of both nodes), and an ove weighted node
as an overweight conflict. A ed-black t ee can be de ned as a ch omatic t ee
without any conflicts. The pu pose of the ebalancing ope ations is to t ansfo m
a ch omatic t ee into a ed-black t ee by emoving all conflicts.
Group Updates for Red-Black Trees 255

Ope ations fo the updates, inse tion, and deletion, as well as fo ebalancing,
can be found in [2, 12]. Note that all ope ations p ese ve the t ee as a ch omatic
t ee, i.e., leaves a e not ed and all paths have the same weight.

3 Group Insertion

The ove all st uctu e of an e cient g oup update fo ed-black t ees is the same
as fo AVL t ees [10]. Fo inse tions, this essentially amounts to inse ting an
o de ed set of keys, sta ting at the oot of the t ee, and p opagating subsets
down the t ee to the app op iate inse tion locations. Fo a leaf sea ch t ee, we
a ive at a set of leaves, with a set of keys to be inse ted at each of these leaves.
The eafte , each set of keys to be inse ted is tu ned into a (usually small)
ed-black t ee. With a ce tain numbe of ebalancing ope ations that a ise f om
applying the ebalancing scheme of ch omatic t ees, the balance condition is
esto ed.
As esult of a g oup inse tion, a node of the t ee may also get negative
units of ove weight. Nodes that have negative weights a e called underweighted
nodes. An unde weighted node p co esponds to a ed node, the colo of which
additionally sto es info mation about how many black nodes a e too much on
each sea ch path in the subt ee ooted at p compa ed with the est of the t ee.
The motivation fo using unde weighted nodes is to expedite the ebalancing,
since the nodes of a new subt ee Ti that is inse ted by the g oup inse tion can
be colo ed ed-black du ing the c eation of Ti without much e o t. On the othe
hand, if all inte nal nodes of Ti a e colo ed ed analogously, as by a sequence of
single inse tions, the ed-black colo ing of Ti must be pa t of the ebalancing.
Because the colo of at least each second node on a path in Ti must then be
changed, Ω(m) t ansfo mations a e needed.
Let T be a ch omatic t ee of size n that ful lls the balance conditions of
ed-black t ees and K a g oup of m so ted keys. K is split into b subg oups
Ki (i = 1 b) of mi keys so that the sea ch fo each of the keys of Ki ends at
the same leaf li of T . Let li sto e key ki .
Operation group insertion:
Step 1 Fo all i = 1 b const uct a balanced ed-black t ee Ti that sto es
the mi keys of the ith subg oup and the key ki . The oot of Ti gets the
unde weight 1 − wi , whe e wi is the numbe of black nodes on each path
f om the child en of the oot to the leaves.
Step Fo all i = 1 b eplace leaf li of T by the t ee Ti . Denote the esulting
t ee by T 0 .
Step 3 Rebalance T 0 by small local t ansfo mations.

In o de to const uct Ti in O(mi ) time, a leaf-o iented AVL-t ee [1] is gen-


e ated f om the set of mi + 1 keys by applying a simple divide-and-conque
algo ithm. Du ing the const uction the nodes a e colo ed ed and black using
the c ite ion by Guibas and Sedgewick [6].
256 Sabine Hanke and ljas Soisalon-Soininen

In o de to ebalance T 0 basically the set of ebalancing t ansfo mations of the


ch omatic t ee is used [2]. The unde weights only se ve fo sto ing info mation
about accumulated inse tions. Du ing the ebalancing they a e t ansfo med into
ed- ed conflicts and ove weight conflicts again.
An unde weighted node p is handled as follows. If p is the oot of the t ee,
then the unde weight conflict is esolved by colo ing p black (cf. Figu e 1, u-
oot). Othe wise, assume that p has weight w(p) = − , and the sibling q of p
has weight g eate than o equal to − . Then p is colo ed ed, the weight of p’s
pa ent is dec eased by , and the weight of q is inc eased by (cf. Figu e 1,
eve sal).

(u-root)
oot oot
w<0 1

(reversal)
w1 0 w1 −
− <0 w2 − 0 w2 + 0

Fig. 1. Handling of an unde weight conflict. Symmet ic cases a e omitted. (A


label beside a node denotes the weight of the node. In o de to ep esent the
colo s of the nodes mo e clea ly, additional black o ove weighted nodes a e lled
and ed o unde weighted nodes a e un lled. Half lled nodes have an a bit a y
weight.)

If the pa ent of p is black o ove weighted befo e applying the eve sal t ans-
fo mation, then the eve sal t ansfo mation esolves at least one unit of unde -
weight f om the t ee. Othe wise, the unde weight is only shifted f om p to its
pa ent. By a eve sal t ansfo mation to handle the unde weight of p, the sibling
q of p may become ove weighted, w(q) . Fu the mo e, a constant numbe of
ed- ed conflicts may be gene ated by a eve sal t ansfo mation (at the nodes
which a e colo ed ed and at the ed child en of these nodes).
In the following we analyze the costs of ebalancing T 0 . Let ri (i = 1 b)
be the unde weighted nodes of T 0 with weights w(ri ) = − i 0. The ri a e the
oots of the new subt ees Ti c eated by the g oup inse tion.
The ith branching node (i = 1 k − 1) of a g oup sea ch path f om the
oot to nodes q1 qk is de ned as the nea est common ancesto of qi and
qi+1 . A stopping node is a b anching node, both subt ees of which contain ed-
ed conflicts.
Pb
Lemma 1. At most 2 i=1 i reversal transformations are needed to resolve
Pb
the i=1 i units of underweight from T 0 .
Group Updates for Red-Black Trees 257

Proof. The ebalancing of T 0 is done along the g oup sea ch path to the nodes
r1 rb . We sta t with t ansfo ming the unde weights into ed- ed and ove -
weight conflicts by using eve sal t ansfo mations. This t ansfo mation is always
done in bottom-up di ection. That means a eve sal t ansfo mation is applied at
an unde weighted node p only if the subt ees ooted at p and its sibling q do not
contain any unde weighted nodes except fo p and q. The question whethe those
two subt ees contain unde weighted nodes can be answe ed locally by ma king
the b anching nodes du ing the sea ch phase of the g oup inse tion and fo each
Ti sto ing the info mation as to whethe the unde weight still exists.
Since T ful lled the balance conditions of ed-black t ees befo e the g oup
inse tion, at least eve y second node on the path f om the oot to a node ri is
black in T 0 . The efo e, at most 2 i eve sal t ansfo mations a e needed to esolve
the unde weight of ri .
If the unde weights of two nodes ri and ri+1 meet at child en of a b anch-
ing node, both unde weights a e handled by the same eve sal t ansfo mation.
The eby one of the unde weights is always esolved (cf. Figu e 1). 2
Let us now conside the situation afte pe fo ming the eve sal t ansfo ma-
tions. Since each eve sal t ansfo mation that handles an unde weight which was
o iginally c eated at ri may gene ate only a constant numbe of ed- ed conflicts
and i units of ove weight, it follows:

Lemma . The Pb number of red-red conflicts generated by the reversal


Pb transfor-
mations is O( i=1 i ), and the sum of units of overweight is O( i=1 i2 ).

In cont ast to unde weights and ove weights, ed- ed conflicts must always
be handled in a top-down manne . Thus, the ebalancing of the ed- ed conflicts
sta ts with the top-most ed- ed conflict on the g oup sea ch path f om the oot
to the nodes r1 rb . We demand that du ing the ebalancing of the ed- ed
conflicts the following condition must be gua anteed:

Condition: Wheneve the pa ent o the g andpa ent of a node has a ed- ed
conflict which is handled as a stopping node, then both child en of this
stopping node must be ed.

The Condition is motivated by the following idea: a ebalancing t ansfo mation


should be ca ied out at a b anching node only if eithe ed- ed conflicts of
both subt ees have been bubbled up to this node, so that they can be handled
togethe , o if one of the subt ees contains no ed- ed conflicts any mo e.

LemmaP 3. The number of transformations needed to resolve all red-red conflicts


is O( bi=1 i + L), where L is the number of di erent nodes on the group search
path from the root to r1 rb .

Proof (sketch . Let k be the numbe of ed- ed conflicts. l denotes the numbe
of pai s of ed siblings whe e at least one of the siblings lies on the g oup sea ch
path f om the oot to r1 rb . Let v be the numbe of nodes p with ed- ed
258 Sabine Hanke and ljas Soisalon-Soininen

conflicts that have a distance g eate than one to the nea est stopping node that
is an ancesto of p. We de ne := 5k + 2l + v 0.
By a simple case analysis it can be ve i ed that, by pe fo ming a ebalancing
t ansfo mation to handle a ed- ed conflict, is always dec eased if, du ing the
handling of the ed- ed conflicts, the Condition is gua anteed. The efo e, at most
5k +2l+v ebalancing t P ansfo mations a e needed to esolve all ed- ed conflicts.
Since k and v a e in O( i=1 i ), cf. Lemma 2, and l L, the claim follows. 2
b

Pb Pb
Theorem 1. O( i=1 log2 mi ) rotations and O( i=1 log2 mi +L) color changes
are needed to rebalance T 0 , where L is the number of di erent nodes on the group
search path from the root to r1 rb .
Proof. Afte inse ting the subt ees T1 Tb into T , the t ee contains b unde -
weighted nodes r1 rb with weights w(ri ) = − i (i = 1 b), whe e i is in
O(log mi ), since the subt ee Ti ooted by ri contains O(log mi ) black nodes on
each sea ch path.
Pb Pb
Fi st, the i=1 i units of unde weight a e t ansfo med into O( i=1 i )
Pb
ed- ed conflicts and O( i=1 i2 ) units of ove weight (Lemma 2) at at most
Pb Pb Pb
2 i=1 i ove weighted nodes by using O( i=1 i ) = O( i=1 log mi ) eve sal
t ansfo mations (Lemma 1). Then the ed- ed conflicts a e esolved f om the
Pb Pb Pb
t ee by using O( i=1 i ) = O( i=1 log mi ) otations and O( i=1 log mi + L)
colo changes (Lemma 3). Finally, the ove weight conflicts a e handled anal-
ogously as in Step 2 of a g oup deletion (cf. the following section). Fo this,
P P P
O( bi=1 i2 ) = O( bi=1 log2 mi ) otations and O( bi=1 i2 + L ) colo changes

a e necessa y, whe e L is the numbe of di e ent nodes on the g oup sea ch
path f om the oot to the ove weighted nodes. P
Befo e handling the ed- ed conflicts L L + 2 bi=1 i , because all of the
Pb
at most 2 i=1 i ove weighted nodes a e siblings of nodes on the g oup sea ch
Pb
path f om the oot to r1 rb . Since each of the O( i=1 i ) otations needed to
Pb
handle the ed- ed conflicts inc eases L by at most one, L is in O(L+ i=1 i ).
So, the numbe P of colo changes needed to esolve all ove weights f om the t ee is
bounded by O( bi=1 log2 mi + L). The efo e, afte pe fo ming step 1 and 2 of a
Pb
g oup inse tion, the t ee can be ebalanced by using O( i=1 log2 mi ) otations
Pb
and O( i=1 log2 mi + L) colo changes. 2
Assuming that each path in T f om the oot to a leaf contains the same
numbe of black nodes, it can analogously be shown as fo 2-3-t ees [3] that the
P
numbe L of di e ent nodes on the g oup sea ch path is O(log n + b−1i=1 log di ),
whe e di denotes the numbe of leaves between li and li+1 .

4 Group Deletion
Let T be a ch omatic t ee of size n that ful lls the balance conditions of ed-black
t ees and K a g oup of m so ted keys. K is split into b subg oups Ki (i = 1 b)
Group Updates for Red-Black Trees 259

of mi keys so that fo each Ki , the t ee T contains a subt ee Ti that sto es all


keys of Ki plus an additional key ki at its leaves.
Operation group deletion:
Step 1 Fo all i = 1 b eplace the subt ee Ti by a leaf li that sto es the key
ki . li gets weight wi , whe e wi is the sum of weights on each path f om the
oot of Ti to a leaf. Denote the esulting t ee by T 0 .
Step Rebalance T 0 by small local t ansfo mations.
In o de to estimate the costs of ebalancing T 0 , st we conside the following
situation: Let p be an ove weighted node, of which the sibling q is the oot of
a balanced subt ee T q that contains no ove weights. (Afte pe fo ming a g oup
deletion, this situation occu s at p = li , fo example, if the sibling q is not equal
li−1 o li+1 espectively.) Denote the pa ent of p and q by u. Let w(p) = +1 2
be the weight of p.
Lemma 4. At most 2 push transformations plus further rebalancing trans-
formations are needed in order to transform T u into a balanced red-black tree
T u . Afterwards the root u0 of T u has weight w(u0 ) w(u) + 1.
Proof (sketch . In o de to dec ease w(p) f om + 1 to 1, ebalancing t ans-
fo mations a e ca ied out. The eby the subt ee T u ooted at u is eplaced by a
subt ee T u ooted by u (cf. Figu e 2).

w(p) = +1 2 w(u0 ) w(u) + 1

u u u0
t ans- 2 t ans-
p 2 1
fo mations fo mations
2 1
Tq
2 1
p w(p) = 1 p 1
u
T u T Tu

Fig. . Rebalancing T u .

By obse ving the weight-balancing t ansfo mations, it can be shown that,


the eby, at most half of the units of ove weight a e esolved. The emaining k
ove weight conflicts a e sp ead out ove the path f om u to p, i.e., except fo
u , all nodes on the path f om u to p (of length 2( − k) + 2) have a weight
less than o equal to two (see Figu e 2).
Then the ove weights on the path f om u to p a e handled in bottom-up
di ection by using at most 2 −k push t ansfo mations and k fu the ebalancing
t ansfo mations. The eby, T u is eplaced by a subt ee T u ooted by u0 . Only
one of the k units of ove weight may emain at u0 .
260 Sabine Hanke and ljas Soisalon-Soininen

Since the set of the ebalancing t ansfo mations that t ansfo m T u into T u
contain at most k push t ansfo mations, in total at most 2 push t ansfo mations
plus fu the ebalancing t ansfo mations a e needed in o de to ebalance T u .
Afte wa ds w(u0 ) w(u) + 1. 2
Pb Pb
Theorem . O( i=1 log mi ) rotations and O( i=1 log mi + L) color changes
are needed to rebalance T 0 , where L is the number of di erent nodes on the group
search path from the root to the overweighted leaves l1 lb .

Proof. Fo i = 1 b let w(li ) = i + 1.


In o de to avoid unnecessa y wo k du ing the ebalancing, we slightly modify
the w7 t ansfo mation [2], which handles ove weight conflicts at sibling nodes.
Instead of esolving only one unit of ove weight as w7 does, w7 esolves the
maximum numbe of ove weight conflicts at a time (cf. Figu e 3).

w(p) = +1 2 w(u0 ) w(u) + 1

u u u0
t ans- 2 t ans-
p 2 1
fo mations fo mations
2 1
Tq
2 1
p w(p) = 1 p 1
T u Tu Tu

Fig. 3. Modi ed (w7)-t ansfo mation.

The ebalancing of T 0 is always done in bottom-up di ection. That means an


ove weight conflict at a node p is handled only if the subt ees ooted at p and
its sibling q both a e balanced ed-black t ees that do not contain ove weighted
nodes except fo p and q. Such a node p always exists in T 0 , if the g oup deletion
has not emoved all leaves of the t ee. At the beginning of the ebalancing, p is
one of the leaves li and q may be li−1 o li+1 espectively.
The question as to whethe two subt ees ooted at sibling nodes a e bal-
anced can be answe ed locally by ma king the b anching nodes du ing the sea ch
phase of the g oup deletion and by sto ing fo each li the info mation whethe
ove weight-conflicts gene ated at li still exist in the t ee.
Let p be an ove weighted node so that the subt ee ooted at p and the subt ee
ooted at p’s sibling q a e both balanced. Without loss of gene ality w(p) w(q).
Let u be the pa ent of p and q. T u denotes the subt ee ooted at u.
Case 1: [w(q) 1 and w(p) = 2] One single t ansfo mation is su cient
eithe to esolve the ove weight conflict at p o to shift it to p’s pa ent u.
Group Updates for Red-Black Trees 261

Case : [w(q) 1 and w(p) := p + 1 > 2] In this case, the subt ee T u is


ebalanced analogously as desc ibed in the p oof of Lemma 4 by using at most
3 p t ansfo mations. The eby, at least p − 1 units of ove weight a e emoved
f om the t ee. T u is eplaced by the subt ee T u and the weight of u0 is less o
equal than w(u) + 1.
Case 3: [w(q) := q + 1 2] By pe fo ming a w7 t ansfo mation, the q
units of ove weight a e emoved f om q, and q units of ove weight a e shifted
f om p to p’s pa ent u.
The ebalancing of T 0 is now done as follows. We sta t at a leaf li and
apply Case 1 and Case 2, as long as all ove weight conflicts a e esolved locally
o , espectively, the child of a b anching node is eached. Then the ove weight
conflicts at anothe lj a e handled analogously. If both child en of a b anching
node become ove weighted, Case 3 applies. If only one subt ee of a b anching
node contains ove weights, these ove weight conflicts a e handled by applying
Case 1 and Case 2. Then at most one unit of ove weight may emain at the
b anching node. In both cases, afte wa ds an ove weight conflict at the b anching
P as at one of the leaves li .
node is handled analogously
Because T 0 contains bi=1 i units of ove weight, the numbe of otations
Pb Pb
needed to ebalance T 0 is O( i=1 i ) = O( i=1 log mi ).
Each time Case 2P o Case 3 applies, the numbe of ove weight conflicts is
b
educed. The eby, O( i=1 log mi ) t ansfo mations a e pe fo med. In Case 1 ei-
the the ove weight conflict is esolved o it is shifted along the sea ch path
towa ds the oot. Thus, Case 1 applies O(L) times. The efo e, the numbe of
Pb
colo changes needed to ebalance T 0 is O( i=1 log mi + L). 2

5 Conclusions

The e a e applications in which a la ge numbe of updates fo a sea ch st uctu e


is c eated in a ve y sho t time, fo example by measu ing equipment, o when a
document is inse ted into a document database. It is often impo tant that such
a g oup is b ought into the st uctu e as fast as possible, and that du ing this
g oup update the concu ent use of the st uctu e is allowed.
Ou wo k in the p esent pape is along the lines of [10], whe e a g oup
inse tion algo ithm fo AVL-t ees was p esented and analyzed. The novelty of
the p esent pape is that we conside ed-black bina y t ees, and that we obtain
a g oup update algo ithm that is mo e e cient than the one given in [10] in the
following sense: In the algo ithm of [10], all nodes in the g oup sea ch path must
be ma ked as unbalanced nodes and the balance in these nodes must be esto ed
du ing the ebalancing phase of the algo ithm. Ou new algo ithm is able to
est ict the balance esto ing to those nodes that have gotten out of balance
because of the g oup update.
One impo tant aspect of g oup updates not conside ed in the p esent pape
is ecove y, i.e., the question of how a valid sea ch st uctu e can be e ciently
esto ed afte a possible failu e du ing a g oup update. These questions have
262 Sabine Hanke and ljas Soisalon-Soininen

been conside ed in [15] fo AVL-t ees, and we believe that simila methods can
be applied to ed-black t ees.

References
[1] G.M Adel’son-Vels’kii and .M. Landis. An algorithm for the organisation of
information. Soviet Math. Dokl., 3:1259 1262, 1962.
[2] J. Boyar, R. Fagerberg, and K. Larsen. Amortization results for chromatic search
trees, with an application to priority queues. Journal of Computer and System
Sciences, 55(3):504 521, 1997.
[3] M. Brown and R. Tarjan. Design and analysis of a data structure for representing
sorted lists. SIAM Journal of Computing, 9(3):594 614, 1980.
[4] C. Faloutsos and S. Christodoulakis. Design of a signature le method that ac-
counts for non-uniform occurrence and query frequencies. In Proc. Intern. Conf.
on Very Large Data Bases, pages 165 180, 1985.
[5] C. Faloutsos and H.V. Jagadish. Hybrid text organizations for text databases.
In Proc. Advances in Database Technology, volume 580 of LNCS, pages 310 327,
1992.
[6] L.J. Guibas and R. Sedgewick. A dichromatic framework for balanced trees. In
Proc. 19th I Symposium on Foundations of Computer Science, pages 8 21,
1978.
[7] S.D. Lang, J.R. Driscoll, and J.H. Jou. Batch insertion for tree-structured le
organizations improving di erential database representation. Information Sys-
tems, 11(2):167 175, 1992.
[8] K. Larsen. Amortized constant relaxed rebalancing using standard rotations. Acta
Informatica, 35(10):859 874, 1998.
[9] K. Larsen, . Soisalon-Soininen, and P. Widmayer. Relaxed balance through
standard rotations. In Proc. 5th Workshop on Algorithms and Data Structures,
volume 1272 of LNCS, pages 450 461, August 1997.
[10] L. Malmi and . Soisalon-Soininen. Group updates for relaxed height-balanced
trees. In Proc. 18th ACM Symposium on the Principles of Database Systems,
pages 358 367, 1999.
[11] C. Mohan and I. Narang. Algorithms for creating indexes for very large tables
without quiescing updates. In Proc. 19th ACM SIGMOD Conf. on the Manage-
ment of Data, pages 361 370, 1992.
[12] O. Nurmi and . Soisalon-Soininen. Chromatic binary search trees: A structure
for concurrent rebalancing. Acta Informatica, 33:547 557, 1996.
[13] K. Pollari-Malmi, . Soisalon-Soininen, and T. Ylönen. Concurrency control in B-
trees with batch updates. I Transactions on Knowledge and Data nineering,
8(6):975 983, 1996.
[14] M. Rossi. Concurrent full text database. Master’s thesis, Department of Computer
Science, Helsinki University of Technology, Department of Computer Science and
ngineering, Finland, 1997.
[15] . Soisalon-Soininen and P. Widmayer. Concurrency and recovery in full-text
indexing. In Proc. Symposium on String Processing and Information Retrieval
(SPIR ’99), pages 192 198. I Computer Society, 1999.
[16] V. Srinivasan and M.J. Carey. Performance of on-line index construction algo-
rithms. In Proc. Advances in Database Technology, volume 580 of LNCS, pages
293 309, 1992.
Approximating SVP to within
Almost-Polynomial Factors Is NP-Hard

Irit Dinur

Tel-Aviv University
dinur@mat .tau.ac.il

Ab tract. We show SVP∞ and CVP∞ to be NP-hard to approximate to


within nc log log n for some constant c > 0. We show a direct reduction
from SAT to these problems, that combines ideas from [ABSS93] and
from [DKRS99], along with some modi cations. Our result is obtained
without relying on the PCP characterization of NP, although some of
our techniques are derived from the proof of the PCP characterization
itself [DFK+ 99].

1 Introduction
Background
A lattice L = L(v1 vn ), for linearly independent vectors v1 Pvn Rk is the
additive group generated by the basis vectors, i.e. the set L = ai vi ai ZZ .
Given L, the Shortest Vector Problem (SVPp ) is to nd the shortest non-zero
vector in L. The length is measured in Euclidean lp norm (1 p ). The
Closest Vector Problem (CVPp ) is the non-homogeneous analog, i.e. given L and
a vector y, nd a vector in L, closest to y.
These lattice problems have been introduced in the previous century, and
have been studied since. Minkowsky and Dirichlet tried, with little success,
to come up with approximation algorithms for these problems. It was much
later that the lattice reduction algorithm was presented by Lenstra, Lenstra
and Lovasz [LLL82] , achieving a polynomial-time algorithm approximating the
Shortest Lattice Vector to within the exponential factor 2n 2 , where n is the
dimension of the lattice. Babai [Bab86] applied LLL’s methods to present an
algorithm that approximates CVP to within a similar factor. Schnorr [Sch85]
improved on LLL’s technique, reducing the factor of approximation to (1 + )n ,
for any constant > 0, for both CVP and SVP. These positive approximation
results hold for lp norm for any p 1 yet are quite weak, achieving only ex-
tremely large (exponential) approximation factors. The shortest vector problem
is particularly important, quoting [ABSS93], because even the above relatively
weak approximation algorithms have been used in a host of applications, includ-
ing integer programming, solving low-density subset-sum problems and break-
ing knapsack based codes [LO85], simultaneous diophantine approximation and
factoring polynomials over the rationals [LLL82], and strongly polynomial-time
algorithms in combinatorial optimization [FT85].

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 263 276, 2000.

c Springer-Verlag Berlin Heidelberg 2000
264 Irit Dinur

Interest in lattice problems has been recently renewed due to a result of Ajtai
[Ajt96], showing a reduction, from a version of SVP, to the average-case of the
same problem.
Only recently [Ajt98] showed a randomized reduction from the NP-complete
problem Subset-Sum to SVP. This has been improved [CN98], showing approx-
1
imation hardness for some small factor (1 + n− ). Very recently [Mic98] has
signi cantly strengthened Ajtai’s result, showing SVP hard to approximate to
within some constant factor.
The above results all apply to SVPp , for nite p. SVP with the maximum
norm l1 , appears to be a harder problem. A g-approximation algorithm for
SVP2 implies a ng-approximation algorithm for SVP1 , since for every vector
v, v 1 v 2 v 1 n. Thus hardness for approximating SVP1 to within
a factor ng will imply the hardness for approximating SVP2 to within factor
g. Lagarias showed SVP1 to be NP-hard in its exact decision version. Arora
et al. [ABSS93] utilized the PCP characterization of NP to show that both
CVP (for lp norm for any p) and SVP1 are quasi-NP-hard to approximate
1−
to within 2(log n) for any constant > 0. Recently, the hardness result for
approximating CVP has been strengthened [DKS98, DKRS99] showing that it
is NP-hard to approximate to within a factor of nΩ(1) log log n (where n is the
lattice dimension). In this paper we similarly strengthen the hardness result for
approximating SVP1 .
So far there is still a huge gap between the positive results, showing approx-
imations for SVP and CVP with exponential factors, and the above hardness
results. Nevertheless, some other results provide a discouraging indication for
improving the hardness result beyond a certain factor. [GG98] showed that ap-
proximating both SVP2 and CVP2 to within n and approximating SVP1 and
CVP1 to within n O(log n) is in NP \ co-AM. Hence it is unlikely for any of
these problems to be NP-hard.

Our Result

We prove that approximating SVP1 and CVP1 to within a factor of nc log log n is
NP-hard (where n is the lattice dimension and c > 0 is some arbitrary constant).

Technique

We obtain our result by modifying (and slightly simplifying) the framework of


[DKS98, DKRS99]. Starting out from SAT, we construct a new SAT instance
that has the additional property that it is either totally satis able, or, not even
weakly-satis able in some speci c sense (to be elaborated upon below). We refer
to such a SAT instance as an SSAT1 instance (this is a variant of [DKS98]’s
SSAT ). The construction reducing SAT to SSAT1 is the main part of the
paper. The construction has a tree-like recursive structure that is a simpli cation
of techniques from [DKS98, DKRS99], along with some additional observations
tailored to the l1 norm.
Approximating SVP∞ to within Almost-Polynomial Factors Is NP-Hard 265

We nally obtain our result by reducing SSAT1 to SVP1 and to CVP1 .


These reductions are relatively simple combinatorial reductions, utilizing an ad-
ditional idea from [ABSS93].
Hardness-of-approximation results are naturally divided into those that are
obtained via reduction from PCP, and those that are not. Although the best
previous hardness result for SVP1 [ABSS93] relies on the PCP characteriza-
tion of NP, our proof does not. We do, however, utilize some techniques similar
to those used in the proof of the PCP characterization of NP itself. In fact,
the nature of the SVP1 problem eliminates some of the technical complica-
tions from [DFK+ 99, DKS98, DKRS99]. Thus, we believe that SVP1 makes a
good candidate (out of all of the lattice problems) for pushing the hardness-of-
approximation factor to within polynomial range.

Structure of the Paper

Section 2 presents a variant of the SSAT problem from [DKS98] which we call
SSAT1 . It then proceeds with some standard (and not so standard) de nitions.
Section 3 gives the reduction from SAT to SSAT1 , whose correctness is proven in
Section 4. Finally, in Section 5 we describe the (simple) reduction from SSAT1
to SVP1 and to CVP1 , establishing the hardness of approximating SVP1 and
CVP1 .

De nitions

.1 AT 1
A SAT instance is a set Ψ = 1 n of tests (Boolean functions) over
variables V = v1 vm . We denote by R the set of satisfying assignments
for i Ψ . The Cook-Levin [Coo71, Lev73] theorem states that it is NP-hard
to distinguish whether or not the system is satis able (i.e. whether there is an
assignment to the variables that satis es all of the tests). We next de ne SSAT1 ,
a version of SAT that has the additional property that when the instance is not
satis able, it is not even ’weakly-satis able’ in a sense that will be formally
de ned below.
We recall the following de nitions (De nitions 1,2 and 3) from [DKS98],
De nition 1 (Super-Assignment to Tests). A super-assignment is a func-
tion S mapping to each Ψ a value from ZZR . S( ) is a vector of integer
coe cients, one for each value r R . Denote by S( )[r] the rth coordinate of
S( ).
If S( ) = 0 we say that S( ) is trivial. If S( )[r] = 0, we say that the value
r appears in S( ). A natural assignment (an assignment in the usual sense) is
identi ed with a super-assignment that assigns each Ψ a unit vector with
a 1 in the corresponding coordinate. In this case, exactly one value appears in
each S( ).
266 Irit Dinur

We next de ne the projection of a super-assignment to a test onto each of


its variables. Consistency between tests will amount to equality of projections
on mutual variables.
De nition (Projection). Let S be a super-assignment to the tests. We de-
ne the projection of S( ) on a variable x of , x (S( )) ZZjF j , in the natural
way:
def X
a F: x (S( ))[a] = S( )[r]
r2R rjx =a

We shall now proceed to de ne the notion of consistency between tests. If


the projections of two tests on each mutual variable x are equal (in other words,
they both give x the same super-assignment), we say that the super-assignments
of the tests are consistent (match).
De nition 3 (Consistency). Let S be a super-assignment to the tests in Ψ .
S is consistent if for every pair of tests i and j with a mutual variable x,

x (S( i )) = x (S( j ))

Given a system Ψ = 1 n , a super-assignment S : Ψ ZZR is called


not-all-zero if there is at least one test Ψ for which S( ) = 0. The norm of
a super-assignment S is de ned
def
S = max ( S( ) 1 )

where S( ) 1 is the standard l1 norm. The norm of a natural super-assignment


is 1.
The gap of SSAT1 is formulated in terms of the norm of the minimal super-
assignment that maintains consistency.
De nition 4 (g-SSAT1 ). An instance of SSAT1 with parameter g
I= Ψ= 1 n V = v1 vm R 1 R n

consists of a set Ψ of tests over a common set V of variables that take values in a
eld F . The parameters m and F are always bounded by some polynomial in n.
ach test Ψ has associated with it a list R of assignments to its variables,
called the satisfying assignments or the range of the test . The problem is to
distinguish between the following two cases,
Yes: There is a consistent natural assignment for Ψ .
No: No not-all-zero consistent super-assignment is of norm > g.
Remark. The de nition of SSAT1 di ers from that of SSAT only in the
characterization of when a super-assignment falls into the ’no’ category. On one
hand, SSAT1 imposes a weaker requirement of not-all-zero rather than the
non-triviality of SSAT . On the other hand, the norm of a super assignment S
is measured by a ’stronger’ measure, taking the maximum of S( ) 1 over all
, rather than the average as in SSAT .
Approximating SVP∞ to within Almost-Polynomial Factors Is NP-Hard 267

Theorem 1 (SSAT1 Theorem). SSAT1 is NP-hard for g = nc log log n


for
some c > 0.

We conjecture that a stronger statement is true, which would imply that


SVP1 NP-hard to approximate to within a constant power of the dimension.

Conjecture SSAT1 is NP-hard for g = nc for some constant c > 0.

. LDFs, Super-LDFs

Throughout the paper, let F denote a nite eld F = ZZp for some prime number
p > 1. We will need the following de nitions.
De nition 5 (low degree function - [r d]-LDF). A function f : F d F is
said to have degree r if its values are the point evaluation of a polynomial on F d
with degree r in each variable. In this case we say that f is an [r d]-LDF, or
f LDFr d .
Sometimes we omit the parameters and refer simply to an LDF.

De nition 6 (low degree extension). Let m d be natural numbers, and let


d
H F such that H = m. A vector (a0 am−1 ) F m can be naturally
identi ed with a function A : H d
F by looking at points in Hd as representing
numbers in base H .
There exists exactly one [ H − 1 d]-LDF A : F d F that extends A. A is
called the H − 1 degree extension of A in F .

A (D + 2)-dimensional a ne subspace ((D + 2)-cube for short) C F d is said


to be parallel to the axises if it can be written as C = x + span(ei1 eiD+2 ),
where x F d and ei F d is the i-th axis vector, ei = (0 1 0). We write
the parameterization of the cube C as follows,

def X
D+2
C(z) = x + zj eij Fd for z = (z1 zD+2 ) F D+2
j=1

We will need the following (simple) proposition,


Proposition 1. Let f : F d F . Suppose, for every parallel (D + 2)-cube
C F d the function f C : F D+2 F de ned by

x F D+2 f C (x) = f (C(x))

is an [r D + 2]-LDF. Then f is an [r d]-LDF.

Similar to the de nition of super-assignments, we de ne a super-[r d]-LDF


(or a super-LDF for short) G ZZLDFr d to be a vector of integer coe cient G[P ]
per LDF P LDFr d . This de nition arises naturally from the fact that the
268 Irit Dinur

tests in our nal construction will range over LDFs. We further de ne the norm
of a super-LDF to be the l1 norm of the corresponding coe cient vector.
We say that an LDF P LDFr d appears in G i G[P ] = 0. A point x is called
ambiguous for a super-LDF G, if there are two LDFs P1 P2 appearing in G such
that P1 (x) = P2 (x). The following (simple) property of low-norm super-LDFs is
heavily used in this paper.
Proposition (Low Ambiguity). Let G be a super-[r d]-LDF of norm g.
def  rd
The fraction of ambiguous points for G is amb(r d g) = g2 jF j.

Proof. The number of non-zero


 coordinates in an integer vector whose l1 norm
is g is g. There are g
2 pairs of LDFs appearing in G, and each pair agrees
on at most jF j of the points in F .
rd d

The following embedding-extension technique taken from [DFK+ 99] is used in


our construction,
De nition 7 (embedding extension). Let b 2, k > 1 and t be natural num-
bers. We de ne the embedding extension mapping b : F t F tk as follows.
b maps any point x = ( 1 t) F to y F , y = b (x) = ( 1
t tk
tk ) by
 
def b b2 k−1
b2 k−1
b( 1 t) = 1 ( 1) ( 1) ( 1 )b b
t ( t) ( t) ( t )b

The following (simple) proposition, shows that any LDF on F t can be repre-
sented by an LDF on F tk with signi cantly lower degree:
Proposition 3. Let f : F t F be a [bk − 1 t]-LDF, for integers t > 0 b >
1 k > 1. There is a [b − 1 t k]-LDF fext : F tk F such that
x Ft : f (x) = fext ( b (x))

For any [b − 1 kt]-LDF f , its ’restriction’ to the manifold f b


: Ft F is
de ned as
def
x F t f b (x) = f ( b (x))
and is a [bk − 1 t]-LDF (the degree in a variable i of f b
is (b − 1)(b0 + b1 +
+ bk−1 ) = bk − 1).

Let Ge be a super-[bk − 1 t]-LDF (i.e. a vector in ZZLDFr t ). Its embedding-


extension is the super-[b − 1 tk]-LDF G de ned by,

f LDFb−1 tk
def
e
G[f ] = G[f ]
b

In a similar manner, the restriction Ge of a super-[b − 1 tk]-LDF G is a super-


[bk − 1 t]-LDF de ned by

f LDFbk −1 t
def
e ext ]
G[f ] = G[f
Approximating SVP∞ to within Almost-Polynomial Factors Is NP-Hard 269

The following proposition holds (e.g. by a counting argument),

Proposition 4. Let G1 G2 be two super-[b − 1 tk]-LDFs, and let Ge1 Ge2 be their
respective restrictions (with parameter b). Ge1 = Ge2 if and only if G1 = G2 .

3 The Construction

We prove that SSAT1 is NP-hard via a reduction from SAT, described herein.
We adopt the whole framework of the construction from [DKRS99], and refer
the reader there for a more detailed exposition.
Let = 1 n be an instance of SAT, viewed as a set of Boolean tests
over Boolean variables V = x1 xm , (m = nc for some constant c > 0) such
that each test depends on D = O(1) variables. Cook’s theorem [Coo71] states
that it is NP-hard to decide whether there is an assignment for V satisfying all
of the tests in .
Starting from , we shall construct an SSAT1 test-system Ψ over variables
VΨ V . Our new variables VΨ will be non-Boolean, ranging over a eld F ,
with F = nc log log n for some constant c > 0. An assignment to VΨ will be
interpreted as an assignment to V by identifying the value 0 F with the
Boolean value true and any other non-zero value with fal e.

3.1 Constructing the CR-Forest

In order to construct the SSAT 1 instance I = Ψ V R 1 R n we need


to describe for each test Ψ , which variables it depends on, and its satisfying
assignments R . We begin by constructing the CR-forest, which is a combina-
torial object holding the underlying structure of Ψ . The forest Fn ( ) will have
a tree T for every test . Each node in the forest will have a set of vari-
ables associated with it. For every leaf there will be one test depending on the
variables associated with that leaf.
Let us (briefly) describe one tree T in the forest Fn ( ).
Every tree will be of depth K log log n (however, not all of the leaves will
be at the bottom level).
Each node v in the tree will have a domain domv = F d of points (domv =
F in case v is the root node) associated with it.
d0

The o springs of a non-leaf node v will be labeled each by a distinct (D + 2)-


cube Cv of domv (this part is slightly simpler than in [DKRS99]),

def
labels(v) = C C is a (D + 2)-cube in domv

The points in the domain domv of each node v will be mapped to some of
Ψ ’s variables, by the injection varv : domv VΨ . This mapping essentially
describes the relation of a node to its parent, and is de ned inductively as follows.
For each node v, we denote by Vv the set of ’fresh new’ variables mapped from
270 Irit Dinur

domv (i.e. none of the nodes de ned inductively so far have points mapped to
these variables). Altogether
def [
V = VΨ = Vv
v T
2

For the root node, varroot : domroot VΨ is de ned (exactly as in


[DKRS99]) by mapping Hd0 domroot = F d0 to V and the rest of the
def
points to the rest of Vroot = V VΨ (i.e. the low-degree-extension of V ). It
is important that varroot is de ned independently of .
For a non-root node v with parent u, the points of the cube Cv labels(u)
labeling v are mapped into the domain domv by the embedding extension
mapping, bv : Cv domv , de ned above in Section 2.2 (the parameter bv
speci ed below depends on the speci c node v, rather than just on v’s level
as in [DKRS99]). These points are u’s points that are ’passed on’ to the o -
spring v. We think of the point y = bv (x) domv as ’representing’ the point
x Cv domu , and de ne varv : domv VΨ as follows,
De nition 8 (varv , for a non-root node v). Let v be a non-root node, let u
be v’s parent, and let Cv domu be the label attached to v. For each point y
def
bv (Cv ) domv de ne varv (y) = varu ( b−1v
(y)), i.e. points that ’originated’
from Cv are mapped to the previous-level variables, that their pre-images in Cv
were mapped to. For each ’new’ point y domv bv (Cv ) we de ne varv (y) to
be a distinct variable from Vv .
The parameters used for the embedding extension mappings bv are t =
1 10
D + 2, k = d t = a. We set the degree of the root node rroot = H = F
and rv and bv (for non-root nodes v) are de ned by the following recursive
formulas:
8 a
< ru + 1 Cv is parallel to the axises
bv = p
: a
ru (D + 2) + 1 Otherwise
rv = bv − 1
We stop the recursion and de ne a node to be a leaf (i.e. de ne its labels
to be empty) whenever rv 2(D + 2). A simple calculation (to appear in the
complete version) shows that bv rv decrease with the level of v until for some
level K < log log n, rv 2(D + 2) = O(1). (This may happen to some nodes
sooner than others, therefore not all of the leaves are in level K).
We now complete the construction by describing the tests and their satisfying
assignments.
De nition 9 (Tests). Ψ will have one test v for each leaf v in the forest. v
will depend on the variables in varv (domv ). The set of satisfying assignments
for v ’s variables, R v , will consist of assignments A that satisfy the following
two conditions:
Approximating SVP∞ to within Almost-Polynomial Factors Is NP-Hard 271

1. A is an [rv d]-LDF on varv (domv )


2. If v T for and ’s variables appear in varv (domv ), then A must
satisfy .

4 Correctness of the Construction

4.1 Completeness

Lemma 1 (completeness). If there is an assignment A : V true fal e


satisfying all of the tests in , then there is a natural assignment AΨ : VΨ F
satisfying all of the tests in Ψ .

We extend A in the obvious manner, i.e. by taking its low-degree-extension (see


De nition 6) to the variables V , and then repeatedly taking the embedding
extension of the previous-level variables, until we’ve assigned all of the variables
in the system. More formally,

Proof. We construct an assignment AΨ : VΨ F by inductively obtaining [ri d]-


LDFs Pv : domv F for every level-i node v of every tree in the CR-forest, as
follows. We rst set (for every ) Proot to be the low degree extension (see
De nition 6) of A (we think of A as assigning each variable a value in 0 1 F
rather than true fal e , see discussion in the beginning of Section 3). Assume
we’ve de ned an [ru d]-LDF Pu consistently for all level-i nodes, and let v be an
o spring of u, labeled by Cv . The restriction f = Pu Cv of Pu to the cube Cu is an
[r D +2]-LDF where r = ru or r = ru (D +2) depending on whether Cv is parallel
to the axises or not. f can be written as a [ a r + 1 − 1 a (D + 2)]-LDF fext
over the larger domain F d , as promised by Proposition 3. We de ne Pv = fext
to be that [rv d]-LDF (recall that d = a (D + 2) and bv = a r + 1).
def
Finally, for a variable x varv , x = varv (x), we set AΨ (x) = Pv (x). The
construction implies that there are no collisions, i.e. x0 = varv (x0 ) = varv (x) =
x implies Pv (x) = Pv (x0 ).

4. Soundness

We need to show that a ’no’ instance of SAT is mapped to a ’no’ instance of


SSAT1 . We assume that the constructed SSAT1 instance has a consistent non-
trivial super-assignment of norm g, and show that the SAT instance we
started with is satis able.
def 1
Lemma (Soundness). Let g = F 102 . If there exists a consistent super-
assignment of norm g for Ψ , then is satis able.

Let A be a consistent non-trivial super-assignment for Ψ , of size A 1 g.


It induces (by projection) a super-assignment to the variables

m : VΨ − ZZjF j
272 Irit Dinur

i.e. for every variable x VΨ , m assigns a vector x (A( )) of integer coe cients,
one per value in F where is some test depending on x. Since A is consistent,
m is well de ned (independent of S the choice of test ). Alternatively, we view
m as a labeling of the points v2Fn ( ) domv by a ’super-value’ a formal
linear combination of values from F . The label of the point x domv for
some v Fn ( ), is simply m(varv (x)), and with a slight abuse of notation, is
sometimes denoted m(x). m is used as the underlying point super-assignment
for the rest of the proof, and will serve as an anchor by which we test consistency.

The central task of our proof is to show that if a tree has a non-trivial leaf,
then there is a non-trivial super-LDF for the domain in the root node that is
consistent with m. We will later want to construct from these super-LDFs an
assignment that satis es all of the tests in . For this purpose, we need the
super-LDFs along the way to be legal,
De nition 10 (legal). An LDF P is called legal for a node v T (for
some ), if it satis es in the sense that if ’s variables have pre-images
x1 xD domv , then P (x1 ) P (xD ) satisfy . A super-LDF G is called legal
for v T if for every LDF P appearing in G, P is legal for v T .
The following lemma encapsulates the key inductive step in our soundness
proof,
Lemma 3. Let u nodesi for some 0 i < K. There is a legal super-[ru d]-
def
LDF Gu with Gu 1 m 1 = maxx m(x) 1 such that for every x domu ,
x (Gu ) = m(x). Furthermore, if there is a node v in u’s sub-tree for which Gv = 0
then Gu = 0.
Due to space limitations, the proof of this lemma is omitted, and appears in the
full version of this paper.
In order to complete the soundness proof, we need to nd a satisfying as-
signment for . We obtained, in Lemma 3, a super-[r0 d]-LDF G for each root
node root , such that x domroot = F d0 , m(x) = x (G ). Note that indeed,
for every pair of tests = 0 , the corresponding super-LDFs must be equal
G = G (denote them by G). This follows because they are point-wise equal
x (G ) = m(x) = x (G ), and so the di erence super-LDF G − glob is trivial
on every point, and must therefore (again, by Proposition 2 low-ambiguity) be
trivial.
If A is not trivial, then there is at least one test v Ψ for which A( ) = 0.
Thus, denoting by the test for which v is a leaf in T , Lemma 3 implies
G = G = 0. Take an LDF f that appears in G, and de ne for every v V ,
def
A(v) = f (x) where x Hd0 is the point mapped to v. Since G is legal, is
totally satis ed by A.

5 From 1 to SVP1
AT
In this section we show the reduction from g-SSAT1 to the problem of ap-
proximating SVP1 . This reduction follows the same lines of the reduction in
Approximating SVP∞ to within Almost-Polynomial Factors Is NP-Hard 273

[ABSS93] from Pseudo-Label-Cover to SVP1 . We begin by formally de ning


the gap-version of SVP1 (presented in Section 1) which is the standard method
to turn an approximation problem into a decision problem.
De nition 11 (g-SVP1 ). Given a lattice L and a number d > 0, distinguish
between the following two cases:
Yes. There is a non-zero lattice vector v L with v 1 d.
No. very non-zero lattice vector v L has v 1 > g d.
We will show a reduction from g-SSAT1 to g-SVP1 , thereby implying
SVP1 to be NP-hard to approximate to within a factor of g = nΩ(1) log log n .
Let I = Ψ V R be an instance of g-SSAT1 , where Ψ = 1 n
is a set of tests over variables V = v1 vm , and R is the set of satisfying
assignments for i Ψ . We construct a g-SVP1 instance (L(B) d) where
def
d = 1 and B is an integer matrix whose columns form the basis for the lattice
L(B).
The matrix B will have a column v [ r] for every pair of test Ψ and an
assignment r R for it. There will be one additional special column t. The
matrix B will have two kinds of rows, consistency rows and norm-measuring
rows, de ned as follows.

Consistency Rows. B will have F + 1 rows for each threesome ( i j x) where


i and j are tests that depend on a mutual variable x. Only the columns of i
and j will have non-zero values in these rows.
The special column t will have g in each consistency row, and zero in the
other rows.
For a pair of tests i and j that depend on a mutual variable x, let’s
concentrate on the sub-matrix consisting of the columns of these tests, and the
F + 1 rows of the pair i j viewed as a pair of matrices
G1 of dimension
( F + 1) R and G2 of dimension ( F + 1) R j . Let r R be a
satisfying assignment for i and r0 R j be a satisfying assignment for j .
The r-th column in G1 equals g times the unit vector ei where i = r x (i.e.
a vector with zeros everywhere and a g in the r x -th coordinate). The r0 -th
column in G2 equals g (1 − ei ) where i = r0 x and 1 is the all-one vector (i.e.
g everywhere except a zero in the r0 x -th coordinate).
Notice that any zero-sum linear combination of the vectors ei 1 − ei 1 i
must give ei the same coe cient as 1−ei , because the vectors 1 ei i are linearly
independent.

Norm-measuring Rows. There will be a set of R rows designated to each test


Ψ in which only ’s columns have non-zero values. The matrix B, when
restricted to these rows and to the columns of , will be the ( R R )
Hadamard matrix H (we assume for simplicity that R is a power of 2, thus
such a matrix exists, see [Bol86] p. 74). Recall that matrix Hn of
 the Hadamard 
H n−1 H n−1
order 2n 2n is de ned by H0 = (1) and Hn = .
Hn−1 −Hn−1
274 Irit Dinur

G1 i j G2
r1 r2 rk r1 r2 rk 0

A row for
0 0 0 g g g
x
each value .
. .
of . . . . . .
. . . . . .
r1 x
j 's row
0 g 0 i; j ; x
.
g
. .
g r1 jx 's row
g
0
g
...
0
.
0
.
0
...
g
.
. . . . . .
. . . . . .
. . .
0 0 0 g g g
::
0 0 ::
0 0
Consistency Rows H i
Norm Measuring Rows H j
::
0 0 ::
0 0

Fig. 1. The matrix B

The vector t, as mentioned earlier, will be zero on these rows.

Proposition 5 (Completeness). If there is a natural assignment to Ψ , then


there is a non-zero lattice vector v L(B) with v 1 = 1.

Proof. Let A be a consistent natural assignment for Ψ . We claim that


X
v = t− v [ A( )]

P
is a lattice vector with v 1 = 1. Restricting 2Ψ v [ A( )] to an arbitrary
row in the consistency rows (corresponding to a pair of tests i j with mutual
variable x), gives g, because A( i ) x = A( j ) x . Subtracting this from t gives
zero in each consistency-row.
In the norm-measuring rows, since every test Ψ is assigned one value by
A, v restricted to ’s rows equals some column of the Hadamard matrix which
is a 1 matrix. Altogether, v 1 = 1 as claimed.

Proposition 6 (Soundness). If there is a non-zero lattice vector v L(B)


with v 1 < g, then there is a consistent non-trivial super-assignment A for
Ψ , for which A 1 < g.

Proof. Let X
v = ct t + c[ r] v[ r]
r

be a lattice vector with v 1 < g. The entries in the consistency rows of every
lattice vector, are integer multiples of g. The assumption v 1 < g implies
that v is zero on these rows.
De ne a super-assignment A to Ψ by setting for each Ψ and r R ,
def
A( )[r] = c[ r] .
Approximating SVP∞ to within Almost-Polynomial Factors Is NP-Hard 275

To see that A is consistent, let i j Ψ both depend on the variable x.


Notice that (as mentioned above) any zero-sum linear combination of the vectors
1 ek 1 − ek k must give ek and 1 − ek the same coe cient because the vectors
1 ek k are linearly independent. This implies that for any value k F for x,
X X
c[ r] = c[ j r ]
rjx =k r jx =k

This, in turn, means that x (A( i )) = x (A( j )) thus A is consistent.


A is also not-all-zero because v = 0 (if only ct was non-zero, then v 1 =
g). The norm of A is de ned as

A 1 = max ( A( ) 1 )

The vector v restricted to the norm-measuring rows of is exactly H A( ). Now


since 1
H is a ( R R ) orthonormal matrix, we have
jR j

1
p H A( ) 2 = A( ) 2
R

Since for every z IRn , z 1 p z 2 n, we obtain HA( ) 1 A( ) 2 .


Now for every integer vector z, z 1 z 2, and altogether,
p
A( ) 1 A( ) 2 HA( ) 1 v 1 < g
def
showing A 1 = max 2Ψ ( A( ) 1 ) < g as claimed.

Finally, if Ψ is a SSAT1 no instance, then the norm of any consistent super-


assignment A must be at least g, and so the norm of the shortest lattice vector
in L(B), must be at least g. This completes the proof of the reduction.

The reduction to CVP1 is quite similar, taking t to be the target vector, and is
omitted.

References
[ABSS93] S. Arora, L. Babai, J. Stern, and Z. Sweedyk. The hardness of approximate
optima in lattices, codes and linear equations. In Proc. 34th I Symp.
on Foundations of Computer Science, pages 724 733, 1993.
[Ajt96] M. Ajtai. Generating hard instances of lattice problems. In Proc. 28th
ACM Symp. on Theory of Computing, pages 99 108, 1996.
[Ajt98] Miklos Ajtai. The shortest vector problem in L2 is NP-hard for randomized
reductions. In Proceedings of the 30th Annual ACM Symposium on Theory
of Computing (STOC-98), pages 10 19, New York, May 23 26 1998. ACM
Press.
[Bab86] L. Babai. On Lovasz’s lattice reduction and the nearest lattice point prob-
lem. Combinatorica, 6:1 14, 1986.
276 Irit Dinur

[Bol86] B. Bollobas. Combinatorics. Cambridge University Press, 1986.


[CN98] J.Y. Cai and A. Nerurkar. Approximating the SVP to within a factor
(1 + 1 dim ) is NP-hard under randomized reductions. In Proc. of the
13th Annual I Conference on Computational Complexity, pages 46
55. 1998.
[Coo71] S. Cook. The complexity of theorem-proving procedures. In Proc. 3rd
ACM Symp. on Theory of Computing, pages 151 158, 1971.
[DFK+ 99] Dinur, Fischer, Kindler, Raz, and Safra. PCP characterizations of NP: To-
wards a polynomially-small error-probability. In STOC: ACM Symposium
on Theory of Computing (STOC), 1999.
[DKRS99] I. Dinur, G. Kindler, R. Raz, and S. Safra. Approximating-CVP to within
almost-polynomial factors is NP-hard. Manuscript, 1999.
[DKS98] Dinur, Kindler, and Safra. Approximating-CVP to within almost-
polynomial factors is NP-hard. In FOCS: I Symposium on Foun-
dations of Computer Science (FOCS), 1998.
[FT85] Andras Frank and va Tardos. An application of simultaneous approx-
imation in combinatorial optimization. In 26th Annual Symposium on
Foundations of Computer Science, pages 459 463, Portland, Oregon, 21
23 October 1985. I .
[GG98] O. Goldreich and S. Goldwasser. On the limits of non-approximability
of lattice problems. In Proc. 30th ACM Symp. on Theory of Computing,
pages 1 9, 1998.
[Lev73] L. Levin. Universal’ny e pereborny e zadachi (universal search problems :
in Russian). Problemy Peredachi Informatsii, 9(3):265 266, 1973.
[LLL82] A.K. Lenstra, H.W. Lenstra, and L. Lovasz. Factoring polynomials with
rational coe cients. Math. Ann., 261:513 534, 1982.
[LO85] J. C. Lagarias and A. M. Odlyzko. Solving low-density subset sum prob-
lems. Journal of the ACM, 32(1):229 246, January 1985.
[Mic98] D. Micciancio. The shortest vector in a lattice is hard to approximate
to within some constant. In Proc. 39th I Symp. on Foundations of
Computer Science, 1998.
[Sch85] C.P. Schnorr. A hierarchy of polynomial-time basis reduction algorithms.
In Proceedings of Conference on Algorithms, Pecs (Hungary), pages 375
386. North-Holland, 1985.
Convergence Analysis of
Simulated Annealing-Based Algorithms Solving
Flow Shop Scheduling Problems?

Kathleen Steinhöfel1 , Andreas Albrecht2 , and Chak-Kuen Wong2


1
GMD National Research Center for Information Technology,
Kekulestr. 7, D-12489 Berlin, Germany
2
De t. of Com uter Science and Engineering,
The Chinese University of Hong Kong, Shatin, N.T., Hong Kong

Ab tract. In the a er, we a ly logarithmic cooling schedules of inho-


mogeneous Markov chains to the flow sho scheduling roblem with the
objective to minimize the makes an. In our detailed convergence analy-
sis, we rove a lower bound of the number of ste s which are su cient
to a roach an o timum solution with a certain robability. The result
is related to the maximum esca e de th Γ from local minima of the
underlying energy landsca e. The number of ste s k which are required
to a roach with robability 1 − the minimum value of the makes an
is lower bounded by nO(Γ ) logO(1) (1 ). The auxiliary com utations
are of olynomial com lexity. Since the model cannot be a roximated
arbitrarily closely in the general case (unless P = N P ), the a roach
might be used to obtain a roximation algorithms that work well for
the average case.

1 Introduction
In the flow shop scheduling problem n jobs have to be processed on m di erent
machines. Each job consists of a sequence of tasks that have to be processed
during an uninterrupted time period of a xed length. The order in which each
job is processed by the machines is the same for all jobs. A schedule is an
allocation of the tasks to time intervals on the machines and the aim is to
nd a schedule that minimizes the overall completion time which is called the
makespan.
Flow shop scheduling has long been identi ed as having a number of impor-
tant practical applications. Baumgärtel addresses in [4] the flow shop problem in
order to deal with the planning of material flow in car plants. His approach was
applied to the logistics for the Mercedes Benz automobile. The NP-hardness of
the general problem setting with m 3 was shown by Garey, Johnson, and Sethi
Research artially su orted by the Strategic Research Program at the Chinese
University of Hong Kong under Grant No. SRP 9505, by a Hong Kong Government
RGC Earmarked Grant, Ref. No. CUHK 4367/99E, and by the HK-Germany Joint
Research Scheme under Grant No. D/9800710.

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 277 290, 2000.

c Springer-Verlag Berlin Heidelberg 2000
278 Kathleen Steinhöfel, Andreas Albrecht, and Chak-Kuen Wong

[10] in 1976. The existence of a polynomial approximation scheme for the flow
shop scheduling problem with an arbitrary xed number of machines is demon-
strated by Hall in [12]. A recent work of Williamson et al. constitutes theoretical
evidence that the general problem, which is considered in the present paper, is
hard to solve even approximately. They proved that nding a schedule that is
shorter than 5/4 times the optimum is NP-hard [23].
We are concentrating on the convergence analysis of simulated annealing-
based algorithms which employ a logarithmic cooling schedule. The algorithm
employs a simple neighborhood which is reversible and ensures a priori that tran-
sitions always result in a feasible solution. The neighborhood relation determines
a landscape of the objective function over the con guration space F of feasible
solutions of a given flow shop scheduling problem. Let aS (k) denote the probabil-
ity to obtain the schedule S F after k steps of a logarithmic
P cooling schedule.
The problem is to nd an upper bound for k such that S2Fmin aS (k) > 1 −
for schedules S minimizing the makespan. The general framework of logarithmic
cooling schedules has been studied intensely, e.g., by B. Hajek [11] and O. Catoni
[5,6].
Our convergence result, i.e., the lower bound of the number of steps k, is
based on a very detailed analysis of transition probabilities between neighbor-
ing elements of the con guration space F . We obtain a run-time of nO(Γ )
logO(1) (1 ) to have with probability 1 − a schedule with the minimum value
of the makespan, where Γ is a parameter of the energy landscape characterizing
the escape depth from local minima.

The Flow Shop Problem

The flow shop scheduling problem can be formalized as follows. There are a set
J of l jobs and a set M of m machines. Each job has exactly one task to be
processed on each machine. Therefore, we have n := l m tasks each with a given
processing time p(t) IN. There is a binary relation R on the set of tasks T that
decomposes T into chains corresponding to the jobs. The binary relation, which
represents precedences between the tasks is de ned as follows: For every t T
there exists at most one t0 such that (t t0 ) R. If (t t0 ) R, then J(t) = J(t0 )
and there is no x t t0 such that (t x) R or (x t0 ) R. For any (v w) R, v
has to be performed before w. R induces a total ordering of the tasks belonging
to the same job. There exist no precedences between tasks of di erent jobs.
Clearly, if (v w) R then M (v) = M (w). The order in which a job passes all
machines is the same for all jobs. As the task number of a task t we will denote
the number of tasks preceding t within its job. We can therefore assume that all
tasks with task number i are processed on machine Mi . A schedule is a function
S :T IN 0 that for each task t de nes a starting time S(t).
The length, respectively the makespan of a schedule S is de ned by

(1) (S) := max S(v) + p(v)
v2T
Convergence Analysis of Simulated Annealing-Based Algorithms 279

i.e., the earliest time at which all tasks are completed. The problem is to nd an
optimum schedule, that is feasible and of minimum length. A flow shop schedul-
ing problem can be represented by a disjunctive graph, a model introduced by
Roy and Sussmann in [17]. The disjunctive graph is a graph G = (V A E ),
which is de ned as follows:
V =T I O
A = [v w] v w T (v w) R
[I w] w T v T : (v w) R
 [v O] v T w T : (v w) R
E= v w v w T v = w M (v) = M (w)
: V IN

The vertices in V represent the tasks. In addition, there are a source (I) and
a sink (O) which are two dummy vertices. All vertices in V are weighted. The
weight of a vertex (v) is given by the processing time p(v), (v) := p(v),
( (I) = (O) = 0). The arcs in A represent the given precedences between the
tasks. The edges in E represent the machine capacity constraints, i.e., v w E
with v w T and M (v) = M (w) denotes the disjunctive constraint and the two
ways to settle the disjunction correspond to the two possible orientations of
v w . The source I has arcs emanating to all the rst tasks of the jobs and the
sink O has arcs coming from all nal tasks of jobs.
An orientation on E is a function Ω : E T T such that Ω( v w )
v w w v for each v w E. A schedule is feasible if the corresponding
orientation Ω on E (Ω(E) = Ω(e) e E ) results in a directed graph (called
digraph) D := G0 = (V A E Ω(E)) which is acyclic.
A path P from xi to xj , i j IN i < j : xi xj V of the digraph D
is a sequence of vertices (xi xi+1 xj ) V such that for all i k < j,
[xk xk+1 ] A or xk xk+1 Ω(E).
The length of a path P (xi xj ) is de ned by the sum of the weights of all
P
vertices in P : (P (xi xj )) = jk=i (xk ). The makespan of a feasible schedule
is determined by the length of a longest path (i.e., a critical path) in the digraph
D. The problem of minimizing the makespan therefore can be reduced to nding
an orientation Ω on E that minimizes the length of (Pmax ).

3 Basic De nitions
Simulated annealing algorithms are acting within a con guration space in accor-
dance with a certain neighborhood structure or a set of transition rules, where
the particular steps are controlled by the value of an objective function. The
con guration space, i.e., the set of feasible solutions of a given problem instance
is denoted by F . For all instances, the number of tasks of each job equals the
number of machines and each job has precisely one operation on each machine.
Therefore, the size of F can be upper bounded in the following way: In the dis-
junctive graph G there are at most l! possible m orientations to process l tasks on
a single machine. Hence, we have F l! .
280 Kathleen Steinhöfel, Andreas Albrecht, and Chak-Kuen Wong

To describe the neighborhood of a solution S F, we de ne a neighborhood


function : F (F ). The neighborhood of S is given by (S) F, and each
solution in (S) is called a neighbor of S. Van Laarhoven et al. [22] propose
a neighborhood function L for solving job shop scheduling problems which
is based on interchanging two adjacent tasks of a block. A block is a maximal
sequence of adjacent tasks that are processed on the same machine and do belong
to a longest path. We will use their neighborhood function with the extension
that we allow changing the orientation of an arbitrary arc which connects two
tasks on the same machine:

(i) Choosing two vertices v and w such that


M (v) = M (w) = k with e = v w Ω(E);
(ii) Reversing the order of e such that the resulting arc e0 Ω 0 (E) is w v ;
(iii) If there exists an arc u v such that v = u M (u) = k, then replace the arc
u v by u w ;
(iv) If there exists an arc w x such that w = x M (x) = k, then replace the
arc w x by v x .

Thus, the neighborhood structure is characterized by

De nition 1 The schedule S 0 is a neighbor of S, S 0 (S), if S 0 can be obtained


by the transition rules 1 − 4 or S 0 = S.

Our choice is motivated by two facts:

In contrast to the job shop scheduling the transition rules do guarantee for
the flow shop a priori that the resulting schedule is feasible, i.e., that the
corresponding digraph is acyclic.
The extension of allowing to reverse the orientation of an arbitrary arc leads
to an important property of the neighborhood function, namely reversibility.

Thus, the neighborhood structure is such that the algorithm visits only digraphs
corresponding to feasible solutions and is equipped with a symmetry property
which is required by our convergence analysis.

Lemma 1 Suppose that e = v w Ω(E) is an arbitrary arc of an acyclic


digraph D. Let D0 be the digraph obtained from D by reversing the arc e. Then
D0 is also acyclic.

Proof: Suppose D0 is cyclic. Because D is acyclic, the arc w v is part of


the cycle in D0 . Consequently, there is a path P = (v x1 x2 xi w) in D0 .
Since w is processed before v on machine Mk at least two arcs of the path P
are connecting vertices of the same job. From the de nition of the flow shop
problem that implies at least two vertices have a task number greater than k.
Neither within a job nor within a machine there is an arc y z such that the
task number of y is greater than the task number of z. This contradicts that the
path P exists in D0 . Hence, D0 is acyclic. q.e.d.
Convergence Analysis of Simulated Annealing-Based Algorithms 281

As already mentioned in Section 2, the objective is to minimize the makespan


of feasible schedules. Hence, we de ne Z(S) := (Pmax ), where Pmax is a longest
path in D(S). Furthermore, we set
 
(2) Fmin := S S F and S 0 S 0 F Z(S 0 ) Z(S)
For the special case of L, Van Laarhoven et al. have proved the following
Theorem 1 [22] For each schedule S Fmin , there exists a nite sequence of
transitions leading from S to an element of Fmin .
The probability of generating a solution S 0 from S can be expressed by
( 1
if S 0 (S)
0
(3) G[S S ] :=
0 otherwise,
with n − m + 1 which follows from De nition 1.
Since the neighborhood function L from [22] is a special case of our transition
rules 1 − 4, we have:
Lemma Given S F Fmin , there exists S 0 (S) such that G[S S 0 ] > 0.
The acceptance probability A[S S 0 ], S 0 (S) F, is given by:
(
1 if Z( 0 ) − Z( ) 0
(4) A[S S 0 ] := Z(S )−Z(S)
e− c otherwise,
where c is a control parameter having the interpretation of a temperature in an-
nealing procedures. Finally, the probability of performing the transition between
S and S 0 , S S 0 F, is de ned by
8
< G[S S 0 ] A[S S 0 ] if S 0 = S
(5) Pr S S 0 = 1 − P G[S ] A[S ] otherwise.
:
6= S

Let aS (k) denote the probability of being in the con guration S after k steps
performed for the same value of c. The probability aS (k) can be calculated in
accordance with
X
(6) aS (k) := a (k − 1) Pr S

The recursive application of (6) de nes a Markov chain of probabilities aS (k).


If the parameter c = c(k) is a constant c, the chain is said to be a homoge-
neous Markov chain; otherwise, if c(k) is lowered at any step, the sequence of
probability vectors a(k) is an inhomogeneous Markov chain.
We consider a cooling schedule which de nes a special type of inhomogeneous
Markov chains. For this cooling schedule, the value c(k) changes in accordance
with
Γ
(7) c(k) = k = 0 1
ln(k + 2)
282 Kathleen Steinhöfel, Andreas Albrecht, and Chak-Kuen Wong

The choice of c(k) is motivated by Hajek’s Theorem [11] on logarithmic cooling


schedules for inhomogeneous
 Markov chains. If there exists S0 S1 Sr
F S0 = S Sr = S 0 such that G[Su Su+1 ] > 0 u = 0 1 (r − 1) and
Z(Su ) h, for all u = 0 1 r, we denote height(S S0) h. The schedule
S is a local minimum, if S F Fmin and Z(S 0 ) > Z(S) for all S 0 L (S) S.
By depth(Smin ) we denote the smallest h such that there exists a S 0 F, where
Z(S 0 ) < Z(Smin ), which is reachable at height Z(Smin ) + h.
The following convergence property has been proved by B. Hajek:
Theorem [11] Given a con guration space C and a cooling schedule de ned
by
Γ
c(k) = k = 0 1
ln(k + 2)
P
the asymptotic convergence H2C aH (k) − 1 of the stochastic algorithm, which
k!1
is based on ( ), (4), and (5), is guaranteed if and only if

(i) H H 0 C H0 H1 Hr C H0 = H Hr = H 0 : G[Hu Hu+1 ] > 0
l=0 1 (r − 1);
(ii) h : height(H H 0) h height(H 0 H) h;
(iii) Γ max depth(Hmin ).
Hmin

The condition (i) expresses the connectivity of the con guration space. As al-
ready mentioned above, with the choice of our neighborhood relation we can
guarantee the mutual reachability of schedules. Therefore, Hajek’s Theorem can
be applied to our con guration space F with the neighborhood relation .
Before we perform the convergence analysis of the logarithmic cooling sched-
ule de ned in (7), we point out some properties of the con guration space and
the neighborhood function. Let S and S 0 be feasible schedules and S 0 (S).
To obtain S 0 from S, we chose the arc e = v w with M (v) = M (w).
If Z(S) < Z(S 0 ), then only a path containing one of the selected vertices
v w can determine the new makespan after the transition move. It can be shown
that all paths whose length increase contain the edge e0 = w v . Therefore, we
have the following upper bound.
Lemma 3 The increase of the objective function
 Z in a single step according
to (S − S 0 ) can be upper by p(v) + p(w) .
The reversibility of the neighborhood function implies for the maximum dis-
tance of neighbors S 0 (S) to Fmin in relation to S itself: If the minimum
number of transitions to reach from S an optimum element is N , then for any
S0 (S) the minimum number of transitions is at most N + 1.

4 Convergence Analysis
Our convergence results will be derived from a careful analysis of the exchange
of probabilities among feasible solutions which belong to adjacent distance levels
Convergence Analysis of Simulated Annealing-Based Algorithms 283

to optimum schedules, i.e., in addition to the value of the objective function, the
elements of the con guration space are further distinguished by the minimal
number of transitions required to reach an optimum schedule. We rst introduce
a recurrent formula for the expansion of probabilities and then we prove the main
result on the convergence rate which relates properties of the con guration space
to the speed of convergence. Throughout the section we employ the fact that
for a proper choice of Γ the logarithmic cooling schedule leads to an optimum
solution.
To express the relation between S and S 0 according to their value of the
objective function we will use <Z , >Z , and =Z to simplify the expressions:
S <Z S 0 instead of S0 (S) & (Z(S) < Z(S 0 )),
0
S >Z S instead of S0 (S) & (Z(S) > Z(S 0 )),
S =Z S instead of S = S & S 0
0 0
(S) & (Z(S) = Z(S 0 )).
Furthermore, we denote:
p(S) := S <Z S 0 q(S) := S =Z S 0 r(S) := S >Z S 0
These notations imply
(8) p(S) + q(S) + r(S) = (S) − 1 = m l − m − 1
The equation is valid because there are m (l − 1) arcs which are allowed to
be switched and S belongs to its own neighborhood. Therefore, the size of the
neighborhood is independent of the particular schedule S, and we set n0 :=
m l − m.
Now, we analyze the probability aS (k) to be in the schedule S F after k
steps of the logarithmic cooling schedule de ned in (7), and we use the notation
1 Z(S)−Z(S )
(9) = e− c(k) k 0
 Z(S)−Z(S
Γ
)
k+2
By using (3) till (5), one obtains from (6) by straightforward calculations
 p(S)
X 
p(S) + 1 1 1
(10) aS (k) = aS (k − 1) 0
−  Z(S +
n n0 )−Z(S)
Γ
i=1 k+1
S <Z S
p(S)+q(S) r(S)
X aS (k − 1) X aSj (k − 1) 1
+ +
n0 n0  Z(S)−Z(S
Γ
j)
i=1 j=1 k+1
S Z S S j <Z S

The representation (expansion) will be used in the following as the main relation
reducing aS (k) to probabilities from previous steps. We introduce the following
partition of the set of schedules with respect to the value of the objective func-
tion:
h
[ 
L0 := Fmin and Lh+1 := S : S F S0 S0 F Li Z(S 0 ) Z(S)
i=0
284 Kathleen Steinhöfel, Andreas Albrecht, and Chak-Kuen Wong

The highest level within F is denoted by Lhmax . Given S F, we further denote


by Wmin (S) := [S Sk−1 S 0 ] a shortest sequence of transitions from S to

Fmin , i.e., S 0 Fmin . Thus, we have for the distance d(S) := length Wmin (S) .
We introduce another partition of F with respect to d(S) :
s[
-1
S Di d(S) = i 0 and Ds = Di i.e., F = Ds
i=1

Thus, we distinguish between distance levels Di related to the minimal number


of transitions required to reach an optimal schedule from Fmin and the levels Lh
which are de ned by the objective function. By de nition, we have D0 := L0 =
Fmin . We will use the following abbreviations:
1
(11) f (S 0 S t) := and
 Z(S )−Z(S)
k+2−t Γ

p(S)
p(S) + 1 X 1 − Z(S )−Z(S)
(12) KS (k − t) := 0
− k+2−t Γ

n n0
i=1
S <Z S

We are going backwards from the k th step and expanding aS (k)Pin accordance
with (10). Our aim is to nd a close upper bound for the value S 62 D0 aS (k)
in terms of probabilities from previous steps.
During the expansion (10) of aS (k), S D0 , terms according to S are gener-
ated as well as according to all neighbors S 0 of S. Some terms generated by the
expansion of S contain the factor aS (k − 1) and can therefore be summarized
with terms generated by the expansion of S 0 . However, it is important to distin-
guish between elements from D1 and elements from Di , i 2. For all S D1 ,
we obtain the following:

p(S) + 1 + q(S) + r(S)
(13) aS (k − 1) −
n0
p(S)
X1 p(S)
X1 
1 1
−  Z(S )−Z(S) +  Z(S )−Z(S) = aS (k − 1)
n0 Γ n0 Γ
i=1 k+1 i=1 k+1
S <Z S S <Z S

In case of S D1 , some neighbors S 0 of S are elements of D0 and do not generate


the terms related to S >Z PS 0 because the aS (k) are not expanded since they
are not present in the sum S 62 D0 aS (k). Therefore, r0 (S) r(S) many terms
are missing for S D1 and the following arithmetic term is generated:
 
r0 (S)
(14) aS (k − 1) 1 −
n0
where r0 (S) := S 0 : S 0 (S) S 0 D0 . On the other hand, the expansion of
aS (k) generates terms related to S 0 D0 with S 0 <Z S and containing aS (k−1)
Convergence Analysis of Simulated Annealing-Based Algorithms 285

as a factor. Those terms are not canceled by expansions of aS (k). All S D1


therefore generate the following term:

r (S)
X aSj (k − 1) 1
(15)
n0  Z(S)−Z(S
Γ
j)
j=1 k+1
Sj 2 D0 \ (S)

Now, we consider the entire sum and take the negative product aS (k) r0 (S) n0
separately. By using the abbreviations introduced in (12) we derive the following
lemma.
P
Lemma 4 After one step of the expansion of S62D0 aS (k), the sum can be
represented by
X X X r0 (S)
aS (k) = aS (k − 1) − aS (k − 1) +
n0
S 62 D0 S 62 D0 S 2 D1

r (S)
X X f (S Sj 1)
+ aSj (k − 1)
n0
S 2 D1 j=1
Sj 2 D0 \ (S)


The diminishing factor 1 − r0 (S) n0 appears by de nition for all elements
of D1 . At subsequent reduction steps, the factor is transmitted successively
to all probabilities from higher distance levels Di because any element of Di
has at least one neighbor from Di−1 . The main task is now to analyze how this
diminishing factor changes, if it is propagated to the next higher distance level.
We denote
X X X
(16) aS (k) = (S t) aS (k − t) + (S 0 t) aS (k − t)
S 62 D0 S 62 D0 S 2 D0

0
P and (S t) are the factors at probabilities after t
i.e., the coe cients (S t)
steps of an expansion of S 62 D0 aS (k). Hence, for S D1 we have (S 1) =
1 − r0 (S) n0 , and (S 1) = 1 for the remaining S Ds (D0 D1 ). For S 0 D0
we have from Lemma 4:
p(S )
X f (Si S 0 1)
(17) (S 0 1) =
n0
i=1
S 2 D1 ^ S 2 (S )

Starting from step (k − 1), the generated probabilities aS (k − u) are expanded in


the same way as all other probabilities. We set (S j) := 1 − (S j) because we
are mainly interested in the convergence (S j) 0. We perform an inductive
step from (k − t + 1) to (k − t) and obtain for t 2:
286 Kathleen Steinhöfel, Andreas Albrecht, and Chak-Kuen Wong

Lemma 5 The following recurrent relation is valid for the coe cients (S t),
t 2:
X (S 0 t − 1) X (S 00 t − 1)
(S t) = (S t − 1) KS (k − t) + + f (S 00 S t)
n0 n0
SZ S S<Z S

Furthermore, for the three special cases S Dj j > t, S D1 t = 1, and


S D0 t = 1 we have, (S t) = 0, (S t) = r0 (S) n0 , and (S t) = 1 −
Pp(S) 0
j=1 f (Sj S 1) n , with Sj D1 S (Sj ) respectively.
Exactly the same structure of the equation is valid for (S t) which will be used
for elements
P of D0 only because these elements are not present in thePoriginal
sum S62D0 aS (k). Now, any (S t) and (S t) is expressed by a sum u Tu of
arithmetic terms. We consider in more details the terms associated withP elements
S 0 of D0 and S 1 Pof D1 . We assume a representation (S 0 t − 1) = T (S 0 ),
and (S t − 1) = T (S), S D P0.
If we consider r0 (S 1 ) n0 and S 0 <Z S 1 f (S 1 S 0 t) n0 separately, the di cul-
ties arising from the de nition (S t) := 1 − (S t) can be avoided, i.e., we have
to take into account only changing signs of terms during the transmission from
D1 to D0 and vice versa.
P
De nition The two expressions r0 (S 1 ) n0 , and S 0 <Z S 1 f (S 1 S 0 t) n0 , are
called source terms of (S 1 t) and (S 0 t), respectively.
P
During an expansion of S62D0 aS (k) backwards according to (13), the source
terms are distributed permanently to higher distance levels Dj . Therefore, at
higher distance levels the notion of a source term can be de ned by an inductive
step:
De nition 3 For all S Di , i > 1, any term which is generated according
to the equation of Lemma 5 from a source term of (S 0 t − 1), where S 0
Di−1 \ (S), is said to be a source term of (S t).
We introduce a counter e(T ) to terms T which indicates the step at which the
term has been generated from source terms. The value e(T ) is called the rate of
a term and we set e(T ) = 1 for source terms T .
The value e(T ) > 1 is assigned to terms related to D0 and D1 in a slightly
0
di erent way compared to higher distanceP levels because at the rst step the S
do not participate in the expansion of S62D0 aS (k). Furthermore, in the case
of D0 and D1 we have to take into account the changing signs of terms which
result from the simultaneous consideration of (S 1 t) (for D1 ) and (S 0 t) (for
D0 ).

De nition 4 A term T 0 is called a j th rate term of (S 0 t), S 0 D0 and


j 2, if either S 0 = −T and e(T ) = j − 1 for some (S t − 1), S D1 \ (S 0 ),
or e(T 0 ) = j − 1 for some (S 0 t − 1), S 0 D0 \ (S 0 ).
A term T is called a j th rate term of (S t), S 1 D1 and j 2, if e(T ) =
j − 2 for some (S 0 t − 1), S 0 D2 \ (S 1 ), e(T ) = j − 1 for some (S 0 t − 1),
Convergence Analysis of Simulated Annealing-Based Algorithms 287

S 0 D1 \ (S 1 ), or T = −T 0 and e(T 0 ) = j − 1 for some S 0 D0 \ (S 1 ) with


respect to (S 0 t − 1).
A term T is called a j th rate term of (S t), S Di and i j 2, if e(T ) =
j − 1 for some (S 0 t − 1), S 0 Di+1 \ (S), e(T ) = j − 1 for some (S 0 t − 1),
S 0 Di \ (S), or T is a j th rate term of (S t − 1) for some S Di−1 .

The classi cation of terms will be used for a partition of the summation over
all terms which constitute particular values (S 1 t) and (S 0 t). Let Tj (S t) be
the set of j th rate arithmetic terms of (S 1 t) ( (S 0 t)) related to S Ds . We
set
X
(18) Aj (S t) := T
T 2Tj (S t)

The same notation is used in case of S = S 0 D0 with respect to (S 1 t), and


we obtain by induction

t−i+1
X t
X
(19) (S t) = Aj (S t) and (S 0 t) = Aj (S 0 t)
j=1 j=1

For S Di = D1 D0 and j 2 we obtain immediately from Lemma 5 and


De nition 4:

(20) Aj (S t) = Aj−1 (S t − 1) KS (k − t) +
X Aj−2 (S 0 t − 1) X Aj−2 (S 00 t − 1)
+ + f (S 00 S t) +
n0 n0
S Z S S >Z S
S 2 D +1 S 2 D +1
X Aj−1 (S 0 t − 1) X Aj−1 (S 00 t − 1)
+ + f (S 00 S t) +
n0 n0
S Z S S >Z S
S 2D S 2D
X Aj (S 0 t − 1) X Aj (S 00 t − 1)
+ + f (S 00 S t)
n0 n0
S Z S S >Z S
S 2 D −1 S 2 D −1

In case of S D1 and j 2, we have in accordance with De nition 4:

(21) Aj (S t) = Aj−1 (S t − 1) KS (k − t) +
X Aj−1 (S 0 t − 1) X Aj−1 (S 00 t − 1)
+ + f (S 00 S t) +
n0 n0
S Z S S >Z S
S 2 D1 S 2 D1
X Aj−2 (S 00 t − 1) X Aj−1 (S 0 t − 1)
+ 0
f (S 00 S t) −
n n0
S >Z S S 2D0 \ (S)
S 2 D2
288 Kathleen Steinhöfel, Andreas Albrecht, and Chak-Kuen Wong

Finally, the corresponding relation for S 0 is given by


X Aj−1 (S t − 1)
(22) Aj (S 0 t) = Aj−1 (S 0 t − 1) KS 0 (k − t) − 0
f (S S 0 t)
0
n
S>Z S

We incorporate (20) till (22) in the following upper bound:


Lemma 6 Given S F, k nO(Γ ) , there exist constants a b c > 1 such that
1 b
Aj (S t) < j a 2−k

where j k c is required.
The proof is performed by induction, using the representations (20) till (22) for
increasing i and j, and we employ similar relations as in [19], Lemma 10 and
Lemma 11. In the present case, we utilize the reversibility of the neighborhood
relation and therefore the lower bound on k depends directly on Γ .
We compare the computation of (S t) (and (S 0 t)) for two di erent values
t = k1 and t = k2 , i.e., (S t) is calculated backwards from k1 and k2 , respec-
tively. To distinguish between (S t) and related values, which are de ned for
di erent k1 and k2 , we will use an additional upper index. At this point, we
use again the representation (19) of (S t) (and the corresponding equation for
(S 0 t)).
Lemma 7 Given k2 k1 and S Di , then

A12 (S t) = A22 (S k2 − k1 + t) if t i+2

The proposition can be proved straightforward by induction over i, i.e., the sets
Di . Lemma 7 implies that at step s + 2 (with respect to k1 )

A12 (S s + 2) = A22 (S k2 − k1 + s + 2) for all S Ds

For A11 (S t), the corresponding equality is already satis ed in case of t s. The
relation can be extended to all values j 2:
Lemma 8 Given k2 k1 , j 1, and S Di , then

A1j (S t) = A2j (S k2 − k1 + t) 2 (j − 1) + i
if t
P
We recall that our main goal is to upper bound the sum S62D0 aS (k). When
a(0) denotes the initial probability distribution, we have from (16):
X X
(23) aS (k1 ) − aS (k2 )
S62D0 S62D0
X  X X 
(S k1 ) aS (0) + (S 0 k1 ) − (S 0 k1 ) aS (0)
S62D0 S 2D0 S 2D0
Convergence Analysis of Simulated Annealing-Based Algorithms 289

Lemma 9 Given k2 k1 > nO(Γ ), then


X  1
(S k2 ) − (S k1 ) aS (0) < 2− k1
S62D0

for a suitable constant > 1.


The proof follows straightforward from Lemma 6 and Lemma 8. The same rela-
tion can be derived for the (S 0 k1 2 ). Now, we can immediately prove
Theorem 3 The condition
1
k > nO(Γ ) logO(1)

implies for arbitrary initial probability distributions a(0) and > 0:


X X
aS (k) < and therefore, aS (k) > 1 −
S62D0 S 2D0

Proof: We choose k in accordance with Lemma 9 and we consider


X X  X
aS (k) = aS (k) − aS (k2 ) + aS (k2 )
S62D0 S62D0 S62D0
X 
= (S k2 ) − (S k) aS (0) +
S62D0
X  X
+ (S 0 k) − (S 0 k2 ) aS (0) + aS (k2 )
S 2D0 S62D0

The value k2 from LemmaP9 is larger but independent of k1 = k, i.e., we can


take a k2 > k such that S62D0 aS (k2 ) < 3. Here, we employ Theorem 1
and 2, i.e., if the constant Γ from (7) is su ciently large, the inhomogeneous
simulated annealing procedure de ned by (3) till (5) tends to the global minimum
P Z on F . We obtain the
of  stated
P inequality, 0
if additionally both di erences
S62D0 (S k2 ) − (S k) and S 2D0 (S k) − (S 0 k2 ) are smaller than
3. Lemma 9 implies that the condition on the di erences is satis ed in case of
1
k1 > log (3 ). q.e.d.

References
1. E.H.L. Aarts. Local Search in Combinatorial Optimization. Wiley, New York, 1997.
2. E.H.L. Aarts, P.J.M. Van Laarhoven, J.K. Lenstra, and N.L.J. Ulder. A Com-
utational Study of Local Search Algorithms for Sho Scheduling. ORSA J. on
Computing, 6:118 125, 1994.
3. E.H.L. Aarts and J.H.M. Korst. Simulated Annealing and Boltzmann Machines:
A Stochastic Approach. Wiley, New York, 1989.
4. H. Baumgärtel. Distributed Constraint Processing for Production Logistics. In
PACT’97 Practical Application of Constraint Technology, Black ool, UK, 1997.
290 Kathleen Steinhöfel, Andreas Albrecht, and Chak-Kuen Wong

5. O. Catoni. Rough Large Deviation Estimates for Simulated Annealing: A lica-


tions to Ex onential Schedules. Annals of Probability, 20(3):1109 1146, 1992.
6. O. Catoni. Metro olis, Simulated Annealing, and Iterated Energy Transformation
Algorithms: Theory and Ex eriments. J. of Complexity, 12(4):595 623, 1996.
7. C. Chen, V.S. Vem ati, and N. Aljaber. An A lication of Genetic Algorithms for
Flow Sho Problems. uropean J. of Operational Research, 80:389 396, 1995.
8. P. Chretienne, E.G. Co man, Jr., J.K. Lenstra, and Z. Liu. Scheduling Theory and
Its Applications. Wiley, New York, 1995.
9. M.K. El-Najdawi. Multi-Cyclic Flow Sho Scheduling: An A lication in Multi-
Stage, Multi-Product Production Processes. International J. of Production Re-
search, 35:3323 3332, 1997.
10. M.R. Garey, D.S. Johnson, and R. Sethi. The Com lexity of Flow Sho and Job
Sho Scheduling. Mathematics of Operations Research, 1:117 129, 1976.
11. B. Hajek. Cooling Schedules for O timal Annealing. Mathematics of Operations
Research, 13:311 329, 1988.
12. L.A. Hall. A roximability of Flow Sho Scheduling. In 36th Annual Symposium
on Foundations of Computer Science, . 82 91, Milwaukee, Wisconsin, 1995.
13. C.Y. Lee and L. Lei, editors. Scheduling: Theory and Applications. Annals of O -
erations Research, Journal Edition. Baltzer Science Publ. BV, Amsterdam, 1997.
14. G. Liu, P.B. Luh, and R. Resch. Scheduling Permutation Flow Sho s Using The La-
grangian Relaxation Technique. Annals of Operations Research, 70:171 189, 1997.
15. E. Nowicki and C. Smutnicki. The Flow Sho with Parallel Machines: A Tabu
Search A roach. uropean J. of Operational Research, 106:226 253, 1998.
16. M. Pinedo. Scheduling: Theory, Algorithms, and Systems. Prentice Hall Inter-
national Series in Industrial and Systems Engineering. Prentice Hall, Englewood
Cli s, N.J., 1995.
17. B. Roy and B. Sussmann. Les problemes d’Ordonnancement avec Constraints Dis-
jonctives. Note DS No.9 bis. SEMA, 1964.
18. D.L. Santos, J.L. Hunsucker, and D.E. Deal. Global Lower Bounds for Flow Sho s
with Multi le Processors. uropean J. of Operational Research, 80:112 120, 1995.
19. K. Steinhöfel, A. Albrecht, and C.K. Wong. On Various Cooling Schedules for
Simulated Annealing A lied to the Job Sho Problem. In M. Luby, J. Rolim,
and M. Serna, editors, Proc. RANDOM’98, ages 260 279, Lecture Notes in
Com uter Science, vol. 1518, 1998.
20. K. Steinhöfel, A. Albrecht, and C.K. Wong. Two Simulated Annealing-Based
Heuristics for the Job Sho Scheduling Problem. uropean J. of Operational Re-
search, 118(3):524 548,1999.
21. J.D. Ullman. NP-Com lete Scheduling Problems. J. of Computer and System
Science, 10(3):384 393, 1975.
22. P.J.M. Van Laarhoven, E.H.L. Aarts, and J.K. Lenstra. Job Sho Scheduling by
Simulated Annealing. Operations Research, 40(1):113 125, 1992.
23. D.P. Williamson, L.A. Hall, J.A. Hoogeveen, C.A.J. Hurkens, J.K. Lenstra,
S.V. Sevast’janov, and D.B. Shmoys. Short Sho Schedules. Operations Research,
45:288 294, 1997.
On the Lovasz Number of Certain Circulant
Graphs

Valentin E. Brimkov1, Bruno Codenotti2 , Valentino Crespi1 , and


Mauro Leoncini2 3
1
Department of Mathematics, astern Mediterranean University,
Famagusta, TRNC
brimkov, cre pi .a @mozart.emu.edu.tr
2
Istituto di Matematica Computazionale del CNR,
Via S. Maria 46, 56126-Pisa, Italy
codenotti, leoncini @imc.pi.cnr.it
3
Facolta di conomia, Universita di Foggia,
Via IV Novembre 1, 71100 Foggia, Italy

Abstract. The theta function of a graph, also known as the Lovasz


number, has the remarkable property of being computable in polynomial
time, despite being sandwiched between two hard to compute integers,
i.e., clique and chromatic number. Very little is known about the explicit
value of the theta function for special classes of graphs. In this paper we
provide the explicit formula for the Lovasz number of the union of two
cycles, in two special cases, and a practically e cient algorithm, for the
general case.

1 Introduction
The notion of capacity of a graph was introduced by Shannon in [14], and after
that labeled as Shannon capacity. This concept arises in connection with a graph
representation for the problem of communicating messages in a zero-error chan-
nel. One considers a graph G, whose vertices are letters from a given alphabet,
and where adjacency indicates that two letters can be confused. In this setting,
the maximum number of one-letter messages that can be sent without danger
of confusion is given by the independence number of G, here denoted by (G).
If (Gk ) denotes the maximum number of k-letter messages that can be safely
communicated, we see that (Gk ) (G)k . Moreover one can readily show that
equality does not hold in generalp (see, e.g., [11]). The Shannon capacity of G
is the number (G) = limk!1 k (Gk ) which, by the previous observations,
satis es (G) (G), where equality does not need to occur.
It was very early recognized that the determination of the Shannon capacity
is a very di cult problem, even for small and simple graphs (see [8, 13]). In
a famous paper of 1979, Lovasz introduced the theta function (G), with the
explicit goal of estimating (G) [11].
Shannon capacity and Lovasz theta function attracted a lot of interest in the
scienti c community, because of the applications to communication issues, but

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 291 305, 2000.

c Springer-Verlag Berlin Heidelberg 2000
292 Valentin . Brimkov et al.

also due to the connections with some central combinatorial and computational
questions in graph theory, like computing the largest clique and nding the
chromatic number of a graph (see [2, 3, 4, 6] for a sample of the wealth of
di erent results and applications of (G) and (G)). Despite a lot of work in
the eld, nding the explicit value of the theta function for interesting special
classes of graphs is still an open problem.
In this paper we present some results on the theta function of circulant
graphs, i.e., graphs which admit a circulant adjacency matrix. We recall that a
circulant matrix is fully determined by its rst row, each other row being a cyclic
shift of the previous one. Such graphs span a wide spectrum, whose extremes are
the single cycle and the complete graph. We either give a formula or an algorithm
for computing the Lovasz number of circulant graphs given by the union of
two cycles. The algorithm is based on the computation of the intersection of
halfplanes and (although its running time is O(n log n) in the worst case, as
compared with the linear time achievable through linear programming) is very
e cient in practice, since it exploits the particular geometric structure of the
intersection.

2 Preliminaries
There are several equivalent de nitions for the Lovasz theta function (see, e.g.,
the survey by Knuth [10]). We give here the one that comes out of Theorem 6
in [11], because it requires only little technical machinery.

De nition 1. Let G be a graph and let A be the family of matrices A = (aij )


such that aij = 0 if i and j are adjacent in G. Also, let 1 (A) 2 (A)
n (A) denote the eigenvalues of A. Then
 
1 (A)
(G) = max 1 −
A2A n (A)

Combining the fact that (G) (G) with the easy lower bound (C5 )
5, Lovasz has been able to determine exactly the capacity of C5 , the pentagon,
which turns out to be 5.
For several families of simple graphs, the value of (G) is given by explicit
formulas. For instance, in the case of odd cycles of length n we have

n cos( n)
(Cn ) =
1 + cos( n)

We now sketch the proof of correctness of the above formula (see [10] for more
details), because it will be instrumental to the more general results obtained in
this paper.
With reference to the de nition of the Lovasz number which resorts to the
minimum of the largest eigenvalue over all feasible matrices (Section 6 in [10]), in
the case of n-cycles, we have that a feasible matrix has ones everywhere, except
On the Lovasz Number of Certain Circulant Graphs 293

on the superdiagonal, subdiagonal and the upper-right and lower-left corners, i.e.
it can be written as C = J +xP +xP −1 , where J is a matrix whose entries are all
equal to one, and P is the permutation matrix taking j into (j + 1) mod n. It is
well known and easy to see that the eigenvalues of C are n+2x, and x( j + −j ),
for j = 1 n − 1, where = e2 i n . The minimum over x of the maximum of
these values is obtained when n + 2x = −2x cos n, which immediately leads
to the above formula.

3 The Function of Circulant Graphs of Degree 4


n−1
Let n be an odd integer and let j be such that 1 < j 2 . Let C(n j)
denote the circulant graph with vertex set 0 n − 1 and edge set i i +
1 mod n i i + j mod n i = 0 n − 1 . By using the approach sketched in
[10], we can easily obtain the following result.

Lemma 1. Let f0 (x y) = n + 2x + 2y and, for some xed value of j, fi (x y) =


2x cos 2n i + 2y cos 2 nij , i = 1 n − 1. Then
 
n−1
(C(n j)) = min max fi (x y) i = 0 1 (1)
xy i 2

Proof. Follows from the same arguments which lead to the known formula for
the Lovasz number of odd cycles [10] (i.e., taking advantage of the fact that we
can restrict the set of feasible matrices within the family of circulant matrices)
and observing that, for i 1, fi (x y) = fn−i (x y).

3.1 A Linear Programming Formulation


Throughout the rest of this paper we will consider the following linear program-
ming formulation of (1).

minimize z
n−1
s.t. fi (x y) − z 0 i=0 2 (2)
z 0
where the fi (x y)’s are de ned in Lemma 1.
Consider the intersection C of the closed halfspaces de ned by z 0 and
n−1
fi (x y) − z 0, i = 1 2 (which is not empty, since any point (0 0 k),
k 0, satis es all the inequalities). C is a polyhedral cone with the apex at the
origin. This follows from the two following facts, which can be easily veri ed: (1)
the equations fi (x y) − z = 0, i 1, de T ne hyperplanes through the origin; (2)
for any z0 > 0, the projection z0 of C z = z0 onto the xy plane is a polygon,
i.e., z0 is bounded1 (see Fig. 1, which corresponds to the graph C(13 2)).
Consider now the rst constraint of formulation (2). The region represented
by such constraint is the halfspace above the plane A with equation n + 2x +
1
In the appendix we shall give a rigorous proof of this fact for the case j = 2.
294 Valentin . Brimkov et al.

1.8

1.6

1.4

1.2

Z1
0.8

0.6

0.4 1

0.2 0 Y
0
–1.4 –1.2 –1 –0.8 –0.6 –0.4 –1
–0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
X

Fig. 1. The polyedral cone for n = 13 and j = 2 cut at z = 2

2y − z = 0. It is then easy to see that the minimum z of (2) will correspond to


the point P = (x y z) of C that is the last met by a sweeping line, parallel to
the line y = −x, which moves on the surface of A towards the negative ortant
(we will simply refer to these as to the extremal vertices). In particular, x and
y are the coordinates of the extremal vertex v of the convex polygon z in the
third quadrant. The lines which de ne v have equations 2x cos +2y cos( j) = z
and 2x cos + 2y cos( j) = z, where = 2 ni1 and = 2 ni , for some indices
i1 and i2 . The key property, which we will exploit both to determine a closed
formula for the (in the cases j = 2 and j = 3) and to implement an e cient
algorithm for the general case of circulant graphs of degree 4, is that i1 and i2
can be computed using any projection polygon z0 , z0 > 0, and determining
its extremal point in the third quadrant. Once i1 and i2 are known, z can be
computed by solving the following linear system
8
< 2x cos + 2y cos(j ) − z = 0
2x cos + 2y cos(j ) − z = 0 (3)
:
2x + 2y − z = −n

3. The Special Case =


The detailed proof of the following theorem is deferred to the appendix.
Theorem 1.
 
− cos( 2n n 3 ) − cos( 2n ( n 3 + 1))
1
(C(n 2)) = n 1 − 2
(4)
(cos( 2n n 3 ) − 1)(cos( 2n ( n 3 + 1)) − 1)

3.3 The Special Case =3


Consider again the projection polygon z0 , for some z0 > 0. We know from
Section 3.1 that the value of is the optimal value z of the objective function in
On the Lovasz Number of Certain Circulant Graphs 295

Fig. . Lines forming the extremal vertex P

the linear program (2), and that this value is achieved at the extremal vertex P
of z in the third quadrant. Also, we know that any projection polygon z0 can
be used to determine the two lines fi (x y) − z = 0, i 1, which form P . It turns
out that nding such lines is easy when j = 3. In the following we will say that
the line li has positive x-intercept (resp., y-intercept) if the intersection between
li and the x-axis (resp., y-axis) has positive x-coordinate (resp., y-coordinate),
otherwise we will say that the intercept is negative. The crucial observation is
the following. Among the lines with negative x- and y-intercepts, lbn 2c is the
one for which these intersections are closest to the origin. It then follows that
P must lay on this line and (after a moment of thought) that the second line
forming P must be searched among those with positive slope. Let xi and yi
denote the coordinates of the intersection between the line li and the line lbn 2c .
Now, since lbn 2c is slightly steeper than the line with equation y = −x, the line
sought will be the one with positive x-intercept, negative y-intercept, and such
that yi is maximum (see Fig. 2). We shall prove that such line is the one with
index n = n−3 6 .

To this end, observe rst that the requirement of positive x-intercept and
n
negative y-intercept implies 12 < i < n4 (recall that li has equation y =
− cos 6n x + cosz06 ). To prove that ln maximizes yi we show that, for any integer
cos
n n
n
i = n in the interval 12 < i < n4 , the three points vi = (xi yi ), vn = (xn yn ) and
O = (0 0) form a clockwise circuit. We already know (see the Appendix) that
this amounts to proving that d(vi vn O) = xi yn − yi xn < 0. This is easy; the
only formal di culty is working with the integer part of n−3 6 . Clearly this might
be circumvent by dealing with the three di erent cases, namely n = 6k + 1,
n = 6k + 3, and n = 6k + 5, for some positive integer k. For simplicity we shall
prove the rst case only. Now, for n = 6k + 1 we have n−3 6 = n−1
6 and, using
z0 = 2,
296 Valentin . Brimkov et al.

cos 3n −cos n − cos n −cos( 3 − 3n )


xn = cos 3
cos( 3 − 3n )+cos
yn = cos 3
cos( 3 − 3n )+cos
n n n n

3
cos n+cos 6n cos n +cos n
xi = 3 6 yi = cos 6n −cos 3
cos n cos n −cos n cos n cos n n cos n

After some relatively simple algebra we obtain


+ +γ
d(vi vn O) =   
cos 3
n cos 3 − 3n + cos2 n cos 3n cos 2n i − cos n cos 6n i

where 
= cos n cos 6n i + cos n 
= cos( 2n i ) cos n − cos 3n 
γ = cos 3 − 3n cos 3n + cos 6n i
It is easy to check that γ > 0 while the denominator of d(vi vn O) is
negative for the admissible values of i. We are now able to determine the value
of the function of C(n 3).
Theorem .
n−3 n−3 !
2 2
cos2 −cos cos 6 +cos2 6 −1
(C(n 3)) = n 1 − n
2
n
n−3
n n
2 n−3 (5)
(cos +1)(cos 6 −1)(1−cos +cos 6 )
n n n n

Proof. (C(n 3)) is the value of z in the solution to the


 linear system (3) where
cos +cos cos +cos −1
j = 3, i.e., z = n 1 − (1−cos )(cos −1)(cos +cos +1) where = 2 ni1 and
= 2 ni . By the previous results we know that i1 = n2 and i2 = n−3
6 .
Plugging these values into the expression for z we get the desired result.

4 An cient Algorithm and Computational Results


Although the Lovasz number can be computed in polynomial time, the available
algorithms are far from simple and e cient (see, e.g., [1]). It is thus desirable
to devise e cient algorithms tailored to the computation of for special classes
of graphs. By reduction to linear programming, the theta function of circulant
graphs can be computed in linear time, provided that the number of cycles is
independent of n. The corresponding algorithms are not necessarily e cient in
practice, though. We briefly describe a practically e cient algorithm for com-
puting (C(n j)), i.e., in case of two cycles.
The algorithm rst determines the 2 lines forming the extremal vertex of 1
in the third quadrant, then solves the resulting 3 3 linear system (i.e., the
system (3)). More precisely, the algorithm incrementally builds the intersection
of the halfplanes which de ne 1 (considering only the third quadrant) and
keeps track of the extremal point. The running time is O(n log n) in the worst
case (i.e., it does not improve upon the optimal algorithms for computing the
intersection of n arbitrary halfplanes or, equivalently, the convex hull of n points
On the Lovasz Number of Certain Circulant Graphs 297

in the plane). However, it does make use of the properties of the lines bounding
the halfplanes to keep the number of vertices of the incremental intersection
close to the minimum possible. In some cases (such as C(n 2)) this is still Ω(n),
but for most values of n and j it turns out to be substantially smaller.
Using the above algorithm we have performed some preliminary experiments
to get insights about the behavior of the theta function for the special class of
circulant graphs considered in this abstract. Actually, since the value sandwiched
by the clique and the chromatic number of C(n j) is the theta function of C(n j)
(i.e., the complementary graph of C(n j)), the results refer to (C(n j)) =
n
(C(n j)) .
Table 4 shows (C(n j)) approximated to the four decimal place, for a num-
ber of values of n and j. It is immediate to note that, for a xed value of j,
the values of the theta function seem to slowly approach, as n grows, the lower
bound (given by the clique number), which happens to be 2 almost always (ob-
vious exceptions occur when 3 divides n and j = n3 .

Table 1. Some computed values of (C(n j))


n−3
4 5 6 7 bn
4c dn
4e bn
3c bn
3c+1
51 2 2446 2 0474 2 1227 2 0838 2 1297 2 2446 3 2 0173 2 2446
101 2.2383 2.0121 2.1122 2.0228 2.2383 2.1162 2.2383 2.0044 2.2383
201 2.2366 2.0031 2.1103 2.0059 2.2366 2.1113 3 2.0011 2.2366
301 2.2363 2.0014 2.1099 2.0027 2.2363 2.1099 2.0005 2.2363 2.2363
401 2.2362 2.0008 2.11 2.0015 2.2362 2.1102 2.2362 2.0003 2.2362
501 2.2362 2.0005 2.11 2.001 2.2362 2.1102 3 2.0002 2.2362
1001 2.2361 2.0001 2.1099 2.0002 2.2361 2.1099 2.2361 2 2.2361
2001 2.2361 2 2.1099 2.0001 2.2361 2.1099 3 2 2.2361
3001 2.2361 2 2.1099 2 2.2361 2.1099 2 2.2361 2.2361
4001 2.2361 2 2.1099 2 2.2361 2.1099 2.2361 2 2.2361
5001 2.2361 2 2.1099 2 2.2361 2.1099 3 2 2.2361
10001 2.2361 2 2.1099 2 2.2361 2.1099 2.2361 2 2.2361

This is con rmed by the results in Table 2, which depicts the behavior of
the relative distance dnj of (C(n j)) from the clique number. We only con-
sider odd values of j (so that the clique number is always 2); we also rule out
the cases where j = n3 , for which we know there is a (relatively) large gap be-
tween clique number and theta function. More precisely, Table 2 shows: (1) the
maximum relative distance M = maxj n dnj , where n ranges P over all odd in-
tegers from 9 to n; (2) the average relative distance = N1n j n dnj , where
Nn is the number of admissible pairs (n j); (3) the average quadratic distance
P 2
= N1n j n (dnj − ) .
The regularities presented by the value of the theta function and by the
geometric structure of the optimal lines suggest the possibilities of further ana-
lytic investigations.
p
For instance, we have observed that, for j = 4, the formula
n −1− 5
i = 2 arccos 4 seems to correctly predict the index of the rst optimal
line, in perfect agreement with the experimental results. In general, for j even
and j << n, up to a value , the optimal point seems to always correspond
298 Valentin . Brimkov et al.

Table . Relative distances of (C(n j)) from the clique number

n M
101 0.372402 0.056077 0.004343
201 0.372402 0.033712 0.002600
301 0.372402 0.024840 0.001876
401 0.372402 0.019897 0.001471
501 0.372402 0.016734 0.001214
1001 0.372402 0.009657 0.000653

to two consecutive indices. For j odd, the rst line giving the optimal point is
almost always obtained at the index n−1
2 ; the second line varies with j, but with
a regular behaviour.

5 Conclusions

This paper has provided a rst step towards extending the class of graphs for
whose theta function either a formula or a very e cient algorithm is available.
Work in progress by the authors [5] aims at nding an e cient algorithm for
more general circulant graphs. We believe that the results of this paper together
with the above mentioned more general results will contribute to shedding new
lights on the properties of this fascinating function.

References
[1] Alizadeh, F., Haeberly, J.-P. A., Nayakkankuppam, M., Overton, M., Schmieta,
S.: SDPPACK User’s Guide.
http://www.cs.nyu.edu/faculty/overton/sdppack/sdppack.html
[2] Alon, N.: On the Capacity of Digraphs. uropean J. Combinatorics, 19 (1998)
1 5
[3] Alon, N., Orlitsky, A.: Repeated Communication and Ramsey Graphs. I
Trans. on Inf. Theory, 41 (1995) 1276 1289
[4] Ashley, Siegel: A Note on the Shannon Capacity of Run-Length-Limited Codes.
I Trans. on Inf. Theory, 33 (1987)
[5] Brimkov, V. ., Codenotti, B., Crespi, V., Leoncini, M.: cient Computation
of the Lovasz Number of Circulant Graphs. In preparation
[6] Farber, M.: An Analogue of the Shannon Capacity of a Graph. SIAM J. on
Alg. and Disc. Methods, 7 (1986) 67 72
[7] Feige, U.: Randomized Graph Products, Chromatic Numbers, and the Lovasz
-Function. Proc. of the 27th STOC (1995) 635 640
[8] Haemers, W.: An Upper Bound for the Shannon Capacity of a Graph. Colloq.
Math. Soc. Janos Bolyai, 5 (1978) 267-272
[9] Haemers, W.: On Some Problems of Lovasz Concerning the Shannon Capacity
of Graphs. I Trans. on Inf. Theory, 5 (1979) 231 232
On the Lovasz Number of Certain Circulant Graphs 299

[10] Knuth, D. .: The Sandwich Theorem. lectronic J. Combinatorics, 1 (1994)


[11] Lovasz, L.: On the Shannon Capacity of a Graph. I Trans. on Inf. Theory,
5 (1979) 1 7
[12] O’Rourke, J.: Computational Geometry in C. Cambridge University Press
(1994)
[13] Rosenfeld, M.: On a Problem of Shannon. Proc. Amer. Math. Soc., 18 (1967)
315 319
[14] Shannon, C. .: The Zero- rror Capacity of a Noisy Channel. IR Trans. In-
form. Theory, IT- (1956) 8 19
[15] Szegedy, M.: A Note on the Number of Lovasz and the Generalized Delsarte
Bound. Proc. of the 35th FOCS, (1994) 36 41

Appendix

In this appendix we shall prove Theorem 1. Before that, we need to establish


the following subsidiary result.

Theorem 3. Let n be odd and n 7. Also, as in Section 3, let C be the inter-


section of the halspaces de ned by the inequalities z 0 and fi (x y) − z 0,
n−1
i = 1 2 . Then, C is a polyhedral cone with the apex at the origin. The
1-dimensional faces of C (i.e., the edges of C are the intersections (in the half-
space z 0 of consecutive pairs of planes Pi , P (i) , where Pi is de ned by
the equation fi (x y) − z = 0 and

i + 1 if i < n−1
2
s(i) =
1 otherwise.

Proof. It is su cient to show that (1) for any z0 > 0, z0 is bounded2 (i.e., is
a polygon), so that the polyhedron is indeed a pointed cone with the apex at
the origin, and (2) z0 has exactly n−1 2 vertices, formed by the intersections of
pairs of consecutive lines li and l (i) , where li has equation fi (x y) − z0 = 0,
n−1
i=1 2 . To this end, we rst establish a couple of preliminary results.
Given any two intersecting lines l and l0 in the 2D space, we will say that
l0 is clockwise from l if the two lines can be overlapped by rotating l clockwise
around the intersection point by an angle of less than 2 radians.
n−1
Lemma . For n 11, li+1 is clockwise from li , i = 1 2 − 1.

Proof. The equation de ning li can be written as y = mj x + qi , where mt =


t
cos
− cos 4nt . Thus − 2 < i = arctg(mi ) < 2 is the angle between the positive
n
x-axis and li . It is clearly su cient to prove the following statements.

1. If i i+1 > 0 then i > i+1 ;


2. if i > 0 and i+1 < 0 then i − i+1 < 2;
3. if i < 0 and i+1 > 0 then i+1 − i > 2.
2
T
Recall that z0 is the projection of C z = z0 onto the xy plane.
300 Valentin . Brimkov et al.

It is not di cult to see that Condition 1 (i.e., i i+1 > 0) occurs if and only
if 4 (k − 1) < cos i < cos i+1 < 4 k, for some k 1 2 3 4 . Since the
denominator of mt does not vanish when t [i i + 1], then mt is a continuous
function. We shall then prove that i > i+1 by showing that, for t [i i + 1],
mt is a monotone decreasing function of t. Indeed, we have
dmt 2 − sin t cos 2 t + 2 cos t sin 2 t
=−
dt n cos2 2 t
2 sin t cos 2 t − 4 sin t cos2 t
=
n cos2 2 t
2 sin t (cos 2 t − 4 cos2 t )
=
n cos2 2 t
2 sin t (2 cos2 t − 1 − 4 cos2 t )
=
n cos2 2 t
2 sin t (−1 − 2 cos2 t )
= <0
n cos2 2 t
The proof of statements 2 and 3 becomes simpler if we assume that n be large
enough (although only statement 3 requires that n 11 in order to hold true).
Suppose rst that i > 0 and i+1 < 0. This only happens if 2n i < 2 < 2 (i+1) n
(i.e., if both angles are close to 2 ). For n large enough this clearly means that
both mi and −mi+1 are positive and close to zero, which in turn implies that
i − i+1 is close to 0.
The proof of statement 3 is similar. The condition i < 0 and i+1 > 0 occurs
if either 2n i < 4 < 2 (i+1)
n or 2n i < 34 < 2 (i+1)
n . It both cases, −mi and mi+1
approach in nity as n grows, which means that i+1 − i approaches .

As an example, in Fig. 3 we see (following the clockwise order) that, for n = 13


and i = 1 2 3, the line li+1 is indeed clockwise from li .
n−1
T
Lemma 3. For i = 1 2 − 1, let vi = li li+1 denote the intersection point
n−1
between li−1 and li . Then any two points vi and vi+1 , for i = 1 2 − 2,
together with the origin form a clockwise circuit.
Proof. It is well known (see, e.g., [12]) that three arbitrary points a = (a0 a1 ),
b = (b0 b1 ), and c = (c0 c1 ) form a clockwise circuit if and only if

a0 a 1 1

d(a b c) = b0 b1 1 < 0
c0 c1 1

In our case c0 = c1 = 0, so that the above determinant simpli es to a0 b1 − a1 b0 ,


where a0 and a1 (resp., b0 and b1 ) are the coordinates of vi (resp., vi+1 ). To
determine a0 and a1 we can solve the 2 2 linear system (where, for simplicity,
we have set z0 = 1) 
fi (x y) − 1 = 0
fi+1 (x y) − 1 = 0
On the Lovasz Number of Certain Circulant Graphs 301

2 2

1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2
−1 0 1 2 3 −1 0 1 2 3
2 2

1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2
−2 −1 0 1 2 −2 −1 0 1 2

Fig. 3. Lines add in clockwise order (n = 13)

obtaining

cos(2t(i + 1)) − cos(2ti)


x=
2(cos(2t(i + 1)) cos(ti) − cos(2ti) cos(t(i + 1)))

and
cos(ti) cos(2ti) − cos(2ti) cos(t(i + 1))
y=
2 cos(2ti)(cos(2t(i + 1)) cos(ti) − cos(2ti) cos(t(i + 1)))
where t = 2n .
For vi+1 we clearly obtain similar values (the correspondence being is exactly
given by replacing i with i + 1 everywhere). After some simple but quite tedious
algebra, the corresponding formula for the determinant simpli es to

cos(t) cos(t(i + 1)) − cos(ti)


d(vi vi+1 O) =
(2 cos(ti) cos(t(i + 1)) + 1)(2 cos(t(i + 1)) cos(t(i + 2)) + 1)

We now prove that d = d(vi vi+1 O) < 0. Consider the numerator of d. Since
cos(ti) > cos(t(i+1)) (recall that 0 < ti < t(i+1) < t(i+2) < ), the numerator
is clearly negative when cos(ti) > 0. However, since cos(t) cos(t(i+1))−cos(ti) =
cos(t(i + 2)) − cos(t) cos(t(i + 1)) we can easily see that the numerator is also
negative when cos(ti) < 0. It remains to show that the denominator of d is
302 Valentin . Brimkov et al.

positive. The denominator is the product of two terms, and the same argument
applies to each of them. Clearly 2 cos(ti) cos(t(i+1))+1 > 0 when cos(ti) cos(t(i+
1)) > 0. Hence the term might be negative only when ti < 2 < t(i + 1). In this
case, however, both angles are close to 2 and thus 2 cos(ti) cos(t(i + 1)) is small
compared to 1 (as in the proof of Lemma 2, this fact is obvious for large n,
although it holds for any n 7).

We are now able to complete the proof of Theorem 3. As in Lemma 3, let vi


n−1
denote the intersection point of li−1 and li , i = 1 2 − 1. Also, let v
n−1

denote the intersection point of l n−1 and l1 . By lemmas 2 and 3, we know that
any three consecutive vertices of the closed polygon L = [v1 v n−1 ] make a
right turn (except, possibly, for the two triples which include v n−1 and v1 ). We
also know that the angle i , as a function of i, changes sign three times only,
starting from a negative value for i = 1. Hence the polygon L may possibly have
the three shapes depicted in Fig. 4: (1) L is convex, (2) L is simple but not
convex, (3) L is not simple.

Fig. 4. Three possible forms for the polygon L of Theorem 3

Case 2 would clearly correspond to z0 being unbounded, while case 3 would


imply that the number of vertices of z0 is less than n−1
2 and that not all of them
are formed by the intersection of consecutive lines. Hence, to prove the result,
we have to prove that only Case 1 indeed occurs. But this is easy. In fact, cases
2 and 3 can be ruled out simply by observing that that the three points v n−1 ,
v1 , and O make a left turn, while in case 1 they make a right turn. Computing
the appropriate determinant d we get
−(2 − 1)( + 1)(4 2 − 2 − 1)
d=
2(4 3 − 2 − 1)(32 6 − 48 4 + 20 2 − 1)
where = cos n . Now, the numerator is negative for x > 8090169945 (the
largest root of 4x2 − 2x − 1 = 0) while the denominator is positive for x >
On the Lovasz Number of Certain Circulant Graphs 303

8846461772 (the unique real root of 4x3 − 2x − 1 = 0). But for n = 7 we already
have = 9009688678; hence d < 0 for any n 7 and the three points make a
right turn, as required.
As the last observation, we recall that the proof holds for odd n 11 (because
of Lemma 2). However, the result is true for any odd n 7, as can be seen by
directly checking the cases n = 7 and n = 9 (see Fig. 5).

Fig. 5. Projection z0 for n = 7 (left) and n = 9

Proof of Theorem 1. Consider the linear system (3) with j = 2. By Theorem



3, we know that = 2n i and = 2 (i+1) n , for some i 1 n−1
2 − 1 . The
solution to (3) is given by x = n2 (coscos−1)(cos
+cos
, y = −n 1
4 (cos −1)(cos −1) , and
 1
 −1)
−cos −cos
z = n + 2x + 2y = n 1 − (cos −1)(cos −1) . Our goal is now to determine the
value of i which
 minimizes z. More precisely, we will compute the minimum,
n−1
over the set 1 2 2 − 1 , of the following function

cos 2nx + cos 2 (x+1)


n − 12
gn (x) =
(cos 2 nx − 1)(cos 2 (x+1)
n − 1)
gn (x) is a continuous function in the open interval (0 2 ), with limx!0+ =
limx!2 − = + . Computing the derivative we obtain
 3
csc nx csc (1+x)
n
gn0 (x) = − hn (x)
8n
where hn (x) = sin (2x−1)n + sin (2x+3)
n + sin (4x+1)
n + sin (4x+3)
n . Clearly,
0
gn (x) = 0 if and only if hn (x) = 0. As a rst rough argument (which is useful
304 Valentin . Brimkov et al.

just to locate the zero x of hn ), we note that hn (x) > 0 if (4x+3)n . This
implies that x must be greater than n−34 . But then, for n large enough, we may
approximate hn (x) with hn (x) = 2 sin 2xn +2 sin 4xn which vanishes at x = n3 .
So x n3 and we see that 2 < 2x−1 < 2x+3 < and < 4x+1 < 4x+3 < 32 .
We now use this result to obtain tight bounds for x. We observe that hn (x) is
positive if
   
(4x + 3) (4x + 3) (4x + 1)
sin 2 − = max sin
n n sin n
 
(2x − 1) (2x + 3)
< min sin sin
n n
(2x + 3)
= sin
n
which amounts to saying that x cannot be less than n3 − 1. Analogously, hn (x)
is negative if
   
(4x + 1) (4x + 3) (4x + 1)
sin 2 −
= min sin
n n sin n
 
(2x − 1) (2x + 3)
> max sin sin
n n
(2x − 1)
= sin
n
which implies that x cannot be larger than n3 . This fact allows us to conclude
that the integer value which minimizes gn (x) (and hence z) is one among n3 −
1, n3 and n3 . We now prove that the value sought is n3 by showing that
gn ( n3 − 1) − gn ( n3 ) and gn ( n3 ) − gn ( n3 ) are both positive. For simplicity,
we shall use the following notation: fb c−1 = cos 2 (bnn3c−1) , fb c = cos 2 (bn
n
3c)
,
2 (dn 3e) 2 (dn 3e+1)
fd e = cos n , and fd e+1 = cos n . We have

n n fd e + fd e+1 − 1 2 fb c + fd e − 1 2
gn ( ) − gn ( )= −
3 3 (fd e − 1)(fd e+1 − 1) (fb c − 1)(fd e − 1)
fd e fb c − fd e + fd e+1 fb c − fd e+1 − 12 fb c + 1
= 2

(fb c − 1)(fd e − 1)(fd e+1 − 1)
fb c fd e+1 − fb c + fd e fd e+1 − fd e − 12 fd e+1 + 1
2
(fb c − 1)(fd e − 1)(fd e+1 − 1)
(fb c − fd e+1 )( 12 + fd e )
=
(fb c − 1)(fd e − 1)(fd e+1 − 1)
The last expression is positive since the denominator is negative, fb c −fd e+1 > 0,
and fd e < − 12 . Similarly,

n n fb c−1 + fb c − 1 2 fb c + fd e − 1 2
gn ( − 1) − gn ( )= −
3 3 (fb c−1 − 1)(fb c − 1) (fb c − 1)(fd e − 1)
On the Lovasz Number of Certain Circulant Graphs 305

fb c−1 fd e − fb c−1 + fb c fd e − fb c − 12 fd e + 1
= 2

(fb c−1 − 1)(fb c − 1)(fd e − 1)
fb c fb c−1 − fb c + fd e fb c−1 − fd e − 12 fb c−1 + 1
2
(fb c−1 − 1)(fb c − 1)(fd e − 1)
(fd e − fb c−1 )( 12 + fb c )
=
(fb c−1 − 1)(fb c − 1)(fd e − 1))

and again the numerator is negative since fd e − fb c−1 < 0 and fb c > − 21 .
Speeding Up Pattern Matching by Text
Compression

Yusuke Shibata1 , Takuya Kida1 , Shuichi Fukamachi2 , Masayuki Takeda1 ,


Ayumi Shinohara1 , Takeshi Shinohara2, and Setsuo Arikawa1
1
Dept. of Informatics, Kyushu University 33, Fukuoka 812-8581, Japan
yusuke, kida, takeda, ayumi, arikawa @i.kyus u-u.ac.jp
2
Dept. of Arti cial Intelligence, Kyushu Institute of Technology,
Iizuka 320-8520, Japan
fukamati, s ino @ai.kyutec .ac.jp

Ab tract. Byte pair encoding (BP ) is a simple universal text com-


pression scheme. Decompression is very fast and requires small work
space. Moreover, it is easy to decompress an arbitrary part of the orig-
inal text. However, it has not been so popular since the compression is
rather slow and the compression ratio is not as good as other methods
such as Lempel-Ziv type compression.
In this paper, we bring out a potential advantage of BP compression.
We show that it is very suitable from a practical view point of com-
pressed pattern matching, where the goal is to nd a pattern directly in
compressed text without decompressing it explicitly. We compare run-
ning times to nd a pattern in (1) BP compressed les, (2) Lempel-Ziv-
Welch compressed les, and (3) original text les, in various situations.
xperimental results show that pattern matching in BP compressed
text is even faster than matching in the original text. Thus the BP
compression reduces not only the disk space but also the searching time.

1 Introduction
Pattern matching is one of the most fundamental operations in string processing.
The problem is to nd all occurrences of a given pattern in a given text. A lot of
classical or advanced pattern matching algorithms have been proposed (see [8,1]).
The time complexity of pattern matching algorithm is measured by the number
of symbol comparisons between pattern and text symbols. The Knuth-Morris-
Pratt (KMP) algorithm [19] is the rst one which runs in linear time proportional
to the sum of the pattern length m and the text length n. The algorithm re-
quires additional memory proportional to the pattern length m. One interesting
research direction is to develop an algorithm which uses only constant amount
of memory, preserving the linear time complexity (see [11,7,5,13,12]). Another
important direction is to develop an algorithm which makes a sublinear number
of comparisons on the average, as in the Boyer-Moore (BM) algorithm [4] and
its variants (see [24]). The lower bound of the average case time complexity is
known to be O(n log m m) [27], and this bound is achieved by the algorithm
presented in [6].

G. Bongiovanni, G. Gambosi, R. Petreschi ( ds.): CIAC 2000, LNCS 1767, pp. 306 315, 2000.

c Springer-Verlag Berlin Heidelberg 2000
Speeding Up Pattern Matching by Text Compression 307

From a practical viewpoint, the constant hidden behind O-notation plays


an important role. Horspool’s variant [14] and Sunday’s variant [22] of the BM
algorithm are widely known to be very fast in practice. In fact, the former is
incorporated into a software package Agrep, which is understood as the fastest
pattern matching tool developed by Wu and Manber [25].
Recently, a new trend for accelerating pattern matching has emerged: speed-
ing up pattern matc ing by text compression. It was rst introduced by Manber
[20]. Contrary to the traditional aim of text compression to reduce space re-
quirement of text les on secondary disk storage devices , text is compressed
in order to speed up the pattern matching process.
It should be mentioned that the problem of pattern matching in compressed
text without decoding, which is often referred to as compressed pattern matc -
ing, has been studied extensively in this decade. The motivation is to investigate
the complexity of this problem for various compression methods from the view-
point of combinatorial pattern matching. It is theoretically interesting, and in
practice some algorithms proposed are indeed faster than a regular decompres-
sion followed by a simple search. In fact, Kida et al. [18,17] and Navarro et al.
[21] independently presented compressed pattern matching algorithms for the
Lempel-Ziv-Welch (LZW) compression which run faster than a decompression
followed by a search. However, the algorithms are slow in comparison with pat-
tern matching in uncompressed text if we compare the CPU time. In other words,
the LZW compression did not speed up the pattern matching.
When searching text les stored in secondary disk storage, the running time
is the sum of le I/O time and CPU time. Obviously, text compression yields a
reduction in the le I/O time at nearly the same rate as the compression ratio.
However, in the case of an adaptive compression method, such as Lempel-Ziv
family (LZ77, LZSS, LZ78, LZW), a considerable amount of CPU time is devoted
to an extra e ort to keep track of the compression mechanism. In order to reduce
both of le I/O time and CPU time, we have to nd out a compression scheme
that requires no such extra e ort. Thus we must re-estimate the performance of
existing compression methods or develop a new compression method in the light
of the new criterion: t e time for nding a pattern in compressed text directly.
As an e ective tool for such re-estimation, we introduced in [16] a unify-
ing framework, named collage system, which abstracts various dictionary-based
compression methods, such as Lempel-Ziv family, and the static dictionary meth-
ods. We developed a general compressed pattern matching algorithm for strings
described in terms of collage system. Therefore, any of the compression meth-
ods that can be described in the framework has a compressed pattern matching
algorithm as an instance.
Byte pair encoding (BPE, in short) [10], included in the framework of col-
lage systems, is a simple universal text compression scheme based on the pattern-
substitution [15]. The basic operation of the compression is to substitute a single
character which did not appear in the text for a pair of consecutive two char-
acters which frequently appears in the text. This operation will be repeated
until either all characters are used up or no pair of consecutive two characters
308 Yusuke Shibata et al.

appears frequently. Thus the compressed text consists of two parts: the substi-
tution table, and the substituted text. Decompression is very fast and requires
small work space. Moreover, partial decompression is possible, since the com-
pression depends only on the substitution. This is a big advantage of BPE in
comparison with adaptive dictionary based methods. Despite such advantages,
the BPE method has received little attention, until now. The reason for this is
mainly the following two disadvantages: the compression is terribly slow, and
the compression ratio is not as good as other methods such as Lempel-Ziv type
compression.
In this paper, we pull out a potential advantage of BPE compression, that
is, we show that BPE is very suitable for speeding up pattern matching. Man-
ber [20] also introduced a little simpler compression method. However since its
compression ratio is not so good and is about 70% for typical English texts, the
improvement of the searching time cannot be better than this rate. The com-
pression ratio of BPE is about 60% for typical English texts, and is near 30%
for biological sequences. We propose a compressed pattern matching algorithm
which is basically an instance of the general one mentioned above. Experimental
results show that, in CPU time comparison, the performance of the proposed
algorithm running on BPE compressed les of biological sequences is better than
that of Agrep running on uncompressed le of the same sequences. This is not
the case for English text les. Moreover, the results show that, in elapsed time
comparison, the algorithm drastically defeats Agrep even for English text les.
It should be stated that Moura et al. [9] proposed a compression scheme that
uses a word-based Hu man encoding with a byte-oriented code. The compres-
sion ratio for typical English texts is about 30%. They presented a compressed
pattern matching algorithm and showed that it is twice faster than Agrep on
uncompressed text in the case of exact match. However, the compression method
is not applicable to biological sequences because they cannot be segmented into
words. For the same reason, it cannot be used for natural language texts written
in Japanese in which we have no blank symbols between words.
Recall that the key idea of the Boyer-Moore type algorithms is to skip sym-
bols of text, so that they do not read all the text symbols on the average. The
algorithms are intended to avoid ‘redundunt’ symbol comparisons. Analogously,
our algorithm also skips symbols of text in the sense that more than one symbol
is encoded as one character code. In other words, our algorithm avoids processing
of redundant information about text. Note that the redundancy varies depend-
ing on the pattern in the case of the Boyer-Moore type algorithms, whereas it
depends only on the text in the case of speeding up by compression.
The rest of the paper is organized as follows. In Section 2, we introduce
the byte pair encoding scheme, discuss its implementation, and estimate its
performance in comparison with Compress and Gzip. Section 3 is devoted to
compressed pattern matching in BPE compressed les, where we have two im-
plementations using the automata and the bit-parallel approaches. In Section 4,
we report our experimental results to compare practical behaviors of these al-
Speeding Up Pattern Matching by Text Compression 309

gorithms performed. Section 5 concludes the discussion and explains some of


future works.

2 Byte Pair ncoding

In this section we describe the byte pair encoding scheme, discuss its imple-
mentation, and then estimate the performance of this compression scheme in
comparison with widely-known compression tools Compress and Gzip.

.1 Compression Algorithm

The BPE compression is a simple version of pattern-substitution method [10].


It utilizes the character codes which did not appear in the text to represent
frequently occurring strings, namely, strings of which frequencies are greater
than some threshold. The compression algorithm repeats the following task until
all character codes are used up or no frequent pairs appear in the text:

Find t e most frequent pair of consecutive two c aracter codes in t e


text, and t en substitute an unused code for t e occurrences of t e pair.

For example, suppose that the text to be compressed is

T0 = ABABCDEBDEFABDEABC

Since the most frequent pair is AB, we substitute a code G for AB, and obtain the
new text
T1 = GGCDEBDEFGDEGC

Then the most frequent pair is DE, and we substitute a code H for it to obtain

T2 = GGCHBHFGHGC

By substituting a code I for GC, we obtain

T3 = GIHBHFGHI

The text length is shorten from T0 = 18 to T3 = 9. Instead we have to encode


the substitution pairs AB G, DE H, and GC I.
More precisely, we encode a table which stores for every character code what
it represents. Note that a character code can represent either (1) the character
itself, (2) a code-pair, or (3) nothing. Let us call such table substitution table. In
practical implementations, an original text le is split into a number of xed-size
blocks, and the compression algorithm is then applied to each block. Therefore
a substitution table is encoded for each block.
310 Yusuke Shibata et al.

. Speeding Up of Compression
In [10] an implementation of BPE compression is presented, which seems quite
simple. It requires O( N ) time, where N is the original text length and is the
number of character codes. The time complexity can be improved into O( + N )
by using a relatively simple technique, but this improvement did not reduce the
compression time in practice. Thus, we decided to reduce the compression time
with sacri ces in the compression ratio.
The idea is to use a substitution table obtained from a small part of the text
(e.g. the rst block) for encoding the whole text. The disadvantage is that the
compression ratio decreases when the frequency distribution of character pairs
varies depending on parts of the text. The advantage is that a substitution table
is encoded only once. This is a desirable property from a practical viewpoint of
compressed pattern matching in the sense that we have to perform only once any
task which depends on the substitution table as a preprocessing since it never
changes.
Fast execution of the substitutions according to the table is achieved by an
e cient multiple key replacement technique [2,23], in which a one-way sequential
transducer is built from a given collection of replacement pairs which performs
the task in only one pass through a text. When the keys have overlaps, it replaces
the longest possible rst occurring key. The running time is linear in the total
length of the original and the substituted text.

.3 Comparison with Compress and Gzip


We compared the performance of BPE compression with those of Compress and
Gzip. We implemented the BPE compression algorithm both in the standard way
described in [10] and in the modi ed way stated in Section 2.2. The Compress
program has an option to specify in bits the upper bound to the number of
strings in a dictionary, and we used Compress with speci cation of 12 bits and
16 bits. Thus we tested ve compression programs.
We estimated the compression ratios of the ve compression programs for
the four texts shown in Table 1. The results are shown in Table 2. We can see
that the compression ratios of BPE are worse than those of Compress and Gzip,

Table 1. Four Text Files.

le annotation
Brown corpus A well-known collection of nglish sentences, which was com-
(6.4 Mbyte) piled in the early 1960s at Brown University, USA.
Medline A clinically-oriented subset of Medline, consisting of 348,566 ref-
(60.3 Mbyte) erences.
Genbank1 A subset of the GenBank database, an annotated collection of
(43.3 Mbyte) all publicly available DNA sequences.
Genbank2 The le obtained by removing all elds other than accession
(17.1 Mbyte) number and nucleotide sequence from the above one.
Speeding Up Pattern Matching by Text Compression 311

especially for English texts. We also estimated the CPU times for compression
and decompression. Although we omit here the results because of lack of space,
we observed that the BPE compression was originally very slow, and it is dras-
tically accelerated by the modi cation stated in Section 2.2. In fact, the original
BPE compression is 4 5 times slower than Gzip, whereas the modi ed one
is 4 5 times faster than Gzip and is competitive with Compress with 12 bit
option.
Thus, BPE is not so good from the traditional criteria. This is the reason
why it has received little attentions, until now. However, it has the following
properties which are quite attractive from the practical viewpoint of compressed
pattern matching: (1) No bit-wise operations are required since all the codes
are of 8 bits; (2) Decompression requires very small amount of memory; and
(3) Partial decompression is possible, that is, we can decompress any portion of
compressed text.
In the next section, we will show how we can perform compressed pattern
matching e ciently in the case of BPE compression.

Table . Compression Ratios (%).

BP Compress Gzip
standard modi ed 12bit 16bit
Brown corpus (6.8Mb) 51.08 59.02 51.67 43.75 39.04
Medline (60.3Mb) 56.20 59.07 54.32 42.34 33.35
Genbank1 (43.3Mb) 46.79 51.36 43.73 32.55 24.84
Genbank2 (17.1Mb) 30.80 32.50 29.63 26.80 23.15

3 Pattern Matching in BP Compressed Texts

For searching a compressed text, the most naive approach would be the one
which applies any string matching routine with expanding the original text on
the fly. Another approach is to encode a given pattern and apply any string
matching routine in order to nd the encoded pattern directly in the compressed
text. The problem in this approach is that the encoded pattern is not unique.
A solution due to Manber [20] was to devise a way to restrict the number of
possible encodings for any string.
The approach we take here is basically an instance of the general compressed
pattern matching algorithm for strings described in terms of collage system [16].
As stated in Introduction, collage system is a unifying framework that abstracts
most of existing dictionary-based compression methods. In the framework, a
string is described by a pair of a dictionary D and a sequence S of tokens
representing phrases in D. A dictionary D is a sequence of assignments where
312 Yusuke Shibata et al.

the basic operations are concatenation, repetition, and pre x (su x) truncation.
A text compressed by BPE is described by a collage system with no truncation
operations. For a collage system with no truncation, the general compressed
pattern matching algorithm runs in O( D + S +m2 +r) time using O( D +m2 )
space, where D denotes the size of the dictionary D and S is the length of
the sequence S.
The basic idea of the general algorithm is to simulate the move of the KMP
automaton for input D and S. Note that one token of sequence S may represent
a string of length more than one, which causes a series of state transitions. The
idea is to substitute just one state transition for each such consecutive state
transitions. More formally, let : be the state transition function of
the KMP automaton, where is the alphabet and is the set of states. Extend

into the function : by
(q ) = q and (q ua) = ( (q u) a)

where q ,u , and a . Let D be the set of phrases in dictionary. Let
Jump be the limitation of to the domain D.
By identifying a token with the phrase it represents, we can de ne the new
automaton which takes as input a sequence of tokens and makes state transition
by using Jump. The state transition of the new machine caused by a token corre-
sponds to the consecutive state transitions of the KMP automaton caused by the
phrase represented by the token. Thus, we can simulate the state transitions of
the KMP automaton by using the new machine. However, the KMP automaton
may pass through the nal state during the consecutive transitions. Hence the
new machine should be a Mealy type sequential machine with output function
Output : D 2N de ned by
Output(q u) = i N1 i u and (q u[1 i]) is the nal state
where N denotes the set of natural numbers, and u[1 i] denotes the length i
pre x of string u.
In [16] e cient realizations of the functions Jump and Output were discussed
for general case. In the case of BPE compression, a simpler implementation is
possible. We take two implementations. One is to realize the state transition
function Jump de ned on D as a two-dimensional array of size D.
The array size is not critical since the number of phrases in D is at most 256
in BPE compression. This is not the case with LZW, in which D can be the
compressed text size.
Another implementation is the one utilizing the bit parallel paradigm in a
similar way that we did for LZW compression [17]. Technical details are omitted
because of lack of space.

4 xperimental Results
We estimated the running time of the proposed algorithms running on BPE
compressed les. We tested the two implementations mentioned in the previ-
ous section. For comparisons, we tested the algorithm [17] in searching LZW
Speeding Up Pattern Matching by Text Compression 313

compressed les. We also tested the KMP algorithm, the Shift-Or algorithm
[26,3], and Agrep (the Boyer-Moore-Horpspool algorithm) in searching uncom-
pressed les. The performance of the BM type algorithm strongly depends upon
the pattern length m, and therefore the running time of Agrep was tested for
m = 4 8 16. The performance of each algorithm other than Agrep is indepen-
dent of the pattern length. The text les we used are the same as the four text
les mentioned in Section 2. The machine used is a PC with a Pentium III pro-
cessor at 500MHz running TurboLinux 4.0 operating system. The data transfer
speed was about 7.7 Mbyte/sec.
The results are shown in Table 3, where we included the preprocessing time.
In this table, (a) and (b) stand for the automata and the bit-parallel implemen-
tations stated in the previous section, respectively.

Table 3. Performance Comparisons.

BP LZW uncompressed
Agrep
(a) (b) [17] KMP Shift-Or
m = 4 m = 8 m = 16
Brown Corpus 0.09 0.16 0.94 0.13 0.11 0.09 0.07 0.07
CPU time Medline 1.03 1.43 6.98 1.48 1.28 0.85 0.69 0.63
(sec) Genbank1 0.52 0.89 4.17 0.81 0.76 0.72 0.58 0.53
Genbank2 0.13 0.22 1.33 0.32 0.29 0.27 0.32 0.32
Brown Corpus 0.59 0.54 1.17 0.91 1.01 0.91 0.90 0.90
elapsed time Medline 4.98 4.95 7.53 8.38 8.26 8.01 7.89 7.99
(sec) Genbank1 3.04 2.95 4.48 6.26 6.32 6.08 5.67 5.64
Genbank2 0.76 0.73 1.46 2.28 2.33 2.19 2.18 2.14

First of all, it is observed that, in CPU time comparison, the automata-based


implementation of the proposed algorithm in searching BPE compressed le is
faster than each of the routines except Agrep. Comparing with Agrep, it is good
for Genbank1 and Genbank2, but not so for other two les. The reason for this is
that the performance of the proposed algorithm depends on compression ratio.
Recall that the compression ratios for Genbank1 and Genbank2 are relatively
high in comparison with those of Brown corpus and Medline.
From a practical viewpoint, the running speed in elapsed time is also impor-
tant, although it is not easy to measure accurate values of elapsed time. Table 3
implies that the proposed algorithm is the fastest in the elapsed time comparison.

5 Conclusion

We have shown potential advantages of BPE compression from a viewpoint of


compressed pattern matching.
314 Yusuke Shibata et al.

The number of tokens in BPE is limited to 256 so that all the tokens are en-
coded in 8 bits. The compression ratio can be improved if we raise the limitation
to the number of tokens. A further improvement is possible by using variable-
length codewords. However, it is preferable to use xed-length codewords with 8
bits from the viewpoint of compressed pattern matching since we want to keep
the search on a byte level for e ciency.
One future direction of this study will be to develop approximate pattern
matching algorithms for BPE compressed text.

References

1. A. Apostolico and Z. Galil. Pattern Matching Algorithm. Oxford University Press,


New York, 1997.
2. S. Arikawa and S. Shiraishi. Pattern matching machines for replacing several
character strings. Bulletin of Informatics and Cybernetics, 21(1 2):101 111, 1984.
3. R. Baeza-Yates and G. H. Gonnet. A new approach to text searching. Comm.
ACM, 35(10):74 82, 1992.
4. R. S. Boyer and J. S. Moore. A fast string searching algorithm. Comm. ACM,
20(10):62 72, 1977.
5. D. Breslauer. Saving comparisons in the Crochemore-Perrin string matching algo-
rithm. In Proc. of 1st uropean Symp. on Algorithms, pages 61 72, 1993.
6. M. Crochemore, A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq, W. Plandowski,
and W. Rytter. Speeding up two string-matching algorithm. Algorithmica,
12(4/5):247 267, 1994.
7. M. Crochemore and D. Perrin. Two-way string-matching. J. ACM, 38(3):651 675,
1991.
8. M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, New
York, 1994.
9. . S. de Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Direct pattern match-
ing on compressed text. In Proc. 5th International Symp. on String Processing and
Information Retrieval, pages 90 95. I Computer Society, 1998.
10. P. Gage. A new algorithm for data compression. The C Users Journal, 12(2), 1994.
11. Z. Galil and J. Seiferas. Time-space-optimal string matching. J. Comput. System
Sci., 26(3):280 294, 1983.
12. L. Gasieniec, W. Plandowski, and W. Rytter. Constant-space string matching with
smaller number of comparisons: Sequential sampling. In Proc. 6th Ann. Symp. on
Combinatorial Pattern Matching, pages 78 89. Springer-Verlag, 1995.
13. L. Gasieniec, W. Plandowski, and W. Rytter. The zooming method: a recursive ap-
proach to time-space e cient string-matching. Theoret. Comput. Sci, 147(1/2):19
30, 1995.
14. R. N. Horspool. Practical fast searching in strings. Software-Practice and xperi-
ence, 10:501 506, 1980.
15. G. C. Jewell. Text compaction for information retrieval. I SMC Newsletter,
5, 1976.
16. T. Kida, Y. Shibata, M. Takeda, A. Shinohara, and S. Arikawa. A unifying frame-
work for compressed pattern matching. In Proc. 6th International Symp. on String
Processing and Information Retrieval, pages 89 96. I Computer Society, 1999.
Speeding Up Pattern Matching by Text Compression 315

17. T. Kida, M. Takeda, A. Shinohara, and S. Arikawa. Shift-And approach to pattern


matching in LZW compressed text. In Proc. 10th Ann. Symp. on Combinatorial
Pattern Matching, pages 1 13. Springer-Verlag, 1999.
18. T. Kida, M. Takeda, A. Shinohara, M. Miyazaki, and S. Arikawa. Multiple pattern
matching in LZW compressed text. In J. A. Atorer and M. Cohn, editors, Proc.
Data Compression Conference ’98, pages 103 112. I Computer Society, 1998.
19. D. . Knuth, J. H. Morris, and V. R. Pratt. Fast pattern matching in strings.
SIAM J. Comput, 6(2):323 350, 1977.
20. U. Manber. A text compression scheme that allows fast searching directly in the
compressed le. In Proc. Combinatorial Pattern Matching, volume 807 of Lecture
Notes in Computer Science, pages 113 124. Springer-Verlag, 1994.
21. G. Navarro and M. Ra not. A general practical approach to pattern matching
over Ziv-Lempel compressed text. In Proc. 10th Ann. Symp. on Combinatorial
Pattern Matching, pages 14 36. Springer-Verlag, 1999.
22. D. M. Sunday. A very fast substring search algorithm. Comm. ACM, 33(8):132
142, 1990.
23. M. Takeda. An e cient multiple string replacing algorithm using patterns with
pictures. Advances in Software Science and Technology, 2:131 151, 1990.
24. B. W. Watson and G. Zwaan. A taxonomy of sublinear multiple keyword pattern
matching algorithms. Sci. of Comput. Programing., 27(2):85 118, 1996.
25. S. Wu and U. Manber. Agrep a fast approximate pattern-matching tool. In
Usenix Winter 1992 Technical Conference, pages 153 162, 1992.
26. S. Wu and U. Manber. Fast text searching allowing errors. Comm. ACM,
35(10):83 91, October 1992.
27. A. C.-C. Yao. The complexity of pattern matching for a random string. SIAM J.
Comput., 8(3):368 387, 1979.
Author Index

Abdalla, A., 17 Marchetti-Spaccamela, A., 1


Albrecht, A., 277 Merlini, D., 211
Arikawa, S., 306
Asano, T., 187 Nandy, S.C., 187
Ausiello, G., 1 Neyer, G., 113
Niedermeier, R., 174
Barcucci, ., 199
Nivat, M., 199
Battiato, S., 226
Noilhan, F., 87
Blom, M., 137
Böckenhauer, H.-J., 72
Paepe, W. de, 137
Brimkov, V. ., 291
Pani, G., 239
Brunetti, S., 199

Cantone, D., 150, 226 Rambau, J., 125


Caporaso, S., 239 Redstone, J., 32
Catalano, D., 226 Ruzzo, W.L., 32
Chin, F.Y., 163
Cincotti, G., 150, 226 Santha, M., 87
Codenotti, B., 291 Seibert, S., 72, 102
Covino, ., 239 Shibata, Y., 306
Crespi, V., 291 Shinohara, A., 306
Shinohara, T., 306
Damaschke, P., 63 Soisalon-Soininen, ., 253
Del Lungo, A., 199 Sprugnoli, R., 211
Deo, N., 17 Steinhöfel, K., 277
Dinur, I., 263 Stougie, L., 137
Fukamachi, S., 306
Takeda, M., 306
Gramm, J., 174
Unger, W., 72, 102
Hanke, S., 253
Harayama, T., 187 Vega, W.F. de la, 59
Hauptmeier, D., 125 Verri, M.C., 211
Hofri, M., 226
Hromkovic, J., 72 Wagner, F., 113
Kida, T., 306 Wang, C.A., 163
Klasing, R., 72 Wong, C.K., 277
Krumke, S.O., 125, 137
Yang, B., 163
Leonardi, S., 1
Leoncini, M., 291 Zaks, S., 44

You might also like