15
overlaps with
s
1
(there exists at least one such string, namely
s
1
itself ). In general, if
e
i
<n
we set
b
i
+1
=
e
i
+ 1 and denote by
e
i
+1
the largest index of a string that overlaps with
s
b
i
+1
.Eventually,
we will get
e
t
=
n
for some
t
n
.
For each pair of strings (
s
b
i
;s
e
i
), let
k
i
>
0 be the length of the overlap b etween their leftmost
occurrences in
s
(this may b e dierent from their maximum overlap). Let
i
=
b
i
e
i
k
i
. Clearly,
f
set(
i
)
j
1
i
t
g
is a solution for
S
, of cost
P
i
j
i
j
.
The critical observation is that
i
does not overlap
i
+2
.We will prove this claim for
i
=1;
the same argument applies to arbitrary
i
. Assume for a contradiction that
1
overlaps
3
. Then
the o ccurrence of
s
b
3
in
s
overlaps the occurrence of
s
e
1
. Because
e
1
<b
2
<b
3
, it follows that the
occurrence of
s
b
3
overlaps the o ccurrence of
s
b
2
. Ths contradicts the fact that
e
2
is the highest
indexed string that overlaps with
s
b
2
.
Because of this observation, eachsymbol of
s
is covered by at most two of the
i
's. Hence
OPT
S
P
i
j
i
j
2
OPT.
2
Lemma 2.8 immediately gives:
Theorem 2.9
Algorithm 2.7 is a
2
H
n
factor algorithm for the shortest superstring problem.
Exercise 2.10
A more elab orate argumentshows that in fact Algorithm 2.2 achieves an ap-
proximation factor of
H
m
, where
m
is the cardinality of the largest sp ecied subset of
U
. Prove
this approximation guarantee.
Exercise 2.11
Show that a similar greedy strategy achieves an approximation guarantee of
H
n
for set multi-cover, a generalization of set cover in whichanintegral coverage requirement is also
specied for each element, and sets can b e picked multiple numb er of times to satisfy all coverage
requirements. Assume that the cost of picking
alpha
copies of set
S
i
is
cost(
S
i
).
Exercise 2.12
By giving an appropriate tight example, show that the analysis of Algorithm
2.2 cannot b e improved even if all sp ecied sets have unit cost. Hint: Consider running the greedy
algorithm on a vertex cover instance.
Exercise 2.13
The
maximum coverage problem
is the following: Given a universe
U
of
n
elements, with non-negativeweights specied, a collection of subsets of
U
,
S
1
;:::;S
l
, and an integer
k
, pick
k
sets so as to maximize the weight of elements covered. Show that the obvious algorithm,
of greedily picking the best set in each iteration until
k
sets are picked, achieves an approximation
factor of
(1
;
(1
;
1
k
)
k
)
>
(1
;
1
e
)
:
Exercise 2.14
Using set cover, obtain approximation algorithms for the following variants of
the shortest superstring problem (here
s
R
is the reverse of string
s
):
1. Find the shortest string
s
that contains for each string
s
i
2
S
,both
s
i
and
s
R
i
as subtrings.
2. Find the shortest string
s
that contains for each string
s
i
2
S
, either
s
i
or
s
R
i
as a subtring.
(Notice that this suces for the data compression application given earlier.)