3 Warmup: A Single Pass Streaming Algorithm
In this section, we p resent a sin gle-pass streaming algorithm for maintaining a (2 + ǫ)-approximate
solution to th e densest subgraph problem. The algorithm handles a dynamic (turnstile) stream of
edge insertions/deletions in
˜
O(n) space. In particular, we do not worry about the update time of
our algorithm. Our main resu lt in th is section is summarized in Theorem
3.1.
Theorem 3.1. We can process a dynamic stream of updates in the graph G in
˜
O(n) space , and
with high probability return a (2 + O(ǫ))-approximation of d
∗
= max
S⊆V
ρ(S) at the end of the
stream.
Throughout this section, we fix a small constant ǫ ∈ (0, 1/2) and a sufficiently large constant
c > 1. Moreover, we set α ← (1 + ǫ)/(1 − ǫ), L ← 2 + ⌈log
(1+ǫ)
n⌉. The main technical lemma is
below and states that we can construct a (α, d, L)-decomposition by sampling
˜
O(n) edges.
Lemma 3.2. Fix an integer d > 0, and let S be a collection of cm(L − 1) log n/d mutually in-
dependent random samples (each consisting of one edge) from the edge-set E of the input graph
G = (V, E). With high probability we can construct from S an (α, d, L)-decomposition (Z
1
, . . . , Z
L
)
of G, using only
˜
O((n + m/d)) bits of space.
Proof. We partition the samples in S evenly among (L − 1) groups {S
i
} , i ∈ [L − 1]. Thus, each
S
i
is a collection of cm log n/d mutually independent random samples from the edge-set E, and,
furthermore, the collections {S
i
} , i ∈ [L − 1], themselves are mutually independent. Our algorithm
works as follows.
• Set Z
1
← V .
• For i = 1 to (L − 1): Set Z
i+1
← {v ∈ Z
i
: D
v
(Z
i
, S
i
) ≥ (1 − ǫ)αc log n}.
To analyze the correctness of the algorithm, d efine the (random) sets A
i
= {v ∈ Z
i
: D
v
(Z
i
, E) >
αd} and B
i
= {v ∈ Z
i
: D
v
(Z
i
, E) < d} for all i ∈ [L − 1]. Note that for all i ∈ [L − 1], the
random sets Z
i
, A
i
, B
i
are completely d etermin ed by the outcomes of the samples in {S
j
} , j < i.
In particular, the samples in S
i
are chosen independently of the sets Z
i
, A
i
, B
i
. Let E
i
be the event
that (a) Z
i+1
⊇ A
i
and (b) Z
i+1
∩ B
i
= ∅. By Definition
2.1, the output (Z
1
, . . . , Z
L
) is a valid
(α, d, L)-decomposition of G iff the event
T
L−1
i=1
E
i
occurs. Consider any i ∈ [L − 1]. Below, we show
that the event E
i
occurs with high probability. The lemma f ollows by taking a union bou nd over
all i ∈ [L − 1].
Fix any instantiation of the random set Z
i
. Condition on this event, and note that this event
completely determines the sets A
i
, B
i
. Consider any node v ∈ A
i
. Let X
v,i
(j) ∈ {0, 1} be an
indicator random variable for the event that the j
th
sample in S
i
is of the form (u, v), with u ∈
N
v
(Z
i
). Note that th e random variables {X
v,i
(j)}, j, are mutually independent. Furthermore, we
have E[X
v,i
(j)|Z
i
] = D
v
(Z
i
)/m > αd/m for all j. Since there are cm log n/d such samples in S
i
,
by lin earity of expectation we get: E[D
v
(Z
i
, S
i
)|Z
i
] =
P
j
E[X
v,i
(j)|Z
i
] > (cm log n/d) · (αd/m) =
αc log n. The node v is included in Z
i+1
iff D
v
(Z
i
, S
i
) ≥ (1 − ǫ)αc log n, and this event, in turn,
occurs with high p robability (by Chernoff bound ). Taking a union bound over all nodes v ∈ A
i
,
we conclude that P r[Z
i+1
⊇ A
i
| Z
i
] ≥ 1 − 1/(p oly n). Using a similar line of reasoning, we get
that Pr[Z
i+1
∩ B
i
= ∅ | Z
i
] ≥ 1 − 1/(poly n). I nvoking a union bound over these two events, we get
Pr[E
i
| Z
i
] ≥ 1 − 1/(poly n). Since this holds for all possible instantiations of Z
i
, th e event E
i
itself
occurs w ith high probability.
The space requ irement of the algorithm, ignoring poly log factors, is proportional to the number
of samples in S (which is cm(L − 1) log n/d) plus the number of nodes in V (which is n). Since
7