3 Algorithms
In this section we present algorithms that seek to optimize (2.2) and (2.3). Our algorithms are iterative in
nature, and are directly inspired by the efficient algorithms of Lee and Seung [20]. Appealing properties
include ease of implementation and computational efficiency.
Note that the problems (2.2) and (2.3) are not jointly convexin B and C, so it is not easy to obtain globally
optimal solutions in polynomial time. Our iterative procedures start by initializing B and C randomly or
otherwise. Then, B and C are alternately updated until there is no further appreciable change in the objective
function value.
3.1 Algorithms for (2.2)
We utilize the concept of auxiliary functions [20] for our derivations. It is sufficient to illustrate our methods
using a single column of C (or row of B), since our divergences are separable.
Definition 3.1 (Auxiliary function). A function G(c, c
′
) is called an auxiliary function for F (c) if:
1. G(c, c) = F (c), and
2. G(c, c
′
) ≥ F (c) for all c
′
.
Auxiliary functions turn out to be useful due to the following lemma.
Lemma 3.2 (Iterative minimization). If G(c, c
′
) is an auxiliary function for F (c), then F is non-increasing
under the update
c
t+1
= argmin
c
G(c, c
t
).
Proof. F (c
t+1
) ≤ G(c
t+1
, c
t
) ≤ G(c
t
, c
t
) = F (c
t
).
As can be observed, the sequence formed by the iterative application of Lemma 3.2 leads to a monotonic
decrease in the objective function value F (c). For an algorithm that iteratively updates c in its quest to min-
imize F (c), the method for proving convergence boils down to the construction of an appropriate auxiliary
function. Auxiliary functions have been used in many places before, see for example [5, 20].
We now construct simple auxiliary functions for (2.2) that yield multiplicative updates. To avoid clutter
we drop the functions α and β from (2.2), noting that our methods can easily be extended to incorporate these
functions.
Suppose B is fixed and we wish to compute an updated column of C. We wish to minimize
F (c) = D
ϕ
(Bc, a), (3.1)
where a is the column of A corresponding to the column c of C. The lemma below shows how to construct
an auxiliary function for (3.1). For convenience of notation we use ψ to denote ∇ϕ for the rest of this section.
Lemma 3.3 (Auxiliary function). The function
G(c, c
′
) =
X
ij
λ
ij
ϕ
b
ij
c
j
λ
ij
−
X
i
ϕ(a
i
) − ψ(a
i
)
(Bc)
i
− a
i
,
(3.2)
with λ
ij
= (b
ij
c
′
j
)/(
P
l
b
il
c
′
l
), is an auxiliary function for (3.1). Note that by definition
P
j
λ
ij
= 1, and as
both b
ij
and c
′
j
are nonnegative, λ
ij
≥ 0.
Proof. It is easy to verify that G(c, c) = F (c), since
P
j
λ
ij
= 1. Using the convexity of ϕ, we conclude
that if
P
j
λ
ij
= 1 and λ
ij
≥ 0, then
F (c) =
X
i
ϕ
X
j
b
ij
c
j
− ϕ(a
i
) − ψ(a
i
)
(Bc)
i
− a
i
≤
X
ij
λ
ij
ϕ
b
ij
c
j
λ
ij
−
X
i
ϕ(a
i
) − ψ(a
i
)
(Bc)
i
− a
i
= G(c, c
′
).
3