5
the considerable amount of reduction of the first term in the
overall objective, and the net result is the reduction of the
overall objective. Therefore, the optimization process is actually
moving the inconsistent parts from the original similarity matrix
W
piq
to the matrix E
piq
by minimizing the overall objective,
which is the core principle of how we simultaneously model
multi-view consistency and multi-view inconsistency in a
unified optimization framework.
We shall rewrite the objective function in order to better
apply optimization techniques to solve it. Let B be a v-by-v
matrix with its diagonal elements being β and off-diagonal
elements being γ. Then our objective function can be written
in a more compact form
min
α,S,
A
p1q
,...,A
pvq
v
ÿ
i“1
λ
i
›
›
α
i
A
piq
´ S
›
›
2
F
` (9)
v
ÿ
i,j“1
B
ij
λ
i
λ
j
α
i
α
j
sum
´
`
W
piq
´ A
piq
˘
˝
`
W
pjq
´ A
pjq
˘
¯
s.t. α
J
1 “ 1, α ě 0, S ě 0,
W
piq
ě A
piq
ě 0, i “ 1, . . . , v
where λ
i
is a parameter to incorporate the importance of the
i-th view, and a higher value indicates greater importance.
Particularly, the parameter λ
i
is introduced to provide the
users with an opportunity to incorporate prior knowledge
through it. Yet in this paper, we mainly focus on the un-
supervised multi-view clustering task, and the parameter
λ
i
“ 1 can be simply used for all views when no prior
knowledge is involved. When prior knowledge is involved,
the requirement for λ
i
is λ
i
ą 0, because negative value
does not make sense and if one wants to set λ
i
“ 0 then the
i-th view can be simply removed from the objective. In the
next section, the optimization problem will be theoretically
analyzed, and a highly efficient algorithm will be developed
to approximately solve it.
4 OPTIMIZATION
Though the proposed framework, if optimized properly, can
identify multi-view inconsistency and fuse consistent parts
into a unified graph, the optimization of objective (9) is
difficult for three reasons. Firstly, the objective function (9)
is not jointly convex on all variables, which rules out the
possibility of directly applying convex optimization tech-
niques. Secondly, there are a lot of coupling between the
variables, i.e., different variables are multiplied together,
which causes the exponent of variables as large as 4. Thirdly
and unfortunately, the optimization of objective (9) turns out
to be NP-hard, which will be explained in Section 4.7.1. In
the preliminary version of this paper [12], a projection based
method is applied to optimize the objective. Although the
projection based method is a good heuristic, we observe that
the objective increases in some optimization iterations, and
it may yield unsatisfactory clustering performance on some
datasets, as we will show in the Supplementary Material.
Based on better understandings of the problem, we
develop the following approaches to tackle these problems.
Firstly, we simplify the constraints by proving one constraint
can be automatically satisfied in Lemma 1. Secondly, objec-
tive (9) can be rewritten as two forms of quadratic functions,
corresponding to Subproblem (1) and Subproblem (3), re-
spectively, so that we can optimize them alternately. Finally,
Subproblem (3) consists of at least n nonconvex quadratic
programs (QPs) with box constraints, which are difficult to
solve since they are NP-hard [24]. By proving that these QPs
share a same Hessian that has a desired property (Lemma 2),
we are able to modify the d.c. algorithm [13] to solve them
all at once, which is much more efficient than solving them
sequentially.
To facilitate these approaches, the multi-view dense rep-
resentation is further proposed, where the nonzero elements
of adjacency matrices of all views are arranged into a dense
matrix. A significant advantage of the proposed approaches
is that they mainly involve matrix-vector and matrix-matrix
multiplications, which contribute to their high efficiency for
large-scale problems.
4.1 Constraint Simplification
We first show that the constraint S ě 0 in Problem (9) can
be removed while the global minimizer(s) remains the same.
Define the following sets:
G
0
“ tα ě 0 | α
J
1 “ 1u, (10)
G
i
“ tA
piq
| W
piq
ě A
piq
ě 0u, i “ 1, . . . , v, (11)
G “ G
0
ˆ G
1
ˆ ¨ ¨ ¨ ˆ G
v
ˆ R
nˆn
, (12)
G
`
“ G
0
ˆ G
1
ˆ ¨ ¨ ¨ ˆ G
v
ˆ R
nˆn
ě0
, (13)
G
´
“ G
0
ˆ G
1
ˆ ¨ ¨ ¨ ˆ G
v
ˆ pR
nˆn
zR
nˆn
ě0
q, (14)
where ˆ denotes the Cartesian product. Clearly, G
`
, G
´
Ă G
and G
`
Y G
´
“ G, G
`
X G
´
“ ∅. Then objective function
(9) is denoted by f : G Ñ R. The following lemma shows
that with the constraint S ě 0 removed, the minimizer of
the resulting problem still satisfies S ě 0.
Lemma 1. For every minimizer x
˚
of the problem
min
x
fpxq, s.t. x P G, (15)
we have x
˚
P G
`
.
Proof: Suppose x
˚
R G
`
. Then x
˚
P G
´
. Suppose
x
˚
“ pα, A
p1q
, . . . , A
pv q
, Sq, where S
pq
ă 0 for some
p, q P t1, . . . , nu. Let V “ tpp, qq | S
pq
ă 0u. Let
˜x “ pα, A
p1q
, . . . , A
pv q
,
˜
Sq P G
`
, where
˜
S P R
nˆn
ě0
such
that
˜
S
pq
“ 0 for all pp, qq P V and
˜
S
pq
“ S
pq
for all
pp, qq R V. Let c “
ř
v
k“1
λ
k
ř
pp,q qRV
pα
k
A
pkq
pq
´ S
pq
q
2
`
ř
v
i,j“1
B
ij
λ
i
λ
j
α
i
α
j
sumppW
piq
´ A
piq
q ˝ pW
pjq
´ A
pjq
qq. Then
fpx
˚
q “ f pα, A
p1q
, . . . , A
pv q
, Sq
“
v
ÿ
k“1
λ
k
ÿ
pp,q qPV
´
α
k
A
pkq
pq
´ S
pq
¯
2
` c
“
v
ÿ
k“1
λ
k
ÿ
pp,q qPV
´
`
α
k
A
pkq
pq
˘
2
` S
2
pq
´ 2S
pq
α
k
A
pkq
pq
¯
`c
ą
v
ÿ
k“1
λ
k
ÿ
pp,q qPV
´
α
k
A
pkq
pq
¯
2
` c
“ fp˜xq.
This contradicts that x
˚
is a minimizer of the problem (15).
Therefore, we conclude that x
˚
P G
`
.