1356 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 32, 2023
subspace clustering method based on the latent representations
learned by autoencoders. Li et al. [42] incorporated the
adversarial training [43] with autoencoders in a unified multi-
view clustering framework. Fan et al. [44] introduced graph
autoencoders with graph constraints for multi-view clustering.
Yin et al. [45] proposed variational autoencoder based multi-
view clustering, where the shared features were learned with a
mixture of Gaussian distributions. Xu et al. [46] pointed out
to learn disentangled multi-view representations for clustering
with common and peculiar variables in variational autoen-
coders. Different from using the reconstruction objectives of
autoencoders, some works (e.g., [47], [48], [49]) proposed to
penalize the representation space with regularization constrains
for deep multi-view clustering. For example, Zhou et al. [47]
utilized encoder networks to extract informative features and
leveraged Gaussian kernel matrices to avoid the feature degen-
eration.
However, the real-word multi-view data always contain
missing data in some views, resulting in the inapplicability
of existing multi-view clustering methods. Therefore, deep
incomplete multi-view clustering is an important topic which
has attracted researchers’ attention in recent years.
B. Incomplete Multi-View Clustering
Incomplete multi-view clustering (IMVC) is also called par-
tial multi-view clustering in the literature. Traditional IMVC
methods utilize classic machine learning techniques such as
non-negative matrix factorization, kernel trick, graph learning,
and tensor techniques. Li et al. [22] proposed the non-negative
matrix factorization based method to handle incomplete multi-
view data. Hu et al. [26] incorporated weighted and regular-
ized matrix factorization in an online IMVC framework. The
recent matrix factorization based method [50] employed the
cosine similarity to preserve the manifold structures. Matrix
factorization based IMVC usually recovers the non-negative
matrix for the missing data with the available data. Similarly,
kernel based IMVC usually imputes the kernel matrix of
incomplete multi-view data by utilizing that of complete
multi-view data. For example, Guo et al. [23] presented a
kernel similarity based method with an anchor strategy for
partial multi-view clustering. Liu et al. [24] proposed multi-
ple kernel IMVC method, which completed each incomplete
base matrix of incomplete views with the learned consensus
matrix. The graph based IMVC is able to leverage graph
structure information to improve the recognition ability for
cluster patterns. For instance, the literature [31] established
graph regularization to achieve the consistency between the
available data and the imputed values for missing data. The
recent graph based method [51] considered the instance-to-
anchor and instance-to-instance similarities for spectral clus-
tering. Fang et al. [52] leveraged the biological evolution
theory to handle the unbalanced incompleteness in IMVC.
Recent works of tensor based IMVC (e.g., [27], [28], [29])
usually introduce low-rank tensor constraints to characterize
the high-order correlation and the inner structure among
multiple views. In recent years, deep learning based IMVC
has been attracting increasing attention. One of the natural
motivations is that the generative adversarial net (GAN [43])
can be applied to generate data for incomplete multi-view
data [33], [37]. Besides, Wen et al. [35] proposed a cognitive
deep incomplete multi-view clustering network, where the
nearest neighbor graph was constructed and the missing data
was filled by average values. Wei et al. [34] utilized shared
subspace representations to reconstruct the missing data via
a decoder of individual view. Recent work [36] stacked dual
prediction networks on autoencoders to perform data recovery
for incomplete data. Xu et al. [53] proposed to mine the non-
linear cluster complementarity among the incomplete multi-
view data. Tang et al. [54] proposed to dynamically impute
missing views with the learned semantic neighbors.
Most of the traditional and deep IMVC methods han-
dle incomplete multi-view data with imputation/recovery/
inference strategies. However, the inaccurate imputation values
for missing data will negatively affect the performance. This
issue is likely to occur when the number of missing data
is large. Additionally, previous IMVC methods usually learn
the common representations from the complete multi-view
data and generalize them to incomplete multi-view data. This
process might cause the distribution discrepancy between the
complete data and incomplete data. In this paper, we propose
an imputation-free deep IMVC method by considering dis-
tribution alignment in feature learning to address the above
issues.
III. METHOD
Notations: In this paper, {X
v
∈ R
N ×D
v
}
V
v=1
represents a
multi-view data set with V views, where D
v
is the dimen-
sionality of samples in the v-th view and N is the number
of samples. Moreover, we employ an indicator matrix A ∈
{0, 1}
N ×V
, where a
iv
∈ A, a
iv
= 0 denotes that the data of the
i-th sample in the v-th view is missing, and a
iv
= 1 represents
that data is not missing. Denoting the complete data of all
views as {X
v
C
}
V
v=1
and the incomplete data of individual view
as X
v
I
, respectively, for each x
v
i
, if there exists
P
V
v=1
a
iv
= V ,
then x
v
i
∈ X
v
C
; otherwise, x
v
i
∈ X
v
I
. Therefore, [X
v
C
; X
v
I
] ∈
R
N
v
×D
v
and the missing data result in N
v
≤ N . Table I lists
the defined notations and descriptions.
A. Motivation and Framework
Deep autoencoder has been widely applied in IMVC meth-
ods due to its ability of learning clustering-friendly fea-
tures [33], [35], [36]. Concretely, these methods optimize the
reconstruction loss L
R EC
of all views by
L
R EC
=
V
X
v=1
L
v
R EC
=
V
X
v=1
X
v
− D
θ
v
(E
ψ
v
(X
v
))
2
F
, (1)
where E
ψ
v
and D
θ
v
denote the encoder and decoder net-
works of the v-th view, respectively. The encoder network
converts the raw data [X
v
C
; X
v
I
] into the view-specific features
[Z
v
C
; Z
v
I
] ∈ R
N
v
×L
to learn underlying characteristics, i.e.,
[Z
v
C
; Z
v
I
] = E
ψ
v
([X
v
C
; X
v
I
]). (2)
Authorized licensed use limited to: Tsinghua University. Downloaded on November 25,2023 at 01:00:29 UTC from IEEE Xplore. Restrictions apply.