Deep Spectral Clustering using Dual Autoencoder Network
Xu Yang
1
, Cheng Deng
1∗
, Feng Zheng
2
, Junchi Yan
3
, Wei Liu
4∗
1
School of Electronic Engineering, Xidian University, Xian 710071, China
2
Department of Computer Science and Engineering, Southern University of Science and Technology
3
Department of CSE, and MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University
4
Tencent AI Lab, Shenzhen, China
{xuyang.xd, chdeng.xd}@gmail.com, zhengf@sustc.edu.cn,
yanjunchi@sjtu.edu.cn, wl2223@columbia.edu
Abstract
The clustering methods have recently absorbed even-
increasing attention in learning and vision. Deep cluster-
ing combines embedding and clustering together to obtain
optimal embedding subspace for clustering, which can be
more effective compared with conventional clustering meth-
ods. In this paper, we propose a joint learning framework
for discriminative embedding and spectral clustering. We
first devise a dual autoencoder network, which enforces the
reconstruction constraint for the latent representations and
their noisy versions, to embed the inputs into a latent space
for clustering. As such the learned latent representations
can be more robust to noise. Then the mutual information
estimation is utilized to provide more discriminative infor-
mation from the inputs. Furthermore, a deep spectral clus-
tering method is applied to embed the latent representations
into the eigenspace and subsequently clusters them, which
can fully exploit the relationship between inputs to achieve
optimal clustering results. Experimental results on bench-
mark datasets show that our method can significantly out-
perform state-of-the-art clustering approaches.
1. Introduction
As an important task in unsupervised learning [39, 8, 20]
and vision communities, clustering has been widely used
in image segmentation [33], image categorization [41], and
digital media analysis [1]. The goal of clustering is to find
a partition in order to keep similar data points in the same
cluster while dissimilar ones in different clusters. In recen-
t years, many clustering methods have been proposed, such
as K-means clustering [24], spectral clustering [27, 42], and
non-negative matrix factorization clustering [37], among
which K-means and spectral clustering are two well-known
∗
Corresponding author.
(a) Raw data (b) ConvAE (c) Our method
Figure 1. Visualizing the discriminative embedding capability on
MNIST-test with t-SNE algorithm. (a): the space of raw data, (b):
data points in the latent subspace of convolution autoencoder; (c):
data points in the latent subspace of the proposed autoencoder net-
work. Our method can provide a more discriminative embedding
subspace.
conventional algorithms that are applicable to a wide range
of various tasks. However, these shallow clustering method-
s depend on low-level features such as raw pixels, SIFT [28]
or HOG [7] of the inputs. Their distance metrics are only
exploited to describe local relationships in data space, and
have limitation to represent the latent dependencies among
the inputs [3].
This paper presents a novel deep learning based unsu-
pervised clustering approach. Deep clustering, which inte-
grates embedding and clustering processes to obtain opti-
mal embedding subspace for clustering, can be more effec-
tive than shallow clustering methods. The main reason is
that the deep clustering methods can effectively model the
distribution of the inputs and capture the non-linear proper-
ty, being more suitable to real-world clustering scenarios.
Recently, many clustering methods are promoted by
deep generative approaches, such as autoencoder net-
work [25]. The popularity of the autoencoder network lies
in its powerful ability to capture high dimensional probabil-
ity distributions of the inputs without supervised informa-
tion. The encoder model projects the inputs into the latent
1
4066