Learning to Cluster Faces via Confidence and Connectivity Estimation
Lei Yang
1
, Dapeng Chen
2
, Xiaohang Zhan
1
, Rui Zhao
2
, Chen Change Loy
3
, Dahua Lin
1
1
The Chinese University of Hong Kong
2
SenseTime Group Limited,
3
Nanyang Technological University
{yl016, zx017, dhlin}@ie.cuhk.edu.hk, {chendapeng, zhaorui}@sensetime.com, ccloy@ntu.edu.sg
Abstract
Face clustering is an essential tool for exploiting the un-
labeled face data, and has a wide range of applications in-
cluding face annotation and retrieval. Recent works show
that supervised clustering can result in noticeable perfor-
mance gain. However, they usually involve heuristic steps
and require numerous overlapped subgraphs, severely re-
stricting their accuracy and efficiency. In this paper, we
propose a fully learnable clustering framework without
requiring a large number of overlapped subgraphs. In-
stead, we transform the clustering problem into two sub-
problems. Specifically, two graph convolutional networks,
named GCN-V and GCN-E, are designed to estimate the
confidence of vertices and the connectivity of edges, respec-
tively. With the vertex confidence and edge connectivity, we
can naturally organize more relevant vertices on the affin-
ity graph and group them into clusters. Experiments on two
large-scale benchmarks show that our method significantly
improves clustering accuracy and thus performance of the
recognition models trained on top, yet it is an order of mag-
nitude more efficient than existing supervised methods.
1. Introduction
Thanks to the explosive growth of annotated face
datasets [19, 11, 17], face recognition has witnessed great
progress in recent years [31, 27, 33, 7, 40]. Along with this
trend, the ever-increasing demand for annotated data has re-
sulted in prohibitive annotation costs. To exploit massive
unlabeled face images, recent studies [14, 39, 35, 38] pro-
vide a promising clustering-based pipeline and demonstrate
its effectiveness in improving the face recognition model.
They first perform clustering to generate “pseudo labels”
for unlabeled images and then leverage them to train the
model in a supervised way. The key to the success of these
approaches lies in an effective face clustering algorithm.
Existing face clustering methods roughly fall into two
categories, namely, unsupervised methods and supervised
methods. Unsupervised approaches, such as K-means [22]
Confident
Unconfident
Affinity Graph
Strong Connectivity
Clusters
Figure 1: The core idea of our approach. Vertices with different
colors represent different classes. Previous methods group all ver-
tices in the box into a cluster as they are densely connected, while
our approach, learning to estimate the confidence of belonging to a
specific class, is able to detect unconfident vertices that lie among
multiple classes. With the estimated vertex confidence, we further
learn to predict the edge connectivity. By connecting each ver-
tex to a neighbor with higher confidence and strongest connection,
we partition the affinity graph into trees, each of which naturally
represents a cluster.
and DBSCAN [9], rely on specific assumptions and lack
the capability of coping with the complex cluster structures
in real-world datasets. To improve the adaptivity to dif-
ferent data, supervised clustering methods have been pro-
posed [35, 38] to learn the cluster patterns. Yet, both ac-
curacy and efficiency are far from satisfactory. In partic-
ular, to cluster with the large-scale face data, existing su-
pervised approaches organize the data with numerous small
subgraphs, leading to two main issues. First, processing
subgraphs involves heuristic steps based on simple assump-
tions. Both subgraph generation [38] and prediction aggre-
gation [35] depend on heuristic procedures, thus limiting
their performance upper bound. Furthermore, the subgraphs
required by these approaches are usually highly overlapped,
arXiv:2004.00445v2 [cs.CV] 3 Apr 2020