Generalized Transitive Distance with Minimum Spanning Random Forest
Zhiding Yu
1
, Weiyang Liu
2
, Wenbo Liu
1
, Xi Peng
3
, Zhuo Hui
1
, B. V. K. Vijaya Kumar
1
1
Dept. of Electrical and Computer Eng., Carnegie Mellon University
2
School of Electronic and Computer Eng., Peking University, P.R. China
3
I2R, Agency for Sci., Tech. and Research (A*STAR), Singapore
yzhiding@andrew.cmu.edu, wyliu@pku.edu.cn, pangsaai@gmail.com, kumar@ece.cmu.edu
Abstract
Transitive distance is an ultrametric with elegant
properties for clustering. Conventional transitive
distance can be found by referring to the mini-
mum spanning tree (MST). We show that such dis-
tance metric can be generalized onto a minimum s-
panning random forest (MSRF) with element-wise
max pooling over the set of transitive distance ma-
trices from an MSRF. Our proposed approach is
both intuitively reasonable and theoretically attrac-
tive. Intuitively, max pooling alleviates undesired
short links with single MST when noise is present.
Theoretically, one can see that the distance metric
obtained max pooling is still an ultrametric, render-
ing many good clustering properties. Comprehen-
sive experiments on data clustering and image seg-
mentation show that MSRF with max pooling im-
proves the clustering performance over single MST
and achieves state of the art performance on the
Berkeley Segmentation Dataset.
1 Introduction
Over the past decades, clustering has been and is still one of
the most important and fundamental machine learning prob-
lem. A number of clustering methods have been proposed,
ranging from the famous k-means algorithm and graph-based
approaches (such as single linkage algorithm)
[
Sibson, 1973
]
,
to the family of mode seeking
[
Comaniciu and Meer, 2002
]
,
spectral clustering
[
Ng et al., 2002; Zelnik-Manor and Per-
ona, 2004; Shi and Malik, 2000
]
, and subspace cluster-
ing
[
Elhamifar and Vidal, 2009; Liu et al., 2013; 2013;
Peng et al., 2013; 2015
]
. Despite the large variety of different
methods, some general principles are commonly considered
when evaluating the performance among different methods.
These principles include:
• Ability to discover clusters with arbitrary shape.
• Robustness against noise
• Scalability
The family of spectral clustering methods received much
attention and found wide applications for the excellent clus-
tering performance. Given n data points, eigendecomposition
(a)
(b)
Figure 1: (a) Clustering with transitive distance on a single
MST. (b) Clustering with the proposed framework.
is conducted on an n × n normalized pairwise similarity ma-
trix, followed by k-means to generate clusters. A key reason
for spectral clustering’s success lies in its ability to discover
non-convex latent structures. This comes from the the fac-
t that eigendecomposition projects data in to a kernel space
with nicely shaped clusters.
Spectral clustering is not the only family of methods that
can handle clusters with arbitrary shapes. Transitive distance
clustering (also known as path based clustering) provides an
elegant and intuitive non-eigendecomposition alternative also
effective in handling non-convex clusters. Specifically, tran-
sitive distance emphasizes the connectivity rather than abso-
lute distance between pairwise data. This is achieved by find-
ing the set of largest hops (edges) along all possible connect-
ing paths and defining the pairwise distance as the minimum
hop.
[
Fischer and Buhmann, 2003b
]
proposed the concept
of transitive distance and an agglomerative bottom-up clus-
tering framework. The idea of connectivity kernel was later
proposed in
[
Fischer et al., 2004
]
. Other works include the
transitive closure
[
Ding et al., 2006
]
and transitive affinity
[
Chang and Yeung, 2005; 2008
]
.
There exist some inherent connections between transitive
distance and minimum spanning tree. It was proved that the
transitive distance edge for all pairwise data lies on the mini-
mum spanning tree, if the maximum order (number of nodes)
of a path is equal to n. This, however, does not necessarily
mean that transitive distance clustering is identical to early
graph based method such as single linkage algorithm. There
are many nice properties associated with transitive distance,
one of them being that the transitive distance is an ultrametric