
Power Iteration Clustering
Frank Lin frank@cs.cmu.edu
William W. Cohen wcohen@cs.cmu.edu
Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213 USA
Abstract
We present a simple and scalable graph clus-
tering method called power iteration cluster-
ing (PIC). PIC finds a very low-dimensional
embedding of a dataset using truncated
power iteration on a normalized pair-wise
similarity matrix of the data. This em-
bedding turns out to be an effective cluster
indicator, consistently outperforming widely
used spectral methods such as NCut on real
datasets. PIC is very fast on large datasets,
running over 1,000 times faster than an NCut
implementation based on the state-of-the-art
IRAM eigenvector computation technique.
1. Introduction
We present a simple and scalable clustering method
called power iteration clustering (hereafter PIC). In
essence, it finds a very low-dimensional data embed-
ding using truncated power iteration on a normalized
pair-wise similarity matrix of the data points, and this
embedding turns out to be an effective cluster indica-
tor.
In presenting PIC, we make connections to and make
comparisons with spectral clustering, a well-known, el-
egant and effective clustering method. PIC and spec-
tral clustering methods are similar in that both em-
bed data points in a low-dimensional subspace derived
from the similarity matrix, and this embedding pro-
vides clustering results directly or through a k-means
algorithm. They are different in what this embedding
is and how it is derived. In spectral clustering the
embedding is formed by the bottom eigenvectors of
the Laplacian of an similarity matrix. In PIC the em-
bedding is an approximation to a eigenvalue-weighted
linear combination of all the eigenvectors of an nor-
Appearing in Proceedings of the 27
th
International Confer-
ence on Machine Learning, Haifa, Israel, 2010. Copyright
2010 by the author(s)/owner(s).
malized similarity matrix. This embedding turns out
to be very effective for clustering, and in comparison
to spectral clustering, the cost (in space and time) of
explicitly calculating eigenvectors is replaced by that
of a small number of matrix-vector multiplications.
We test PIC on a number of different types of datasets
and obtain comparable or better clusters than existing
spectral metho ds. However, the highlights of PIC are
its simplicity and scalability — we demonstrate that
a basic implementation of this method is able to par-
tition a network dataset of 100 million edges within
a few seconds on a single machine, without sampling,
grouping, or other preprocessing of the data.
This work is presented as follows: we begin by describ-
ing power iteration and how its convergence property
indicates cluster membership and how we can use it
to cluster data (Section 2). Then we show exp erimen-
tal results of PIC on a numb er of real and synthetic
datasets and compare them to those of spectral cluster-
ing, both in cluster quality (Section 3) and scalability
(Section 4). Next, we survey related work (Section 5),
differentiating PIC from clustering methods that em-
ploy matrix powering and from methods that modifies
the “traditional” spectral clustering to improve on ac-
curacy or scalability. Finally, we conclude with why
we believe this simple and scalable clustering method
is very practical — easily implemented, parallelized,
and well-suited to very large datasets.
2. Power Iteration Clustering
2.1. Notation and Background
Given a dataset X = {x
1
, x
2
, ..., x
n
}, a similarity func-
tion s(x
i
, x
j
) is a function where s(x
i
, x
j
) = s(x
j
, x
i
)
and s ≥ 0 if i 6= j, and following previous work
(Shi & Malik, 2000), s = 0 if i = j. An affinity matrix
A ∈ R
n×n
is defined by A
ij
= s(x
i
, x
j
). The de-
gree matrix D associated with A is a diagonal matrix
with d
ii
=
P
j
Aij. A normalized affinity matrix W is
defined as D
−1
A. Below we will view W interchange-
ably as a matrix, and an undirected graph with nodes