Clustering by Passing Messages
Between Data Points
Brendan J. Frey* and Delbert Dueck
Clustering data by identifying a subset of representative examples is important for processing
sensory signals and detecting patterns in data. Such “exemplars” can be found by randomly
choosing an initial subset of data points and then iteratively refining it, but this works well only if
that initial choice is close to a good solution. We devised a method called “affinity propagation,”
which takes as input measures of similarity between pairs of data points. Real-valued messages are
exchanged between data points until a high-quality set of exemplars and corresponding clusters
gradually emerges. We used affinity propagation to cluster images of faces, detect genes in
microarray data, identify representative sentences in this manuscript, and identify cities that are
efficiently accessed by airline travel. Affinity propagation found clusters with much lower error than
other methods, and it did so in less than one-hundredth the amount of time.
C
lustering data based on a measure of
similarity is a critical step in scientific
data analysis and in engineering sys-
tems. A common approach is to use data to
learn a set of centers such that the sum of
squared errors between data points and their
nearest centers is small. When the centers are
selected from actual data points, they are called
“exemplars.” The popular k-centers clustering
technique (1) begins with an initial set of ran-
domly selected exemplars and iteratively refines
this set so as to decrease the sum of squared
errors. k-centers clustering is quite sensitive to
the initial selection of exemplars, so it is usually
rerun many times with different initializations in
an attempt to find a good solution. However ,
this works well only when the number of clus-
ters is small and chances are good that at least
one random initialization is close to a good
solution. We take a quite different approach
and introduce a method that simultaneously
considers all data points as potential exem-
plars. By viewing each data point as a node in
a network, we devised a method that recur-
sively transmits real-valued messages along
edges of the network until a good set of ex-
emplars and corresponding clusters emerges.
As described later, messages are updated on
the basis of simple formulas that search for
minima of an appropriately chosen energy
function. At any point in time, the magnitude
of each message reflects the current affinity
that one data point has for choosing another
data point as its exemplar , so we call our meth-
od “affinity propagation.” Figure 1A illus-
trates how clusters gradually emerge during
the message-passing procedure.
Affinity propagation takes as input a col-
lection of real-valued similarities between data
points, where the similarity s(i,k) indicates
how well the data po int with index k is suited
to be the exemplar for data point i. When the
goal is to minimize squared error, each sim-
ilarity is set to a negative squared error (Eu-
clidean distance): For points x
i
and x
k
, s(i,k)=
−||x
i
− x
k
||
2
. Indeed, the method described here
can be applied when the optimization criterion is
much more general. Later, we describe tasks
where similarities are derived for pairs of im-
ages, pairs of microarray measurements, pairs of
English sentences, and pairs of cities. When an
exemplar-dependent probability model is avail-
able, s(i,k) can be set to the log-likelihood of
data point i given that its exemplar is point k.
Alternatively, when appropriat e, similarit ies
may be set by hand.
Rather than requiring that the number of
clusters be prespecified, affinity propagation
takes as input a real number s(k,k) for each data
point k so that data points with larger values
of s(k,k) are more likely to be chosen as ex-
emplars. These values are referred to as “pref-
erences.” The number of identified exemplars
(number of clusters) is influenced by the values
of the input preferences, but also emerges from
the message-passing procedure. If a priori, all
data points are equally suitable as exemplars, the
preferences should be set to a common value—
this value can be varied to produce different
numbers of clusters. The shared value could
be the median of the input similarities (resulting
in a moderate number of clusters) or their
minimum (resulting in a small number of
clusters).
There are two kinds of message exchanged
between data points, and each takes into ac-
count a different kind of competition. Mes-
sages can be combined at any stage to decide
which points are exemplars and, for every
other point, which exemplar it belongs to. The
“responsibility” r(i,k), sent from data point i to
candidate exemplar point k, reflects the ac-
cumulated evidence for how well-suited point
k is to serve as the exemplar for point i, taking
into account other potential exemplars for
point i (Fig. 1B). The “availability” a(i,k), sent
from candidate exemplar point k to point i,
reflects the accumulated evidence for how
appropriate it would be for point i to choose
point k as its exemplar, taking into account the
support from other points that point k should be
an exemplar (Fig. 1C). r(i,k)anda(i,k) can be
viewed as log-probabilit y ratios. To begin
with, the availabilities are initialized to zero:
a(i,k) = 0. Then, the responsibilities are com-
puted using the rule
rði, kÞ ← s ð i,kÞ − max
k
′
s:t: k
′
≠ k
faði,k
′
Þþsði; k
′
Þg
ð1Þ
In the first iteration, because the availabilities
are zero, r(i,k) is set to the input similarity
between point i and point k as its exemplar,
minus the largest of the similarities between
point i and other candidate exemplars. This
competitive update is data-driven and does not
take into account how many other points favor
each candidate exemplar. In later iterations,
when some points are effectively assigned to
other exemplars, their availabilities will drop
below zero as prescribed by the update rule
below. These negative availabilities will de-
crease the effective values of some of the input
similarities s(i,k′) in the above rule, removing
the corresponding candidate exemplars from
competition. For k = i, the responsibility r(k,k)
is set to the input preference that point k be
chosen as an exemplar, s(k,k), minus the largest
of the similarities between point i and all other
candidate exemplars. This “self-responsibility”
reflects accumulated evidence that point k is an
exemplar , based on its input preference tem-
pered by how ill-suited it is to be assigned to
another exemplar.
Whereas the above responsibility update
lets all candidate exemplars compete for own-
ership of a data point, the following availabil-
ity update gathers evidence from data points
as to whether each candidate exemplar would
make a good exemplar:
aði,kÞ ← min
n
0, rðk,kÞþ
X
i
′
s:t: i
′
∉fi;kg
maxf0,rði
′
,kÞg
o
ð2Þ
The availability a(i,k) is set to the self-
responsibility r(k,k) plus the sum of the positive
responsibilities candidate exemplar k receives
from other points. Only the positive portions of
incoming responsibilities are added, because it
is only necessary for a good exemplar to explain
some data points well (positive responsibilities),
regardless of how poorly it explains other data
points (negative responsibilities). If the self-
responsibility r(k,k) is negative (indicating that
point k is currently better suited as belonging to
another exemplar rather than being an exem-
plar itself), the availability of point k as an
exemplar can be increased if some other points
have positive responsibilities for point k being
their exemplar. To limit the influence of strong
Department of Elec trical and Computer Engineering,
University of Toronto, 10 King’s College Road, Toronto,
Ontario M5S 3G4, Canada.
*To whom correspondence should be addressed. E-mail:
frey@psi.toronto.edu
REPORTS
16 FEBRUARY 2007 VOL 315 SCIENCE www.sciencemag.org972