The Community-search Problem and
How to Plan a Successful Cocktail Party
Mauro Sozio
∗
Max-Planck-Institut für Informatik
Saarbrücken, Germany
msozio@mpi-inf.mpg.de
Aristides Gionis
Yahoo! Research
Barcelona, Spain
gionis@yahoo-inc.com
ABSTRACT
A lot of research in graph mining has been devoted in the dis-
covery of communities. Most of the work has focused in the
scenario where communities need to be discovered with only
reference to the input graph. However, for many interest-
ing applications one is interested in finding the community
formed by a given set of nodes. In this paper we study a
query-dependent variant of the community-detection prob-
lem, which we call the community-search problem:givena
graph G,andasetofquery nodes in the graph, we seek to
find a subgraph of G that contains the query nodes and it
is densely connected.
We motivate a measure of density based on minimum de-
gree and distance constraints, and we develop an optimum
greedy algorithm for this measure. We proceed by charac-
terizing a class of monotone constraints and we generalize
our algorithm to compute optimum solutions satisfying any
set of monotone constraints. Finally we modify the greedy
algorithm and we present two heuristic algorithms that find
communities of size no greater than a specified upper bound.
Our experimental evaluation on real datasets demonstrates
the efficiency of the proposed algorithms and the quality of
the solutions we obtain.
Categories and Subject Descriptors
H.2.8 [Database Management]: Database Applications—
Data mining; G.2.2 [Discrete Mathematics]: Graph The-
ory—Graph algorithms
General Terms
Algorithms, Experimentation, Theory
Keywords
graph mining, community detection, social networks, graph
algorithms
∗
Part of this work was done while the author was visiting
Yahoo! Research, Barcelona.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
KDD’10, July 25–28, 2010, Washington, DC, USA.
Copyright 2010 ACM 978-1-4503-0055-1/10/07 ...$10.00.
1. INTRODUCTION
Graphs is one of most ubiquitous data representations,
and they find applications in a wide range of areas including
biology, physics, social sciences, and information technology.
With the increasing availability of very large networks, there
is need for designing algorithmic data-analysis tools and for
developing applications that exploit the latent structure in
the data.
Discovering communities in graphs and social networks
has drawn a large amount of attention in recent years [9,
13, 15, 16, 25]. It has been one of them most well-studied
problems of graph mining. Most of the work has focused
in the scenario where communities need to be discovered in
an a-priori manner, with only reference to the input graph.
However, in many application scenarios we are interested in
discovering the community defined by a given set of nodes.
For instance, if Bob and Alice take the same tango class,
and Charles is Bob’s boss, the community formed by Bob
and Alice is very different than the community formed by
Bob and Charles.
In this paper we study a query-dependent variant of the
community detection problem, which we call the community-
search problem. Our problem formulation takes as input a
graph G,andasetofquery nodes, and the task is to find
a densely connected subgraph of G that contains the query
nodes. The problem we study has many interesting appli-
cations in areas such as social-network analysis, collabora-
tive tagging systems, query-log analysis, biology, and others,
Some motivating examples are the following.
Finding a group or organizing an event: Anumber
of scientists obtain funding to organize a workshop. They
figure out that the chances of success of the workshop will be
higher if they invite a set of colleagues so that each scientist
is well acquainted, and perhaps have even worked together,
with a number of other participants.
Tag suggestion: In a social-media environment, a tag graph
relates similar tags. For instance, in a photo-sharing portal
two tags are related if they co-occur frequently in a given
set of photos. Given a new photo being uploaded, and a
number of initial tags that the user provides for this photo,
the system can suggest to the user a number of additional
tags. A good set of suggestions is a set of tags related to
each other and related to the original tags.
Biology: A biologist has identified a number of proteins
that regulate a gene of interest, and she would like to study
further a candidate list of other proteins that are likely to
participate in the regulation process. Such a candidate set