An Adaptive Clustering Approach for Group
Detection in the Crowd
Jie Shao
1
, Nan Dong
2
, Qian Zhao
1
1
School of Electrical and Information Engineering, Shanghai University of Electric Power, Shanghai, China 200090
2
Shanghai Adwanced Research Institute, Chinese Academy of Sciences, Shanghai, China, 201210
dongn@sari.ac.cn
Abstract—Collective motion groups play an important role
in pedestrian crowd analysis and social event detection. As the
basis of group modeling in the crowd, a collective motion group
detection algorithm is proposed in this paper. Compared to other
state-of-the-art group detection achievements, ours is more robust
in complex crowded motion scenes, involving varieties of random
traffics and different motion types. First of all, we introduce an
automatic foreground detection strategy, and then generate dense
tracklets by tracking on salient points in foreground area for
preprocessing. Salient point tracklets are represented by spatio-
temporal features afterwards. By exploiting an adaptive initiation
clustering technique, a hierarchical clustering model is built to
partition the crowd into groups depending on different features
layer by layer. We demonstrate the effectiveness and robustness
of our algorithm quantitatively and qualitatively on various real
crowd videos.
Keyword: group detection; adaptive clustering; hierarchical clus-
tering system
I. INTRODUCTION
Pedestrian crowd behavior analysis has been a popular
topic in computer vision in recent years, as it is related to
many fields like safety and civil engineering, transportation
research, social science and so on. It is revealed by Moussaid
et.al. [1] that much of pedestrian traffic in the crowd is made up
of groups. Their research also reported that pedestrian groups
have an important impact on the overall traffic efficiency. In
this paper, we focus on the topic of collective motion group
detection. Collective motion group detection could provide
crowd segmentation depending on people with collective be-
haviors. It is the basis of further mid-level analysis of social
events involving intra-action in groups and interaction between
groups[2]. Groups are commonly discovered by two kinds of
approaches. One is bottom-up way and the other is top-down
way. Bottom-up approach starts with single tracklets as sepa-
rate clusters and gradually builds bigger groups by merging.
A representative bottom-up grouping approach was proposed
by Ge et.al. [3]. Firstly, they detected individuals using Jump
Markov Chain Monte Carlo (RJMCMC), and performed multi-
target tracking to acquire trajectories of all the individuals.
Then trajectories were merged based on certain criterions to
form groups. Nevertheless, it is difficult to accurately detect
all the individuals in most dense crowd. A better solution
was proposed by Zhou [4], who exploited a prior of coherent
motion called coherent neighbor invariance to characterize the
local relationships of individuals in coherent motion. Based on
it, coherent filtering was proposed to detect coherent motion
patterns. Salient point detection rather than target detection was
Nan Dong is the corresponding author
applied to build coherent filtering clusters based on correlations
of distances and velocities of these points. We are inspired by
their work on building tracklets on salient point. But their work
needs the background of each video as pre-knowledge, we
improve it by introducing an automatic background subtraction
strategy. Alternatively, top-down approach starts with the entire
crowd and iteratively separate it into subgroups. Wang et. al.
[5] used to address a similar problem about coherent motion
detection. They first produced a coherent motion field based on
optical flow. Then a two-step clustering process was introduced
to construct semantic regions from coherent motions. Their
experimental results proved that they achieved the state-of-the
-art performance. However, the coherent motions that detected
in their experiments were limited to long-term stable motions
such as traffic in fixed drives or marathon. In other words,
discordant or random motions mixed traffic scenes may be a
problem for their method.
We present an adaptive clustering based group detection
algorithm in this paper. The goal of our work is to segment
motion flows and detect collective motion groups. We represent
collective motions with spatio-temporal features of salient
point (KLT point [6]) tracklets, and then adaptively cluster
these points into groups, so that it is robust to scenes involv-
ing disordered traffics and groups of different motion types.
There are two main contributions of our work. 1. No prior
information is needed by our algorithm, no matter for salient
point extraction or clustering. Salient points are extracted
in foreground after subtraction in preprocessing, so that we
could not only remove noise points in the background, but
also build tracklets in foreground only. Besides, our clustering
method is following an adaptive initialization way as well. 2. A
hierarchical clustering model is built to group spatio-temporal
features of tracklets layer by layer, so that groups with different
motion modalities could all be detected.
II. T
HE ADAPTIVE CLUSTERING APPROACH FOR GROU P
DETECTION
As shown in Fig.1, the framework of our algorithm is
composed of two parts: preprocessing and adaptively cluster-
ing. In the preprocessing part, foreground objects are detected
by Gaussian Mixture Models (GMMs) [7] and shown in
white. Then KLT points are extracted from the foreground and
tracked frame by frame. In the clustering part, a multi-feature-
based hierarchical model is designed, layer by layer, using
adaptive initialization clustering. Three steps are performed to
implement our clustering processing, as shown on the bottom
left of Fig.1.
978-1-4673-8353-0/15/$31.00 ©2015 IEEE 77