Static Crowd Scene Analysis via Deep Network
with Multi-branch Dilated Convolution Blocks
Haoran Liu
College of Computer and Information Engineering
Jiangxi Normal University
Nanchang, China
Aiwen Jiang*
College of Computer and Information Engineering
Jiangxi Normal University
Nanchang, China
Corresponding Author: jiangaiwen@jxnu.edu.cn
Qiaosi Yi
College of Computer and Information Engineering
Jiangxi Normal University
Nanchang, China
Xiaolin Deng
College of Computer and Information Engineering
Jiangxi Normal University
Nanchang, China
Jianyi Wan
College of Computer and Information Engineering
Jiangxi Normal University
Nanchang, China
Mingwen Wang
College of Computer and Information Engineering
Jiangxi Normal University
Nanchang, China
Abstract—In this paper, we have proposed a static crowd scene
analysis network via multi-branch dilated convolution block,
called MDBNet. It focuses on a joint task of estimating crowd
count and high-quality density map from static single image. The
proposed MDBNet follows one-stage object detection framework,
and consists of two parts: pre-trained convolutional layers as
the front end for high-level feature extraction and cascaded
multi-branch dilated convolution block as the back end for
context information aggregation on different ranges. Pixel-wise
objectness probabilities are predicted and regressed to generate
density map. The proposed MDBNet is an easy training model
with strong learning ability. We have tested it on two public
datasets (ShanghaiTech dataset and the UFC CC 50 dataset).
On almost all evaluation criterions, the proposed method has
achieved superior performance. Especially on structure quali-
ty criterions, including our newly introduced spatial adjusted
mutual information measurement, the MDBNet reports a new
state-of-the-art performance. The source code will be distributed
depending on publication of our work.
I. INTRODUCTION
Stampede, which happens frequently in big events around
the world, has caused serious disasters. For example, many
victims were died or injured in the fatal Shanghai Bund
stampede happened in the new year celebrations of 2015. If the
population density of the scene at the time could be accurately
estimated and corresponding security measures were arranged
in advance, such incidents might be effectively reduced or
avoided. Therefore, accurate knowledge of the crowd size,
crowd distribution in a public space is very necessary. With
the ubiquitous installation of surveillance cameras in city and
urban, crowd scene analysis from images or videos has become
an important practical and research topic in computer vision
community.
Since crowds are not regular across various scenes, typically
as shown in Fig. 1, it is not enough to calculate the pop-
ulation size merely. Distribution maps can thus help us get
more accurate and comprehensive information. Since crowd
counting is in principle self-evident: density times area, the
integral of a crowd density map gives the overall crowd count.
In recent works, crowd estimation has been developed from
simple crowd counting that outputs the number of people
in the target scene, to the presentation of density map that
explicitly shows visual patterns of target crowd distribution.
In this paper, we focus on the joint task of estimating crowd
count and high-quality density map from static single image.
Fig. 1. Images in the first row are three samples from ShanghaiTech Part B
dataset. Heat mapping in the second row show respective density maps.
Besides public safety, crowd analysis has wide applications
in traffic monitoring, flow monitoring, and city planning etc.
IJCNN 2019 - International Joint Conference on Neural Networks, Budapest Hungary, 14-19 July 2019
978-1-7281-2009-6/$31.00 ©2019 IEEE
paper N-20158.pdf