深度学习解决跨场景人群计数难题

需积分: 13 155 浏览量更新于2024-09-08 收藏 3.59MB PDF 举报

"这篇论文是关于跨场景人群计数的研究，由Cong Zhang、Hongsheng Li、Xiaogang Wang和Xiaokang Yang等人在2015年的CVPR会议上发表。研究主要解决了在无需大量数据注解的情况下，如何在新的目标监控人群场景中进行准确的人群计数问题。" 在《Cross-scene Crowd Counting via Deep Convolutional Neural Networks》这篇论文中，作者们提出了一个深度卷积神经网络（CNN）模型，专门用于解决跨场景人群计数的挑战。传统的拥挤场景人数统计方法在面对未见过的新场景时，其性能显著下降。为了解决这个问题，他们设计了一种可切换的学习策略，该策略同时训练两个相关的目标：人群密度估计和人群总数预测，并在两者之间交替优化，以获得更好的局部最优解。人群密度估计是通过CNN模型来实现的，它可以学习到不同场景下的人群特征，并生成表示人群密集程度的地图。而人群总数预测则基于这些密度图来进行，通过对密度图的全局分析，得出场景中的人数总和。这种结合了密度估计和总数预测的训练方式，使得模型在处理未见过的场景时具有更强的泛化能力。为了应对未知的目标人群场景，论文还提出了一种适应性方法，这可能涉及到对新场景的快速适应或迁移学习策略，以减少对大量标注数据的依赖。这种方法对于实际应用中的监控系统尤其有价值，因为它能提高在不断变化的环境中的计数精度。此外，论文可能还探讨了评估指标和实验设置，包括在多个公开数据集上的表现对比，以及与其他现有方法的性能比较。通过这些实验，作者证明了所提出的深度学习模型在跨场景人群计数任务上的优越性，并可能对未来的研究提供了新的方向和启示。这篇论文在深度学习和计算机视觉领域中，为解决跨场景人群计数的难题提供了一个创新的解决方案，推动了拥挤场景理解技术的发展。

Cross-scene Crowd Counting via Deep Convolutional Neural Networks

Cong Zhang

1,2

Hongsheng Li

2,3

Xiaogang Wang

Xiaokang Yang

Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University

Department of Electronic Engineering, The Chinese University of Hong Kong

School of Electronic Engineering, University of Electronic Science and Technology of China

{zhangcong0929,lihongsheng}@gmail.com xgwang@ee.cuhk.edu.hk xkyang@sjtu.edu.cn

Abstract

Cross-scene crowd counting is a challenging task where

no laborious data annotation is required for counting peo-

ple in new target surveillance crowd scenes unseen in the

training set. The performance of most existing crowd count-

ing methods drops signiﬁcantly when they are applied to

an unseen scene. To address this problem, we propose a

deep convolutional neural network (CNN) for crowd count-

ing, and it is trained alternatively with two related learning

objectives, crowd density and crowd count. This proposed

switchable learning approach is able to obtain better lo-

cal optimum for both objectives. To handle an unseen tar-

get crowd scene, we present a data-driven method to ﬁne-

tune the trained CNN model for the target scene. A new

dataset including 108 crowd scenes with nearly 200,000

head annotations is introduced to better evaluate the ac-

curacy of cross-scene crowd counting methods. Exten-

sive experiments on the proposed and another two existing

datasets demonstrate the effectiveness and reliability of our

approach.

1. Introduction

Counting crowd pedestrians in videos draws a lot of at-

tention because of its intense demands in video surveil-

lance, and it is especially important for metropolis secu-

rity. Crowd counting is a challenging task due to severe

occlusions, scene perspective distortions and diverse crowd

distributions. Since pedestrian detection and tracking has

difﬁculty when being used in crowd scenes, most state-of-

the-art methods [6, 4, 5, 17] are regression based and the

goal is to learn a mapping between low-level features and

crowd counts. However, these works are scene-speciﬁc, i.e.,

a crowd counting model learned for a particular scene can

only be applied to the same scene. Given an unseen scene

or a changed scene layout, the model has to be re-trained

with new annotations. There are few works focusing on

cross-scene crowd counting, though it is important to actual

applications.

In this paper, we propose a framework for cross-scene

crowd counting. No extra annotations are needed for a new

target scene. Our goal is to learn a mapping from images

to crowd counts, and then to use the mapping in unseen tar-

get scenes for cross-scene crowd counting. To achieve this

goal, we need to overcome the following challenges. 1) De-

velop effective features to describe crowd. Previous works

used general hand-crafted features, which have low repre-

sentation capability for crowd. New descriptors specially

designed or learned for crowd scenes are needed. 2) Dif-

ferent scenes have different perspective distortions, crowd

distributions and lighting conditions. Without additional

training data, the model trained in one speciﬁc scene has

difﬁculty being used for other scenes. 3) For most recent

works, foreground segmentation is indispensable for crowd

counting. But crowd segmentation is a challenging problem

and can not be accurately obtained in most crowded scenes.

The scene may also have stationary crowd without move-

ment. 4) Existing crowd counting datasets are not sufﬁcient

to support and evaluate cross-scene counting research. The

largest one [8] only contains 50 static images from differ-

ent crowd scenes collected from Flickr. The widely used

UCSD dataset [4] and the Mall dataset [6] only consist of

video clips collected from one or two scenes.

Considering these challenges, we propose a Convolu-

tional Neural Network (CNN) based framework for cross-

scene crowd counting. After a CNN is trained with a ﬁxed

dataset, a data-driven method is introduced to ﬁne-tune

(adapt) the learned CNN to an unseen target scene, where

training samples similar to the target scene are retrieved

from the training scenes for ﬁne-tuning. Figure 1 illustrates

the overall framework of our proposed method. Our cross-

scene crowd density estimation and counting framework has

following advantages:

1. Our CNN model is trained for crowd scenes by a

switchable learning process with two learning objectives,

crowd density maps and crowd counts. The two different

but related objectives can alternatively assist each other to

下载后可阅读完整内容，剩余8页未读，立即下载

林语微光

粉丝: 936

深度学习解决跨场景人群计数难题

shanghaitech数据集

ShanghaiTech Part B Dataset

Introductoy_Time_Series_with_R Dataset

Effectual-Method-for-Crowd-Counting.rar_counting video_crowd cou

A_Least-Effort_Approach_to_Crowd_Simulation.pdf

amandajshao-crowd_group_profile-archive-refs-heads-master.zip

Standard-test-image.rar_Crowd.bmp_standard image set

yolo算法-人群计数数据集-666张图像带标签-人crowd_counting-o40os_1.zip

A_Survey_of_Recent_Advances_in_CNN-based_Single_Im.pdf

Crowd_Annotation-master.zip

最新资源