基于金字塔统计的定向滤波器人体检测与在线学习场景几何模型

需积分: 9 158 浏览量更新于2024-09-08 收藏 1.71MB PDF 举报

"这篇资源是关于 Extreme Learning Machine (ELM) 在人类检测中的应用，具体是基于金字塔统计的导向滤波器和在线学习场景几何模型的。文章详细介绍了作者们提出的新方法——金字塔导向滤波器（Pyramidal Statistics of Oriented Filtering, PSOF）来表示人体形状，并通过在线学习的几何模型排除检测中的异常值，增强检测的鲁棒性。" 文章的核心内容主要围绕以下几个知识点展开： 1. **Extreme Learning Machine (ELM)**：ELM是一种快速的单层前馈神经网络的学习算法，其独特之处在于权重和偏置的随机初始化以及训练过程中的唯一一步，即通过最小化误差来确定隐藏层到输出层的连接权重。在本文的背景下，ELM被用于处理人类检测问题，可能涉及到特征提取和分类。 2. **Pyramidal Statistics of Oriented Filtering (PSOF)**：这是一种新型的图像描述符，它通过使用Gabor滤波器库获取像素级别的多尺度方向信息，以增强对图像噪声和模糊的鲁棒性。与传统的单一尺度梯度方法不同，PSOF在金字塔结构上进行局部规范化统计，有效地捕获了对象形状的特性。 3. **Gabor滤波器**：Gabor滤波器是一种在视觉处理中广泛使用的工具，它可以模拟人眼对纹理和边缘的敏感性。在PSOF中，Gabor滤波器用于提取多尺度的方向特征，这有助于在不同尺度和方向上识别人体特征。 4. **图像金字塔**：图像金字塔是由原始图像经过多次下采样或上采样构建的一系列图像集合，每个级别代表图像的不同尺度。在PSOF中，金字塔结构允许在不同尺度上进行统计分析，有助于捕捉不同大小的人体目标。 5. **在线学习场景几何模型**：为了提高检测的准确性，文章提出了在线学习几何模型，该模型可以随着时间推移更新和适应场景的变化，从而排除可能的检测误差，如透视投影引起的异常。 6. **鲁棒性**：鲁棒性是衡量检测算法在面对噪声、模糊或其他干扰时性能稳定性的关键指标。PSOF和在线学习几何模型的结合提升了人类检测的鲁棒性，使得算法在复杂环境下的表现更佳。 7. **异常值排除**：在检测过程中，可能会出现不遵循正常人体几何模式的检测结果，这些被视为异常值。通过在线学习的几何模型，系统可以识别并排除这些异常，提高检测的准确性。这篇文章不仅探讨了ELM在人类检测中的应用，还引入了一种新的图像描述符PSOF，结合在线学习的几何模型，为解决复杂环境下的鲁棒人类检测提供了一个有效的方法。对于希望深入研究机器学习、特别是ELM及其在计算机视觉领域应用的读者来说，这是一个非常有价值的资源。

Human detection based on pyramidal statistics of oriented ﬁltering

and online learned scene geometrical model

Min Li

, Qi Hu, Yu Wang, Weishan Dong

IBM China Research Laboratory, Diamond A, Zhongguancun Software Park, Haidian District, Beijing 100193, China

article info

Article history:

Received 24 November 2011

Received in revised form

15 April 2012

Accepted 30 August 2012

Communicated by Liang Wang

Available online 26 September 2012

Keywords:

Object detection

Human detection

Pyramidal statistics of oriented ﬁltering

Online-learned geometrical model

abstract

We study the problem of robust human detection. In this paper, a new descriptor, Pyramidal Statistics

of Oriented Filtering (PSOF), is proposed for human shape representation. Unlike traditional one-scale

gradient-based methods, the PSOF descriptor utilizes a Gabor ﬁlter bank to obtain multi-scale pixel-

level orientation information and makes use of locally normalized pyramidal statistics of these Gabor

responses to represent object shape, which shows great robustness to image noise and blur. Besides, to

exclude detection outliers that violate perspective projection in image sequence, a geometrical model is

learned online to describe the relationship between object’s average height and the foot-point

coordinate. Experimental results on both static images and video sequences show that PSOF detector

performs much better than one of the state-of-the-art detectors.

1. Introduction

Human detection has drawn much attention in computer vision

community during the last decade [19,7,15,14,28,3,8,29,22], because

human is one of the most important objects in many applications,

such as visual surveillance, intelligent transportation system, HCI

(Human Computer Interaction) and robotics. However, human detec-

tion is still facing many challenges, including wide range of human

poses, image blur and low contrast imaging condition.

Effective feature representation plays a key role in human

detection. Early efforts on human detection focus on Haar wavelet

features [19,15,27,28]. Haar feature computes the gray difference

of adjacent regions at different scales, and can effectively describe

structures like human eyes, nose, and lip. Viola and Jones [27]

improve the computation efﬁciency of Haar features by a novel

technique called integral image, which can calculate a single Haar

feature at any scale with a constant computational cost. Efﬁcient

feature computation and a cascade classiﬁer structure make

Viola’s detector achieve great success in face detection [27].

However, the classiﬁcation performance of Haar features is poor

in real surveillance scenes because of large human-pose changes

and illumination variations. Since human shape is illumination-

invariant and distinctive comparing to background structures,

recent work mainly focuses on shape based human detection.

Dalal et al. [3] proposed a novel feature set called HOG (histo-

grams of oriented gradients), which uses locally normalized

histograms of gradients to represent shape information. Experi-

mental results show that HOG descriptor provides much better

classiﬁcation performance than Haar features in human detection

in complex scenes [3]. Zhu et al. [30] extend the HOG descriptor

and utilize a cascade classiﬁer structure to increase detection

speed. Li et al. [9,11] further study the performance of HOG

descriptor in head–shoulder based human detection in crowded

scenes and ﬁnd that HOG works much more effectively than the

SIFT (scale-invariant feature transformation) [13] descriptor and

Haar features. In [29], a set of edgelet (a short segment of line or

curve) features is proposed to represent human shape and

exhibits good detection performance in crowded scenes. Sabz-

meydani and Mori [22] propose a set of shapelet features (mid-

level features) generated from low-level gradient information

using AdaBoost for human detection. In [25], a pedestrian detec-

tion method based on the covariance matrix descriptor [24] is

proposed and shows better performance on the INRIA dataset [3]

than the HOG descriptor, but an experimental study conducted by

Paisitkriangkrai et al. [18] shows that the covariance matrix

descriptor is slightly inferior to the HOG descriptor on the

DaimlerChrysler pedestrian dataset created in [16].

This paper aims to propose a human detection method that

not only has excellent detection performance in good imaging

condition, shown in Fig. 1(a), but can also work well under bad

imaging conditions, such as blur and low contrast with much

noise, shown in Fig. 1(b) and (c). Since features based on gradients

or edges are often sensitive to image noise or blur, we propose a

Contents lists available at SciVerse ScienceDirect

journal home page: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2012.08.025

Corresponding author.

E-mail addresses: minliml@cn.ibm.com, ziwenwilliamson@gmail.com (M. Li),

huqihq@cn.ibm.com (Q. Hu), yuwangbj@cn.ibm.com (Y. Wang),

dongweis@cn.ibm.com (W. Dong).

Neurocomputing 101 (2013) 338–346

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_39840387

粉丝: 789
资源: 3万+

基于金字塔统计的定向滤波器人体检测与在线学习场景几何模型

Pyramidal Implementation of the Lucas Kanade Feature Tracker Description of the algorithm

Pyramidal_Implementation_Tracker_algorithm

pyraformer

1709.02371.pdf

kuwahara_filter加权

图像拼接APAP算法

pyramidal lk

deep pyramidal residual networks

pyramidal convolution

最新资源