单目行人检测：现状与实验分析

需积分: 0 69 浏览量更新于2024-07-16 收藏 3.5MB PDF 举报

"人体状态采集及评估论文" 这篇论文聚焦于人体状态的采集与评估，特别是针对行人的检测算法的研究。行人检测在计算机视觉领域中是一个快速发展的方向，它在智能车辆、监控系统以及先进机器人技术中具有关键应用。论文由两大部分组成：方法论概述和实验研究。首先，论文对行人检测系统的组件和基础模型进行了详尽的调查。这通常包括特征提取、分类器设计、目标定位等步骤。例如，特征提取部分可能涉及到经典的波let变换、最近邻（NN）/局部响应归一化（LRF）以及结合形状与纹理特征的检测方法。这些特征旨在捕捉行人图像的显著特性，如轮廓、纹理和运动模式。在方法论概述中，论文提到了几种最先进的行人检测算法，其中包括基于小波的AdaBoost级联结构[74]，使用方向梯度直方图（HOG）和线性支持向量机（SVM）的算法[11]，以及结合形状和纹理信息的检测算法[23]。这些算法各有其优势和适用场景，通过对比分析，可以帮助读者理解不同方法在性能和效率上的差异。论文的第二部分是实验研究，这部分占据了论文的主要篇幅。作者们使用了一个大规模的数据集，该数据集是在一辆行驶在城市环境中的车辆上捕捉到的，包含数千个训练样本以及一个长达27分钟的测试序列，涉及超过20,000张带有行人标注位置的图像。这种真实世界的数据集为评估算法的实际性能提供了宝贵的资源。实验评估不仅考虑了通用的评价设置，还针对特定的行人检测场景进行了分析。这可能包括在各种光照、天气条件、遮挡情况下的行人检测性能，以及在不同速度和距离下的跟踪稳定性。通过这样的实验，可以全面了解每个算法在复杂环境中的表现，从而为实际应用提供指导。总结来说，这篇论文深入探讨了人体状态评估，尤其是行人检测的关键技术和算法，并通过大规模实验数据进行验证和比较，对于理解当前计算机视觉领域中行人检测的最新进展，以及未来算法的设计和改进具有重要意义。

fitting error and assoc iated model parameters can be

learned from examples [9].

2.2.2 Discriminative Models

In contrast to the generativemodels,discriminative

models approximate the Bayesian maximum-a-posteriori

decision by learning the parameters of a discriminant

function (decision boundary) between the pedestrian and

nonpedestrian classes from training examples. We will

discuss the merits and drawbacks of several feature

representations and continue with a review of classifier

architectures and techniques to break down the complexity

of the pedestrian class.

Features. Local filters operating on pixel intensities are a

frequently used feature set [59]. Nonadaptive Haar wavelet

features have been popularized by Papageorgiou and

Poggio [53] and adapted by many others [48], [64], [74].

This overcomplete feature dictionary represents local in-

tensity differences at various locations, scales, and orienta-

tions. Their simplicity and fast evaluation using integral

images [41], [74] contributed to the popularity of Haar

wavelet features. However, the many-ti mes redundant

representation, due to overlapping spatial shifts, requires

mechanisms to select the most appropriate subset of features

out of the vast amount of possible features. Initially, this

selection was manually designed for the pedestrian class, by

incorporating prior knowledge about the geometric config-

uration of the human body [48], [53], [64]. Later, automatic

feature selection procedures, i.e., variants of AdaBoost [18],

were employed to select the most discriminative feature

subset [74].

The automatic extraction of a subset of nonadaptive

features can be regarded as optimizing the features for the

classification task. Likewise, the particular configuration of

spatial features has been included in the actual optimiza-

tion itself, yielding feature sets that adapt to the under-

lying data set during training. Such features are referred to

as local receptive fields [19], [23], [49], [68], [75], in

reference to neural structures in the human visual cortex

[24]. Recent studies have empirically demonstrated the

superiority of adaptive local receptive field features over

nonadaptive Haar wavelet features with regard to pedes-

trian classification [49], [68].

Another class of local intensity-based features is code-

book feature patches, extracted around interesting points in

the image [1], [39], [40], [61]. A codebook of distinctive

object feature patches along with geometrical relations is

learned from training data followed by clustering in the

space of feature patches to obtain a compact representation

of the underlying pedestrian class. Based on this represen-

tation, feature vectors have been extracted including

information about the presence and geometric relation of

codebook patches [1], [39], [40], [61].

Others have focused on discontinuities in the image

brightness function in terms of models of local edge

structure. Well-normalized image gradient orientation histo-

grams, computed over local image blocks, have become

popular in both dense [11], [62], [63], [80], [83] (HOG,

histograms of oriented gradients) and sparse representations

[42] (SIFT, scale-invariant feature transform), where sparse-

ness arises from preprocessing with an interest-point

detector. Initially, dense gradient orientation histograms

were computed using local image blocks at a single fixed

scale [11], [62] to limit the dimensionality of the feature vector

and computational costs. Extensions to variable-sized blocks

have been presented in [63], [80], [83]. Results indicate a

performance improvement over the original HOG approach.

Recently, local spatial variation and correlation of gradient-

based features have been encoded using covariance matrix

descriptors which increase robustness toward illumination

changes [71].

Yet others have designed local shape filters that

explicitly incorporate the spatial configuration of salient

edge-like structures. Multiscale features based on horizon-

tal and vertical co-occurrence groups of dominant gradient

orientation have been introduced by Mikolajczyk et al. [45].

Manually designed sets of edgelets, representing local line

or curve segments, have been proposed to capture edge

structure [76]. An extension to these predefined edgelet

features has recently been introduced w ith regard to

adapting the local edgelet features to the underlying image

data [60]. So-called shapelet features are assembled from

low-level oriented gradient responses using AdaBoost, to

yield more discriminative local features. Again, variants of

AdaBoost are frequently used to select the most discrimi-

native subset of features.

As an extension to spatial fe atures, sp atiotemporal

features have been proposed to capture human motion

[12], [15], [65], [74], especially gait [27], [38], [56], [75]. For

example, Haar wavelets and local shape filters have been

extended to the temporal domain by incorporating intensity

differences over time [65], [74]. Local receptive field features

have been generalized to spatiotemporal receptive fields

[27], [75]. HOGs have been extended to histograms of

differential optical flow [12]. Several papers compared the

performance of otherwise identical spatial and spatiotem-

poral features [12], [74] and reported superior performance

of the latter at the drawback of requiring temporally aligned

training samples.

Classifier architectures. Discriminati ve classification

techniques aim at determining an optimal decision bound-

ary between pattern classes in a feature space. Feed-forward

multilayer neural networks [33] implement linear discrimi-

nant functions in the feature space in which input patterns

have been mapped nonlinearly, e.g., by using the pre-

viously described feature sets. Optimality of the decision

boundary is assessed by minimizing an error criterion with

respect to the network parameters, i.e., mean squared error

[33]. In the context of pedestrian detection, multilayer

neural networks have been applied particularly in conjunc-

tion with adaptive local receptive field features as non-

linearities in the hidden network layer [19], [23], [49], [68],

[75]. T his architecture unifies feature e xtraction and

classification within a single model.

Support Vector Machines (SVMs) [73] have evolved as a

powerful tool to solve pattern classification problems. In

contrast to neural networks, SVMs do not minimize some

artificial error metric but maximize the margin of a linear

decision boundary (hyperplane) to achieve maximum

separation between the object classes. Regarding pedestrian

classification, linear SVM classifiers have been used in

2182 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 12, DECEMBER 2009

剩余16页未读，继续阅读

hzhshok1

粉丝: 2
资源: 10

单目行人检测：现状与实验分析

行业分类-物理装置-一种人体躯干微动状态变化特征提取方法.zip

电子系论文模版

基于床垫BCG信号的在床状态判别研究1

人体姿态评估系统设计

写一篇基于WiFi的数据采集系统的论文需要哪些准备工作

ZigBee 人体红外采集实验原理

在主动配电网中，如何应用模糊动态时间弯曲算法（FDTW）进行电压运行状态评估，并结合物联网技术实现实时数据采集与传输？

如何运用模糊动态时间弯曲算法（FDTW）对主动配电网中的电压运行状态进行实时评估，并结合物联网技术进行数据采集和传输？

配电网运行状态评估的方法有哪些

数模 使用智能手机记录人体活动状态

最新资源

数模使用智能手机记录人体活动状态