行人搜索中的目标导向组合扫描路径优化模型

需积分: 5 64 浏览量更新于2024-08-12 收藏 4.75MB PDF 举报

行人搜索中的扫描路径组合模型是一个关键的计算机视觉研究领域，它旨在模拟和优化目标在图像或视频中定位的过程。目标搜索，作为计算机视觉中的一个重要任务，已引起广泛关注，因为它在诸多应用中如监控、自动驾驶和智能安防等领域发挥着重要作用。为了设计出更接近人类视觉搜索行为的算法，该模型结合了影响视觉搜索的三个主要因素： 1. **自上而下的目标信息 (Top-down Target Information)**: 这是指在搜索过程中，人们通常会根据预先存在的目标概念或者目标类别信息来指导视线。这种因素强调了目标的预期位置和特征，有助于快速缩小搜索范围，提高效率。在模型中，通过利用高级认知过程和知识库，自上而下的目标信息被用来引导生成的扫描路径。 2. **空间背景 (Spatial Context)**: 环境背景对人类寻找目标也有显著影响。人们往往会利用周围环境线索来辅助目标定位，例如物体的位置关系、纹理对比或运动模式。模型考虑了这些上下文信息，帮助生成的扫描路径更好地融入场景，避免在无关区域浪费精力。 3. **自下而上的视觉显著性 (Bottom-up Visual Saliency)**: 这是从低级视觉特征（如颜色、纹理和形状）自动产生的对视觉刺激的注意力吸引。虽然这对视觉搜索至关重要，但人类并不总是完全受其控制，可能会忽略某些显著但不相关的区域。模型中包含了对视觉显著性的处理，但发现其对搜索准确性的影响相对有限。通过将生成的扫描路径与人类视觉的固定注视序列进行比较，研究人员评估了模型的有效性。评估策略不仅测试了单个因素的表现，还进行了不同因素组合的分析，以确定最佳权重。实验结果表明，自上而下的目标信息是影响搜索准确性最主要的因素，而自下而上的视觉显著性作用较小。然而，综合使用这三种因素的扫描路径组合，往往能提供比单一因素更优的搜索性能。最终，提出的模型通过生成与人类视觉固定序列最为相似的扫描路径，证明了其作为最优选择的合理性。这一研究对于改进行人搜索算法以及理解人类视觉搜索机制具有重要意义，为进一步提升计算机视觉系统的实时性和准确性提供了新的思路。

A Combined Model for Scan Path in Pedestrian

Searching

Lijuan Duan, Zeming Zhao, Wei Ma*, Jili Gu,

Zhen Yang

College of Computer Science and Technology

Beijing University of Technology, China

{ljduan, mawei, yangzhen}@bjut.edu.cn

{zhaozeming, gujili}@emails.bjut.edu.cn

Yuanhua Qiao

College of Applied Science

Beijing University of Technology, China

qiaoyuanhua@bjut.edu.cn

Abstract—Target searching, i.e. fast locating target

objects in images or videos, has attracted much attention in

computer vision. A comprehensive understanding of factors

influencing human visual searching is essential to design

target searching algorithms for computer vision systems. In

this paper, we propose a combined model to generate scan

paths for computer vision to follow to search targets in

images. The model explores and integrates three factors

influencing human vision searching, top-down target

information, spatial context and bottom-up visual saliency,

respectively. The effectiveness of the combined model is

evaluated by comparing the generated scan paths with

human vision fixation sequences to locate targets in the same

images. The evaluation strategy is also used to learn the

optimal weighting coefficients of the factors through linear

search. In the meanwhile, the performances of every single

one of the factors and their arbitrary combinations are

examined. Through plenty of experiments, we prove that the

top-down target information is the most important factor

influencing the accuracy of target searching. The effects

from the bottom-up visual saliency are limited. Any

combinations of the three factors have better performances

than each single component factor. The scan paths obtained

by the proposed model are optimal, since they are most

similar to the human vision fixation sequences.

Keywords—visual attention; bottom-up visual saliency;

top-down target information; spatial context

I. INTRODUCTION

Human visual attention, one of the most important

mechanisms in biological vision systems [1], [3], [4],

guides us to fast locate a specific kind of targets in images.

A comprehensive understanding of factors influencing

human visual searching is essential to design computer

vision systems. In this paper, we explore three factors,

bottom-up visual saliency, top-down target information

and spatial context, which influence human vision systems

to search targets (pedestrians) in images. The factors have

been experimentally evaluated, separately or integratedly

in literatures [5], [6], [7]. The paper presents a combined

model which integrates the three factors with optimal

weights, to guide target searching for computer vision

systems. The weights are learned by linear search [2]. The

performance of the combined model on the generation of

scan paths is evaluated by comparing with human vision

scan paths.

Psychological studies show that at each moment,

humans are attracted to salient parts in images [6], [8], [9].

The bottom-up saliency clue is considered to have

influences on computer visual searching, which has been

experimentally proved by Itti et al. [10]. On the other

hand, during visual searching, humans not only fixate on a

target, but also scan regions or objects with similar shapes

to the target [11], [12]. For example, during searching for

a pedestrian, objects of a rectangular shape, or with a

circle on the top would attract attention. The spatial

context information provides rich cues to target positions

for human vision [13], [14], [15]. It is widely used in

object detection [14] and recognition [16].

Based on the above facts, the paper experimentally

explores each factor and presents a method to combine

them for efficient target searching in images. The

proposed method is given in section II. Section III

This research is partially sponsored

y Natural Science Foundatio

of China (Nos.61003105, 61175115 and 61370113), the Importation an

Development of High-Caliber Talents Project of Beijing Municipa

Institutions (CIT&TCD201304035), Jing-Hua Talents Project of Beijin

University of Technology (2014-JH-L06), and Ri-Xin Talents Project o

Beijing University of Technology (2014-RX-L06), and the Internationa

Communication Ability Development Plan for Young Teachers o

Beijing University of Technology (No.2014-16).

Fig. 1. The workflow of scan path generation. The saliency map

and target map are computed based on the input image. The

searching guide map is obtained by combining the spatial context

map. At each round of fixation choosing, the strategies of WTA

and IOR are used.

2014 International Joint Conference on Neural Networks (IJCNN)

July 6-11, 2014, Beijing, China

2156

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38641896

粉丝: 2
资源: 915

行人搜索中的目标导向组合扫描路径优化模型

行人库模型_机场行人库模型_

干货-激光雷达在自动驾驶中的应用.zip

LSTM与CNN混合模型设计：创新应用在音频与图像识别

YOLO训练集、测试集、验证集与其他模型训练方法的比较：优缺点分析

图像识别中的特征选择角色：深度解析

卡尔曼滤波算法在自动驾驶中的关键作用解析

VSCode 中 OpenCV C++ 的机器学习集成：赋能 AI 开发

高级图像处理：利用神经网络在SimpleCV中实现复杂处理

点云数据在三维地图可视化中的处理与展示技术

Griddata在人工智能中的应用秘诀：机器学习与深度学习

最新资源