基于 Quaternion DCT 的区域注意力检测模型

35 浏览量更新于2024-08-27 收藏 440KB PDF 举报

视觉注意力检测算法综述视觉注意力检测是计算机视觉领域中一个重要的研究方向，其目的是检测人眼注视的区域，以便更好地理解人类视觉注意机制。近年来，基于 quaternion 变换的视觉注意力检测模型，如 PQFT（Quaternion Fourier 变换的相位谱）和 QDCT（Quaternion 离散余弦变换），逐渐受到关注，这些模型具有实时性强、计算成本低的特点，使其在日常应用中具备极高的应用价值。然而，当前的视觉注意力检测方法存在一些缺陷，例如基于全局 PQFT 和 QDCT 的方法只能检测输入图像的跳跃边缘，而难以准确检测物体边界。为了解决这个问题，我们提出了一种基于 QDCT 的视觉注意力检测模型，该模型引入了基于超像素的区域性视觉注意力检测机制，以提高检测准确性。我们的模型优点在于： 1. 引入了基于超像素的区域性视觉注意力检测机制，可以更好地检测物体边界。 2. 使用 QDCT 变换，可以实时地检测视觉注意力区域。 3. 模型具有较低的计算成本，适合日常应用。视觉注意力检测模型的应用场景广泛，包括： 1. 人机交互：视觉注意力检测模型可以应用于人机交互系统中，例如眼球追踪、人机交互界面等。 2. 图像处理：视觉注意力检测模型可以应用于图像处理领域，例如图像分割、目标检测等。 3. 计算机视觉：视觉注意力检测模型可以应用于计算机视觉领域，例如物体识别、场景理解等。在本文中，我们将详细介绍基于 QDCT 的视觉注意力检测模型，包括模型的设计、实现和实验结果。 Quaternion 变换是一种重要的数学工具，广泛应用于信号处理、图像处理和计算机视觉等领域。Quaternion 变换可以将二维信号转换为四维信号，从而更好地捕捉信号的空间关系。QDCT 变换是 Quaternion 变换的一种特殊形式，广泛应用于图像处理和计算机视觉领域。 superpixel 是一种图像处理技术，通过将图像分割成小块，来提高图像处理的效率和准确性。基于 superpixel 的视觉注意力检测模型可以更好地检测物体边界，提高检测准确性。视觉注意力检测模型的优化是一个重要的研究方向，目的是提高模型的检测准确性和实时性。我们的模型使用了基于超像素的区域性视觉注意力检测机制，提高了检测准确性和实时性。本文介绍了一种基于 QDCT 的视觉注意力检测模型，该模型引入了基于超像素的区域性视觉注意力检测机制，提高了检测准确性和实时性。该模型具有广泛的应用前景，包括人机交互、图像处理和计算机视觉等领域。

Learning to predict where human gaze is using quaternion DCT based

regional saliency detection

Ting Li *

, Yi Xu

, Chongyang Zhang

1 Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University,

Shanghai, China;2 Shanghai Key Laboratory of Digital Media Processing and Transmissions,

Shanghai, China

ABSTRACT

Many current visual attention approaches used semantic features to accurately capture human gaze. However, these

approaches demand high computational cost and can hardly be applied to daily use. Recently, some quaternion-based

saliency detection models, such as PQFT (phase spectrum of Quaternion Fourier Transform), QDCT (Quaternion

Discrete Cosine Transform), have been proposed to meet real-time requirement of human gaze tracking tasks. However,

current saliency detection methods used global PQFT and QDCT to locate jump edges of the input, which can hardly

detect the object boundaries accurately. To address the problem, we improved QDCT-based saliency detection model by

introducing superpixel-wised regional saliency detection mechanism. The local smoothness of saliency value distribution

is emphasized to distinguish noises of background from salient regions. Our algorithm called saliency confidence can

distinguish the patches belonging to the salient object and those of the background. It decides whether the image patches

belong to the same region. When an image patch belongs to a region consisting of other salient patches, this patch should

be salient as well. Therefore, we use saliency confidence map to get background weight and foreground weight to do the

optimization on saliency map obtained by QDCT. The optimization is accomplished by least square method. The

optimization approach we proposed unifies local and global saliency by combination of QDCT and measuring the

similarity between each image superpixel. We evaluate our model on four commonly-used datasets (Toronto, MIT, OSIE

and ASD) using standard precision-recall curves (PR curves), the mean absolute error (MAE) and area under curve

(AUC) measures. In comparison with most state-of-art models, our approach can achieve higher consistency with human

perception without training. It can get accurate human gaze even in cluttered background. Furthermore, it achieves better

compromise between speed and accuracy.

Keywords:

saliency detection, superpixels, quaternion transform, optimization model

1. INTRODUCTION

In last two decades, saliency detection has been extensively studied in the fields of artificial intelligence, computer

vision and video analysis, providing visual attention cues for the solution of the ill-posed problems. There are many

aspects that will attract human attention such as bottom-up, top-down, and knowledge-driven visual cues. In the early

efforts, many models were proposed motivated by neural selective visual model (Koch & Ullman, 1985). Itti et al.

proposed a biologically model using multiple low level features such as color, intensity and orientation features at

multiple scales and using center-surround mechanism. After a saliency map is computed for each feature channel, they

are normalized and combined into a master saliency map using winner-take-all strategy. However, it had high

computational complexity and over-parameterization problems. Moreover, it in fact does not match human saccades

according to eye-tracking data. In order to achieve higher consistency with human visual system, other saliency detection

models

6,10,11

based on machine learning were proposed, for example, the well-known Judd's model

. It used a set of low-

level, mid-level and high-level image features and needed training by a linear support vector machine. This kind of

approaches demands high computational cost due to the training step. However, as we know, human vision can

effortlessly judge the importance of image regions and locate a salient object without any training even in a totally

strange environment or cluttered scene.

*E-mail: tina_ww@sjtu.edu.cn;

Applications of Digital Image Processing XXXVII, edited by Andrew G. Tescher, Proc. of SPIE Vol. 9217, 92171K

Proc. of SPIE Vol. 9217 92171K-1

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/10/2014 Terms of Use: http://spiedl.org/terms

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38599712

粉丝: 8

基于 Quaternion DCT 的区域注意力检测模型

"跨学科研究：机器学习算法初探和应用领域的探讨

"GPT-4技术报告：多模态模型在人类水平表现出色"。

"基于LSTM和随机森林算法的核电汽轮机组出力优化方法研究

Using machine learning to predict student difficulties from learning session data

IntentNet- Learning to Predict Intention from Raw Sensor Data.pdf

基于深度学习的股票预测-Use Deep Learning try to predict stock price.

Jing_2023_Advanced Intelligent Systems_A Deep Learning System to Predict Recurrence and Disability Outcomes in.pdf

Using Propensity Scores to Predict the Kinases of Unannotated Phosphopeptides

Predict whether income exceeds $50K/yr based on census data. Als

Camera-based model to predict the total difference between effect coatings under directional illumination

最新资源