稀疏编码在计算机视觉中的应用探析

需积分: 10 28 浏览量更新于2023-05-22 收藏 7.44MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"稀疏编码及其在计算机视觉中的应用" 稀疏编码是现代计算机视觉领域的一个核心概念，它源于信号处理和机器学习理论，旨在通过找到数据的稀疏表示来理解和压缩复杂信息。该技术的基本思想是将一个复杂的信号或图像分解为一组基元素的线性组合，其中大部分元素值接近于零，只有少数几个非零系数。这种稀疏表示有助于揭示数据的内在结构，并在特征提取、图像恢复、识别和分类等任务中表现出优越性能。在计算机视觉中，稀疏编码的应用广泛且深远。首先，它可以用于特征学习。通过学习一组能够产生稀疏表示的原子或基，可以提取出图像或视频中的关键特征。这些特征通常具有更好的判别性和稳定性，对于图像分类和目标检测等任务尤其有用。例如，稀疏编码可以被用于构造深度学习模型的初始层，作为卷积神经网络（CNN）的预处理步骤，提高模型的训练效率和准确性。其次，稀疏编码在图像恢复和去噪方面也有显著效果。通过寻找最稀疏的解，可以有效地去除噪声，同时保持图像的重要结构信息。这种方法优于传统的滤波器，因为它可以更好地保留边缘和细节。此外，稀疏编码还可以用于图像超分辨率重建，即从低分辨率图像中恢复高分辨率图像，这对于视频处理和监控系统尤为重要。再者，稀疏编码在解决光照变化、遮挡和姿态变化等引起的识别问题时也表现出色。通过学习不同条件下的稀疏表示，可以建立鲁棒的表示方法，提高识别系统的泛化能力。例如，在人脸识别领域，稀疏编码可以用来学习不变性特征，使得系统能够在各种光照和表情变化下正确识别人脸。此外，稀疏编码还可应用于动作识别和视频分析。通过对视频序列进行稀疏编码，可以提取出代表动作的关键帧，并进行时空特征的建模。这种方法在视频内容理解、行为分析和监控系统中有着重要应用。稀疏编码在计算机视觉领域的应用不仅限于特征提取和图像恢复，还包括目标检测、识别、视频分析等多个方面。随着计算能力和算法的不断发展，稀疏编码技术将持续推动计算机视觉领域的创新，为图像处理和人工智能提供更强大的工具。

资源详情

资源推荐

September 28, 2015 16:10 Sparse Coding and Its Applications in Computer Vision – 9in x 6in b2310 page 5

Introduction 5

[van den Berg and Friedlander (2009); Cotter et al. (2005); Duarte et al.

(2005); Mishali and Eldar (2008, 2009)]. In many situations, the sparse

coeﬃcients tend to cluster and a clustering prior exploiting correlations be-

tween neighboring coeﬃcients is enforced in the optimization algorithms

in order to obtain a better representation. In group sparsity, the data is

inherently represented by a small number of pre-deﬁned groups of data

samples, thus a sparsifying term over the groups is used to promote this

property. In the case of multiple measurements, a speciﬁc group sparsity

called joint structured sparsity is explored for joint sparse representation

and heterogeneous feature fusion [Zhang et al. (2012b,a)], where not only

the sparsity property for each measurement is utilized, but the structural

information across the multiple sparse representation vectors for the mul-

tiple measurements is exploited as well.

Discrimination Although sparse coding was originally proposed as a

generative model, it also performs surprisingly well in many classiﬁcation

problems. The Sparse Representation-based Classiﬁcation (SRC) proposed

in [Wright et al. (2009)] is a pioneering work in this direction. In SRC, a

signal x from class c is assumed to lie in or near a low-dimensional subspace

spanned by the atoms in the class-speciﬁc dictionary D

.Ifwetryto

represent x using the composite dictionary D=[D

, ..., D

] for all the C

classes, the resulting sparse code α=[α

; ...; α

] is supposed to have non-

zero coeﬃcients concentrating in α

, which is associated with its class.

In the original work of [Wright et al. (2009)], D

consists of all the train-

ing samples from class c, which is not practical if the total class number

or the training set is large. There are quite a few recent papers trying to

learn dictionaries with a more compact form and a better discriminative

capability by augmenting the reconstruction objective in (1.4) with addi-

tional discrimination terms such as Fisher discriminant criterion [Yang et al.

(2011c)], structural incoherence [Ramirez et al. (2010)], class residual dif-

ference [Mairal et al. (2008b); Yang et al. (2011b)] and mutual information

[Qiu et al. (2011)]. Sparse codes generated by discriminative dictionaries

are also used as the input features of general classiﬁcation models other

than SRC [Bradley and Bagnell (2008); Yang et al. (2010b); Jiang et al.

(2011); Mairal et al. (2012)]. In addition to natural images, discrimina-

tive sparse representation learning has also been actively applied in other

imageries such as hyperspectral classiﬁcation [Wang et al. (2014a, 2015a)].

September 28, 2015 16:10 Sparse Coding and Its Applications in Computer Vision – 9in x 6in b2310 page 8

8 Sparse Coding and Its Applications in Computer Vision

widely used as the convex relaxation for the intractable 

-norm, inducing

the objective function in (1.4). It is ensured by theories from compres-

sive sensing [Donoho (2006b,a); Candes et al. (2006); Cand`es (2006)] that,

under certain favorable conditions, the solutions obtained using 

-norm

and 

-norm are exactly the same or within bounded diﬀerence in noisy

environment. These conditions mainly depend on the characteristics of D,

such as null space property, restricted isometry property (RIP), spark and

coherence. An intuitive understanding can be developed from Fig. 2.1 for

the equivalence between the two sparse regularizers. Interested readers are

referred to [Fornasier and Rauhut (2011)] and the references within for a

more comprehensive and in-depth review.

Dα=x

Fig. 2.1 The 

plane (blue diamond) coincides with the linear system solution space

(red line) at a coordinate axis, generating a sparse solution.

Besides the 

-norm sparse penalty, there have been other forms of L

proposed to ﬁnd a better tradeoﬀ between promoting sparsity and reduc-

ing computation. A diﬀerentiable sparse coding model was developed in

[Bradley and Bagnell (2008)], which employs smoother sparse priors such

as KL divergence and 

-norm (p≤1). It has been shown that smoother

priors can preserve the beneﬁts of sparse priors while adding prediction

stability when the sparse codes are inferred using maximum a posteriori

(MAP) estimate. With similar probabilistic formulation, sparse represen-

tation has also been obtained using relevance vector machine (RVM) [Ji

et al. (2008)], tree-structured graph model [He et al. (2010)] and Bernoulli

distribution [Zhou et al. (2009a)]. In [Ling et al. (2013)], a deterministic

sparse regularization based on the sum-log of 

-norm was used (with q

being 1 or 2), which has the advantage of decentralized joint optimization.

September 28, 2015 16:10 Sparse Coding and Its Applications in Computer Vision – 9in x 6in b2310 page 9

Theories of Sparse Coding 9

2.3 Sparse Solvers

The past decade has seen great process in the development of eﬃcient

solvers for various sparse prior models. As ﬁnding the exact solution with



-norm constraint has combinatorial complexity, algorithms to ﬁnd ap-

proximated solutions have been proposed. Greedy approaches, such as

matching pursuit (MP) [Mallat and Zhang (1993)], iteratively select dic-

tionary atom with the largest correlation with the current reconstruction

residual until the required sparsity is reached. Orthogonal matching pursuit

[Tropp et al. (2007)] makes extension by updating residual using orthogonal

projection. In this way, an atom will only be selected once at the cost of

more computation.

More attention has been attracted to solve the 

-norm regularized prob-

lem in (1.4). In statistics literatures, this model is conventionally called

LASSO and its optimal property has been studied for long [Zou et al.

(2007)]. The 

-constrained optimization can be cast as a general linear

programming problem, which cannot be solved very eﬃciently in large scale.

Homotopy methods, e.g., least angle regression (LARS) [Efron et al. (2004);

Osborne et al. (2000)], start from a trivial solution and follow the regular-

ization path of the LASSO, which may take a simple form like piecewise

linear and make the search of solution space more eﬃcient.

Coordinate descent can generally provide a quick solution to LASSO,

and converge to a global minimum. A number of soft-thresholding based

methods [Fu (1998); Daubechies et al. (2003); Friedman et al. (2007); Nes-

terov et al. (2007)] have been proposed following this direction. The con-

vergence speed may be slowed down if there exists heavy correlation among

the dictionary atoms.

Another important category of algorithms to solve Eq. (1.4) is the prox-

imal method [Combettes and Pesquet (2011)]. In each iteration, a proximal

method optimizes a proximal problem, which is a local approximation of

the original objective function but takes a simpler form such as smooth and

linear functions. The principle of proximal methods is similar to subgradi-

ent descent, however, it is demonstrated in [Mairal (2010)] that proximal

methods have faster convergence rates both theoretically and practically.

Just as the counterpart of OMP in the 

case, active set methods [Lee

et al. (2006); Roth and Fischer (2008); Bach (2009)] solve a sequence of

reduced sub-problems where only a small subset of active variables are

considered. The active set is augmented at each iteration while the global

optimality condition is ensured.

剩余238页未读，继续阅读

lzycs

粉丝: 2
资源: 19

会员权益专享

稀疏编码在计算机视觉中的应用探析

SparseCoding：基于L1的字典学习，用于稀疏编码

SparseCoding

git sparse-checkout 和git checkout

数据为sparse-view (SV) sinogram (sino 95 0.nii)

sparse-categorial-crossentrop有什么好处

git sparse-checkout

bash: .get/info/sparse-checkout: No such file or directory

sparse-rcnn

sparse-categorial-crossentropy

sparse r-cnn网络详细

Sparse R-CNN

sparse r-cnn网络

给定数据为sparse-view (SV) sinogram (sino_95_0.nii), 给出一个滤波反投影算法,python代码，输出尺寸为256*256

Sparse R-CNN网络介绍

Sparse R-CNN思想

Sparse R-CNN网络结构

error in asmethod(object) : (converted from warning) sparse->dense coercion:

Sparse R-CNN讲解

详细讲一下sparse r-cnn网络

Sparse R-CNN优点

会员权益专享

最新资源