M-PCA-N字典学习：优化音频信号稀疏表示的新方法

71 浏览量更新于2024-08-28 收藏 1.68MB PDF 举报

"这篇研究论文提出了一种名为M-PCA-N的新颖字典学习算法，旨在改进音频信号的稀疏表示。M-PCA-N解决了K-SVD算法在处理大规模词典时的效率问题，以及仅使用rank-1逼近更新原子的限制。该算法通过结合SVD分解的前M个主要等级信息来同时更新M个原子，然后通过N个非主要等级的信息转换，进一步优化表示效果。实验结果表明，M-PCA-N在减少音频信号稀疏表示的均方误差(MSE)方面表现出色，并且在音频信号分类任务中优于K-SVD。" 在信号处理领域，稀疏表示是一个关键概念，它允许复杂信号被表示为简洁、可解释的形式。K-SVD和K-SVD扩展是常见的字典学习方法，用于构建一组基础单元（或称为“原子”），使得信号可以高效地用这些单元线性组合表示。然而，K-SVD算法在处理大规模数据集时效率较低，因为它每次只更新一个原子，这可能导致优化过程缓慢且不充分。 M-PCA-N算法对此进行了改进，它采用主成分分析(PCA)的多级版本，即M-PCA，一次性更新M个原子，有效地利用了更多维度的信息。PCA是一种降维技术，通过找到数据最大方差的方向来提取主要特征。M-PCA扩展了这一思想，利用SVD分解的前M个奇异值来更新原子，从而更好地捕捉数据的结构。此外，M-PCA-N还引入了非主要等级的信息，通过转化到主要等级，进一步优化字典学习过程。论文在BBC音效库上进行了实验，验证了M-PCA-N在音频信号稀疏表示的准确性和效率。结果显示，M-PCA-N不仅降低了原始信号与重构信号之间的MSE，而且在音频信号分类任务中取得了优于K-SVD的性能。这意味着M-PCA-N能更准确地保留音频信号的特性，同时在计算上更有效，这对于音频处理和识别应用尤其重要。 M-PCA-N是字典学习和稀疏表示领域的创新贡献，它通过改进的PCA方法优化了大规模词典的学习，提高了信号表示的质量和分类性能。这项工作对于未来音频信号处理的研究提供了新的视角和工具，可能引领该领域的新一轮技术进步。

IET Signal Processing

Research Article

Dictionary learning based on M-PCA-N for

audio signal sparse representation

ISSN 1751-9675

Received on 15th March 2016

Revised 7th September 2017

Accepted on 12th September 2017

E-First on 6th October 2017

doi: 10.1049/iet-spr.2015.0277

www.ietdl.org

Jichen Yang

1,2

, Qianhua He

, Yanxiong Li

, Leian Liu

, Jianhong Li

, Xiaohui Feng

School of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Zhongkai Road No. 1, Haizhu District,

Guangzhou, People's Republic of China

School of Electronic and Information Engineering, South China University of Technology, Wushan Road No.381, Tianhe District, Guangzhou,

People's Republic of China

Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou Higher Education Mega Center,

Panyu District, Guangzhou, People's Republic of China

E-mail: eeyxli@scut.edu.cn

Abstract: The current popular dictionary learning algorithms for sparse representation of signals are K-means Singular Value

Decomposition (K-SVD) and K-SVD-extended. Only rank-1 approximation is used to update one atom at a time and it is unable

to cope with large dictionary efficiently. In order to tackle these two problems, this study proposes M-Principal Component

Analysis-N (M-PCA-N), which is an algorithm for dictionary learning and sparse representation. First, M-Principal Component

Analysis (M-PCA) utilised information from the top M ranks of SVD decomposition to update M atoms at a time. Then, in order to

further utilise the information from remaining ranks, M-PCA-N is proposed on the basis of M-PCA, by transforming information

from the following N non-principal ranks onto the top M principal ranks. The mathematic formula indicates that M-PCA may be

seen as a generalisation of K-SVD. Experimental results on the BBC Sound Effects Library show that M-PCA-N not only lowers

the MSE between original signal and approximation signal in audio signal sparse representation, but also obtains higher audio

signal classification precision than K-SVD.

1 Introduction

Sparse representations of signals have received continuous

attention in recent years [1–4]. In signal processing, the most

appropriate linear combination atoms from an overcomplete

dictionary are used to get an approximation of the signal. Highly

natural signals with substantial characteristics close to that of

acoustic signals may be achieved in this manner. It allows for a

large amount of information to be represented by a small number

of exemplary signals and is widely used in signal processing in

areas such as communication [5, 6], compressive sampling [1, 7],

ground penetrating radar signals classifications [8], audio

inpainting [9], image sparse representation [10–13], signal

recovery [14–16], denoising [17, 18], deblurring [19], compression

[20], source separation, mapping, classification [21], synthetic

aperture radar target recognition [22, 23], image classification [24,

25], face recognition [26, 27], voice conversion [28, 29] and so on.

Supposing dictionary matrix D ∈ R

n × K

, which has K atoms

j j = 1

, for a signal

Y ∈ R

, the goal of sparse representation may

either be exact Y = DX or approximate, Y ≃ DX, satisfying

∥ Y − DX ∥

≤ ε, the vector X ∈ R

contains the representation

coefficients of signal Y [11], which also can use the following

formula:

min

D, X

∥ Y − D X ∥

subjectto ∀i min ∥ X

∥

≤ T

(1)

where T

is target sparsity.

Dictionary fitting is not trivial. Some work has been carried out

in the area over the recent decades. Around 1999, method of

optimal directions (MOD) by Engan et al. [30–32] was represented

to update the dictionary. However MOD requires the computation

of the inversion matrix in the process of updating the dictionary

and this is almost impracticable for large dictionaries [11]. In 2006,

K-SVD was presented to learn D using Y [11] in image signal

processing by using (SVD) [33]. This is an iterative method that

alternates between sparse coding of examples based on the current

dictionary with an updating process for dictionary atoms so as to

better fit the data. Thus, the sparse representation is updated

together with the dictionary columns. Extended K-means Singular

Value Decomposition (EK-SVD) [12] was presented in 2008 on the

basis of K-SVD with a dictionary-optimisation process. EK-SVD

starts with a large number of dictionary elements and

systematically prunes the under-utilised or similar-looking

elements to produce a well-trained dictionary with no redundant

elements [12].

In summary, dictionary learning consists of two stages: sparse

coding and dictionary updating. Sparse coding is the process of

computing the representation coefficients X based on the given

signal Y and dictionary

D [11]. Dictionary updating is the process

of updating the atoms in the dictionary by using data.

The earliest work in the field of sparse coding is [34], where

Mallat and Zhang proposed matching pursuit (MP) in 1993, which

is an iterative algorithm that decomposes the residue by projecting

it on a dictionary vector that matches the residue very well at every

step of decomposition. Here, previously obtained atoms of signal

decomposition are not orthogonal to one another; in the latter

signal decomposition, however, the residual vector will have

components in the space which is formed by the selected atoms. To

solve the problem, Pati et al. [35] proposed orthogonal MP (OMP),

a modified MP in 1993, which maintains full backward

orthogonality of residual error at every step and thereby leads to

improved convergence. On the basis of OMP, some OMP-extended

algorithms such as look ahead OMP [36, 37], generalised OMP

[38] and stagewise OMP [39] were proposed. MP remains the

simplest method in sparse coding and OMP is the most popular

method because of its effectiveness.

Conventional methods of dictionary updating, whether K-SVD

or EK-SVD, only use the rank-1 approximation to update an atom

at a time, neglecting the information from other ranks. They are

thus unable to cope with large dictionaries efficiently. These

algorithms are all used for image signal processing. There is more

variation in audio signal processing compared to image signal

IET Signal Process., 2018, Vol. 12 Iss. 2, pp. 198-206

198

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38645865

粉丝: 10
资源: 923

M-PCA-N字典学习：优化音频信号稀疏表示的新方法

基于2D-PCA与2D-MMC的人脸识别算法.pdf

稀疏编码的友好介绍

K-SVD An Algorithm for Designing Overcomplete.pdf

稀疏非负协作表示检测器matlab代码实践指南

稀疏编码入门：原理与应用详解

信号处理压缩感知：用更少数据还原信号，突破传统限制

基于net的超市管理系统源代码（完整前后端+sqlserver+说明文档+LW）.zip

LABVIEW程序实例-公式节点.zip

大米商城开源版damishop(适合外贸)

LABVIEW程序实例-通过全局变量接收数据.zip

最新资源