稀疏编码在声音事件分类中的应用

32 浏览量更新于2024-08-26 收藏 253KB PDF 举报

"声音事件分类的稀疏编码是利用稀疏编码技术进行声音事件识别的研究论文。该方法结合监督学习模型，以稀疏编码系数作为声音事件的特征来训练分类器，旨在提高声音事件分类的性能。研究团队由来自清华大学、日本名古屋工业大学和天津大学的专家组成。" 在声音事件分类领域，传统的算法通常借鉴语音识别的方法，包括特征提取和模型训练。然而，为了提升分类效果，研究人员一直在寻找更有效的声音特征和分类器，这是一个颇具挑战性的任务。近年来，稀疏编码作为一种强大的数据表示方法，逐渐在处理高阶表示特征时展现出优势。稀疏编码的基本思想是将输入数据分解成一个稀疏向量（即大部分元素为零）和一个基础矩阵的线性组合。这种表示方式能够捕获数据的内在结构，并突出重要的信息，忽略噪声和其他不相关的细节。在声音事件分类中，应用稀疏编码可以提取出声音信号的独特特征，这些特征对于区分不同声音事件至关重要。在这篇论文中，作者提出了一种基于稀疏编码的声音事件分类方法。首先，对声音信号进行预处理，然后通过稀疏编码算法得到每个声音样本的稀疏编码系数。这些系数可以视为声音事件的高阶特征，它们反映了声音信号在特定基上的非零成分。接下来，采用监督学习模型（如支持向量机、随机森林或深度学习模型等）对这些稀疏编码系数进行训练，构建一个能够区分不同声音事件的分类器。实验结果证明了这种方法的有效性，通过比较传统方法和基于稀疏编码的方法，显示了在声音事件分类任务中，使用稀疏编码特征可以显著提升分类准确率。这表明稀疏编码不仅能够捕捉到声音的复杂结构，而且能够为机器学习模型提供更有区分力的输入，从而提高识别的精确度。声音事件分类的稀疏编码是一种创新性的技术，它打破了依赖于特定声音特征的传统框架，转而利用稀疏编码挖掘数据的内在模式，为声音识别提供了新的思路。这一研究对音频处理、环境监控、智能家居等领域具有深远的影响，有望推动声音事件分类技术的进一步发展。

Sparse Coding for Sound Event Classification

Mingming Zhang

1,2

, Weifeng Li

1,2

, Longbiao Wang

, Jianguo Wei

, Zhiyong Wu

1,2

, Qingmin Liao

1,2

Shenzhen Key Lab. of Information Sci&Tech/Shenzhen Engineering Lab. of IS&DRM

Department of Electronic Engineering/Graduate School at Shenzhen, Tsinghua University, China

Nagaoka University of Technology, Japan

School of Computer Science and Technology, Tianjin University, China

Abstract—Generally sound event classification algorithms are

always based on speech recognition methods: feature-extraction

and model-training. In order to improve the classification

performance, researchers always pay much attention to find

more effective sound features or classifiers, which is obviously

difficult. In recent years, sparse coding provides a class of

effective algorithms to capture the high-level representation

features of the input data. In this paper, we present a sound

event classification method based on sparse coding and

supervised learning model. Sparse coding coefficients will be

used as the sound event features to train the classification model.

Experiment results demonstrate an obvious improvement in

sound event classification.

I. INTRODUCTION

The non-speech sound event classification has a wide use

in many important applications, such as music genre

classification [1-4], security surveillance [5], environment

detection [6-7], health care and so on. Generally, the system

of sound event detection and classification always uses the

methods derived from speech recognition, which in general

contains two steps: first, the sound event features are

extracted from labeled training sound, such as MFCC (Mel-

Frequency Cepstrum Coefficient, MFCC), PLP (Perceptual

Linear Predictive, PLP); second, the classifier is trained with

extracted features, such as SVM (Support Vector Machine,

SVM), GMM (Gaussian Mixture Model, GMM) and HMM

(Hidden Markov Model, HMM). A lot of related work has

been done in last twenty years. [8] relied on the use of

Wavelet transform technique for detection and on an

unsupervised order estimation of GMM. The basic idea of [9]

was to embed probabilistic distances into classical SVM to

classify the sound events. [10] presented an efficient robust

sound classification algorithm based on hidden Markov

models. While the literature [11] proposed a novel method for

feature extraction with spectrogram image feature. The most

difference between aforementioned methods is the different

combination of general features and classifiers.

Sparse coding is algorithm trying to find a high-level

representation of the input signal, which first introduced by

Olshausen [12]. It has to learn a dictionary called “basis

functions”, and the input signal can be represented by the

linear combination of the basis functions while the coefficient

vector is sparse. In recent years, sparse coding is paid more

and more attention in many research fields, especially image

processing such as image noise reduction, image restoration,

image classification and face recognition [13-14].

In audio signal processing, sparse coding can be used in

speaker Identification [15], speech recognition [16-17],

speech enhancement [18] and so on. Comparing with the

image processing, sparse coding has got less attention on the

use of audio signal processing, especially sound event

classification. [19] proposed a joint sparsity classification

method to exploit the inner correlation between observations

for acoustic signal classification. [20] presented an algorithm

for computing shift-invariant sparse coding (SISC) solutions

and applied it to audio classification. [21] employed the

sparse coding of auditory temporal modulations in music

genre classification. Sparse coding can represent each

example using a few non-zero coefficients and obtain a high-

level representation of the example, therefore the sparse

coefficients can be used as the new feature of sound event for

sound events classification with supervised learning.

In this paper, we propose to lean a high-level representation

of the input sound event features via sparse coding, and then

to train a supervised classifier for our classification task.

This paper is organized as follows: In section 2, the general

sparse coding algorithm is presented. In section 3, we give the

proposed method. Section 4 is our detailed experiment results

and the results analysis. Finally, we draw our conclusions in

section 5.

II. S

PARSE CODING

In this section, we will give a simple description of sparse

coding algorithm, including coefficient learning and

dictionary learning.

Given a signal sample

∈

, and dictionary

∈

the signal x can be described by a linear combination of some

atoms of dictionary D as follows:

Ds= i

The sparse representation

∈

of x can be estimated by the

following method:

(,)

min . . ( )

xDs st s

−<i

.. 1

tBcjn≤∀=

∑



where D=[d

··· d

] is the dictionary with column vector d

the j

atom, and s is the coefficient vector. Therefore the

above sparse coding problem can be seen as a optimization

problem with constrain as follows:

min ( )

Ds s

βφ

−+

∑

.. 1

tBcjn≤∀=

∑



(1)

(2)

(3)

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38656400

粉丝: 3

稀疏编码在声音事件分类中的应用

SRC稀疏编码

matlab_迁移学习和稀疏编码来实现不同领域之间的适配

这是一个基于深度学习分类模板

移位不变稀疏编码在音频分类中的应用

稀疏表示分类算法在模式识别中的应用

稀疏表示分类器在MATLAB中的内核版本实现

基于稀疏表示的高光谱图像MATLAB分类器实现

语音与音频信号分层编码：信号扭曲与稀疏变换技术

稀疏表示与信号处理：从理论到应用

面向人脸识别的类特定稀疏协作补丁表示方法

最新资源