监督动态聚类法：揭示音乐情感基元的分割

167 浏览量更新于2024-08-26 收藏 399KB PDF 举报

本文探讨了音乐情感分析中的核心问题，即如何将音乐序列分解成一组基本的情感单元，这些单元被称为情感基元。当前的研究主要依赖于固定长度的音乐片段，这种方法常常导致精确情绪识别的困难。因为短的音乐片段，如单个音乐帧，可能不足以引发强烈的情绪反应；而较长的音乐片段可能会丧失对情感变化的细节捕捉。传统的音乐情感分析方法往往忽视了音乐动态结构的重要性，特别是在实时性和情感细腻度方面的表现。为了克服这一局限，作者提出了一种监督动态聚类的方法，该方法允许在时间维度上对音乐进行非固定长度的分割，以捕捉音乐情感表达的即时性和连续性。这种新型算法旨在通过结合深度学习技术和聚类技术，学习到能够反映不同情感状态的动态模式，从而更准确地识别和提取音乐中的情感基元。研究过程包括以下几个关键步骤： 1. 数据预处理：收集和标注大量音乐数据，确保包含丰富多样的情感样本，以便训练模型能理解不同情感之间的细微差别。 2. 特征提取：利用先进的音频信号处理技术（如频谱分析、时域特征或深度学习特征提取器）从音乐信号中提取出有助于情感分析的特征向量。 3. 动态聚类：设计一种监督学习框架，可以动态调整聚类结构，根据音乐的实时情感变化进行实时或准实时的聚类。这可能涉及到选择合适的聚类算法（如DBSCAN、K-means的变种或基于深度学习的自编码器），并结合时间窗口的概念来捕捉音乐中的短期和长期情感趋势。 4. 情感基元学习：通过对动态聚类结果的分析，模型学习到一系列能够代表不同情感状态的基元，这些基元可以是音符、和弦、节奏模式或者更高级别的音乐结构。 5. 模型评估与优化：通过交叉验证和情感分类任务，评估模型在识别不同音乐片段中情感基元的能力，并根据性能反馈对模型进行迭代优化。通过这种方法，研究者们期望能够开发出一种更为精确且适应性强的音乐情感分析工具，为音乐推荐系统、音乐创作辅助、情感计算等领域提供更加深入和精准的理解。这不仅有助于改善用户体验，也为音乐情感研究开辟了新的方向，推动了跨领域交叉研究的发展。

Learning Music Emotion Primitives via Supervised

Dynamic Clustering

Yang Liu

1,2

, Yan Liu

, Xiang Zhang

3,4

, Gong Chen

, Kejun Zhang

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong SAR, P. R. China

Institute of Research and Continuing Education, Hong Kong Baptist University, Shenzhen, P. R. China

Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, P. R. China

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, P. R. China

csygliu@comp.hkbu.edu.hk, csyliu@comp.polyu.edu.hk, csxgzhang@comp.polyu.edu.hk

csgchen@comp.polyu.edu.hk, zhangkejun@zju.edu.cn

ABSTRACT

This paper explores a fundamental problem in music emotion anal-

ysis, i.e., how to segment the music sequence into a set of basic

emotive units, which are named as emotion primitives. Current

works on music emotion analysis are mainly based on the ﬁxed-

length music segments, which often leads to the difﬁculty of ac-

curate emotion recognition. Short music segment, such as an in-

dividual music frame, may fail to evoke emotion response. Long

music segment, such as an entire song, may convey various emo-

tions over time. Moreover, the minimum length of music segment

varies depending on the types of the emotions. To address these

problems, we propose a novel method dubbed supervised dynamic

clustering (SDC) to automatically decompose the music sequence

into meaningful segments with various lengths. First, the music

sequence is represented by a set of music frames. Then, the mu-

sic frames are clustered according to the valence-arousal values in

the emotion space. The clustering results are used to initialize the

music segmentation. After that, a dynamic programming scheme

is employed to jointly optimize the subsequent segmentation and

grouping in the music feature space. Experimental results on stan-

dard dataset show both the effectiveness and the rationality of the

proposed method.

Keywords

Music emotion analysis; emotion primitives; supervised dynamic

clustering

1. INTRODUCTION

Music, laxly explained as the organized sound, can convey and

evoke various emotions. Music emotion analysis, which attracts

much attention of researchers from various disciplines such as mu-

sicology [4], psychology [11], and computer science [27], plays a

crucial role in many real-world applications such as music recom-

mendation [24] and music therapy [18].

With the huge amount of available musical data and the rapid

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full cita-

tion on the ﬁrst page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

MM ’16, October 15-19, 2016, Amsterdam, Netherlands

 2016 ACM. ISBN 978-1-4503-3603-1/16/10. . . $15.00

DOI: http://dx.doi.org/10.1145/2964284.2967215

development of computer resources, computational approaches to

automate the process of music emotion analysis have taken on in-

creasing interests and importance [15, 29, 22, 24, 25, 12]. Lu et al.

[15] proposed a GMM-based model to detect the moods in music.

Yang et al. [29] presented a regression approach for music emotion

recognition. Trohidis et al. [22] formulated music emotion analy-

sis as a multi-label classiﬁcation problem. Wang et al. [24] built a

probabilistic model for music recommendation. Wu et al. [25] pro-

posed a multi-label multi-layer multi-instance multi-view learning

scheme for music emotion recognition. Liu et al. [12] introduced a

dimensionality reduction algorithm to model the relations between

low-level music features and high-level emotions.

Although tremendous progress has been made in music emotion

analysis, one of the fundamental problems, i.e., how to segment the

music sequence into a set of plausible units according to the emo-

tions, is seldom investigated. The inherent difﬁculty in the problem

mainly stems from the variety and complexity of music emotion

labels, a relatively large range of temporal scale for different music

emotions, and the intra-emotion variation of the music sequences.

In this paper, we work on this fundamental problem. We name

these basic units as music emotion primitives and explore machine

learning techniques to learn these primitives from the music data

with human annotated emotions. To address aforementioned chal-

lenges, a novel computational model dubbed supervised dynamic

clustering (SDC) is presented to jointly optimize the segmenta-

tion and clustering of music sequences under the supervision of

the emotion information.

To the best of our knowledge, this is the ﬁrst work to decompose

the music sequences into plausible emotion primitives, although

learning motion primitives for visual data has already achieved sig-

niﬁcant progress [6, 9, 13, 14, 30]. One latest related work is [7],

which also modeled the dynamics of music emotions over time.

However, the objective and methodology in [7] are totally differ-

ent from those in our paper. The work in [7] aimed at performing

emotion-based music retrieval via dynamic time warping [2], while

the target of our work is to discover the emotion primitives of the

music by the proposed SDC.

The rest of this paper is organized as follows. In Section 2, we

propose SDC to learn the music emotion primitives. In Section 3,

we schematically illustrate the learning outcomes and statistically

evaluate the performance of proposed method on standard dataset.

Finally, the paper is concluded with future work in Section 4.

2. SUPERVISED DYNAMIC CLUSTERING

Most of the computational models for music emotion analysis

are based on two kinds of emotion representations: the categorical

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38660579

粉丝: 11
资源: 918

监督动态聚类法：揭示音乐情感基元的分割

matlab模糊聚类程序 动态聚类图程序.zip_MATLAB 聚类_matlab 聚类程序_matlab谱系图_模糊动态聚类_

聚类法(系统聚类法 动态聚类法 模糊聚类法)

动态聚类数据分析算法（ISODATA）_动态聚类算法_聚类算法_动态聚类_动态聚类数据分析算法（ISODATA）_数据聚类_

无监督聚类学习（纯搬运）.pdf

通过具有实例级别约束的主动学习进行有效的半监督文档聚类

无监督学习和聚类PPT学习教案.pptx

无监督学习与聚类PPT学习教案.pptx

无监督情感聚类：基于维度判别的文本情感分析方法

全国31省域经济发展动态聚类分析：主成分与动态聚类结合

实例级约束主动学习提升半监督文档聚类效率

最新资源

matlab模糊聚类程序动态聚类图程序.zip_MATLAB 聚类_matlab 聚类程序_matlab谱系图_模糊动态聚类_

聚类法(系统聚类法动态聚类法模糊聚类法)