优化音乐音频分析的相似度矩阵增强方法

需积分: 7 25 浏览量更新于2024-09-09 收藏 346KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"通过引入稳健且可扩展的音频特征和结合上下文信息，增强音乐音频分析中的相似性矩阵，以优化时间、空间复杂性和结构信息提取的难度。此方法显著减小了矩阵规模，并简化了结构提取步骤，适用于音频摘要和音频同步等应用。" 在音乐音频分析领域，相似性矩阵起着至关重要的作用。然而，这些矩阵的二次时间复杂性和空间复杂性，以及从它们中提取所需结构信息的复杂性，常常对实际应用构成挑战。这篇论文提出了一种新方法，旨在增强相似性矩阵的结构特性。首先，论文引入了一种新型的音频特征类，这种特征能够吸收局部时间变化。这意味着即使音频在短时间内存在变异，也能更准确地捕捉到音频的本质相似性。这种稳健且可扩展的音频特征对于处理现实世界中的复杂音频信号尤其有用，因为它们能更好地适应各种环境和条件下的音频变化。其次，论文将上下文信息整合到局部相似度测量中。通过考虑音频片段周围的上下文，可以更全面地理解它们之间的关系，从而提高相似度评估的准确性。这种上下文信息的融入不仅提升了相似性矩阵的质量，也使得从矩阵中提取结构信息的过程变得更加高效。由于这两个贡献，改进后的相似性矩阵的规模显著减小。这降低了存储和计算需求，使其在处理大型音乐数据集时更具可行性。同时，结构提取步骤也因此变得更为简便。论文以音频摘要和音频同步为例，展示了这种方法的应用。通过使用这些技术，可以开发出既有效又计算上可行的算法来解决这些问题。音频摘要是一项任务，它要求从长时间的音频流中提取出关键或代表性的片段，形成简短的概述。而音频同步则涉及到将多个音频源对齐，例如在音乐混音或视频编辑中确保声音与画面同步。这两种应用都得益于改进的相似性矩阵，因为它允许更快速地识别和处理音频中的关键部分，而不必处理整个矩阵，大大提高了处理效率。这篇论文提出的增强相似性矩阵的方法为音乐音频分析提供了一种新的、强大的工具，能够更好地应对实际应用中的挑战，同时也为未来的音频处理研究提供了新的思路和可能性。

资源详情

资源推荐

ENHANCING SIMILARITY MATRICES FOR MUSIC AUDIO ANALYSIS

Meinard M

uller, Frank Kurth

Department of Computer Science III, University of Bonn

omerstr. 164, D-53117 Bonn, Germany

{meinard, frank}@cs.uni-bonn.de

ABSTRACT

Similarity matrices have become an important tool in music audio

analysis. However, the quadratic time and space complexity as well

as the intricacy of extracting the desired structural information from

these matrices are often prohibitive with regard to real-world appli-

cations. In this paper, we describe an approach for enhancing the

structural properties of similarity matrices based on two concepts:

ﬁrst, we introduce a new class of robust and scalable audio features

which absorb local temporal variations. As a second contribution,

we then incorporate contextual information into the local similarity

measure. The resulting enhancement leads to signiﬁcant reduction in

matrix size and also eases the structure extraction step. As an exam-

ple, we sketch the application of our techniques to the problems of

audio summarization and audio synchronization, obtaining effective

and computationally feasible algorithms.

1. INTRODUCTION

The concept of similarity matrices has been introduced to the mu-

sic context by Foote in order to visualize the time structure of au-

dio and music [1]. The general idea is as follows: given two au-

dio data streams, one ﬁrst transforms them into sequences



V :=

(v

,v

,...,v

) and



W := ( w

,w

,..., w

) of feature vectors

v

∈F, 1 ≤ n ≤ N,and w

∈F, 1 ≤ m ≤ M. Here, F denotes

a suitable feature space, e.g., a space of spectral, MFCC, or chroma

vectors. Based on a suitable similarity measure d : F×F →R,

one can form a similarity matrix S =(d(v

,w

))

by pairwise

comparison of the features v

and w

. In case that



V =



W , the

resulting matrix is also referred to as self-similarity matrix.

Similarity matrices have proven to be a valuable tool in audio

analysis. In Sect. 3, we address two such analysis tasks: audio sum-

marization and audio synchronization. The underlying principle is

that similar segments are revealed as paths along diagonals in the

corresponding similarity matrix. As an example, we consider the

ﬁrst 94 seconds of an Ormandy interpretation of Brahms’ Hungar-

ian Dance No. 5, having the musical form A

(segment

being a repetition of A

and B

being a repetition of B

). The

self-similarity matrix (with respect to some suitable audio features),

shown in Fig. 1, reveals this structure: the path in the lower left cor-

ner indicates that the segment between 1 and 22 is similar to the seg-

ment between 22 and 42 (measured in seconds), whereas the curved

path in the upper right corner indicates that the segment between 42

and 69 is similar to the segment between 69 and 89. Note that in

the Ormandy interpretation, the tempo of B

is much faster than that

of B

, which is revealed by the gradient of the path encoding the

relative tempo difference between the two segments.

There are two major problems in music audio analysis based on

similarity matrices: the ﬁrst problem concerns the robust extraction

200 400 600 800

900

800

700

600

500

400

300

200

100

20 40 60 80

1 22

Fig. 1. Self-similarity matrices of the ﬁrst 94 seconds of an Ormandy

interpretation of Brahm’s Hungarian Dances No. 5. The musical

form A

is revealed by the path structure. The left side

shows a matrix with a feature sampling rate of 10 Hz. The right side

shows an enhanced similarity matrix (S

min

10,2

(21, 5)) with a feature

sampling rate of 1 Hz.

of suitable paths revealing the structural similarity relations between

the underlying audio streams. So far, this problem has been studied

under the constant tempo assumption, which typically holds for pop

music, see Sect. 3.1 for references. For the case, however, that musi-

cally similar segments exhibit signiﬁcant local tempo variations—as

often holds for Western classical music—there are yet no effective

and efﬁcient solutions. The second problem concerns the high time

and space complexity O(NM) to compute and store the similarity

matrices, which makes the usage of similarity matrices infeasible

for large N and M . Here, reducing the number N and M by simply

increasing the feature analysis window often destroys the structural

properties of the similarity matrices, see Fig. 5.

In this paper, we suggest an approach for enhancing the path

structure of similarity matrices, which constitutes an important step

towards a solution of the above mentioned problems. In particular,

we cope with the delicate tradeoff between needing coarse and ro-

bust features on the one hand and requiring sufﬁcient ﬂexibility to

deal with local tempo variations on the other hand. Our basic idea

towards ﬁnding a good tradeoff can be summarized as follows. In-

stead of relying on one single mechanism, we take care of the tempo-

ral variations on various levels simultaneously: on the “feature level”

(using statistical features to absorb micro-variations), on the “local

distance measure level” (including ﬂexible contextual information to

account for local variations) as well as on the “path extraction level”

(accounting for coarse global time variations). In Sect. 2, we de-

scribe this approach in detail and apply the techniques to the class of

chroma features. In Sect. 3, we then sketch the impact of our matrix

enhancement techniques to the problems of music summarization

下载后可阅读完整内容，剩余3页未读，立即下载

rogermansuy

粉丝: 0
资源: 2

优化音乐音频分析的相似度矩阵增强方法

Getting Started with C++ Audio Programming for Game Development

Enhancing the Discriminative Feature Learning for Visible-Thermal Cross-Modality

Small but Mighty: Enhancing 3D Point Clouds Semantic Segmentation with U-Next Framework

packages are looking for funding run `npm fund` for details

matlab exposure

LE CS goals

基于yolo的低照度目标检测英文文献

enhancing low light videos by exploring high sensitivity camera noise

用英文回答面试问题：你如何理解信息管理与信息系统专业。两分钟左右

enhancing mqtt-based machine-to-machine communication with python in iot sys

error happens when enhancing class: Dangling meta character '*' near index 0

yolov8 pose本地摄像头推理

HitPaw Video Enhancer

c++ whether

boost pdf

the SEA generator, HYP generator, RBF generator, RTG generator and AGR generator

BraTS2021数据集的标签为什么是4通道的

推荐20篇关于多特征服装检索的文献

network proxy

12306抢票脚本 - Bypass

最新资源