深度神经网络中的音乐理论图像表示：多声部音乐时间调性关系建模

版权申诉

183 浏览量更新于2024-09-11 收藏 741KB PDF 举报

本文档《Modeling_Temporal_Tonal_Relations_in_Polyphonic_Music_Through_Deep_Networks.pdf》探讨了如何通过深度神经网络（Deep Networks）在多声部音乐（Polyphonic Music）的时间和调性关系建模方面实现创新。作者Ching-Hua Chuan和Dorien Herremans来自University of North Florida和University of Miami，他们合作提出了一种端到端的方法，旨在将音乐理论中的专业知识融入到深度学习模型中，从而提高音乐理解和生成的准确性和艺术性。传统上，深度学习在诸多领域取得了显著的成功，但在结合特定领域的专业知识时，如音乐理论，仍面临挑战。为解决这一问题，论文的核心贡献是开发了一种新颖的图像化表示方法，这种表示方式源于音乐理论中的“调网”（tonnetz）。调网是一种图形化的工具，它以直观的方式描绘了音高之间的调性关联，捕捉了音乐旋律中的调性变化规律。作者将音乐转换为二维图像，每行对应一个音符，而每个格子则表示两个音符之间的调性关系。这种方法使音乐数据的结构化变得易于处理，有助于深度网络更好地理解和学习音乐中的模式。论文设计了一个多层的深度网络架构，这个架构能够处理这种图像化输入，并且在预测和生成新的音乐片段时，能够自然地反映调性时间和旋律上的规律。通过这种创新的融合，该研究不仅展示了如何利用深度学习的强大计算能力来挖掘音乐数据的内在结构，而且展示了如何将音乐理论的内在逻辑与现代机器学习技术相结合。这种方法有可能应用于音乐分析、自动作曲、音乐推荐系统等领域，为音乐创作和理解开辟新的可能性。此外，论文还可能讨论了实验设置、评估指标以及与其他方法的比较，以证明其在实际应用中的有效性。总结来说，这篇论文是深度学习与音乐理论跨学科合作的典范，为音乐信息建模提供了一种新颖且有效的途径，有望推动未来音乐人工智能的发展。

INTRODUCTION

Modeling Temporal Tonal Relations in Polyphonic Music Through Deep Networks

with a Novel Image-Based Representation

Ching-Hua Chuan

1, 2

University of North Florida

University of Miami

c.chuan@miami.edu

Dorien Herremans

3, 4

Singapore University of Technology and Design

Institute of High Performance Computing, A*STAR, Singapore

dorien_herremans@sutd.edu.sg

Abstract

We propose an end-to-end approach for modeling polyphonic

music with a novel graphical representation, based on music

theory, in a deep neural network. Despite the success of deep

learning in various applications, it remains a challenge to in-

corporate existing domain knowledge in a network without

affecting its training routines. In this paper we present a novel

approach for predictive music modeling and music generation

that incorporates domain knowledge in its representation. In

this work, music is transformed into a 2D representation, in-

spired by tonnetz from music theory, which graphically en-

codes musical relationships between pitches. This represen-

tation is incorporated in a deep network structure consist-

ing of multilayered convolutional neural networks (CNN, for

learning an efﬁcient abstract encoding of the representation)

and recurrent neural networks with long short-term memory

cells (LSTM, for capturing temporal dependencies in music

sequences). We empirically evaluate the nature and the effec-

tiveness of the network by using a dataset of classical mu-

sic from various composers. We investigate the effect of pa-

rameters including the number of convolution feature maps,

pooling strategies, and three conﬁgurations of the network:

LSTM without CNN, LSTM with CNN (pre-trained vs. not

pre-trained). Visualizations of the feature maps and ﬁlters in

the CNN are explored, and a comparison is made between

the proposed tonnetz-inspired representation and pianoroll,

a commonly used representation of music in computational

systems. Experimental results show that the tonnetz represen-

tation produces musical sequences that are more tonally sta-

ble and contain more repeated patterns than sequences gen-

erated by pianoroll-based models, a ﬁnding that is directly

useful for tackling current challenges in music and AI such

as smart music generation.

Introduction

Predictive models of music have been explored by re-

searchers since the very beginning of the ﬁeld of computer

music (Brooks et al. 1957). Such models are useful for ap-

plications in music analysis (Qi, Paisley, and Carin 2007);

music cognition (Schellenberg 1996); improvement of tran-

scription systems (Sigtia, Benetos, and Dixon 2016); music

generation (Herremans et al. 2015); and others. Applications

such as the latter represent various fundamental challenges

 2018, Association for the Advancement of Artiﬁcial

in artiﬁcial intelligence for music. In recent years, there has

been a growing interest in deep neural networks for model-

ing music due to their power to capture complex hidden re-

lationships. The launch of recent projects such as Magenta, a

deep learning and music project with a focus on music gen-

eration by the Google Brain team, testify to the importance

and recent popularity of music and AI. With this project we

aim to further advance the capability of deep networks to

model music by proposing a novel image based representa-

tion inspired by music theory.

Recent deep learning projects in the ﬁeld of music include

Eck and Schmidhuber (2002), in which a recurrent neural

network (RNN) with LSTM cells is used to generate impro-

visations (ﬁrst chord sequences, followed by the generation

of monophonic melodies) for 12-bar blues. They represent

music as notes whose pitches fall in a range of 25 possi-

ble pitches (C

to C

) and that occur at ﬁxed time intervals.

Therefore, the network has 25 outputs that are each con-

sidered independently. A decision threshold of 0.5 is used

to select each note as a statistically independent events in a

chord. More recently, a pianoroll representation of 88 keys

has been used to train a RNN by Boulanger-Lewandowski,

Bengio, and Vincent (2012). The authors integrate the notion

of chords by using restricted Boltzmann machines on top of

an RNN to model the conditioned distribution of simultane-

ously played notes in the next time slice, given the previous

time slice. In Huang, Duvenaud, and Gajos (2016), a chord

sequence is modelled as a string of symbols. Chord em-

beddings are learned from a corpus using Word2vec based

on the skip-gram model (Mikolov et al. 2013), to describe

a chord according to its sequential context. A Word2vec

approach is also used in Herremans and Chuan (2017) to

model and generate polyphonic music. For a more complete

overview of music generation systems, the reader is referred

to Herremans, Chuan, and Chew (2017). While music can

typically be represented in either audio or symbolic format,

the focus of this paper is on the latter.

The widely spread adoption of deep learning in areas such

as image recognition is due to the high accuracy of models

given the availability of abundant data, and its end-to-end

solution to eliminate the need of hand-crafted features. Mu-

sic, however, is a domain where well-annotated datasets are

relatively scarce, but which has a long history of theoretical

deliberation. It is therefore important to explore how such

Preprint accepted for publication in the Proceedings of the Thirty-Second AAAI Conference on Artiﬁcial Intelligence

(AAAI-18). New Orleans, Louisiana, USA. Feb. 2018

下载后可阅读完整内容，剩余7页未读，立即下载

Fun_He

粉丝: 18
资源: 104

深度神经网络中的音乐理论图像表示：多声部音乐时间调性关系建模

tonal, 一种用于Javascript的功能音乐.zip

powerdesigner16.5的核心功能英文原版文档（全12册）

No module named 'transformers.modeling_bert'

ModuleNotFoundError: No module named 'transformers.modeling_bert'

No module named 'paddlehub.module.modeling_ernie'

runtimeerror: failed to import transformers.models.bert.modeling_bert becaus

ModuleNotFoundError: No module named 'modeling_nezha'

AttributeError: module 'transformers.models.gpt2.modeling_gpt2' has no attribute 'MLP'

解释每一句# 读取训练好的模型 import paddle from ppcls.modeling.architectures.se_resnet_vd import SE_ResNet50_vd model = SE_ResNet50_vd(class_dim=16) model.set_state_dict(paddle.load('./output/SE_ResNet50_vd/best_model/ppcls.pdparams'))

No module named 'transformers.modeling_outputs'

最新资源