LSTM生成对抗网络在多类别MIDI音乐生成中的应用

160 浏览量更新于2024-08-28 2 收藏 782KB PDF 举报

"这篇研究论文探讨了如何使用基于LSTM的生成对抗网络（GAN）来实现多类别MIDI音乐的生成。作者通过结合多层循环神经网络（RNN）和GAN框架，旨在模拟音乐理论法则，以生成具有良好听觉体验的多元化音乐作品。" 在深度学习领域，神经网络对音乐生成的研究已经成为一个核心议题，尤其是当深度神经网络在处理大量数据集时展现出了强大的学习能力。本文提出了一种新的音乐乐谱生成模型，该模型利用多层递归神经网络（RNN）和生成对抗网络（GAN）的架构。首先，MIDI序列被输入到模型中，这些序列会被解析为音符的长度、频率、强度和时间信息，然后引入音乐理论规则，将初始序列设置为音乐和弦。在这个训练过程中，模型能够学习并捕捉音乐的分布特性。实验结果显示，这种网络结构是可行的，能够生成多种类型的音乐，并且听起来有良好的听感。关键词包括：音乐生成、生成对抗网络（GAN）、循环神经网络（RNN）、MIDI和和弦。 1. 引言自从1959年第一台计算机被用来创作音乐以来，计算机音乐生成就已经引起了人们的兴趣。随着时间的推移，技术的进步使得这种方法更加精细和复杂。LSTM，作为一种特殊的RNN变体，因其在处理序列数据时能有效地捕获长期依赖性而被广泛应用。而生成对抗网络，由两部分组成——生成器和判别器，它们通过对抗性学习相互博弈，使得生成器能够生成越来越接近真实数据的样本。 2. 方法本文的方法主要涉及两个关键组件：LSTM网络和GAN框架。LSTM网络用于理解和学习音乐结构，而GAN则提供了一种评估和改进生成音乐质量的有效机制。在训练过程中，LSTM生成器试图创造出逼真的音乐序列，而判别器则试图区分这些生成的序列与真实MIDI序列。通过不断的迭代，生成器逐渐提高其生成音乐的能力，直到达到与真实音乐难以区分的程度。 3. 实验与结果为了验证提出的模型效果，研究人员进行了详尽的实验，使用了不同类型的MIDI音乐数据集进行训练。实验结果表明，该模型不仅能够生成多样化的音乐类别，而且生成的音乐在听感上也达到了可接受的水平，证明了模型的有效性和实用性。 4. 讨论与未来工作虽然当前模型已经在音乐生成方面取得了显著的进步，但仍有提升空间。未来的挑战可能包括进一步增强音乐的创新性和多样性，以及将更多复杂的音乐理论元素融入生成过程。此外，模型的解释性和可理解性也是未来研究的一个重要方向。这项工作为音乐生成领域带来了新的视角，展示了深度学习技术在创造音乐艺术中的潜力。通过LSTM和GAN的结合，有望为自动作曲系统开辟新的可能性，同时也为人工智能在音乐创作领域的应用提供了新的工具。

Multi-category MIDI music generation based on LSTM Generative

adversarial network

Yutian Wang

, Guochen Yu

,JuanJuan Cai

, and Hui Wang

1,*

1. Key Laboratory of Media Audio & Video (Communication University of China), Ministry of Education

{wangyutian, yuguochen, caijuanjuan, hwang}@cuc.edu.cn

Abstract. Music generation by neural networks has become a central issue since deep neural networks

demonstrated their ability in learning from big data collections. This paper proposes a music score generation

model which employs multi-layer RNNs and GAN scheme. First of all, the midi sequences are passed to the model,

which is parsed as tone lengths, frequencies, intensities, and timing, and then the music theory law is introduced,

while the initial sequences are set as music chords. Consequently, the distribution of music is learned in the

process of training. The experimental results show that it is a feasible network structure which can generate

multi-category music with good hearing experience.

Keywords: Music generation, GAN, RNN, midi, chords.

1 introduction

It has been a long time to use computers to generate music since 1959 when the first music

composite algorithm was proposed[1]. In recent years, with the prevalence of deep learning, the ability

of neural networks in learning from big data has made people begin to apply DNN to generate

music[2] . There has been a tremendous amount of deep neural network models for music generation,

most of whom uses the RNNs and their modified architectures[3,4,5,6,7], which is presumably because

the music generation is inherently belonged to sequence generation[7]. For music generation, it is

mainly divided into symbolic-domain generation (generating MIDIS) and audio-domain generation

(generating WAVS). Famous examples include the MelodyRNN models[5] for symbolic-domain

generation and the SampleRNN model[6] for audio-domain generation. Generally, these models encode

the features of music(such as pitch, frequency, etc.)to latent codes, and generate a reasonably realistic

music sequence by decoding these codes. In fact, the encoder-decoder architecture has its unavoidable

shortcomings. For example, the selected latent codes is possibly not an ideal indicator, while some

crucial information is likely lost in the process of reducing dimension [8].

Generative adversarial networks(GANs)are designed with the aim of generating realistic data and

have been widely acclaimed [9]. Using this framework can highly fit the true distribution of the data,

which is not necessary to use the Markov chain for repeated sampling[10]. Besides, GAN can generate

date without much inferring during the learning process and avoid the need to calculate tricky

probabilities.

Recurrent neural networks(RNN)are widely used to model sequences of data[11], which have a

good effect in modelling text sequences and audio sequences[12]. In 2002, ECK modeled blues songs

with 25 discrete tone values[13]. In addition, there was a work combining the RNN with restricted

Boltzmann machines and representing 88 distinct tones[14]. Just a 2 years ago, Daniel Johnson used

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38656064

粉丝: 10
资源: 932

LSTM生成对抗网络在多类别MIDI音乐生成中的应用

2017.8.20.王立昊.周报3.基于LSTM神经网络架构利用MIDI短音乐段作为输入产生音乐（完整版）——王立昊1

基于LSTM神经网络架构利用MIDI短音乐段作为输入产生音乐——王立昊1

基于深度学习生成音乐(mid格式的音乐) 附代码

项目音乐生成

AI生成不同风格的音乐.txt

harmonyos2-Neural-Network-Music-Generation:利用最先进的NLP模型来生成人类发声的音乐

Bi-LSTM与CNN-GAN模型创作古典音乐：跨时代AI作曲研究

深度学习实践：流派音乐生成模型开发与评估

和声2：探索NLP模型在音乐生成中的应用

探索机器学习：自动音乐创作软件源码解析

最新资源