LSTM驱动的多情感单音轨音乐生成研究

PDF格式 | 332KB | 更新于2024-09-03 | 8 浏览量 | 举报

"这篇论文是‘首发论文’，标题为‘Multi-emotional single-track music generating model based on LSTM’，由王希成和李炜撰写，他们来自北京邮电大学网络技术研究所。研究主要关注在短视频平台上，背景音乐对情感表达的重要性，以及当前存在的单一性和版权问题。论文提出了一种改进的多情感单音轨音乐生成模型，通过分析LSTM（长短期记忆网络）的优缺点，并结合实际应用情境，提出了LB-Attention模型，用于解决音乐定位问题。" 正文：随着短视频平台的广泛流行，用户自创视频分享成为常态，背景音乐在这些短视频中的情感传达起到至关重要的作用。然而，现有的短视频背景音乐存在种类单一、可能涉及版权问题等挑战。为了解决这些问题，论文的作者王希成和李炜探讨了一种基于LSTM的多情感单音轨音乐生成模型。 LSTM是一种特殊的循环神经网络（RNN），特别适合处理序列数据，如音乐序列。原始的LSTM网络能够捕捉时间序列中的长期依赖性，但可能会忽视某些局部细节。论文中，作者深入分析了LSTM的优缺点，特别是其回看机制，以理解如何改进网络结构以更好地生成多情感音乐。针对LSTM的局限，论文提出了LB-Attention模型。Attention机制允许模型在生成过程中更灵活地关注输入序列的不同部分，这对于生成具有复杂情感变化的音乐尤其有用。结合LSTM的长时记忆和Attention机制的动态聚焦，LB-Attention模型能更精确地定位音乐的特定音符或节拍，从而增强音乐的情感表达。此外，论文还强调了在实际应用场景中，音乐生成模型应具备的能力，比如适应不同情感需求和风格变换。该模型不仅可以生成新颖的音乐片段，而且可以依据用户的情感倾向来定制音乐，使得背景音乐与视频内容更好地匹配，提升用户体验。这篇论文贡献了一种创新的音乐生成方法，通过融合LSTM和Attention机制，有望解决短视频平台背景音乐的多样性和版权问题，同时也为音乐创作提供了新的思路和技术支持。

http://www.paper.edu.cn

- 1 -

中国科技论文在线

Multi-emotional single-track music generating model based

on LSTM

WANG Xicheng, LI Wei

(Institute of Network Technology, Beijing University of Posts and Telecommunications)

Brief author introduction:WANG Xicheng(1995-), Male, Graduate student, Cloud Computing and Service

Oriented Computing

Correspondance author: LI Wei, Male, Associate Professor, Business Network Intelligence. E-mail:

liwei@bupt.edu.cn

Abstract: With the popularity of short video platforms, it has become very common for users to create

videos for sharing. As an integral part of short videos, background music plays an important role in

emotional expression. However, the background music currently in short video platforms is relatively

single, and it also involves copyright issues. In this paper, by improving existing music generation

model, a multi-emotional single-track music generation model is proposed. By analyzing the

advantages and disadvantages of the original network and the lookback mechanism, and combining

with the actual application scenario, the LB-Attention model is proposed. Note positioning information,

music emotional information, and attention mechanism are introduced into the model to achieve the

requirements of application scenarios. By comparing the generated results and performance indicators

of the original model and the model in this paper, it is concluded that the model has excellent music

generation effect. The performance of LB-Attention model is similar to the original model, and can

basically meet the needs of the application scenario.

Key words: machine learning; short videos; music generation; attention mechanism

0 Introduction

In recent years, with the popularity of various mobile devices and the coverage of wireless

networks, various short video platforms have gradually emerged. In addition to the original

contents, such as pictures and text, it has become more and more common for users to share their

lives through videos. In video creation, background music is an indispensable element in order to

set the mood and emotions. However, there are many problems with the content of the short video

platforms. On the one hand, some popular music will be used repeatedly by a large number of

users, which leads to the lack of uniqueness. On the other hand, copyright disputes caused by the

casual use of popular music is also one of the issues that platforms cannot avoid. Therefore, users

have greater limitations in the choice of background music.

Automatically generating music by computer has a long history. According to the form of the

result, music generation models can be roughly divided into two types: models that directly

generate music waveform files and models that generate symbol data such as MIDI. Although the

former can also be used to generate music, it is limited by issues of sound quality and file size.

These models are more suitable for solving audio generation problems such as speech synthesis,

whose contents cannot be represented by symbols. The latter, due to the simplicity of notation, is

often used to deal with the problem of music generation.

Based on the existing models of music symbol data generation, this paper combines the short

video platform's demand for multi-emotional music to improve the models. While optimizing the

model generation effect, the emotional and style information of the music is added so that users

can generate background music with different emotions corresponding to the video content.

1 Related Works

Research in music generation can be traced back to the 18th century in Europe. In recent

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38679449

粉丝: 5

LSTM驱动的多情感单音轨音乐生成研究

人工智能-项目实践-问答系统-Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库.zip

The Psychology of Massively Multi-User Online RPG 大规模多用户在线角色扮演游戏的心理学

A Text Sentiment Classification Modeling Method Based on Coordinated CNN-LSTM-Attention Model

Modeling-Uncertainty-in-Predicting-Emotional-Attributes-from-Spontaneous-Speech

facial-emotional-classification

Information-Driven Multi-Robot Behavior Adaptation to Emotional Intention in Human-Robot Interaction

emofilt - emotional speech synthesis-开源

NLP-Emotional-analysis:情绪分析

Brain-emotional-learning.zip_BEL -PID_Brain emotional_brain_matl

personal-emotional-dialogue-system:对话系统文件清单

最新资源