
http://www.paper.edu.cn
- 1 -
Multi-emotional single-track music generating model based
on LSTM
WANG Xicheng, LI Wei
**
(Institute of Network Technology, Beijing University of Posts and Telecommunications)
Brief author introduction:WANG Xicheng(1995-), Male, Graduate student, Cloud Computing and Service
Oriented Computing
Correspondance author: LI Wei, Male, Associate Professor, Business Network Intelligence. E-mail:
liwei@bupt.edu.cn
Abstract: With the popularity of short video platforms, it has become very common for users to create
videos for sharing. As an integral part of short videos, background music plays an important role in
emotional expression. However, the background music currently in short video platforms is relatively
single, and it also involves copyright issues. In this paper, by improving existing music generation
model, a multi-emotional single-track music generation model is proposed. By analyzing the
advantages and disadvantages of the original network and the lookback mechanism, and combining
with the actual application scenario, the LB-Attention model is proposed. Note positioning information,
music emotional information, and attention mechanism are introduced into the model to achieve the
requirements of application scenarios. By comparing the generated results and performance indicators
of the original model and the model in this paper, it is concluded that the model has excellent music
generation effect. The performance of LB-Attention model is similar to the original model, and can
basically meet the needs of the application scenario.
Key words: machine learning; short videos; music generation; attention mechanism
0 Introduction
In recent years, with the popularity of various mobile devices and the coverage of wireless
networks, various short video platforms have gradually emerged. In addition to the original
contents, such as pictures and text, it has become more and more common for users to share their
lives through videos. In video creation, background music is an indispensable element in order to
set the mood and emotions. However, there are many problems with the content of the short video
platforms. On the one hand, some popular music will be used repeatedly by a large number of
users, which leads to the lack of uniqueness. On the other hand, copyright disputes caused by the
casual use of popular music is also one of the issues that platforms cannot avoid. Therefore, users
have greater limitations in the choice of background music.
Automatically generating music by computer has a long history. According to the form of the
result, music generation models can be roughly divided into two types: models that directly
generate music waveform files and models that generate symbol data such as MIDI. Although the
former can also be used to generate music, it is limited by issues of sound quality and file size.
These models are more suitable for solving audio generation problems such as speech synthesis,
whose contents cannot be represented by symbols. The latter, due to the simplicity of notation, is
often used to deal with the problem of music generation.
Based on the existing models of music symbol data generation, this paper combines the short
video platform's demand for multi-emotional music to improve the models. While optimizing the
model generation effect, the emotional and style information of the music is added so that users
can generate background music with different emotions corresponding to the video content.
1 Related Works
Research in music generation can be traced back to the 18th century in Europe. In recent