PAD模型在情感语音韵律转换中的应用

26 浏览量更新于2024-08-30 收藏 451KB PDF 举报

"这篇研究论文探讨了如何利用PAD三维情感模型转换情感言语的韵律，特别是在计算交流中的幸福感方面。作者设计了一个包含11种典型情感表达的情感语音语料库，每个表达都用PAD值心理学意义上的情感信息进行标记。他们采用五阶音调模型在音节级别上模拟情感语音的音高轮廓，并构建了一个基于广义回归神经网络（GRNN）的韵律转换模型，用于实现情感语音的音高、持续时间和停顿时间的转换，其中考虑了情感的PAD值和上下文参数预测。" 本文的核心知识点包括： 1. **PAD三维情感模型**：PAD模型是心理学领域常用的情绪表示法，代表了四个基本情绪状态：愉快（Pleasure）、激活（Activity）和支配（Dominance）。在这个模型中，情感可以用一个三维坐标系来表示，通过调整这三个维度的值，可以描述出复杂多变的情感状态。 2. **情感语音处理**：研究聚焦于将PAD模型应用于情感语音的韵律转换，目的是在语音通信中计算和表达幸福感。韵律在语言中起着重要的作用，它影响了说话的节奏、音高变化和停顿，从而影响了情感的传达。 3. **情感语音语料库**：为了进行实验，研究人员创建了一个包含11种典型情感表达的语料库。每个表达都被分配了与心理相关的PAD值，这为模型训练提供了基础数据。 4. **五阶音调模型**：这是一种用于模拟语音中音高轮廓的数学模型，它在音节级别上分析并再现了情感语音的音高变化，对于理解和重现情感至关重要。 5. **广义回归神经网络（GRNN）**：GRNN是一种非线性回归模型，常用于预测任务。在这里，GRNN被用来建立一个模型，它可以预测并转换情感语音的音高、持续时间和停顿，同时考虑了情感的PAD值和上下文信息。 6. **韵律转换**：GRNN模型不仅处理音高，还处理了持续时间和停顿，这些都是构成语音情感表达的重要元素。通过模型，可以将一种情感的韵律特征转换为另一种，比如将非愉快的情感转换成具有幸福感的表达。 7. **情感计算与交流**：这项工作的最终目标是改进情感交流，通过改变语音的韵律特性来增强或改变传达的幸福感，这对于人机交互、语音合成和情感识别等领域有潜在的应用价值。该研究探索了如何使用心理情感模型和深度学习技术来改善和控制情感语音的表达，尤其是幸福感的传递，这对于情感计算和智能通信系统的发展具有重要意义。

Applying PAD Three Dim

ensional Emotion Model to

Convert Prosody of Emotional Speech

iaoyong Lu

Hong

wu Yang*

Aibao Z

hou

ollege of Psychology, Northwest Normal University, Lanzhou

llege of Physics and Electronic Engineering, Northwest Normal University, Lanzhou

llege of Computer Science and Engineering, Northwest Normal University, Lanzhou

Email: yanghw@nwnu.edu.cn

stract—Happiness has attracted much attention of the

researchers in various fields. This paper realizes prosodic

conversion of emotional speech for happiness computing on

speech communication. An emotional speech corpus includes 11

kinds of typical emotional utterances is designed, where each

utterance is labeled the emotional information with PAD value in

a psychological sense. A five-scale tone model is employed to

model the pitch contour of emotional utterances on the syllable

level. A generalized regression neural network (GRNN) based

prosody conversion model is built to realize the transformation of

pitch contour, duration and pause duration of emotional

utterance, in which the PAD values of emotion and context

parameter are adopted to predict the prosodic features.

Emotional utterance is then re-synthesized with the STRAIGHT

algorithm by modifying pitch contour, duration and pause

duration. Experimental results on Emotional Mean Opining

Score (EMOS) demonstrate that the prosody conversion effect of

proposed method can express corresponding feelings.

Index Terms—happiness, PAD emotion model, five-scale tone

model, generalized regression neural network (GRNN),

STRAIGHT, prosody conversion.

RODUCTION

appiness is the eternal value pursuit of human, and is

the ultimate goal of social development. The basic outline of

happiness includes not only the cognitive component, but also

includes emotional component [1]. Since happiness is one of

the most important aspects for human communication, it has

been a hot topic on human-computer speech communication

including speech synthesis and speech recognition. A speech

synthesis system can synthesize human-like utterances.

Though current speech synthesis system is generally accepted

by users on its intelligibility and naturalness, synthetic speech

is primarily presented to users with neutral intonation, which

lacks the rich emotional express. Therefore, high performance

speech synthesis has become a hot study point of speech

engineering in recent years [2]. Emotional speech synthesis

mainly adopts speech synthesis methods based on Hidden

Markov Model (HMM) [3] and large-corpus based

concatenation method [4]. Although the former can use the

method of speaker adaptation transform [5-6] to realize

emotional speech synthesis, the quality of synthesized speech

is hard to be accepted by users. Though speech synthesis by

large-corpus based concatenation method can achieve high

naturalness, it is very difficult to record different emotional

corpus. Therefore, some studies proposed methods to realize

emotional speech synthesis through prosody conversion. Four

basic emotions is selected in [7] to realize the conversion the

related prosodic characteristics of emotional speech. The PAD

three dimensional emotion model is also employed [8-9] obtain

synthetic emotional speech. The emotional speech conversion

is also achieved by using PAD emotion model [10]. The SVR

is also used to predict emotional prosody parameter [11].

However, these studies lack of the modeling of fundamental

frequency contour.

In order to convert F0 envelop in emotional speech

conversion, the paper builds 11 kinds of typical emotional text

corpus and records relative speech corpus. The PAD values of

speech corpus are labeled with psychological method. We also

build a syllabic F0 model with Five-scale Tone Model [12]. A

predicting model of emotional speech prosody parameter is

constructed with Generalized Regression Neural Network

(GRNN). The model can predict the prosodic features of target

emotional speech according to the PAD value and contextual

features of sentences. Finally, STRAIGHT[13] algorithm is

exploited to achieve emotional speech conversion. The

experimental results show that the converted speech can

express the target emotion.

II. PAD

HREE DIMENSIONAL EMOTION MODEL

he main methods for describe emotion [14] include

category representation and dimensional representation. Since

the category description is difficult to describe the mixed

emotions, the paper adopts PAD three dimensional emotion

model to describe emotional speech.

PAD three dimensional emotion model [15] is composed of

three dimensionalities: 1) Pleasure-Displeasure which means

the positive and negative of the emotional state; 2) Arousal-

Non-arousal which means emotional psychological activation

level and alertness; 3) Dominance-Submissiveness which

means the control and influence of the emotions of others and

the external environment.

A. Text corpus

We respectively select one or two common emotions that

can represent its quadrant from each quadrant of PAD three

dimensional space. These common emotions have 11 species

978-1-4799-6284-6/14/$31.00

 2014

IEEE

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38681736

粉丝: 3
资源: 886

PAD模型在情感语音韵律转换中的应用

基于PAD三维情感模型的情感语音韵律转换

基于PAD的个性化情感模型 (2012年)

三维伊辛模型的有效潜力：拟ϵ扩展研究

应用PAD编程技术处理蔗糖水解实验数据 (1991年)

Android应用源码pad无线点餐项目

渐近波形估计技术在三维电磁散射中的应用 (2005年)

基于PAD情感模型的维度情感语音数据库优化研究

个性化情感模型：基于PAD的情感计算研究

三维伊辛模型：伪ϵ扩展法在有效势与小场状态方程中的应用

PAD模型级联分类情感语音识别提升技术

最新资源