基于FrFT与RBF神经网络的高效说话人识别方法

26 浏览量更新于2024-08-26 收藏 764KB PDF 举报

本文探讨了一种利用优化的傅立叶变换（FrFT）频率图谱和径向基函数（RBF）神经网络进行说话人识别的方法。作者Penghua Li等人，来自重庆邮电大学自动化学院和中国汽车工程研究院的关键实验室，提出了一个创新的解决方案来解决语音信号的辨识问题。首先，文章的核心技术是基于FrFT的频谱图生成。FrFT是一种变分傅立叶变换，它通过改变频率分析的阶数，能够提供比传统傅立叶变换更为精细的语音信号特征描述。这种特性使得FrFT在处理非线性、时变和非平稳信号时展现出优势，有助于提高说话人识别的准确性。为了降低后续处理的计算复杂度，提取的FrFT频谱图被转换为低维向量，使用局部二值模式（LBP）操作。LBP是一种简单而有效的图像特征表示方法，通过对像素邻域的灰度差异进行编码，能够保留图像的局部结构信息，从而减少维度，提高搜索效率。在这个过程中，粒子群优化（PSO）算法被巧妙地应用，用于寻找最优的频谱图。PSO是一种模拟鸟群或粒子群体觅食行为的优化算法，它利用每个粒子的位置和速度信息，在搜索空间中寻找全局最优解。设计的适应度函数结合了类别间的距离和类内差异，以衡量候选频谱图在区分不同说话人方面的性能。这种方法的优势在于结合了FrFT的精细频率特性与RBF神经网络的强大非线性映射能力。RBF神经网络以其自组织的特征学习和简洁的模型结构，能够有效地从低维LBP特征中学习并建立说话人的独特识别模型。通过优化的频谱图选择和RBF网络的训练，该系统能够在保持高效性的同时，实现高精度的说话人识别任务。这篇研究论文提出了一种有效的方法，通过FrFT频谱图和RBF神经网络的协同工作，提升了说话人识别的性能，尤其在处理复杂语音信号和减少计算负担方面展现出了创新性和实用性。

Speaker Identiﬁcation Using FrFT-based Spectrogram and

RBF Neural Network

Penghua Li

, Yuanyuan Li

, Dechao Luo

, Hongping Luo

1. Automotive Electronics Engineering Research Center, College of Automation, Chongqing University of Posts and

Telecommunications, Chongqing, 400065, China

E-mail: lipenghua88@163.com

2. Key Laboratory of Vehicle Emission and Economizing Energy, National Institute of Automotive Engineering,

Chongqing, 400039, China

Abstract: This paper address a speaker identiﬁcation problem using optimized spectrogram and radial basis function

(RBF) neural network. The proposed approach applies fractional Fourier transform (FrFT) to obtain spectrograms with

different orders, which gives much more reﬁned description of the speech signals. To reduce the computational com-

plexity, these spectrograms are converted into low-dimensional vectors by local binary patterns (LBP) operator. The

LBP vectors compose the searching space of particle swarm optimization (PSO) algorithm which is designed for ﬁnd

the optimal spectrogram. The ﬁtness function of PSO algorithm is designed by between-class distances and within-class

distances. Through getting the optimal LBP vectors, the similarity criterion is used to ﬁnd the fractional orders corre-

sponding to the optimal spectrograms. Then, the optimal speech features are fed to the RBF network for training and

testing. The numerical experiments indicate that our approach has an acceptable recognition rate with high accuracy.

Key Words: Speaker Identiﬁcation, Spectrogram, Fractional Fourier Transform, Radial Basis Function Neural Network

1 INTRODUCTION

Applying spectrogram to identify an unknown speaker

among several speakers, which is originated from the Bell

laboratories, has been attracting widespread attentions of

many researchers for a long time [1].

It is simple but effective to use the traditional spec-

trogram, being created by short-time fourier transform

(STFT) or fast fourier transform (FFT), to achieve a good

performance of speaker identiﬁcation. However, these

spectrogram-based identiﬁed techniques have to meet a hy-

pothesis that the speech signals should be “short-time and

stationarity”, which is not suitable for processing the sig-

nals whose frequencies are varied with time [2].

In fact, the STFT has only provided the averaging proper-

ties of a speech signal. Speciﬁcally, a short-time window

analysis approximates the formant trajectories with rapid

changes as the collective states. In addition, the short-time

analysis will ﬂatten the formants when the window closes

dullnes signal, which leads many features of the speech sig-

nal to be ignored and the ﬁne structure of speech can not

be seen. Moreover, the texture feature contained in spec-

trogram is not clear due to the impaction of environment

noise in the sampling process. To investigate higher resolu-

tion analysis of spectrogram extracted from non-stationary

signals, signal processing community has given a consider-

able amount of attention on the fractional Fourier transform

This work is jointly supported by the National Natural Science

Foundation (61403053), the Youth Science and Technology Innovation

Talents Project of Chongqing (cstc2013kjrc-qnrc40005,CSTC2013kjrc-

tdjs40010), the Science and Technology Project of Chongqing Municipal

Education Commission (KJ1400404).

(FrFT) in recent years [3][4].

In this paper, we present an approach of speech feature ex-

traction using PSO algorithm to search the optimal FrFT-

based spectrograms. The speech signals are processed by

FrFT to obtain spectrograms with different orders. These

FrFT-based spectrograms have given much more reﬁned

description of the speech signals and composed the search-

ing space of particle swarm optimization (PSO) algorithm.

For reducing the computational complexity, these spectro-

grams are converted into low-dimensional vectors by local

binary patterns (LBP) operator. After getting the optimal

LBP vectors, the similarity criterion is exploited to ﬁnd

the fractional orders corresponding to the optimal spec-

trograms. Completing the setting of radial basis function

(RBF) neural network, these optimal speech features are

fed to the network for training and testing. The compara-

tive experiments, using FrFT-based spectrograms and FFT-

based spectrograms to identify the speakers, are carried out

to evaluate the performance of our proposed approach.

2 FRFT-BASED SPECTROGRAM

STFT-based spectrogram is a simple but effective time-

frequency analysis tool which shows a lot of information

related with phonetic features closely. The STFT of a

speech signal x(n) is deﬁned as

X(n, ω)=

∞



m=−∞

x(m) • ω (n − m)e

−jωm

(1)

where X(n, ω) denotes the frame-based signal conversion,

and ω (n) is the short-time window function which sliding

Proceedings of the 34th Chinese Control Conference

Jul

28-30, 2015, Han

zhou, China

3674

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38678796

粉丝: 4
资源: 932

基于FrFT与RBF神经网络的高效说话人识别方法

frft code_分数阶图像_frft图像_frftmatlab_图像去噪

基于frft估计线性调频信号参数

论文研究-基于SWT和FRFT的脉冲星TOA估计算法.pdf

论文研究-基于MF-FRFT伪码捕获算法研究.pdf

论文研究-基于短时FRFT的SAR自聚焦算法 .pdf

_基于ST-FRFT的非合作水声脉冲信号检测方法.pdf

基于FRFT的Hilbert变换

FRFT.zip_frft_frft matlab_frft 时频图_frft工具箱_时频分析工具箱

FrFT.zip_FRFT图像处理_fractional fourier_frft_图像frft变换_数字水印

基于离散FrFT的压缩感知地面运动目标成像

最新资源