Speaker Identification Using FrFT-based Spectrogram and
RBF Neural Network
Penghua Li
1
, Yuanyuan Li
1
, Dechao Luo
2
, Hongping Luo
1
,
1. Automotive Electronics Engineering Research Center, College of Automation, Chongqing University of Posts and
Telecommunications, Chongqing, 400065, China
E-mail: lipenghua88@163.com
2. Key Laboratory of Vehicle Emission and Economizing Energy, National Institute of Automotive Engineering,
Chongqing, 400039, China
Abstract: This paper address a speaker identification problem using optimized spectrogram and radial basis function
(RBF) neural network. The proposed approach applies fractional Fourier transform (FrFT) to obtain spectrograms with
different orders, which gives much more refined description of the speech signals. To reduce the computational com-
plexity, these spectrograms are converted into low-dimensional vectors by local binary patterns (LBP) operator. The
LBP vectors compose the searching space of particle swarm optimization (PSO) algorithm which is designed for find
the optimal spectrogram. The fitness function of PSO algorithm is designed by between-class distances and within-class
distances. Through getting the optimal LBP vectors, the similarity criterion is used to find the fractional orders corre-
sponding to the optimal spectrograms. Then, the optimal speech features are fed to the RBF network for training and
testing. The numerical experiments indicate that our approach has an acceptable recognition rate with high accuracy.
Key Words: Speaker Identification, Spectrogram, Fractional Fourier Transform, Radial Basis Function Neural Network
1 INTRODUCTION
Applying spectrogram to identify an unknown speaker
among several speakers, which is originated from the Bell
laboratories, has been attracting widespread attentions of
many researchers for a long time [1].
It is simple but effective to use the traditional spec-
trogram, being created by short-time fourier transform
(STFT) or fast fourier transform (FFT), to achieve a good
performance of speaker identification. However, these
spectrogram-based identified techniques have to meet a hy-
pothesis that the speech signals should be “short-time and
stationarity”, which is not suitable for processing the sig-
nals whose frequencies are varied with time [2].
In fact, the STFT has only provided the averaging proper-
ties of a speech signal. Specifically, a short-time window
analysis approximates the formant trajectories with rapid
changes as the collective states. In addition, the short-time
analysis will flatten the formants when the window closes
dullnes signal, which leads many features of the speech sig-
nal to be ignored and the fine structure of speech can not
be seen. Moreover, the texture feature contained in spec-
trogram is not clear due to the impaction of environment
noise in the sampling process. To investigate higher resolu-
tion analysis of spectrogram extracted from non-stationary
signals, signal processing community has given a consider-
able amount of attention on the fractional Fourier transform
This work is jointly supported by the National Natural Science
Foundation (61403053), the Youth Science and Technology Innovation
Talents Project of Chongqing (cstc2013kjrc-qnrc40005,CSTC2013kjrc-
tdjs40010), the Science and Technology Project of Chongqing Municipal
Education Commission (KJ1400404).
(FrFT) in recent years [3][4].
In this paper, we present an approach of speech feature ex-
traction using PSO algorithm to search the optimal FrFT-
based spectrograms. The speech signals are processed by
FrFT to obtain spectrograms with different orders. These
FrFT-based spectrograms have given much more refined
description of the speech signals and composed the search-
ing space of particle swarm optimization (PSO) algorithm.
For reducing the computational complexity, these spectro-
grams are converted into low-dimensional vectors by local
binary patterns (LBP) operator. After getting the optimal
LBP vectors, the similarity criterion is exploited to find
the fractional orders corresponding to the optimal spec-
trograms. Completing the setting of radial basis function
(RBF) neural network, these optimal speech features are
fed to the network for training and testing. The compara-
tive experiments, using FrFT-based spectrograms and FFT-
based spectrograms to identify the speakers, are carried out
to evaluate the performance of our proposed approach.
2 FRFT-BASED SPECTROGRAM
STFT-based spectrogram is a simple but effective time-
frequency analysis tool which shows a lot of information
related with phonetic features closely. The STFT of a
speech signal x(n) is defined as
X(n, ω)=
∞
m=−∞
x(m) • ω (n − m)e
−jωm
(1)
where X(n, ω) denotes the frame-based signal conversion,
and ω (n) is the short-time window function which sliding
Proceedings of the 34th Chinese Control Conference
Jul
28-30, 2015, Han
zhou, China
3674