语音信号基频估计：变率粒子滤波器方法

154 浏览量更新于2024-08-27 收藏 682KB PDF 举报

"本文探讨了使用可变率粒子滤波器在语音信号中进行基本频率估计的方法，这是一种在非线性状态空间模型中追踪动态参数的强大贝叶斯推理方法。" 在语音信号处理领域，基本频率（Fundamental Frequency，通常称为基频）的估计是一个关键任务，对研究者和工业界都具有重要意义。基频对应于声带振动的频率，对于语音识别、情感分析以及语音合成等应用至关重要。传统的基频估计方法包括基于周期性的算法，如cepstrum分析和自相关函数等，但在噪声环境或非稳态语音中可能表现不佳。粒子滤波器（Particle Filter, PF）作为一种贝叶斯滤波技术，因其在处理非线性和非高斯动态系统中的优秀性能而受到广泛关注。在本研究中，作者Geliang Zhang和Simon Godsill提出了一种基于时间可变源-滤波器（source-filter model）的语音模型，并利用可变率粒子滤波器（Variable Rate Particle Filter, VRPF）来估计语音信号中的基频周期。这种模型考虑了语音生成过程中的时间变化特性，从而提高了估计的准确度。为了进一步提升性能，作者还实现了一种Rao-Blackwellised可变率粒子滤波器（RBVRPF）。Rao-Blackwell化是一种优化粒子滤波器的方法，它通过减少状态空间的维度来提高效率并降低计算复杂性。通过与现有的最先进的基频估计算法——YIN算法进行对比，模拟结果显示，即使在强背景噪声环境下，VRPF和RBVRPF也能提供更精确的基频估计。本文的索引术语包括：可变率粒子滤波器、基频估计以及Rao-Blackwell化。这些关键词强调了该研究的核心技术及其在解决语音信号处理挑战中的应用。通过VRPF和RBVRPF的引入，研究为在复杂环境下的基频估计提供了新的解决方案，对提高语音处理系统的性能有着积极的推动作用。

892 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 5, MAY 2016

Algorithm 1.

Goal: Tracking T

1:n

which are contained in x

1:t

,giveny

1:t

and T

1) Initialize {x

(i)

}

i=1

. Set up all the ﬁxed hyperparame-

ters and {w

(i)

}

i=1

. To initialize {x

(i)

}

i=1

,usethe

joint estimation technique based on the ﬁrst period of

speech data y

1:T

, according to (14), see later. Then sam-

ple {s

(i)

}

i=1

based on a

, A

, B

according to (2)

and (5). Set P

(i)

= T

2) for t =1:t

end

a) for i=1:N do

i) Set n

(i)

= n

(i)

(t−1)

ii) While t>P

(i)

A) Add a new pitch period, n

(i)

← n

(i)

+1.

B) Sample a new pitch period and other

coefﬁcients:

(i)

∼ U(max[T

(i)

t−1

− σ

low

min[T

(i)

t−1

+ σ

upp

]),

(i)

(i),p

∼N(a

(i)

−1

(i),p

,σ

a,p

(i)

(i),k

∼N(A

(i)

−1

(i),k

,σ

A,k

(i)

(i),k

∼N(B

(i)

−1

(i),k

,σ

B,k

C) Update P

(i)

← P

(i)

+ T

(i)

iii) Sample new signal value: s

based on

(i)

i,p

(i)

i,k

(i)

i,k

(i)

as (2) and (5). Now

(i)

1:t

]=[x

(i)

1:t−1

(i)

], as deﬁned in (34).

iv) Compute importance weight w

(i)

of each

particle:

(i)

∝ w

(i)

t−1

p(y

(i)

,σ

b) end for

c) Renormalize ˜w

(i)



i=1

(i)

, i = 1, 2,...,N.

d) If t = k ∗ BlockSize, where k is a positive integra,

i) Resample {x

(i)

1:t

}

i=1

when N

eff

<N/2. N

eff

denotes the effective sample size and is calcu-

lated as N

eff

=1/



i=1

˜w

(i)

ii)



i=1

˜w

(i)

3) end for

track them [17]. For example, in the case of modeling input

sources as almost periodic signals, if 20 harmonics are assumed

to be existing in the input source (K =20) and a 11-order AR

model is used (M =11), the number of parameters involved

in the whole model is 2 ∗ K + M +1, which is 52. In order

to estimate a 52-dimensional vector using a moderate number

of particles, for example, 1000 particles, it will be necessary

to have a good initialization method at the beginning of the

algorithm to make the particle ﬁlter work.

B. Joint Source-Filter Estimation Method

It has been proposed in [8] that a joint source-ﬁlter optimiza-

tion approach can be used to estimate glottal ﬂow using the LF

model of the glottal ﬂow derivative when the input source is

modeled as glottal pulses. It is suggested in our paper that after

some modiﬁcation on the model used in the input source, this

joint source-ﬁlter optimization approach can be also applied

here when the input source is modeled as almost periodic sig-

nals as a joint source-ﬁlter estimation technique to initialize the

parameters used in the whole model. Details of how this tech-

nique can be modiﬁed to apply when input sources are modeled

as almost periodic signals here are described in Appendix A.

Here we just display the results. If we write the parameters of

the proposed almost periodic source-ﬁlter model except for T

i.e., {a

}

p=1:M, k=0:K

(upper index n

omitted here,

see Appendix A), into a vector a, where

a =



,...,a

,...,A

,...,B



(13)

Then it is possible to jointly estimate the parameters in a using

the following equation:

a = R

−1

p (14)

where

R =



−R



(15)

where

⎛

⎜

⎝

(1, 1) ... C

(M,1)

. ...

(1,M) ... C

(M,M)

⎞

⎟

⎠

(16)





(17)

where

⎛

⎜

⎝

(0, 1) ... C

(0, 1)

. ...

(0,M) ... C

(0,M)

⎞

⎟

⎠

(18)

and

⎛

⎜

⎝

(0, 1) ... C

(0, 1)

. ...

(0,M) ... C

(0,M)

⎞

⎟

⎠

(19)

⎛

⎜

⎝

0,0

(0, 0) ... C

0,2K+2

(0, 0)

1,0

(0, 0) ... C

1,2K+2

(0, 0)

. ...

2K,0

(0, 0) ... C

2K,2K+1

(0, 0)

2K+1,0

(0, 0) ... C

2K+1,2K+1

(0, 0)

⎞

⎟

⎠

(20)

剩余10页未读，继续阅读

weixin_38620839

粉丝: 8
资源: 938

语音信号基频估计：变率粒子滤波器方法

Harvest A high-performance fundamental frequency estimator.pdf

Fundamental frequency_frequency_源码

Fundamental_of_Speech_Recognition_-_Lawr.pdf_speechrecognition_源

Enhancement of the surface emission at the fundamental frequency and the transmitted high-order harmonics by pre-structured targets

Fundamental of speech recognition

Fundamental Frequency Tracking through Comb (Notch) IIR Filtering：实现了 Tan 和 Jiang 在 IEEE 信号处理杂志 (11/2009) 中描述的算法。-matlab开发

fundamental of statistical signal processing estimation theory

Fundamental even leaky mode in microstrip line loaded with shorting vias

Fundamental-Analysis-Strategy-with-RNN-Modeling

Frequency doubling with periodically poled KTiOPO4 at the fundamental wave of cesium D2 transition

最新资源