žCª‘\A3i-vector`{<(@XÚ¥í25ïÄ
, o7U§À§x•
∗
˜u&E‰ÆEâI[¢¿EâM#ÚmuÜŠÑÚŠóEâ¥%
˜uŒÆ&EEâïÄŠÑÚŠóEâ¥%
˜uŒÆOŽÅ‰Æ†EâX
v-jwua@microsoft.com,liltcslt@hotmail.com,wangdong99@mails.tsinghua.edu.cn
∗
ÏÕŠö:x•§Ç§E-mailµfzheng@tsinghua.edu.cn
Á‡
rÌXê(MFCC)´`{<(@XÚ~^Aëê
ƒ˜§Ù¥Q•¹`{<&E§••¹`{SN! &
A5Ù§&E" žCª‘\•{ÏLéMFCCA¥
ª‘(Fbank)?1\§Œ±âÑ`{<«©5p
ª‘©þ§fz«©5$ª‘©þ§Ïk|uJ
p`{<(@XÚ5U" c骑\•{ïÄ
Œõ3GMM-UBM .µe¥?1§©ò3i-vector
.µe¥ïĪ‘\•{3ªêâ¥þ*Ð5" ·
‚uy§dui-vector.=†Ý3ÆSL§¥ÃiÒ
5§ª‘\MFCC A3Äu{uål‹©D
Úi-vector XÚ¥Øäk` ³§\\LDAÚPLDA«
©5.±§TA3ª‘þ`z5Uwª|^å
5§l 4ŒJp(@XÚ£O5U"Óž§·‚
¢L²§žCª‘\•{3i-vector XÚ¥äkr
*Ð5§¦^ªŠóêâ¥ÆS\ëꌱS/
A^3Äu,˜êâ¥Ôöi-vector XÚ¥"
'…c: `{<(@§i-vector§ª‘\
Abstract
MFCC is one of the most popular features used in speaker ver-
ification§it involves not only speaker information, but also in-
formation of contents and channels. A session-aware Fbank
weighting approach has been proposed, where the Fbanks that
are more sensitive to session variance are de-weighted so that s-
peaker discriminative banks are given prominence. Most of the
current researches on Fbank weighting are within the GMM-
UBM framework. In this paper, we study the contribution of F-
bank weighting in the state-of-the-art i-vector architecture. We
found that, due to the unsupervised learned loading matrix in
the i-vector model, Fbank weighting shows no advantages in
i-vector systems, if the simple cosine-distance scoring is used.
However, when discriminative models such as LDA/PLDA are
applied, the advantage of Fbank weighting can be recovered,
which leads to significant performance improvement. Mean-
while we verified that weighting parameters are well generaliz-
able: the parameters trained with a small bilingual database can
be applied successfully in another i-vector system trained with
a large multi-channel database.
Index Terms: speaker verification§ i-vector§ frequency-
weighting
1. Úó
‘X²LuÐÚ<‚¬)¹FÃE,§ÏL)
ÔAé1•<?1°y5É-À" `{<
(@´ÏL<(Ñé`{<°?1yEâ§Ï
ÙŒ‚5Ú´^52•A^u7K! S!Ñ\¸+n
I‡î‚°y+•" ˜‡Ä`{<(@X
Ú•)ŠÑAJ! `{<.ïÚ`{<(«(
@n‡Äü§Ù¥ŠÑAJü̇óŠ´l
ŠÑ&Ò¥J‡N`{<°A5&EA•þ§é
XÚ5Uk-‡K•" 8c~^AJ•{k‚
5ÌýÿXê£LPCC§Linear Predictive Cepstral Coeffi-
cients¤[1]§rÌXê(MFCC, Mel-Frequency Cepstral
Coefficients)[2][3] Úa•‚5ýÿ£PLP, Perceptual Linear
Predictive¤[4]§Ù¥±MFCCAA^••2•"
•,MFCCA3`{<(@¥éŒ¤õ§
ù˜Äuáž©ÛAäk²w"€§AO´MFCCA
¥Q•¹`{<&E§••¹`{SN&EÚ&&
E§ù¦ÄuMFCC `{<(@7L•6 •žm
ÔöÚ£Oêâ" •Jp`{<(@XÚ5U§˜‡
-‡ïÄ••´éMFCCA?1?§âÑÚ`{<
A5ƒ'¤°§~fÚ`{<؃'¤°§lJ
pMFCCAé`{<«©5" Lu ÚDang[5][6]l`{
<u(ÅnÑu§ïĪǤ©Ú(.ƒm'X§
uy(€ÚgG¶äk²w`{<ÉuÑ|„Ì
‡K•100Hz400Hz $ª«•±94,000Hz 5,000Hz
pª«•" Äud§¦‚Jј«F-ratio OK5OŽØ
Óª‘é`{<«©U姿•d鈪‘?1\"
Wang<é¢SA^¥ÑyŠÑžC¯K§•Äžm
Cz&E5?ª‘\ëê [8]" ¦‚Äk¹›˜‡
žCê⥧éÓ˜`{<3{žocžmpéÓ˜|
éf?1-E¹Ñ [12]"ÏLéùêâ©Û§Wang
<uy(¢kª‘3ªžmæþCÄŒ§,˜
ª‘Cħ¦‚òùÚO(JŠ•ëê骑?1
\§'üX|^`{<&E•ÐJ"Š•
Ñ´§þã骑\•{ïÄ„Ø¿©§AO´
骑\ëêŒ*Ð5ïÄ„vk9µ Äk§þã
¢þ3GMM-UBM .µee?1§3Ù§`{<(@
•{þ5U„vky; Ù§cã¢êâ8§
©O•35 <[5] Ú60 <[8]§3Œêâ8þy„vk?
1; 1n§c¢¥ª‘\ëê†`{<.Ñ3
Ó˜ê⥥Ôö§T•{3ªêâ8þ5Uk–
y"
©3Wang[8]<óŠÄ:þ§yÄužC&
Eª‘\XêŒ*Ð5" Äk§·‚yT•{
3i-vector.þk5" i-vector•{[9] ´DÚGMM-
UBM •{*ЧÏLò`{<.N•$‘˜m
˜‡“L•þ§ŽÑGMM-UBMXÚ¥pd©þpƒÕ
áÛ•5§JpXÚ5U" i-vector•{Œ±†‚5
«©5©Û£Linear Discriminate Analysis, LDA¤[9]½ö
VÇ‚5û©Û£Probabilistic Linear Discriminate Anal-
ysis, PLDA¤[10]ƒ(ܧ?˜ÚJp`{<(@XÚ
O(5" dui-vectorXÚéGMM¥pd©þþŠ•þ
?1$‘˜mN, (ÜLDA/PLDAƒ§q?˜ÚO
\«©5‚5N§ù‘5Xe¯Kµª‘\•{
3i-vectorXÚ¥´Ä•,äk3GMM-UBMXÚ¥ƒq
zº©òÏL¢éd?1ïÄ" 1§©ò
3i-vectorµeeïĪ‘\•{3Œ5ª&êâ