基于稀疏表示的手机源验证：SCUTPHONE语音数据驱动

164 浏览量更新于2024-08-26 收藏 1.25MB PDF 举报

本文探讨了在数字媒体取证领域中的一个重要新兴研究课题——源手机验证。传统的研究主要关注源记录设备的识别问题，而本文则着重填补了源手机验证这一领域的空白。作者提出了一种新颖的方法，即利用稀疏表示技术来增强手机验证系统的性能。稀疏表示是一种在信号处理和机器学习中广泛应用的概念，它强调通过最小化信号在某种基下的系数数量来表达数据，这有助于捕捉数据的本质特征并提高模型的效率。在手机验证的背景下，稀疏表示有助于提取语音记录中与特定手机相关的独特模式。作者提出了三种不同的稀疏表示方案：首先，使用示例词典，这种方法依赖于预先收集的特定手机的声音样本，能够识别个体手机的独特声学特性；其次，无监督学习词典，通过聚类或自组织映射等技术，自动学习设备间的差异，尽管可能牺牲部分判别能力；最后，监督学习词典，采用如支持向量机或深度神经网络等算法，结合大量标记数据，既保证了代表能力又提高了判别能力。特别提到了基于MFCC（Mel频率倒谱系数）的高斯超向量（GSV），这是一种常用的技术，能有效地捕捉语音记录中固有的设备特性，用于构建和优化字典。MFCC是一种将音频信号转换为易于分析的特征表示，而高斯超向量则结合了统计特性，进一步提升了特征的区分度。实验部分，作者使用了名为SCUTPHONE的数据集，该数据集包含15部不同手机的语音样本，验证了所提方法的有效性。通过对来自手机的三种语音记录进行评估，结果表明稀疏表示法在源手机验证任务上具有显著优势。此外，文章还深入探讨了示例词典中目标样本数量和无监督学习词典大小对验证性能的影响，这对于优化实际应用中的系统参数至关重要。这篇研究论文不仅扩展了源记录设备验证的研究范围，而且引入了稀疏表示作为有效的工具，为手机身份验证提供了一种新的、基于统计学习的方法。这不仅在理论上推动了数字媒体取证领域的进展，也为实际应用提供了有价值的参考。

L. Zou et al. / Digital Signal Processing 62 (2017) 125–136 127

Fig. 1. Block diagram of source cell phone veriﬁcation scheme based on sparse representation and exemplar dictionary.

show that the proposed scheme outperforms the exemplar dic-

tionary

based scheme, the unsupervised learned dictionary (here

K-SVD) based scheme and other two baseline methods. In addi-

tion,

we also analyze the inﬂuences of number of target examples

in exemplar dictionary and size of learned dictionary (by K-SVD)

on source cell phone veriﬁcation performance.

The

rest of this paper is organized as follows. Section 2 de-

scribes

the method for extracting recording device intrinsic ﬁnger-

print.

Section 3 presents the sparse representation based source

cell phone veriﬁcation schemes. Experimental setup and results

are provided in Section 4. Finally, conclusions and future work are

given in Section 5.

2. Recording device characterization

Over the last decade, various features were utilized to cap-

ture

the intrinsic characteristics of the recording devices. Gener-

ally

speaking, these features can be brieﬂy grouped into three

categories: time domain, frequency domain and cepstral domain.

Speciﬁcally, mel-cepstral domain feature like MFCCs reported good

performance on source recording device recognition [26,27,44–46].

GSV, which is a high-dimensional vector (a.k.a. supervector) based

on low-dimensional feature vector (e.g., MFCCs), has been suc-

cessfully

applied to represent the intrinsic ﬁngerprint of recording

device [26]. The signals in the speech recordings contain informa-

tion

not only related to recording device but also related to the

speech content such as speaker and linguistic information. It can

be deemed as the frequency response of the device contextual-

ized

by the speech content. GSV reduces the effects of the speech

content variability utilizing a statistical characterization of the fre-

quency

domain information of the contextualized signals.

The extraction procedure for GSV from a speech recording is

summarized as follows: Suppose that λ

UBM

={ω

, μ

, 

}

is a

diagonal covariance universal background model (UBM) with M

mixture

components, given a speech recording and the feature

vectors (here MFCCs) extracted from it, X ={x

}

, the corre-

sponding

GMM is adapted from the UBM by adaptation of the

means through maximum a posteriori (MAP) [67,68]. More specif-

ically,

after computing the suﬃcient statistics for the weight and

mean parameters of mixture i as n



P (i|x

) and E

(x) =



P (i|x

respectively, the ith adapted mean vector μ

computed as a weighted sum of the suﬃcient statistics for the

mean and the UBM mean: μ

= α

(x) + (1 − α

)μ

UBM

. Here,

is a data-dependent adaptation factor. It is deﬁned as α

/(n

+ r) where r is a ﬁxed relevance factor. Suppose that λ

{

, μ

, 

}

and λ

={ω

, μ

, 

}

are the means adapted

GMMs corresponding to two speech recordings. The Kullback–

Leibler

(KL) divergence kernel is then deﬁned as the corresponding

inner product of the GMM mean supervectors which is a concate-

nation

of the weighted GMM mean vectors [69]:

K (λ

,λ

) =



i=1



√



−1/2





√



−1/2



(1)

where M is the number of mixture components.

3. Cell phone veriﬁcation by sparse representation

3.1. Scheme based on exemplar dictionary

We ﬁrst present the exemplar dictionary based source cell

phone veriﬁcation scheme. The corresponding block diagram is

shown in Fig. 1. During the veriﬁcation process, for a claimed

device, N

target training examples (here GSV), represented as

}

i=1

, are placed together to construct D

=[a

, a

, ···, a

] ∈

M×N

. At the same time, select N

non-target background exam-

ples,

represented as {a

}

i=1

, from the background supervectors to

construct D

=[a

, a

, ···, a

] ∈ R

M×N

and satisfy N

 N

Thus, the exemplar dictionary is constructed by incorporating D

and D

D =[D

]

, a

, ···,a

, a

, ···,a

]∈R

M×N

(2)

where N = N

+ N

. Note that M < N should be satisﬁed for ob-

taining

a redundant and overcomplete dictionary. The atoms in

dictionary D are normalized to unit 

-norm as in [55]. Then, given

a test vector y ∈ R

with unit 

-norm and suppose that y can be

linearly represented with respect to D as

y = Dx =[D

]





(3)

where x is the coeﬃcient vector. Fig. 2 shows an example of sparse

coeﬃcient vectors for target and non-target trial. If y belongs to a

valid test, i.e., it comes from a speech recording recorded by the

claimed device, it will approximately lie in the linear span of the

columns of D

. Thus, the non-zero entries of coeﬃcient vector x

associated

with D

(i.e., x

) will be large compared to the non-zero

entries of coeﬃcient vector x associated with D

(i.e., x

) as shown

in Fig. 2(a). On the other hand, if y belongs to an invalid test, i.e.,

it comes from a speech recording, which is not recorded by the

claimed device, the coeﬃcient vectors will be sparsely distributed

across D

and D

as shown in Fig. 2(b). The sparse solution to

(3) can be obtained by solving the following optimization problem

[66]:

剩余11页未读，继续阅读

weixin_38643269

粉丝: 2

基于稀疏表示的手机源验证：SCUTPHONE语音数据驱动

自适应残差稀疏表示提升语音降噪效果

稀疏表示与冗余理论在信号图像处理中的应用源代码解析

自适应残差驱动的稀疏表示语音降噪新策略

通过稀疏表示和KISS度量从语音记录中匹配源手机

稀疏低秩自回归AR模型在fmri时间序列分析中的应用

【性能评估指南】：在sparseLab中进行稀疏信号处理的性能分析

【从理论到实践】：揭秘语言模型在语音识别中的真实角色

【自然语言处理的验证误区】：验证集在NLP模型验证中的正确应用与技巧

LSTM在语音识别中的应用突破：创新与技术趋势

端到端模型构建：Transformer在自动语音识别中的革命性应用

最新资源