SPEECH WATERMARKING BASED ON ROBUST PRINCIPAL COMPONENT ANALYSIS
AND FORMANT MANIPULATIONS
Shengbei WANG, Weitao YUAN, Jianming WANG
∗
School of Computer Science
& Software Engineering
Tianjin Polytechnic University
Binshuixi Road, Xiqing District, Tianjin, China
Masashi UNOKI
†
School Information Science
Japan Advanced Institute of Science
and Technology
1-1 Asahidai, Nomi, Ishikawa, Japan
ABSTRACT
This paper proposes a watermarking method for speech signals
based on Robust Principal Component Analysis (RPCA) and for-
mant manipulations. As the spectrogram of speech has a rela-
tively sparse structure, the core information of speech is extracted
into a sparse matrix using RPCA so that formants can be esti-
mated with Linear Prediction (LP) more accurately even under
noise/interferences, which significantly improves the robustness of
proposed method. We investigate how the formants can be con-
trolled and manipulated to make the watermarking method effective.
Watermarks are embedded into speech by controlling the shape and
power of formants using the stable and robust parameter, i.e., line
spectral frequencies (LSFs). Evaluations regarding inaudibility and
robustness are carried out and the results suggest that the proposed
method can not only satisfy inaudibility but also provide good ro-
bustness against general processing and different speech codecs
which is better than the other methods.
Index Terms— Robust principal component analysis, Linear
prediction, Formant, Line spectral frequencies, Robustness
1. INTRODUCTION
Speech signal is an important information carrier in many social ap-
plications such as WeChat and GoogleTalk. However, modern digi-
tal technologies have put the security of speech at risk. Watermark-
ing is a promising solution to protect speech signals. A general
watermarking should be inaudible to human perception, blind for
watermark extraction, and robust against signal processing/codecs.
However, there is a trade-off among these competitive requirements,
e.g., robustness is usually improved at the expense of inaudibility,
and vice versa. Therefore, how to realize desired watermarking is
still a challenging problem. This work focuses on exploring inaudi-
ble, blind, and robust speech watermarking.
There has been significant research into speech watermarking re-
cent years. A typical category of watermarking focuses on exploring
the characteristics of human auditory system (HAS) for inaudibility
[1, 2]. For instance, watermarks can be embedded into the phase
of speech based on fact that HAS is not sensitive to slight phase
modifications [3, 4]. Quantization index modulation (QIM) [5, 6]
based methods form another category of watermarking, where a lot
∗
Thanks to grant No. 2017KJ089, Natural Science Foundation of Tianjin
(No. 17JCQNJC00100 and No. 16JCYBJC41500), and National Natural
Science Foundation of China (No. 6137104 and No. 61602344) for funding.
†
This work was also supported by a Grant-in-Aid for Scientic Research
(B) (No. 17H01761) and I-D DATA foundation.
of efforts have been devoted to selecting suitable features to balance
inaudibility and robustness. Spread spectrum is a well-known tech-
nique which is widely employed for robust watermarking [7, 8, 9].
Aside from these categories, hybrid watermarking [10, 11, 12] has
been verified to have superior performance in robustness since wa-
termarks are doubly embedded which enables them to be reliably ex-
tracted. Despite these achievements, many existing methods cannot
reach a balance between inaudibility and robustness. In particular,
robustness against codecs is highly desired for speech watermark-
ing while many methods are not completely robust against different
speech codecs.
A common problem in watermarking field is that many meth-
ods can extract the watermarks in ideal situations (without noise/ in-
terferences), but when there are noise/interferences in watermarked
signal, the embedded watermarks will fail to extract which leads to
weak robustness. We previously proposed two formant enhancement
based watermarking methods [13, 14]. However, their robustness
against speech codecs was not satisfactory, e.g., [13] was not robust
against any speech codecs and [14] was not robust against G.729 at
high capacities. This paper proposes a speech watermarking method
based on robust principal component analysis (RPCA) and formant
manipulations. RPCA is employed to extract the core information in
speech so that formants can be estimated correctly even under inter-
ferences caused by speech processing and codecs. Watermarks are
embedded into the formants of relatively low power by controlling
line spectral frequencies (LSFs) to maintain the speech quality. The
main contribution of this paper is that RPCA is introduced to water-
marking for the first time and the introducing of RPCA can signifi-
cantly attenuate the influence of various interferences in watermark
extraction process which improves the robustness. The effectiveness
of proposed method is demonstrated in the experiments.
2. PROPOSED METHOD
Linear Prediction (LP) is popular for separating the vocal tract and
excitation information in the source-filter model of speech produc-
tion. The LP coefficients derived from LP can provide important
information of acoustic feature, i.e., formants. Nevertheless, when
speech is smeared by interferences such as background noise and re-
verberation, the estimated LP envelope and formants will be much
distorted. As the proposed method embeds watermarks into for-
mants, it is necessary to make sure that formants could be correctly
estimated even under interferences.
In general, speech varies significantly and continuously over
time and its power concentrates on formants, thus the spectrogram
of speech has a relatively sparse structure. Based on this fact, some
2082978-1-5386-4658-8/18/$31.00 ©2018 IEEE ICASSP 2018