Regular paper
An improved vector quantization method using deep neural network
Wenbin Jiang
a,
⇑
, Peilin Liu
a
, Fei Wen
a,b
a
Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, PR China
b
Air Control and Navigation Institution, Air Force Engineering University, Xian 710000, China
article info
Article history:
Received 19 March 2016
Accepted 5 December 2016
Keywords:
Deep neural network
Vector quantization
Auto-encoder
Binary coding
abstract
To address the challenging problem of vector quantization (VQ) for high dimensional vector using large
coding bits, this work proposes a novel deep neural network (DNN) based VQ method. This method uses a
k-means based vector quantizer as an encoder and a DNN as a decoder. The decoder is initialized by the
decoder network of deep auto-encoder, fed with the codes provided by the k-means based vector quan-
tizer, and trained to minimize the coding error of VQ system. Experiments on speech spectrogram coding
demonstrate that, compared with the k-means based method and a recently introduced DNN-based
method, the proposed method significantly reduces the coding error. Furthermore, in the experiments
of coding multi-frame speech spectrogram, the proposed method achieves about 11% relative gain over
the k-means based method in terms of segmental signal to noise ratio (SegSNR).
Ó 2016 Elsevier GmbH. All rights reserved.
1. Introduction
Vector quantization (VQ) is a fundamental technique for data
compression, such as video coding and audio coding. In traditional
VQ methods, the k-means or Linde-Buzo-Gray (LBG) algorithm
[1,2] is most commonly used in codebook training (clustering).
However, when it comes to large vector dimensions and codebook
sizes, direct use of the VQ method suffers from a serious complex-
ity barrier. Some constrained VQ methods, such as Partitioned VQ,
are commonly used to reduce storage and computation complexity
[3,4]. Unfortunately, these compromised methods may severely
increase the coding error.
Recently, inspired by the success of deep neural network (DNN)
in data dimensionality reduction [5,6], DNN-based approaches
have been developed to address this problem [7–9].In[7], a deep
auto-encoder (DAE) with a binary coding layer was learned to code
the high-dimensional vector. In speech spectrogram coding, this
method showed a considerable performance gain over traditional
VQ technology. Nevertheless, when many of the activations of
the coding units are far from binary, quantifying them to binary
values may cause large distortions. In order to make the activations
of the coding layer as close to binary as possible, an effective
approach is to add Gaussian noise to the input of the coding layer
[8]. Another approach is to force the coding layer to be binary dur-
ing the forward pass in the fine-tuning [9]. All the above works
were aimed at getting a binary coding layer from the real-valued
activations of a DNN. In principle, quantifying a floating-point
value to a single bit would inevitably cause distortion.
More recently, in [10], the authors utilized the traditional VQ
method (k-means) as an initializer to learn a DNN-based vector
quantizer for acoustic information retrieval. The output of the vec-
tor quantizer is the codeword label obtained by the traditional VQ
method. The output layer of the neural network is a softmax layer
whose node number is the same as the codeword number. In fact,
this architecture is designed to learn speech content information
from the initializer. However, as mentioned by the authors, the
frame accuracy is not high (below 50%) for the training and devel-
opment set. Thus, this architecture is unsuitable for data compres-
sion applications. Moreover, it is generally impractical to
implement a VQ system with such an architecture when the num-
ber of coding bits (N) is large, since the number of the softmax out-
put layers nodes, which is equal to the codeword number, is 2
N
in
this case.
This work proposes a novel DNN-based VQ method to achieve
improved performance for quantizing high dimensional vector
with a large-size codebook. Firstly, we learn a DAE using greedy
layer-wise pre-training and back-propagation fine-tuning meth-
ods. Then, a DNN, which is initialized by the decoder network of
DAE and fed with the codes obtained by the traditional VQ method,
is trained as the VQ decoder. Unlike the DNN architectures using
binary coding layer in [7] and binary output layer in [10], the input
data of the proposed DNN architecture is binary. From the view of
VQ system, the method in [7] learns both an encoder and a
http://dx.doi.org/10.1016/j.aeue.2016.12.002
1434-8411/Ó 2016 Elsevier GmbH. All rights reserved.
⇑
Corresponding author.
E-mail addresses: jwb361@sjtu.edu.cn (W. Jiang), liupeilin@sjtu.edu.cn (P. Liu),
wenfei@sjtu.edu.cn (F. Wen).
Int. J. Electron. Commun. (AEÜ) 72 (2017) 178–183
Contents lists available at ScienceDirect
International Journal of Electronics and
Communications (AEÜ)
journal homepage: www.elsevier.com/locate/aeue