深度神经网络优化的矢量量化方法

145 浏览量更新于2024-08-26 收藏 720KB PDF 举报

"本文提出了一种使用深度神经网络改进的矢量量化方法，旨在解决高维向量的大比特编码问题。这种方法将k-means矢量量化器作为编码器，而深度神经网络（DNN）作为解码器。解码器初始化自深度自编码器的解码网络，并通过k-means矢量量化器提供的代码进行训练，以最小化VQ系统的编码错误。实验在语音谱图编码中验证了该方法，相比于传统的k-means方法和近期提出的DNN方法，它能显著降低编码错误。" 本文是一篇研究论文，主要探讨了如何利用深度学习技术改进矢量量化（Vector Quantization, VQ）的过程，特别是在处理高维度数据和大比特编码时遇到的挑战。VQ是一种常见的数据压缩技术，常用于图像、语音等信号处理中，它将连续的信号空间离散化为一系列的码书（codebook）中的向量。传统上，k-means聚类算法常被用来执行矢量量化，即将输入向量分配到最近的码书中心，形成编码。然而，这种方法在处理高维数据和需要精细编码的情况下可能会出现编码误差较大、效率较低的问题。为了解决这些问题，作者提出了一种结合深度神经网络的新方法。他们将k-means矢量量化器作为编码阶段，生成初步的编码，然后使用深度神经网络作为解码器。深度神经网络在这里的作用是学习从这些初步编码中恢复原始数据，以减少编码过程中的信息损失。值得注意的是，解码器的初始结构来自深度自编码器（Deep Auto-Encoder, AE）的解码网络。自编码器是一种无监督学习模型，能够学习数据的压缩表示并进行解压，从而在重构过程中尽可能地保持原始数据的特性。在训练过程中，解码器接收k-means编码的输出，并通过反向传播算法进行优化，目标是减少VQ系统的编码误差。这种策略使得解码器可以学习到更复杂的非线性映射，提高解码质量和效率。实验部分，作者将新方法应用于语音谱图的编码，对比了k-means和基于DNN的现有方法。结果显示，提出的深度学习驱动的VQ方法在降低编码误差方面具有显著优势，这表明该方法在高维数据的压缩和传输中可能具有更好的性能和潜力。这篇论文提供了一种创新的VQ方法，通过深度学习优化了编码和解码过程，尤其适用于高维数据场景。这种方法不仅提升了编码质量，还可能对语音通信、图像压缩和其他领域产生积极影响。

展开

Regular paper

An improved vector quantization method using deep neural network

Wenbin Jiang

⇑

, Peilin Liu

, Fei Wen

a,b

Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, PR China

Air Control and Navigation Institution, Air Force Engineering University, Xian 710000, China

article info

Article history:

Received 19 March 2016

Accepted 5 December 2016

Keywords:

Deep neural network

Vector quantization

Auto-encoder

Binary coding

abstract

To address the challenging problem of vector quantization (VQ) for high dimensional vector using large

coding bits, this work proposes a novel deep neural network (DNN) based VQ method. This method uses a

k-means based vector quantizer as an encoder and a DNN as a decoder. The decoder is initialized by the

decoder network of deep auto-encoder, fed with the codes provided by the k-means based vector quan-

tizer, and trained to minimize the coding error of VQ system. Experiments on speech spectrogram coding

demonstrate that, compared with the k-means based method and a recently introduced DNN-based

method, the proposed method signiﬁcantly reduces the coding error. Furthermore, in the experiments

of coding multi-frame speech spectrogram, the proposed method achieves about 11% relative gain over

the k-means based method in terms of segmental signal to noise ratio (SegSNR).

1. Introduction

Vector quantization (VQ) is a fundamental technique for data

compression, such as video coding and audio coding. In traditional

VQ methods, the k-means or Linde-Buzo-Gray (LBG) algorithm

[1,2] is most commonly used in codebook training (clustering).

However, when it comes to large vector dimensions and codebook

sizes, direct use of the VQ method suffers from a serious complex-

ity barrier. Some constrained VQ methods, such as Partitioned VQ,

are commonly used to reduce storage and computation complexity

[3,4]. Unfortunately, these compromised methods may severely

increase the coding error.

Recently, inspired by the success of deep neural network (DNN)

in data dimensionality reduction [5,6], DNN-based approaches

have been developed to address this problem [7–9].In[7], a deep

auto-encoder (DAE) with a binary coding layer was learned to code

the high-dimensional vector. In speech spectrogram coding, this

method showed a considerable performance gain over traditional

VQ technology. Nevertheless, when many of the activations of

the coding units are far from binary, quantifying them to binary

values may cause large distortions. In order to make the activations

of the coding layer as close to binary as possible, an effective

approach is to add Gaussian noise to the input of the coding layer

[8]. Another approach is to force the coding layer to be binary dur-

ing the forward pass in the ﬁne-tuning [9]. All the above works

were aimed at getting a binary coding layer from the real-valued

activations of a DNN. In principle, quantifying a ﬂoating-point

value to a single bit would inevitably cause distortion.

More recently, in [10], the authors utilized the traditional VQ

method (k-means) as an initializer to learn a DNN-based vector

quantizer for acoustic information retrieval. The output of the vec-

tor quantizer is the codeword label obtained by the traditional VQ

method. The output layer of the neural network is a softmax layer

whose node number is the same as the codeword number. In fact,

this architecture is designed to learn speech content information

from the initializer. However, as mentioned by the authors, the

frame accuracy is not high (below 50%) for the training and devel-

opment set. Thus, this architecture is unsuitable for data compres-

sion applications. Moreover, it is generally impractical to

implement a VQ system with such an architecture when the num-

ber of coding bits (N) is large, since the number of the softmax out-

put layers nodes, which is equal to the codeword number, is 2

this case.

This work proposes a novel DNN-based VQ method to achieve

improved performance for quantizing high dimensional vector

with a large-size codebook. Firstly, we learn a DAE using greedy

layer-wise pre-training and back-propagation ﬁne-tuning meth-

ods. Then, a DNN, which is initialized by the decoder network of

DAE and fed with the codes obtained by the traditional VQ method,

is trained as the VQ decoder. Unlike the DNN architectures using

binary coding layer in [7] and binary output layer in [10], the input

data of the proposed DNN architecture is binary. From the view of

VQ system, the method in [7] learns both an encoder and a

http://dx.doi.org/10.1016/j.aeue.2016.12.002

⇑

Corresponding author.

E-mail addresses: jwb361@sjtu.edu.cn (W. Jiang), liupeilin@sjtu.edu.cn (P. Liu),

wenfei@sjtu.edu.cn (F. Wen).

Int. J. Electron. Commun. (AEÜ) 72 (2017) 178–183

Contents lists available at ScienceDirect

International Journal of Electronics and

Communications (AEÜ)

journal homepage: www.elsevier.com/locate/aeue

下载后可阅读完整内容，剩余5页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

weixin_38592405

粉丝: 6

深度神经网络优化的矢量量化方法

Improved feature processing for Deep Neural Networks

Improved relation classification by deep recurrent neural networks

An Improved Initial Quantization Parameter Setting Algorithm

An Improved Differential Evolution Trained Neural Network Scheme for Nonlinear System Identification

An Improved Dynamic Frequency Scaling Approach for Energy Saving Based RBF Neural Network

An improved kernel regression method based on Taylor expansion

An Improved Beamforming Method with Compressive Sensing for Interference Suppression

Scanner color management model based on improved back-propagation neural network

An Improved Decision Tree Algorithm Using Rough Set Theory in Clinical Decision Support System

Wind Prediction Based on Improved BP Artificial Neural Network in Wind Farm

最新资源