LOSSY AUDIO SIGNAL COMPRESSION VIA STRUCTURED SPARSE DECOMPOSITION
AND COMPRESSED SENSING
Sumxin Jiang, Rendong Ying, Zhenqi Lu, Peilin Liu and Zenghui Zhang
Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China
liupeilin@sjtu.edu.cn
ABSTRACT
In this paper, we propose a method for lossy audio signal
compression via structured sparse decomposition and com-
pressed sensing (CS). In this method, a least absolute shrink-
age and selection operator (LASSO) is employed to sparse
and structured decompose the audio signals into tonal and
transient layers, and then, both resulting layers are com-
pressed by a CS method. By employing a new penalty term,
which takes advantage of the structure information of trans-
form coefficients, the LASSO is able to achieve a better sparse
approximation of the audio signal than traditional methods
do. In addition, we propose a sparsity allocation algorithm,
which adjusts the sparsity between the two resulting layers,
thus improving the performance of CS. Experimental results
showed that the new method provided a better compression
performance than conventional methods did.
Index Terms— Compressed sensing, sparse approxima-
tion, audio compression, Lasso
1. INTRODUCTION
The ascending theory of compressed sensing (CS) [1], [2] is
a sub-Nyquist sampling strategy, which combines data acqui-
sition with data compression to enable a new generation of
signal acquisition scheme. This novel acquisition scheme op-
erates near the intrinsic information rate of the signal rather
than its ambient data rate [3], thus substantially surpassing
the limitations of classical Nyquist sampling theory. The CS
theory is constructed on the assumption that the signal has a
sparse or compressible linear representation in a predefined
dictionary. Therefore, the construction of an appropriate dic-
tionary is one of the key issues in CS theory.
With respect to the CS for audio signals [4], [5], finding
a dictionary, on which the audio signals can be well sparsely
represented, is usually the primary task. As the audio sig-
nals are time-varying and consequently can hardly be well
sparsely decomposed within a single orthogonal dictionary
[6], sub-optimal methods [7], [8], which achieve the best s-
parse approximation of the audio signal, are proposed in re-
This work was partially supported by the National Natural Science Foun-
dation of China under grant number 61171171 and 61102169.
cent years. Most of these methods are based on the struc-
ture properties of the audio signal in a certain transform do-
main. M. Kowalski and B. Torresani [9] found that the s-
parse and structured audio signal decomposition on dictionar-
ies can be achieved through explicit modeling in coefficient
domain. They reformulated the sparse decomposition of au-
dio signals as a regression problem, which can be resolved
using a least absolute shrinkage and selection operator (LAS-
SO) with mixed-norm constraints [10], [11]. In their work, a
family of structured shrinkage operators are implemented and
evaluated, such as Elitist LASSO (E-LASSO), Group LASSO
(G-LASSO), and Elitist-Group LASSO (EG-LASSO). How-
ever, these operators can hardly utilize the dependencies a-
mong the neighborhoods within different coefficient groups
(inter-dependencies). To exploit the inter-dependencies and
to introduce more flexibility in the coefficient domain model-
ing, the authors [12] further proposed the social sparsity con-
vex operators. With these social sparsity convex operators,
the audio signals, which exhibit obvious structures in the co-
efficient domain, can be efficiently and sparsely decomposed,
if a suitable set of weighted neighborhoods is selected.
In this paper, a new audio compression method, which
combines the social convex operators with CS theory, is pro-
posed. Audio signals are usually composed of tonal compo-
nents, which are sparse in time domain, and transient com-
ponents, which are sparse in frequency domain [13]. Consid-
ering these structure properties, we first use the social con-
vex operators to decompose the audio signals into tonal and
transient layers, and then, further compress the two resulting
layers using a CS method. As the convex operators can make
full use of the structure information in coefficient domain, the
obtained tonal and transient components will form a fine ap-
proximation to the original audio signal. Moreover, a new
weighted neighborhood window is proposed for the convex
operator, thus improving the performance of sparse decom-
position. In addition, because of the time-varying property of
audio signals, which leads to a non-constant ratio of the tonal
and transient layers, an algorithm is presented to allocate the
sparsity between the two layers, thus substantially improving
the performance of the CS method.
The organization of this paper is as follows. Section 2
introduces the structured shrinkage operators used for audio