PANG et al.: LPC FROM THEORETICAL ANALYSIS OF LCSs FOR LOCALLY LINEAR CLASSIFICATION 2939
TABLE I
S
OME NOTATIONS IN THIS PAPER
smooth function. We have for all x ∈ R
D
f (x) −
v∈C
γ
v
(x)f (v)
≤ αx − γ(x)
2
+ β
v∈C
|γ
v
(x)|v − γ(x)
1+p
. (2)
Lemma 1 indicates that the nonlinear function f (x)
is bounded by the weighted sum of the reconstruction
x − γ(x)
2
error and the locality |γ
v
(x)|v− γ(x)
1+p
error.
A common practice to define a coding scheme in (2)isto
assign a special value to p [36]. But the optimal value p is
difficult to be determined.
B. Local Gaussian Coding and Local Student Coding
If a sample x is closer to anchors v
m
, according to the prin-
ciple of locality, the local codings γ
m
(x) should be larger and
vice versa [27], [44]. Local Gaussian coding (LGC) and local
student coding (LSC) represent the heavy-tail decay function
and the short-tail one, respectively. More concretely, LGC
presumes that the relation between samples and anchors is
γ
lgc
v
(x; v,σ)∝ exp
−v − x
2
σ
2
(3)
where v (v ∈ R
D
) is an anchor, and the hyper-parameter σ con-
trols the weight decay speed to achieve locality. While LSC
uses Student t-distribution with one degree of freedom which
is the same as Cauchy distribution
γ
lsc
v
(x; v,σ) ∝
σ
2
+v − x
2
−1
(4)
where the hyper-parameter σ also controls the weight decay
speed. Student t-distribution has the nice property that
(σ
2
+v − x
2
)
−1
approaches an inverse square law for a
large pairwise distance v − x
2
.
Both LGC (3) and LSC (4) belong to explicit coding
scheme, as both e
−d
2
/σ
2
and (σ
2
+ d
2
)
−1
explicitly prede-
fine how local codings change with respect to the distances
d =v
m
− x (see Fig. 1).
1) Upperbound of Locality Error: Inspired by the idea
in [48], we theoretically compare the locality errors of dif-
ferent coding schemes to answer the problem: given several
different coding schemes, which one has a lower locality error
than the others?
Fig. 1. Decay ability of the two LCSs: e
−d
2
/σ
2
for (3)and1/(σ
2
+ d
2
)
for (4).
Theorem 1 (Locality Error of LGC): Let (γ , C) be an LGC
in (3)onR
D
, and let f be an (α, β, p)-Lipschitz smooth
function. For all v ∈ C, we denote 1 ≤v≤h, and
d
u
= max
v∈C
max
x
x − v.
Then the locality error in Lemma 1 has the following upper-
bound:
v∈C
|γ
v
(x)| v − γ(x)
1+p
≤
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎩
5h
2
+
2
|C|
d
2
u
σ
2
− 1
1 + p
2
if d
u
≥ σ,
5h
2
+
2
|C|
1 −
d
2
u
σ
2
1 + p
2
otherwise.
(5)
Theorem 1 discovers that the locality error of LGC is con-
trolled by the hyper-parameter σ : if the choice of the parameter
σ makes (d
2
u
/σ
2
)<1, we would obtain more lower error
than other ones; besides, d
2
u
<σ
2
means that there exists the
optimal hyper-parameter σ for LGC.
Theorem 2 (Locality Error of LSC): Let (γ, C) be an
LSC (4)onR
D
, and f be an (α, β, p)-Lipschitz smooth
function. For all v ∈ C, we denote 1 ≤v≤h, and
d
l
= min
v∈C
max
x
x − v, d
u
= max
v∈C
max
x
x − v
c
l
= min
v∈C
γ
lsc
v
(x)
σ
2
+v − x
2
,
c
u
= max
v∈C
γ
lsc
v
(x)
σ
2
+v − x
2
.
Then the locality error in Lemma 1 has the following upper-
bound:
v∈C
|γ
v
(x)|v − γ(x)
1+p
≤
5h
2
−
2c
l
d
2
l
c
u
|C|
σ
2
+ d
2
u
1 + p
2
. (6)
Theorem 2 indicates that if we reduce the hyper-
parameter σ , the locality error of LSC also decreases, because
the term −(2c
l
d
2
l
/c
u
|C|(σ
2
+ d
2
u
)) is always negative.
Example 1 (Drawback of Explicit Coding Scheme):
A simple example, illustrated in Fig. 2, reconstructs a