Density weighted support vector data description
Myungraee Cha, Jun Seok Kim, Jun-Geol Baek
⇑
School of Industrial Management Engineering, Korea University, Anam-dong, Seongbuk-gu, 136-701 Seoul, Republic of Korea
article info
Keywords:
One-class classification (OCC)
Support vector data description (SVDD)
Density weighted SVDD (DW-SVDD)
k-Nearest neighbor approach
abstract
One-class classification (OCC) has received a lot of attention because of its usefulness in the absence of
statistically-representative non-target data. In this situation, the objective of OCC is to find the optimal
description of the target data in order to better identify outlier or non-target data. An example of OCC,
support vector data description (SVDD) is widely used for its flexible description boundaries without
the need to make assumptions regarding data distribution. By mapping the target dataset into high-
dimensional space, SVDD finds the spherical description boundary for the target data. In this process,
SVDD considers only the kernel-based distance between each data point and the spherical description,
not the density distribution of the data. Therefore, it may happen that data points in high-density regions
are not included in the description, decreasing classification performance. To solve this problem, we pro-
pose a new SVDD introducing the notion of density weight, which is the relative density of each data
point based on the density distribution of the target data using the k-nearest neighbor (k-NN) app roach.
Incorporating the new weight into the search for an optimal description using SVDD, this new method
prioritizes data points in high-density regions, and eventually the optimal description shifts to these
regions. We demonstrate the improved performance of the new SVDD by using various datasets from
the UCI repository.
Ó 2013 Elsevier Ltd. All rights reserved.
1. Introduction
One-class classification (OCC) is a response to the data classifi-
cation problem in which there is an absence of suitable negative
cases that can be used for training. The subject of a great deal of
past research, OCC thus aims to find the best description of a data
set using only objects from one class, known as the target data. If
the target data is described accurately, it can be used to classify
other classes when there are insufficient non-target data (Tax &
Duin, 2002); thus, OCC has attracted attention for use in excep-
tional situations where it is difficult to gather datasets for other
classes or where no other classes exist (Khan & Madden, 2010;
Mazhelis, 2006).
Support vector data description (SVDD) is a widely used exam-
ple of OCC. The objective of SVDD is to find a set of support vectors
(SVs) describing the spherical boundary of the target data by map-
ping it into high-dimensional feature space. Since the process oc-
curs in feature space, SVDD has a flexible description boundary.
SVDD has been developed from support vector machines as a
way to compensate for weaknesses in previous OCC research. Be-
fore the use of support vectors, many classification methods were
based on the estimation of the probability distribution of the target
data set, and this produced severe limitations for data sets that did
not follow a specific distribution (Tax & Duin, 2002). In contrast,
SVDD is easily applicable to data generated in the real world with
no assumptions regarding the data distribution (Grinblat, Uzal, &
Granitto, 2013; Sjöstrand, Hansen, Larsson, & Larsen, 2007).
In addition to not requiring assumptions of data distribution,
SVDD is also used in various fields for its flexible description
boundaries. In terms of feature extraction methodology, it can be
used to produce a representative set of target data for image retrie-
val (Lai, Tax, Duin, Pe˛kalska, & Paclík, 2004), facial images (Lee,
Park, & Lee, 2006), and pattern recognition (Dong, Zhaohui, &
Wanfeng, 2001; Zhao, Wang, & Xiao, 2013). SVDD has also been
used in outlier detection for image sensory devices (Bovolo,
Camps-Valls, & Bruzzone, 2010; Guo, Chen, & Tsai, 2009), intrusion
detection (Kang, Jeong, & Kong, 2012), and mura inspection of thin-
film transistor liquid–crystal displays (TFT-LCDs) (Liu, Lin, Hsueh,
& Lee, 2009; Liu, Liu, & Chen, 2011). With outliers recognized as
fault in the process, it is possible to identify faults in the dataset
using SVDD (Liu, Liu, & Chen, 2010; Luo, Cui, & Wang, 2011; Zhang,
Liu, Xie, & Li, 2009).
However, even though conventional SVDD has advantages in
data domain description, a major limitation exists. To decide the
optimal description of target data, SVDD takes into account only
the kernel-based distance between the spherical boundary and
the data points, not the distribution of the data. When SVDD sets
the description boundary without considering the density distribu-
tion of the data, it is possible that the boundary will pass through
0957-4174/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.eswa.2013.11.025
⇑
Corresponding author. Tel.: +82 2 3290 3396; fax: +82 2 929 5888.
E-mail address: jungeol@korea.ac.kr (J.-G. Baek).
Expert Systems with Applications 41 (2014) 3343–3350
Contents lists available at ScienceDirect
Expert Systems with Applications
journal homepage: www.elsevier.com/locate/eswa