GO-PEAS: A Scalable Yet Accurate Grid-Based
Outlier Detection Method Using Novel Pruning
Searching Techniques
Hongzhou Li
1
, Ji Zhang
2(
B
)
, Yonglong Luo
3
,FulongChen
3
,
and Liang Chang
1
1
Guangxi Key Laboratory of Trusted Software, Guilin University
of Electronic Technology, Guilin, China
homzh@163.com, changl@guet.edu.cn
2
University of Southern Queensland, Toowoomba, Australia
ji.zhang@usq.edu.au
3
Anhui Normal University, Wuhu, China
ylluo@ustc.edu.cn, long005@mail.ahnu.edu.cn
Abstract. In this paper, we propose a scalable yet accurate grid-based
outlier detection method called GO-PEAS (stands for Grid-based Outlier
detection with Pruning Searching techniques). Innovative techniques are
incorporated into GO-PEAS to greatly improve its speed performance,
making it more scalable for large data sources. These techniques offer
efficient pruning of unnecessary data space to substantially enhance the
detection speed performance of GO-PEAS. Furthermore, the detection
accuracy of GO-PEAS is guaranteed to be consistent with its baseline
version that does not use the enhancement techniques. Experimental
evaluation results have demonstrated the improved scalability and good
effectiveness of GO-PEAS.
1 Introduction
Outlier detection is an important data analytic/mining problem that aims to find
objects and/or patterns that are considerably dissimilar, exceptional and incon-
sistent with respect to the majority data in an input database. Outlier detection
has become one of the key enabling technologies for a wide range of applications in
industry, business, security and engineering, etc., where outliers represent abnor-
mal patterns that are critical for domain-specific decision-making and actions.
Due to its inherent importance in various areas, considerable research efforts
in outlier detection have been taken in the field and a number of outlier detection
techniques have been proposed that leverage different detection mechanisms and
algorithms. The majority of them deal with the traditional relational datasets
which can be generally classified into the distribution-based methods [2], the
distance-based methods [4,10], the density-based methods [8,11,13,16–18]and
the clustering-based methods [6,9], which feature different levels of performance
in terms of detection accuracy and efficiency. The research on outlier detection
has also been carried out for other types of datasets such as temporal data
c
Springer International Publishing Switzerland 2016
T. Ray et al. (Eds.): ACALCI 2016, LNAI 9592, pp. 125–133, 2016.
DOI: 10.1007/978-3-319-28270-1
11