Engineering Applications of Artificial Intelligence 68 (2018) 53–62
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence
journal homepage: www.elsevier.com/locate/engappai
Accelerating nearest neighbor partitioning neural network classifier based
on CUDA
Lin Wang
a,
*, Xuehui Zhu
a
, Bo Yang
a
, Jifeng Guo
a
, Shuangrong Liu
a
, Meihui Li
a
, Jian Zhu
c
,
Ajith Abraham
b
a
Shandong Provincial Key Laboratory of Network based Intelligent Computing, University of Jinan, Jinan, 250022, China
b
Machine Intelligence Research Labs (MIR Labs), Scientific Network for Innovation and Research Excellence, Auburn, WA, 98071, USA
c
Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Jinan 250117, China
a r t i c l e i n f o
Keywords:
Parallel nearest neighbor partitioning
Neural network classifier
Compute unified device architecture
Graphics processing units
a b s t r a c t
The nearest neighbor partitioning (NNP) method is a high performance approach which is used for improving
traditional neural network classifiers. However, the construction process of NNP model is very time-consuming,
particularly for large data sets, thus limiting its range of application. In this study, a parallel NNP method
is proposed to accelerate NNP based on Compute Unified Device Architecture(CUDA). In this method, blocks
and threads are used to evaluate potential neural networks and to perform parallel subtasks, respectively.
Experimental results manifest that the proposed parallel method improves performance of NNP neural network
classifier. Furthermore, the application of parallel NNP in performance evaluation of cement microstructure
indicates that the proposed approach has favorable performance.
© 2017 Elsevier Ltd. All rights reserved.
1. Introduction
Classification is an useful tool in the field of data mining and machine
learning. It aims to learn a classification function based on existing
data or to construct a classification model, i.e., the ‘‘Classifier’’. After
learning, the function or model not only maps the data records in the
training data to a specific class but also is able to predict the unknown
samples. Specifically, the classifier is a general term for classifying the
samples in data mining, including neural networks (Lu et al., 1996;
Misra et al., 2008; Gao and Ji, 2005; Hassan, 2011), support vector
machines (López and Suykens, 2011; Vapnik, 1997; Chua, 2003), rule-
based system, decision tree (Quinlan, 1986; Freund, 1995), data
gravitation (Peng et al., 2014, 2017) etc. Artificial neural network is
a general model of approximating nonlinear function, which directly
maps samples to centroid which belongs a class. It has been successfully
used in many classification tasks (Casta et al., 2011; An et al., 2011;
Avci, 2012; Yaakob and Jain, 2012; Kang and Park, 2009).
In the traditional neural network, the position, number, and labels
of centroids are permanent in training neural networks. However,
in the optimization process, mapping the samples to fixed centroids
reduces the possibility of finding optimal neural network. Thus the
floating centroid method (FCM) (Wang et al., 2012) was proposed to
solve the above problems. Its centroids are obtained through clustering
*
Corresponding author.
E-mail address: ise_wanglin@ujn.edu.cn (L. Wang).
method (Hartigan and Wong, 1979; Zhou et al., 2016) and distributed
automatically in the entire partition space. However, the FCM also
has its limitation, e.g., it cannot yield flexible decision boundaries.
The nearest neighbor partitioning (NNP) (Wang et al., 2017) method
overcomes that limitation by generating flexible decision boundary in
a sphere-like zone. The goodness of each possible neural network in
NNP is evaluated using a nearest-neighbor criterion. NNP is able to
easily yield flexible decision boundaries and partitions, thus increases
the probability of discovering optimal solution.
Despite the NNP’s significant improvement of accuracy, its effi-
ciency, particularly in solving large size data, is still problematic. The
collection of scientific data is getting easier due to the progress of
experimental devices and technologies. However, in recent years, the
efficiency problem of approaches for big data has drawn a widely atten-
tion. In terms of NNP, the similarity computation in its training stage
is an important procedure of the nearest-neighbor criterion (Baraldi et
al., 2016; Yu et al., 2015) and its use in the target function results in the
distribution of evaluated points are essential. However, the dramatic
increase in data size increases the complexity of similarity computation
and thus affects the efficiency of NNP. Furthermore, the efficiency of
other stages in NNP, including sample mapping, normalization, and
target function calculation, are also strongly influenced by the size of
https://doi.org/10.1016/j.engappai.2017.10.023
Received 26 May 2017; Received in revised form 1 October 2017; Accepted 30 October 2017
Available online 21 November 2017
0952-1976/© 2017 Elsevier Ltd. All rights reserved.