KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 8, NO. 10, October 2014 3381
introduced a new classification of behavior recognition systems based on computer clusters.
It integrated increments from data stream mining and active learning. Dilectin [16] classified
the real-time data for tsunami warnings using a KNN (K-Nearest Neighbor) algorithm and
dynamically detected data. In the study of human face gender recognition, Yang [17]
proposed a new framework for reaching faster face gender recognition using a local ternary
pattern and an extreme learning machine. Syed [18] proposed a Batch incremental learning
algorithm based on SVM, which could carry out incremental learning effectively using data
changes. The training data set used in the network text classification was massive and
fast-changing. Ding [19] suggested the FVI-SVM algorithm, which narrowed the volume of
the training data set using a KKT (Karush-Kuhn-Tucker) condition and improved
classification efficiency. Gonzalez-Mendoza [20] introduced KKT conditions and considered
the KKT optimality conditions in order to present a strategy to implement the SVM-QP
(quadratic optimization problem based on Support Vector) Machines. Among those studied,
the BatchSVM incremental learning algorithm could continually accumulate more support
vector sets, but when data endlessly increased, it could bring too great a burden to perform
the computations. Besides, the incremental learning algorithm of the KKT conditions filtered
more incremental samples during training, so the classification accuracy was lower.
In a weighting method of the SVM algorithm, which treated the gray image as a digital
terrain model, Zheng [21] developed an adaptive weighted least squares support vector
machine, named the LS-SVM, to iteratively estimate the optimal gray surface underlying the
noisy image. The LS-SVM worked on Gaussian noise while the weighted LS-SVM worked
on the outliers and the non-Gaussian noise. Xing [22] presented a feature selection and
weighted support vector machine (FSWSVM) classifier-based algorithm to detect ships
using polarimetric SAR imagery. For the online classification of data streams with
imbalanced class distribution, Zhu [23] gave an incremental Linear Proximal Support Vector
Machine, named the LPSVM, also called the DCIL-IncLPSVM, which provided robust
learning performance in the case of class imbalance. Fergani [24] presented a new way to
deal with the class imbalance problem, creating an efficient way of choosing the suitable
regularization parameter C of the Soft-Support Vector Machines (C-SVM) method in order
to facilitate automatic recognition of activities in a smart home environment. None of these
papers considered the case of imbalanced data volumes within a history sample set and an
incremental sample set in incremental learning algorithms. During the research reported in
this paper, we paid attention to this issue and handled it using weighted processing to
improve the classification accuracy.
2.2 Problem analysis
There are several defects in traditional incremental learning algorithms. For example, the
standard SVM [9] is not an incremental algorithm. Although its classification accuracy is
higher, the problem is that the higher the data volume, the longer the training time. By
discarding useless sample sets and keeping useful support vector sets only, the BatchSVM
[18] can realize the purpose of a smaller data set volume. But this advantage disappears
when it comes to larger amount of support vectors, the subset volume will be larger as a
result of continuous iteration and accumulation. Moreover, data stream, which is real-time
continuous, will pose a heavy burden on computation. An incremental learning algorithm
based on a KKT condition [20] will also decrease the sample amount in incremental samples,
which will take part in the next step of training. This lowers the classification accuracy
because a lot of incremental samples are filtered at the same time. Moreover, during data
stream processing, we take the data of a sliding window as the incremental sample set and