云计算环境下的大规模数据流SVM增量学习算法

147 浏览量更新于2024-07-14 1 收藏 546KB PDF 举报

"这篇研究论文探讨了在云计算环境中基于支持向量机（SVM）的大规模数据流增量学习算法。该工作由中国国家自然科学基金资助，并由来自北京科技大学、烟台工程技术学院和清华大学的科研团队共同完成。文章介绍了SVM在处理大规模数据流时的增量学习方法，旨在提高云计算环境中的学习效率和准确性。" 在云计算环境中，处理海量数据是一项巨大的挑战，特别是对于实时或连续的数据流，传统的机器学习算法可能无法有效地应对。支持向量机（Support Vector Machine, SVM）是一种强大的监督学习模型，广泛应用于分类和回归分析。然而，原始的SVM算法并不适用于处理不断变化和增长的数据流，因为它需要重新训练整个数据集，这在大数据背景下效率极低。论文提出了一种基于SVM的增量学习算法，旨在解决这个问题。增量学习允许模型在接收到新数据时逐步更新，而无需重新处理全部历史数据。这种方式大大减少了计算成本，提高了处理大规模数据流的效率。在云计算的背景下，这种算法可以分布式地运行，利用云计算的并行处理能力进一步加速学习过程。论文中，作者可能详细讨论了以下关键点： 1. **增量学习策略**：如何设计一个有效的机制，使得SVM可以在接收新数据实例时仅更新部分模型参数，而不是整个模型。 2. **适应性调整**：如何使模型能够适应数据流中的概念漂移，即数据分布的变化。 3. **内存管理**：在处理大规模数据时，如何有效地存储和管理训练样本，以保持模型的性能和避免内存溢出。 4. **性能评估**：可能通过模拟实验和真实数据集来验证算法的性能，比较其与非增量学习方法的差异，如准确率、召回率、F1分数等指标。 5. **并行化实现**：如何利用云计算平台的并行计算资源，将增量学习算法进行分布式优化，提高处理速度。这篇研究论文为云计算环境下的大规模数据流处理提供了一个创新解决方案，通过SVM的增量学习方法，实现了对动态数据流的高效、准确的学习，这对于实时数据分析和预测具有重要意义。

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 8, NO. 10, October 2014 3381

introduced a new classification of behavior recognition systems based on computer clusters.

It integrated increments from data stream mining and active learning. Dilectin [16] classified

the real-time data for tsunami warnings using a KNN (K-Nearest Neighbor) algorithm and

dynamically detected data. In the study of human face gender recognition, Yang [17]

proposed a new framework for reaching faster face gender recognition using a local ternary

pattern and an extreme learning machine. Syed [18] proposed a Batch incremental learning

algorithm based on SVM, which could carry out incremental learning effectively using data

changes. The training data set used in the network text classification was massive and

fast-changing. Ding [19] suggested the FVI-SVM algorithm, which narrowed the volume of

the training data set using a KKT (Karush-Kuhn-Tucker) condition and improved

classification efficiency. Gonzalez-Mendoza [20] introduced KKT conditions and considered

the KKT optimality conditions in order to present a strategy to implement the SVM-QP

(quadratic optimization problem based on Support Vector) Machines. Among those studied,

the BatchSVM incremental learning algorithm could continually accumulate more support

vector sets, but when data endlessly increased, it could bring too great a burden to perform

the computations. Besides, the incremental learning algorithm of the KKT conditions filtered

more incremental samples during training, so the classification accuracy was lower.

In a weighting method of the SVM algorithm, which treated the gray image as a digital

terrain model, Zheng [21] developed an adaptive weighted least squares support vector

machine, named the LS-SVM, to iteratively estimate the optimal gray surface underlying the

noisy image. The LS-SVM worked on Gaussian noise while the weighted LS-SVM worked

on the outliers and the non-Gaussian noise. Xing [22] presented a feature selection and

weighted support vector machine (FSWSVM) classifier-based algorithm to detect ships

using polarimetric SAR imagery. For the online classification of data streams with

imbalanced class distribution, Zhu [23] gave an incremental Linear Proximal Support Vector

Machine, named the LPSVM, also called the DCIL-IncLPSVM, which provided robust

learning performance in the case of class imbalance. Fergani [24] presented a new way to

deal with the class imbalance problem, creating an efficient way of choosing the suitable

regularization parameter C of the Soft-Support Vector Machines (C-SVM) method in order

to facilitate automatic recognition of activities in a smart home environment. None of these

papers considered the case of imbalanced data volumes within a history sample set and an

incremental sample set in incremental learning algorithms. During the research reported in

this paper, we paid attention to this issue and handled it using weighted processing to

improve the classification accuracy.

2.2 Problem analysis

There are several defects in traditional incremental learning algorithms. For example, the

standard SVM [9] is not an incremental algorithm. Although its classification accuracy is

higher, the problem is that the higher the data volume, the longer the training time. By

discarding useless sample sets and keeping useful support vector sets only, the BatchSVM

[18] can realize the purpose of a smaller data set volume. But this advantage disappears

when it comes to larger amount of support vectors, the subset volume will be larger as a

result of continuous iteration and accumulation. Moreover, data stream, which is real-time

continuous, will pose a heavy burden on computation. An incremental learning algorithm

based on a KKT condition [20] will also decrease the sample amount in incremental samples,

which will take part in the next step of training. This lowers the classification accuracy

because a lot of incremental samples are filtered at the same time. Moreover, during data

stream processing, we take the data of a sliding window as the incremental sample set and

剩余15页未读，继续阅读

weixin_38731479

粉丝: 3
资源: 916

云计算环境下的大规模数据流SVM增量学习算法

计算机-后端-基于Hadoop架构的数据驱动SVM并行增量学习算法研究.pdf

支持向量机在线增量学习算法的MATLAB实现.pdf

论文研究-基于支持向量机的增量学习算法.pdf

基于健忘因子的电子健康网络增量学习分类算法

支持向量机在线增量学习算法的MATLAB实现.zip

增量SVM在Banana数据集上的可视化

SVM快速增量学习算法的研究与实现

增量式SVM：数据流异常检测的高效与精准策略

时间序列预测：双增量学习算法的应用

增量SVM开源实现与PSO算法在分类中的应用

最新资源