快速增量学习：优化线性判别分析处理混合流数据

112 浏览量更新于2024-08-27 收藏 1.16MB PDF 举报

"快速在线增量学习混合流数据" 这篇研究论文主要关注的是在处理流数据时的特征学习，特别是线性判别分析（LDA）的挑战。随着流数据的快速增长，传统的LDA算法往往无法有效地对按顺序到达的样本进行增量更新。作者提出了一种名为Fast Online Incremental Learning on Mixture Streaming Data（快速在线增量学习混合流数据）的方法，旨在解决这一问题。首先，他们引入了一种新的快速批处理LDA（FLDA / QR）算法。这个算法采用聚类中心来解决下三角系统，同时利用Cholesky分解进行优化，从而提高处理速度。Cholesky分解是一种矩阵分解方法，能够高效地计算逆矩阵或求解线性方程组，对于大数据处理非常有效。接着，为了进一步适应流数据的动态特性，研究人员开发了一种精确的增量算法——Incremental FLDA / QR (IFLDA / QR)。与大多数仅处理新类别或少量新样本的现有方法不同，IFLDA / QR不仅能够处理现有类别中的新标记样本，还能处理全新（新颖）类别的样本，甚至能应对大量混合样本的情况。其中，IFLDA / QR的正交化Gram-Schmidt过程在空间和时间效率上显著优于传统的基于QR分解的第一秩更新方法。在IFLDA / QR中，通过改进的Gram-Schmidt过程，可以有效地减少存储需求和计算复杂度，使得算法能够在流数据环境下实时更新模型，而不必将所有历史数据保留在内存中。这种设计对于处理大规模、高维度的数据流尤其重要，因为它降低了资源消耗，提升了系统的响应速度。论文通过理论分析和数值实验验证了IFLDA / QR算法的优越性。结果显示，与现有技术相比，IFLDA / QR在空间和时间成本上降低了2到10倍，分类精度保持在可比较的水平。这意味着在处理混合流数据时，IFLDA / QR不仅更快速，而且更节省资源，是处理这类问题的有效工具。这篇研究论文为在线增量学习提供了新的解决方案，尤其是在面对混合流数据的挑战时，IFLDA / QR算法展示出了显著的优势，对于实际应用中的数据处理和分类任务具有重要的指导价值。

Fast Online Incremental Learning on Mixture Streaming Data

Yi Wang, Xin Fan, Zhongxuan Luo

School of Software

Dalian University of Technology

& Key Laboratory for Ubiquitous Network

and Service Software of Liaoning Province

Dalian, China

{dlutwangyi, xin.fan, zxluo}@dlut.edu.cn

Tianzhu Wang

No. 254, Deta Leisure Town

Jinzhou New District

Dalian, China

wangtz@126.com

Maomao Min

School of Software

Dalian University of Technology

Dalian, China

neilfvhv@gmail.com

Jiebo Luo

Department of Computer Science

University of Rochester

Rochester, NY 14627, USA

jluo@cs.rochester.edu

Abstract

The explosion of streaming data poses challenges to fea-

ture learning methods including linear discriminant anal-

ysis (LDA). Many existing LDA algorithms are not efﬁ-

cient enough to incrementally update with samples that se-

quentially arrive in various manners. First, we propose a

new fast batch LDA (FLDA/QR) learning algorithm that

uses the cluster centers to solve a lower triangular system

that is optimized by the Cholesky-factorization. To take ad-

vantage of the intrinsically incremental mechanism of the

matrix, we further develop an exact incremental algorithm

(IFLDA/QR). The Gram-Schmidt process with reorthogonal-

ization in IFLDA/QR signiﬁcantly saves the space and time

expenses compared with the rank-one QR-updating of most

existing methods. IFLDA/QR is able to handle streaming data

containing 1) new labeled samples in the existing classes, 2)

samples of an entirely new (novel) class, and more signiﬁ-

cantly, 3) a chunk of examples mixed with those in 1) and

2). Both theoretical analysis and numerical experiments have

demonstrated much lower space and time costs (2 ∼ 10 times

faster) than the state of the art, with comparable classiﬁcation

accuracy.

Introduction

Streaming data are explosive with the boom of mobile net-

works, social media, and video cameras in this decade.

To analyze this type of data, an efﬁcient and incremen-

tal learning strategy is necessary. Moreover, these data are

mixed with known or novel class labels, arriving one by

one or chunk by chunk. These characteristics greatly chal-

lenge the existing learning methods. We aim at developing

an extremely fast incremental learning method for mixture

streaming data.

 2017, Association for the Advancement of Artiﬁcial

Classical linear discriminant analysis (LDA) (Chen et al.

2000) and its recent variants (Sharma and Paliwal 2015;

Kong and Ding 2014; Luo, Ding, and Huang 2011) can ef-

fectively extract features for classiﬁcation. Recently, incre-

mental LDA (ILDA) methods have been emerging to address

the above challenges arisen from streaming data. Some of

these ILDA methods attempt to modify the objective func-

tion, while others use new optimization strategies to accel-

erate the calculation speed. The method of IDR/QR (Ye et

al. 2005) uses the QR-decomposition to update the within-

class scatter matrix (S

) and between-class scatter matrix

). The methods of ICLDA (Lu, Zou, and Wang 2012),

IDR/new (Lu, Jian, and Wang 2015) and ILDA/QR (Chu et

al. 2015) also follow the idea of using QR-decomposition,

and employ various techniques to reduce the matrix size for

the decomposition. However, their updating schemes, the

key to any incremental algorithms, are not space and time

efﬁcient for online learning.

Researchers also resort to the approximation of scattering

matrices for efﬁciency improvements. Liu et al. (Liu, Jiang,

and Zhou 2009) give the least-square solution of LDA (LS-

ILDA), but the incrementally update of the pseudo-inverse

of the data matrix is still time consuming. Kim et al. (Kim et

al. 2011) leverage the sufﬁcient spanning set approximation

to S

and the total scatter matrix S

by updating the princi-

pal eigenvectors and eigenvalues of these two matrices. The

time complexity signiﬁcantly decreases, but there exists an

evident gap between the accuracies of incremental and batch

versions.

In this paper, we devise an online incremental LDA al-

gorithm that requires much (2 ∼ 10 times) less computa-

tion complexity and smaller space than the state-of-the-art

for mixture streaming data. The algorithm is illustrated in

Fig. 1. Speciﬁcally, our contributions are twofold:

1. We take advantage of the QR-decomposition on a lower

triangular matrix (Chu et al. 2015), and propose a new

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

2739

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38729108

粉丝: 5
资源: 896

快速增量学习：优化线性判别分析处理混合流数据

Awesome-Incremental-Learning:很棒的增量学习

SVM增量式学习的自适应与优化的MATLAB代码

matlab20行代码-EndToEndIncrementalLearning:端到端增量学习

figmn:快速增量高斯混合网络

面向分布式数据流的混合聚类算法.pdf

混合属性数据的增量聚类算法：基于相对密度的RDBC M

DCI-ELMK算法：一种混沌优化的增量极限学习机

演化贝叶斯网络在流数据预测中的应用

逻辑回归的在线学习：增量学习与数据流处理技巧

【在线学习的LDA应用】：增量学习中LDA的运用之道

最新资源