异步主成分分析：时间序列数据挖掘的新维度减压方法

96 浏览量更新于2024-08-26 收藏 863KB PDF 举报

本文主要探讨了"基于异步的主成分分析（Asynchronism-based Principal Component Analysis, APCA）在时间序列数据挖掘中的应用"。作者Hailin Li来自华侨大学商学院，其研究关注于解决传统主成分分析（Principal Component Analysis, PCA）在处理时间序列数据时的一个局限性——即同步相关性假设可能在某些情况下效率不高。在传统的PCA方法中，计算的是同步条件下的协方差矩阵，这对于时间序列数据，尤其是那些可能存在非同步现象的数据（例如，不同时间点上的观测值不均匀分布）并不适用。为了解决这个问题，本文提出了一种新的异步PCA方法。首先，通过动态时间 warping (DTW) 的异步策略，对原始时间序列进行插值，生成一组与原始数据时间点相对应的新时间序列。DTW是一种能够衡量两个时间序列相似度的方法，即使它们在时间上没有严格的线性对应关系，也能找到最佳的匹配路径。这些插值后的序列间的相关系数或协方差，被视为原始数据序列间相关性的估计。通过这种方式，作者构建了一个基于异步协方差的新型PCA框架，这个框架更适应时间序列数据的特性，能更有效地实现维度降维。相比于传统的同步PCA，APCA在处理异步数据、识别潜在的模式和趋势以及降低噪声方面具有显著优势。论文的关键字包括异步相关性、协方差矩阵、主成分分析、时间序列数据挖掘以及动态时间 warping。实验结果显示，基于异步策略的主成分分析方法在处理复杂的时间序列数据集时，不仅提高了数据处理的效率，而且能更准确地提取出数据的主要特征，对于提升时间序列数据分析的精度和深度具有重要的理论和实际价值。因此，本文的研究为时间序列数据挖掘领域提供了一个新的有效工具，有助于进一步推动该领域的研究和发展。

Asynchronism-based principal component analysis for time

series data mining

Hailin Li

⇑

College of Business Administration, Huaqiao University, Quanzhou 362021, China

article info

Keywords:

Asynchronous correlation

Covariance matrix

Principal component analysis

Time series data mining

Dynamic time warping

abstract

Principal component analysis (PCA) is often applied to dimensionality reduction for time series data min-

ing. However, the principle of PCA is based on the synchronous covariance, which is not very effective in

some cases. In this paper, an asynchronism-based principal component analysis (APCA) is proposed to

reduce the dimensionality of univariate time series. In the process of APCA, an asynchronous method

based on dynamic time warping (DTW) is developed to obtain the interpolated time series which derive

from the original ones. The correlation coefﬁcient or covariance between the interpolated time series rep-

resents the correlation between the original ones. In this way, a novel and valid principal component

analysis based on the asynchronous covariance is achieved to reduce the dimensionality. The results of

several experiments demonstrate that the proposed approach APCA outperforms PCA for dimensionality

reduction in the ﬁeld of time series data mining.

1. Introduction

Time series is one kind of the most important research objects

in the ﬁeld of data mining. The techniques used in this data is

called time series data mining (TSDM) (Esling & Agon, 2012). How-

ever, since its high dimensionality often renders standard data

mining techniques inefﬁcient, the methods used to reduce the

dimensionality are devised.

So far, there exists many methods to resolve this problem,

which are considered as two kinds of dimensionality reduction.

One is based on univariate time series, such as discrete Fourier

transformation (DFT) (Agrawal, Faloutsos, & Swami, 1993), discrete

wavelet transformation (DWT) (Maharaj & Urso, 2011; Struzik &

Siebes, 1998, 1999), polynomial representation (PR) (Fuchs,

Gruber, Pree, & Sick, 2010), piecewise linear approximation (PLA)

(Keogh, Chu, Hart, & Pazzani, 2001; Papadakis & Kaburlasos,

2010; Shatkay & Zdonik, 1996), piecewise aggregate approxima-

tion (PAA) (Keogh, Chakrabarti, Pazzani, & Mehrotra, 2000; Li &

Guo, 2011), symbolic aggregate approximation (SAX) (Lee, Wu, &

Lee, 2009; Lin, Keogh, Lonardi, & Chiu, 2003). These methods are

mainly proposed to reduced dimensionality from the points of

univariate time series. In other word, they mainly concentrate on

the transformation of a single time series so that the dimension

of the reduced representations is lower than that of the original

one. The other is based on the time series dataset, such as singular

value decomposition (SVD) (Spiegel, Gaebler, & Lommatzsch,

2011), principal component analysis (PCA) (Singhal & Seborg,

2002) and independent component analysis (ICA) (Cichocki &

Amari, 2002).

SVD and PCA are often seen as the same method to retain the

ﬁrst several principal components and to represent the whole

dataset. However, ICA is the development of principal component

analysis and factor analysis. In the ﬁeld of time series data mining,

the methods are often used and combined with the corresponding

measurements to discover the information and knowledge from

time series dataset. Krzanowski (1979) used PCA to construct the

principal components and chosen the ﬁrst k principal components

to represent the multivariate time series. At the same time, the

similarity between two time series are calculated by using the co-

sine value of the angle between the corresponding principal com-

ponents. Singhal and Seborg (2005) proposed a new approach S

dist

to compute the similarity based on PCA, which is better than the

earlier methods. Karamitopoulos and Evangelidis (2010) used

PCA to construct the feature space of the queried time series and

projected every time series to the space. They computed the error

between two reconstructed time series as the distance between

the query time series and the queried one. SVD is often based on

PCA, which uses KL decomposition method to reduce the dimen-

sionality of time series. Li, Khan, and Prabhakaran (2006) proposed

two methods to choose the feature vectors and used them to clas-

sify time series. Weng and Shen (2008) extended the traditional

SVD to an two-dimensional SVD (2dSVD) that extracts the princi-

pal components from the column–column and row–row directions

to compute the covariance matrix. Since feature extraction is one

of the most importance tasks for ICA, it was applied to the analysis

of time series. Wu and Yu (2005) used FastICA (Hyvärinen, 1999)to

http://dx.doi.org/10.1016/j.eswa.2013.10.019

⇑

Tel.: +86 595 22693815.

E-mail address: hailin@mail.dlut.edu.cn

Expert Systems with Applications 41 (2014) 2842–2850

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38692707

粉丝: 8
资源: 901

异步主成分分析：时间序列数据挖掘的新维度减压方法

基于动态弯曲的时间序列异步相关性分析

数据挖掘论文合集-242篇（part1）

数据挖掘论文合集-242篇（part3）

二元以上时间序列模型是什么

基于simulink异步电动机的仿真

基于FPGA的异步FIFO验证

什么是异步传输方式？在什么情况下需要采用异步传输方式？

异步fifo和异步ram的区别

异步十进制加法计数器

异步IO数据发送控制、数据接收、拆包等独立成多线程如何实现

最新资源