时间序列k均值：增强的时间序列数据平滑子空间聚类算法

需积分: 37 2 浏览量更新于2024-08-12 收藏 974KB PDF 举报

本文主要探讨了时间序列数据处理领域的一个新颖方法，即时间序列k均值（Time Series k-means，简称TSkmeans）。传统的聚类算法在处理时间序列数据时，往往无法充分利用其中的平滑子空间信息，这在动态变化的数据中可能导致较差的聚类效果。TSkmeans算法旨在解决这一问题，它是一种结合了时间序列特性和k-means算法的创新性技术。在TSkmeans中，关键的概念是平滑子空间，这个子空间是由加权时间戳构成的，每个时间戳都有一个权重，反映了其在区分不同聚类对象中的相对重要性。通过这种方式，算法能够捕捉到时间序列数据随时间演变的模式，并在聚类过程中考虑这种时序关联性。为了实现这一目标，研究者设计了一个新的目标函数，该函数不仅关注数据点之间的距离，还考虑了时间序列的连续性和趋势，从而使得聚类结果更加准确。更新规则方面，TSkmeans引入了针对平滑子空间的迭代优化过程，确保在每次迭代中，算法都能找到最符合子空间特征的聚类中心。这不同于常规k-means的简单距离中心化，而是考虑到时间序列数据的时间依赖性，使得聚类过程更具动态性和适应性。在实验部分，作者通过综合数据集和五个实际应用数据集验证了TSkmeans的有效性。评估指标包括Accuracy（准确率）、Fscore（F1分数）、RandIndex（兰德指数）以及正常共同信息等，结果显示TSkmeans在这些度量上均表现出优于传统方法的性能，证明了其在时间序列数据聚类任务中的优越性。时间序列k均值算法是一项重要的技术创新，它扩展了k-means方法在处理时间序列数据时的能力，通过利用平滑子空间和加权时间戳，有效地解决了时序数据的复杂性和动态性问题。这对于数据挖掘、特征选择等领域都具有重要意义，未来有可能在实时数据分析、异常检测等领域得到广泛应用。

X. Huang et al. / Information Sciences 367–368 (2016) 1–13 3

Dynamic Time Warping (DTW) [35,45] has been proposed to automatically deal with time deformations and different speeds

(e.g., speech recognition) associated with time-dependent data. However, clustering time series data by using the DTW dis-

tance is a computationally expensive task. Begum et al. proposed an accelerating DTW clustering method with an admissible

pruning strategy [4] . To cluster a short time series data set, Möller-Levet et al. proposed a computational method of calcu-

lating the distance between two short time series as the sum of the squared differences of their corresponding slopes [33] .

Another group of time series clustering algorithms has been developed based on edit distance which mainly includes

longest common subsequence (LCSS) model [40,41] and edit sequence on real sequence (EDR) model [10] . Chen et al. pro-

posed a Spatial Assembling Distance (SpADe) method [12] which was able to deal with shifting and scaling in both temporal

and amplitude dimensions. The main idea of SpADe is to discover matching time segments named patterns, within an entire

time series by shifting and scaling in both the temporal and amplitude dimensions. Bahadori et al. proposed a Functional

Subspace Clustering (FSC) algorithm [3] which extended the power and ﬂexibility of subspace clustering to time series

data by permitting the deformations that underlie many popular functional similarity measures. To overcome the issues of

high dimensionality, contextual constraints, and temporal smoothness, Cai et al. proposed a comprehensive method named

FACETS [6] to simultaneously capture all these aspects by using tensor factorization and performing careful popularizations

to tackle both contextual and temporal issues. Ferreira and Zhao [17] transformed a set of time series objects into a network

by using different distance functions. More speciﬁcally, every time series object is represented by a vertex and the most

similar vertexes are connected to uncover clusters. A fast clustering method for large-scale time series data named YADING

was developed [16] . In particular, time series objects were allocated to clusters that were initially induced based on sampled

subsets of the input data. As a whole, these existing algorithms do not take into account the possible subspaces of a time

series data set.

2.2. Subsequence clustering

Prior to 2003, subsequence time series clustering was generally accepted as a valid technique for time series analysis

[14,18,25] . For example, Fu et al. discovered patterns from stock data by using subsequence time series analysis technique

[18] . However, Keogh et al. claimed that subsequence time series clustering was meaningless in 2003 because the centroids

produced by subsequence time series clustering became sinusoidal pseudo-pattern for almost all kinds of time series data

[30] . Moreover, further work [9,24,38] was published to explain the problem; for example, Idé theoretically explained why

the centroids of subsequence time series data produced by k -means clustering method naturally formed sinusoidal patterns

due to the Fourier state [24] .

Nevertheless, some researchers continued to research new methods that produce meaningful subsequence clustering

results [7,8] . Chen claimed that subsequence clustering could indeed be meaningful if distances were correctly measured in a

delay space [9] . However, the sliding windows technique adopted for constructing the delay space is generally suboptimal as

it causes the delay space that represents the data series dynamics to be aligned closely along the bisectrix of the delay space.

Since similarity computation for time series is the bottleneck for clustering large-scale time series data, Rakthanmanon

proposed the UCR-DTW method [36] which is capable of searching and mining trillions of time series subsequences under

dynamic time warping. Most of the existing subsequence time series clustering approaches can only solve the kinds of

problems which are characterized by some predeﬁned parameters and the range of width variability is small. However, such

an assumption turns out to be unrealistic for many real-world applications. Therefore, Madicar et al. proposed an enhanced

parameter-free subsequence time series clustering algorithm for high-variability-width data [31] . Zolhavarieh et al. offered

a solution to perform online pattern recognition for subsequence time series clustering [46] . Agarwal et al. discovered that

high quality subsequence clusters could be uncovered from vehicular sensor data by using bounded spherical clustering, and

the proposed method characterized by linear time complexity [1] . Due to the encouraging results of this work [1] , authors

found that the problem of generating meaningless subsequence clusters [30] could be resolved by using bounded spherical

clustering.

2.3. Characteristics of the proposed TSkmeans algorithm

Similar to the traditional clustering methods that operate on the whole sequence, our proposed TSkmeans algorithm

can iteratively discover the subspaces of the whole sequence, and then clusters objects based on the uncovered subspaces

instead of the whole sequence. Following such an idea, we propose a new k -means type clustering framework that can

smoothly assign weights to different time stamps for clustering time series data.

Although some weighted k -means type algorithms have been proposed by using different weighting methods

[21,23,26,39] , these algorithms aim to cluster data whose features do not have a chronological order. To effectively explore

the temporal sequence information associated with time series data, the proposed TSkmeans algorithm tries to smooth the

weights of adjacent time stamps, and hence the uncovered subspaces become more meaningful for clustering time series

data.

剩余12页未读，继续阅读

weixin_38670065

粉丝: 4
资源: 923

时间序列k均值：增强的时间序列数据平滑子空间聚类算法

时间序列聚类——十年回顾

利用Python对时间序列进行分类与聚类

TimeSeriesDeepClustering:这是与为“时间序列聚类的端到端深度表示学习”工作所进行的实验相对应的代码。

时间序列k均值_时间序列数据的新k均值类型平滑子空间聚类

电子功用-基于数据平滑度函数的火电机组负荷(准)稳态工况聚类算法

三维点云数据的模糊聚类分割研究及应用.docx

时间序列聚类分析：7大策略与算法选择，优化数据洞察

时间序列数据处理：reshape2包实战技巧全解析

利用PCA进行时间序列数据分析：特征提取与建模

R语言聚类分析进阶：利用hclust包处理复杂数据结构

最新资源