线性序列判别分析：多维序列的降维方法

82 浏览量更新于2024-07-15 收藏 1.42MB PDF 举报

"这篇论文主要探讨了多维序列的判别降维方法，即Discriminative Dimensionality Reduction在处理序列数据时的应用。" 在多维序列分析中，由于时间序列中的观测值在特定时间点上存在依赖关系，它们不能被视为独立的样本。因此，基于独立同分布(i.i.d.)假设的降维方法对于序列数据并不适用。论文提出了一种新的监督降维方法，称为线性序列判别分析（Linear Sequence Discriminant Analysis，LSDA）。LSDA的目标是通过最大化序列类别之间的可分性，将序列中的特征向量线性投影到低维子空间，从而实现整体上的类别区分。为了构建序列类别的可分性，论文基于序列统计信息来设计方法。不同的统计信息将导致不同的LSDA变体。文中提出了两种新颖的LSDA方法，分别是M-LSDA和D-LSDA。M-LSDA（模型基统计LSDA）利用序列类别的动态结构来提取统计特征，这有助于捕捉序列的时间演变特性。而D-LSDA（差异统计LSDA）则侧重于提取能够反映序列差异性的统计信息，可能更加关注各个序列之间的独特模式。 M-LSDA的方法可能更适合于那些具有明显动态规律的时间序列，如物理系统的运动轨迹或者生物信号等。它通过建模序列的动态变化来捕获类别的核心特征。另一方面，D-LSDA可能更适用于复杂或非线性序列数据，其中序列间的差异性是分类的关键。论文中，作者对这两种方法进行了详细的理论分析和实证比较，以展示它们在不同场景下的性能。通过实验，他们可能展示了LSDA在各种序列数据集上的分类效果，对比了与传统降维方法（如PCA或LDA）的差异，并可能讨论了LSDA在处理序列数据时的优势和局限性。这篇研究论文为序列数据分析提供了一种新的工具，即LSDA，它考虑了时间序列数据的内在依赖性，通过有效的降维策略提高了序列分类的准确性和效率。这对于诸如视频分析、生物信号处理、自然语言处理等领域具有重要的应用价值。

to describe the probability distribution of sequences. A test

sequence is classiﬁed into the class whose model provides the

highest likelihood of generating the test sequence.

Autoregressive models [18] and time-delay embed-

ding [19] are designed for univariate time series, but their

extensions to vector sequences are not trivial. The hidden

Markov model (HMM) [20] is one of the most popular prob-

abilistic dynamic system models. HMM models the joint

probability of hidden states and observations. Conditional

random ﬁeld (CRF) [21] directly models the conditional

probability of the hidden nodes given the observations, but

it requires more data and computation for training. Some

model-based approaches, such as HMM, can also be applied

to concatenate sequences to perform implicit segmentation-

based classiﬁcation [2], where the sequence segmentation is

a by-product of recognition by so-called cross training.

Rather than employing models for classiﬁcation, our pro-

posed approach utilizes the properties of such models to

deﬁne the statistics of the sequence class.

Distance-Based Approaches. The core of distance-based

approaches is to deﬁne a distance that measures the similar-

ity between sequences. For vector sequences with unequal

lengths, dynamic time warping (DTW) [22] is the most

widely used distance measure. Given two sequences

XX ¼½x

; x

; ...; x

2R

dN

and YY ¼½y

; y

; ...; y

2

dN

with lengths N

and N

, respectively, DTW ﬁnds the

optimal alignment from all possible sets of correspondences

between vectors, which minimizes the sum of pairwise vec-

tor-to-vector distances

min

;pp

t¼1

 y



; (1)

where T is the number of steps required to align the two

sequences. pp

¼½p

; p

; ...; p



2f1:N

T 1

and pp

½p

; p

; ...; p



2f1:N

T 1

denote the aligned indices

between vectors in sequences XX and YY (warping paths),

respectively. Boundary conditions, continuity and monoto-

nicity constraints are attached to p

and p

. Eq. (1) can be

efﬁciently solved by dynamic programming.

Some variations of DTW based on stretching and align-

ment have been proposed. The constrained DTW [23]

restricts the amount of subsequent alignment through a

locality constraint. The longest common subsequence dis-

tance [24] allows unmatched elements in the stretched

sequences. In [25], each sequence is ﬁrst mapped to a semi-

continuous HMM, and then the DTW distance between the

mixture weight vectors of the two HMMs is used as the ﬁnal

distance between the two sequences.

Feature-Based Approaches. The basic concept of feature-

based approaches is to represent a sequence as a global fea-

ture vector such that vector sequences are mapped to feature

vectors with ﬁxed dimensionality. Conventional supervised

learning methods, such as LDA, k-nearest neighbor and sup-

port vector machine (SVM), can then be applied.

For univariate time series, discrete Fourier trans-

forms [26] and discrete wavelet transforms [27] are applied

to individual time series, and only a portion of the coefﬁ-

cients are preserved. Consequently, the entire time series is

transformed to a new, shorter representation. These meth-

ods actually reduce the length of sequences rather than the

dimensionality of the component vectors, which differs

from the focus of this paper.

A sequence can either be represented by a collection of

subsequences called shapelets in [28], or be transformed

into a bag-of-features representation as in [29]. In [16], the

vector representation of a sequence is obtained by mean

pooling the mapped vectors in the sequence. In [15], for

each video, local descriptors are pooled by the bag-of-words

or by Fisher vector to form a single vector representation

without considering the temporal positions of the descrip-

tors. To encode the temporal dynamics into the ﬁnal repre-

sentation, the pooled time series [30] summarizes the

changes in the frame-wide feature elements over time, and

rank pooling [31] learns a function capable of ordering the

frame-wide features and uses the function parameters as

the representation.

2.4 Dimensionality Reduction for Sequences (DRS)

Although DRS has begun to receive attention, it has

remained an under-explored area. In [3], [32], DTW is com-

bined with canonical correlation analysis to align multidi-

mensional feature sequences. Although these methods also

perform DRS, they can only be applied to multi-modal

sequences for alignment and cannot be extended to multi-

sequence classes for classiﬁcation. In [33], a sequence kernel

based DR approach that combines spatial, temporal and

periodic information is proposed for time series data, where

labels are associated with the vectors in long time series.

The method is kernel-based, and its task is to predict a class

label for each frame.

In speech and handwriting recognition, LDA has been

performed based on HMM [1], [34], which consists of two

steps: 1) An HMM for each class is trained to create pseudo

state labels of vectors in training sequences, and 2) LDA is

then performed by treating all states of all HMMs as indi-

vidual classes. We refer to this method as “state-LDA”.

HLDA [11] can also be applied using a similar process,

which is denoted as “state-HLDA”. The temporal depen-

dencies are ignored because the states within the same

HMM are not independent, and a true label is associated

with the entire state sequence rather than a state. In contrast

to these methods, in our proposed method, the temporal

dependency and holistic structure are explored in two

aspects: 1) the extracted statistics encode the hidden dynam-

ics and temporal information, and 2) the objective function

aims to holistically discriminate different sequence classes.

An advantage of DRS over the feature-based methods

introduced in Section 2.3 is that it does not break the length

and structure when processing concatenate vector sequen-

ces. After DRS, the sequences still remain unsegmented, but

the differences between segments corresponding to differ-

ent patterns become starker; therefore, more precise seg-

mentation may be obtained. Either re-segmentation or a

holistic approach can be conducted in subsequent proce-

dures. In contrast, when feature-based methods are per-

formed, an irreversible pre-segmentation of the long

sequences is needed, which induces irreversible errors. Cer-

tainly, feature-based methods can also be performed on the

sequences after DRS for better classiﬁcation performances.

Recurrent neural networks (RNN) [35] such as long-short

term memories (LSTM) [36] are powerful models for

SU ET AL.: DISCRIMINATIVE DIMENSIONALITY REDUCTION FOR MULTI-DIMENSIONAL SEQUENCES 79

剩余14页未读，继续阅读

weixin_38522214

粉丝: 2
资源: 880

线性序列判别分析：多维序列的降维方法

Transferable discriminative dimensionality reduction.pdf

Discriminative multi-task multi-view feature selection and fusion for multimedia analysis

Graph-based discriminative features learning for fine-grained image retrieva

论文Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification是如何对骨干网络进行微调的？有哪些具体实验步骤和方法？体现在伪代码上又是怎么样的？

Bag of Tricks and A Strong Baseline for Deep Person Re-identification

learning deep features for discriminative localization

基于“sMRI + fMRI”对疾病进行分类的代码

基于深度聚类的语音分离代码

d2-net_weakly-supervised_action_localization_via_discriminative_embeddings_a

子空间聚类最新英文文献有哪些，请给出年限与DOI号

最新资源