遗传算法优化的随机森林：一种识别人类互动的新方法

55 浏览量更新于2024-07-14 收藏 2.54MB PDF 举报

"通过基于遗传算法的随机森林时空相关性识别人与人之间的互动" 这篇研究论文探讨了如何利用遗传算法优化的随机森林方法来识别人类之间的互动行为。在计算机视觉领域，识别多人交互比单一人物活动更为复杂，也更具挑战性。本文提出了一种创新且有效的方法，该方法结合了全局运动上下文（MC）特征和局部时空兴趣点特征的时空（S-T）相关性。首先，全局运动上下文特征被用于训练随机森林模型。在训练阶段，遗传算法被引入以寻找可靠性与效率之间的最佳平衡点。遗传算法是一种模拟自然选择和遗传过程的优化技术，能够通过迭代和选择过程来改进解决方案，使其更适应特定目标。其次，论文提出了基于时空相关性的匹配策略。这里，全局运动上下文的结构被用来计算两个视频在空间上的关联得分，而Needleman-Wunsch算法则用于计算它们在时间轴上的关联得分。Needleman-Wunsch算法通常用于生物信息学中的序列比对，但在这里被创造性地应用到视频的时间同步分析中。实验部分，论文使用UT-Interaction数据集进行验证。这是一个专门用于多人交互行为识别的基准数据集，包含了各种复杂的互动场景。通过对这个数据集的分析，作者展示了所提方法在识别精度和处理速度上的优势。此外，时空相关性是捕捉人与人之间互动的关键因素，因为它可以揭示动作的顺序、同步性和空间排列。这种方法的创新之处在于将这两种特征有效地结合起来，利用遗传算法优化随机森林的训练，从而提高识别的准确性。该研究为人类交互行为识别提供了一个新的视角，不仅提高了识别的精确度，还增强了系统在处理大规模多人体交互时的效率。这种技术在工业和商业应用中具有巨大的潜力，例如智能监控、安全分析和人机交互等领域。通过进一步的研究和改进，它有可能推动计算机视觉在理解复杂社会动态方面的发展。

random forest which is composed of a series of GA

search-based binary decision trees, as shown in the

ﬂowchart in the red box. Besides the MC-based recog-

nition, another scheme based on S-T correlation match

(within the blue box) is adopted. In speciﬁc, we ﬁrst

describe each STIP by a descriptor which contains three

parts: PCA of original image patch, HOG and distribution

of nearby STIPs. Afterwards, we use k-means algorithm

to cluster the STIPs, hence we can describe each video by

a series of STIP occurrence sequences which serve as a

template of that video. Finally, the spatial correlation

score between two videos is calculated within the MC

framework in the way similar to ‘‘histogram intersection

kernel’’, whereas the temporal correlation score is calcu-

lated by a biological sequence matching algorithm called

Needleman–Wunsch algorithm [59]. Experiments on UT-

Interaction dataset demonstrate that both MC and S-T

correlation-based methods can work well separately, and

that the combination of the two methods outperforms

other common machine learning methods and most state-

of-the-art works. The details of ‘‘fusion’’ in Fig. 1 are

discussed in Sect. 4.4.

4 Approach

4.1 STIP-based mid-level feature extraction

4.1.1 STIP extraction based on voxel variance

Numerous studies [30, 45, 60] have conﬁrmed the superi-

ority of Dollar’s STIPs over Laptev’s counterparts. How-

ever, Dollar’s method constructs motion saliency maps by

2-D spatial Gaussian ﬁltering and 1-D temporal Gabor

ﬁltering, which still has considerable computational load,

especially when the video volume is large. Here, we use an

even more straightforward method presented in [31]to

extract STIPs. A sliding window is used to calculate the

motion saliency maps from groups of frames within the

window. As shown in Fig. 2, each pixel value of the mo-

tion saliency map (corresponding to the center frame of the

window) is just the variance of the voxel values in the same

location of a group of frames within the window. As

pointed out by [31], the sliding window size plays an im-

portant role: too many frames in a group will blur the

saliency map and make it difﬁcult to distinguish between

even ‘‘walk’’ and ‘‘run’’. An empirical choice for the

window size is 5–10, and we choose 7 in our experiments.

The STIPs are extracted by ﬁnding the local maxima of

the saliency maps. We use Eq. (1) as our threshold to de-

tect local maxima (non-maximum suppression is imple-

mented to avoid too close STIPs),

threshold ¼ mean þðmax  meanÞ0:005 ð1Þ

where mean and max correspond to the mean and max-

imum of the pixel values of all the saliency maps in a

video. We also compare such STIPs with Dollar’s coun-

terparts (using the same way to generate thresholds),

ﬁnding that they have similar density, whereas the former

calculates much faster (examples are given in Fig. 3).

4.1.2 Motion context (MC)

Motion context (MC) feature, which catches global infor-

mation of motion and shape, is used to train a random

forest. The idea of MC comes from ‘‘shape context (SC)

[51]’’ which uses a log-polar diagram (centered at a ref-

erence edge point) to measure the distribution of the edge

points of an object. Similarly, MC also uses a log-polar

diagram, but measures the distribution of STIPs rather than

edge points. An MC descriptor, which is also a histogram,

can be constructed from each frame. But in practice, we

discard those frames which have less than 30 STIPs, thus

avoiding too sparse histograms that correspond to frames

without obvious motion. As depicted in Fig. 4, we use a

log-polar diagram containing 24 sub-regions to generate a

24-D histogram called MC descriptor.

The diagram’s center (cx,cy) and diameter D are deter-

mined by

ðcx; cyÞ¼

min

þ x

max

;

min

þ y

max



D ¼ g  maxðx

max

 x

min

; y

max

 y

min

(

ð2Þ

where x

min

, x

max

, y

min

and y

max

denote the extrema of all the

STIPs’ coordinates in the current frame, and the coefﬁcient

g (g ¼ 1: 2) is used to make D larger to cover most STIPs.

In speciﬁc, the ratio of the three radial intervals of the log-

polar diagram is 1:ln3:ln

3. Similar to [31], we deﬁne

MC’s main orientation as the fan sector with most STIPs.

To ensure the invariance of MC feature under mirrored

motions, we align the MCs so that their main orientations

always locate at the right side (Fig. 5).

Group of Frames

Motion Saliency Map

Calculate

Voxel Variance

Fig. 2 [31] Illustration of motion saliency map calculation based on

voxel variance

270 Pattern Anal Applic (2016) 19:267–282

123

剩余15页未读，继续阅读

weixin_38608378

粉丝: 4

遗传算法优化的随机森林：一种识别人类互动的新方法

时空相关性下无线传感器网络数据融合算法优化

基于时空相关性的间歇连接社交网络路由策略

随机森林算法详解与应用

基于Copula理论与K-means算法的风光出力相关性场景生成、削减与概率分析,基于Copula理论与K-means算法的风光出力相关性场景生成与削减方法,基于Copula理论与K-means的考虑

基于时空相关性的视频超分辨率重建算法.pdf

基于遗传算法优化随机森林(GA-RF)的时间序列预测，GA-RF时间序列预测 模型评价指标包括:R2、MAE、MSE、RMS

基于时空相关性的无线传感器网络数据融合算法研究.pdf

基于时空相关性的分布式压缩感知多假设预测重构算法.pdf

基于Copula理论与K-means算法的风光出力相关性场景生成、削减与优化配置（Matlab仿真）,基于Copula理论与K-means算法的风光出力相关性考量：风光场景生成、削减与配置优化,基于C

基于时空相关性的LSTM 算法及PM2. 5 浓度预测应用.pdf

最新资源

基于遗传算法优化随机森林(GA-RF)的时间序列预测，GA-RF时间序列预测模型评价指标包括:R2、MAE、MSE、RMS