NVM上的高效并行频繁模式挖掘：PevFP-tree算法

需积分: 5 59 浏览量更新于2024-08-26 收藏 862KB PDF 举报

非易失性存储器（NVMs）上的可扩展频繁模式挖掘是当前研究的重要领域，尤其是在大数据时代，随着DRAM（动态随机存取存储器）的局限性和能耗问题日益突出，NVMs因其能量效率和存储密度的优势逐渐成为首选。然而，现有的频繁模式挖掘算法大多针对DRAM设计，对于NVMs的特性考虑不足，导致在NVM上运行时面临写放大和能源消耗的重大挑战。传统的频繁模式挖掘算法，如Apriori、FP-Growth等，依赖于数据的频繁项集或频繁项集树结构，这在DRAM中相对高效，但在NVMs中，由于其独特的电荷存储机制（例如相变内存）和较高的写延迟，这些算法可能会遇到性能瓶颈。写放大问题源于频繁的数据写入和擦除操作，会显著缩短NVM的使用寿命；而能耗问题则由于数据迁移和计算密集型操作导致。本文提出了一种名为PevFP-tree的并行频繁模式挖掘解决方案，旨在克服在NVM环境中执行频繁模式挖掘的困难。PevFP-tree是一种针对NVM特性的优化设计，它考虑了以下关键点： 1. **适应性架构**：PevFP-tree采用了适应NVM工作模式的新型数据结构和算法，以减少不必要的写操作和能耗。这可能包括使用持久化数据结构来存储频繁模式，减少频繁擦写，并利用NVM的固有持久性。 2. **并行处理**：为了加速挖掘过程，PevFP-tree利用并行计算技术。通过分解任务和数据，可以在多个处理器核心或硬件组件上并行处理，从而降低单个节点的负载，减少整体的延迟。 3. **能耗优化**：通过智能调度和缓存策略，PevFP-tree减少了不必要的能量消耗，确保在满足性能需求的同时保持低功耗。 4. **容错性**：NVM的可靠性也是一个挑战，PevFP-tree可能包含错误检测和纠正机制，以及数据冗余，以防止数据丢失对挖掘结果的影响。 5. **可扩展性**：设计时充分考虑了系统的可扩展性，能够适应不断增长的数据集，支持在更大规模的NVM上进行高效的频繁模式挖掘。 PevFP-tree作为一种创新的NVM频繁模式挖掘方案，它不仅解决了传统方法在NVM环境中的性能瓶颈，还展示了如何通过智能设计来兼顾效率、能耗和可扩展性，为未来的绿色计算和大数据分析奠定了坚实的基础。该研究对于推动NVM技术在大数据分析领域的应用具有重要意义。

Scalable Frequent-Pattern Mining on Nonvolatile Memories

Yi Lin

1,2

, Po-Chun Huang

,DuoLiu

1,2 ∗

, and Liang Liang

Key Lab. of Dependable Service Computing in Cyber Physical Society (Chongqing Univ.), Ministry of Education

College of Computer Science, Chongqing University, China, liuduo@cqu.edu.cn

College of Communication Engineering, Chongqing University, China

Department of Computer Science and Engineering, Yuan Ze University, Taiwan, pchuang@saturn.yzu.edu.tw

Abstract— Frequent-pattern mining is a common

means to reveal the hidden trends behind data. However,

most frequent-pattern mining algorithms are designed for

DRAM, instead of the energy-economic nonvolatile mem-

ories (NVMs). Due to the huge diﬀerences between the

characteristics of NVMs and those of DRAM, existing

frequent-pattern mining algorithms suﬀer from serious

overheads of write ampliﬁcation or energy consumption

as used on NVMs. The design complexity is exaggerat-

ed when parallel computing is used to speedup the min-

ing process. This paper proposes PevFP-tree, a parallel

frequent-pattern mining solution for NVMs, e.g., phase-

change memory (PCM). By considering the NVM charac-

teristics, PevFP-tree accelerates the mining process and

enhance the energy eﬃciency. Moreover, PevFP-tree of-

fers superior scalability in terms of the degree of paral-

lelism of the mining algorithm and the branching factor of

its tree structure. The eﬃcacy of PevFP-tree is evaluated

by experiments based on realistic datasets.

I. Introduction

Recently, data mining is a highlighted technology to dis-

cover the valuable trends behind data [18]. In the data min-

ing area, frequent-pattern mining is a key problem that i-

dentiﬁes the frequent-occurring itemsets or patterns in a giv-

en dataset. While many frequent-pattern mining methods

like Apriori [4] and frequent-pattern tree (FP-tree) [8] have

been widely used, they are primarily designed for volatile and

energy-ineﬃcient dynamic random-access memory (DRAM).

To enhance the persistence and energy eﬃciency of frequent-

pattern mining, modern nonvolatile memories (NVMs) such

as phase-change memory (PCM) [9] are considerable alterna-

tives to DRAM to keep the mining metadata, so as to facili-

tate the high-performance in-memory data analytics. Unfor-

tunately, the distinct characteristics of NVMs, such as skewed

write performance and energy [9, 19], might degrade the per-

formance and energy eﬃciency of existing mining method-

s on NVMs. Although the problem could be alleviated by

jointly concerning the NVM characteristics into the design

of frequent-pattern mining methods [15], the scalability of

the mining methods is still limited where the to-be-mined

dataset is gigantic and the number of distinct data items

is huge. This paper is therefore motivated by proposing a

highly scalable method for the in-memory frequent-pattern

mining problem. To be speciﬁc, we augment an existing

frequent-pattern mining method, i.e., the popular FP-tree

approach, to utilize the performance beneﬁts of symmetric

multi-processor (SMP) parallel computing architecture [12].

Meanwhile, the NVM characteristics are jointly considered

into the design of the augmented method, so as to alleviate

the undesirable degradation of the performance and energy

eﬃciency of frequent-pattern mining.

There have been excellent work for frequent-pattern mining

problem. For example, Apriori [4] proposes to incremental-

ly generate the candidates of frequent-occurring patterns in

∗

Duo Liu is the corresponding author.

a dataset. To reduce the memory space consumption to s-

tore excessive candidate patterns (where many of them have

repetitive data items), an FP-construct method is proposed

to keep the candidates of frequent patterns in a compact tree

structure called FP-tree [8]. Based on FP-tree, several relat-

ed variants of FP-tree, including AFPIM [11], CATS tree [6],

and CanTree [14] are also presented to enhance the adaptabil-

ity of the mining tree structure. In addition, some other work

improve the scalability of frequent-pattern mining method by

implementing the methods as MapReduce algorithms on big

data computing platforms, such as Hadoop [21]. After that,

to alleviate the performance, energy, and endurance problems

of above work on NVMs, the evergreen frequent-pattern tree

(EvFP-tree) jointly considers the intrinsic characteristics of

popular NVMs, such as PCM, to optimize the performance,

energy eﬃciency, and NVM lifetime of the tree construction

process [15]. However, it remains a missing piece on how

to enable highly-scalable, high-performance, energy-eﬃcient

frequent-pattern mining on NVMs, and this paper aims to

ﬁll up the gap.

There have been numerous NVM technologies proposed

as high-performance, energy-economic choices of storage or

memory media. NAND ﬂash memory is widely used as the

storage medium in embedded systems or personal comput-

ers for its high density (as compared to other NVMs) and

fast performance (as compared to hard disk) [10]. Diﬀerent

from other NVMs, the basic unit of a read/write operation

of NAND ﬂash memory is page [10]. Meanwhile, there exist

byte-addressable NVMs, such as PCM [5, 17, 16] and STT-

RAM [19]. Although diﬀerent NVMs come with diﬀerent

intrinsic characteristics, some common characteristics exist:

(a) A write operation typically takes more (7X–10X) time and

energy than a read one on most NVMs [5]. It is therefore key

to reduce the NVM writes so as to optimize the performance

and energy eﬃciency of the NVM system. (b) For many N-

VMs, an NVM cell endures only a limited number of writes

before it becomes worn out. In other words, an NVM cell

has limited lifetime of writes. For example, the lifetime of a

PCM cell is typically 10

–10

writes [5]. Thus, the NVM cells

should be equally utilized by wear-leveling facility to endure

a reasonable lifetime of the system. Although NVMs provide

outstanding access performance and energy consumption as

compared with DRAM or hard disk, revision of existing algo-

rithms designed for DRAM or hard disk is often necessary to

adapt to the intrinsic characteristics of NVMs, so as to fully

exploit the advantages of NVMs.

This paper proposes parallel evergreen frequent-pattern

tree (PevFP-tree), a highly scalable frequent-pattern mining

method for byte-addressable NVMs. In particular, to ac-

celerate the frequent-pattern mining process, the huge-scale

dataset is partitioned into multiple parts, where each part

of the dataset can be mined by a processor/core in paral-

lel to accelerate the mining process. The mining results of

each part of the dataset will be kept in a separate FP-tree

called local FP-tree. We then propose an eﬃcient link-merge

algorithm to merge all local FP-trees into one global FP-tree,

7A-4

578

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38538312

粉丝: 11
资源: 927

NVM上的高效并行频繁模式挖掘：PevFP-tree算法

非易失性存储器耐久和数据保持试验方法（编制说明）.doc.doc

如何给汽车系统选择合适的非易失性存储器

AEC-Q100-005D1：2012 非易失性存储器写入/擦除耐久性、数据保留和操作寿命测试 - 完整英文电子版（14页）

从 I/O 开销的角度解释易失性存储器、非易失性存储器和稳定存储器的区别

半导体存储器是一种非易失性存储器吗

非易失性存储器nvm

非易失性的存储器是什么意思

半导体存储器是一种易失性存储器吗

简述非易失性存储器数据丢失的数据库恢复方法

esp32怎么将配对信息保存在非易失性存储器中

最新资源