Scalable Frequent-Pattern Mining on Nonvolatile Memories
Yi Lin
1,2
, Po-Chun Huang
4
,DuoLiu
1,2 ∗
, and Liang Liang
3
1
Key Lab. of Dependable Service Computing in Cyber Physical Society (Chongqing Univ.), Ministry of Education
2
College of Computer Science, Chongqing University, China, liuduo@cqu.edu.cn
3
College of Communication Engineering, Chongqing University, China
4
Department of Computer Science and Engineering, Yuan Ze University, Taiwan, pchuang@saturn.yzu.edu.tw
Abstract— Frequent-pattern mining is a common
means to reveal the hidden trends behind data. However,
most frequent-pattern mining algorithms are designed for
DRAM, instead of the energy-economic nonvolatile mem-
ories (NVMs). Due to the huge differences between the
characteristics of NVMs and those of DRAM, existing
frequent-pattern mining algorithms suffer from serious
overheads of write amplification or energy consumption
as used on NVMs. The design complexity is exaggerat-
ed when parallel computing is used to speedup the min-
ing process. This paper proposes PevFP-tree, a parallel
frequent-pattern mining solution for NVMs, e.g., phase-
change memory (PCM). By considering the NVM charac-
teristics, PevFP-tree accelerates the mining process and
enhance the energy efficiency. Moreover, PevFP-tree of-
fers superior scalability in terms of the degree of paral-
lelism of the mining algorithm and the branching factor of
its tree structure. The efficacy of PevFP-tree is evaluated
by experiments based on realistic datasets.
I. Introduction
Recently, data mining is a highlighted technology to dis-
cover the valuable trends behind data [18]. In the data min-
ing area, frequent-pattern mining is a key problem that i-
dentifies the frequent-occurring itemsets or patterns in a giv-
en dataset. While many frequent-pattern mining methods
like Apriori [4] and frequent-pattern tree (FP-tree) [8] have
been widely used, they are primarily designed for volatile and
energy-inefficient dynamic random-access memory (DRAM).
To enhance the persistence and energy efficiency of frequent-
pattern mining, modern nonvolatile memories (NVMs) such
as phase-change memory (PCM) [9] are considerable alterna-
tives to DRAM to keep the mining metadata, so as to facili-
tate the high-performance in-memory data analytics. Unfor-
tunately, the distinct characteristics of NVMs, such as skewed
write performance and energy [9, 19], might degrade the per-
formance and energy efficiency of existing mining method-
s on NVMs. Although the problem could be alleviated by
jointly concerning the NVM characteristics into the design
of frequent-pattern mining methods [15], the scalability of
the mining methods is still limited where the to-be-mined
dataset is gigantic and the number of distinct data items
is huge. This paper is therefore motivated by proposing a
highly scalable method for the in-memory frequent-pattern
mining problem. To be specific, we augment an existing
frequent-pattern mining method, i.e., the popular FP-tree
approach, to utilize the performance benefits of symmetric
multi-processor (SMP) parallel computing architecture [12].
Meanwhile, the NVM characteristics are jointly considered
into the design of the augmented method, so as to alleviate
the undesirable degradation of the performance and energy
efficiency of frequent-pattern mining.
There have been excellent work for frequent-pattern mining
problem. For example, Apriori [4] proposes to incremental-
ly generate the candidates of frequent-occurring patterns in
∗
Duo Liu is the corresponding author.
a dataset. To reduce the memory space consumption to s-
tore excessive candidate patterns (where many of them have
repetitive data items), an FP-construct method is proposed
to keep the candidates of frequent patterns in a compact tree
structure called FP-tree [8]. Based on FP-tree, several relat-
ed variants of FP-tree, including AFPIM [11], CATS tree [6],
and CanTree [14] are also presented to enhance the adaptabil-
ity of the mining tree structure. In addition, some other work
improve the scalability of frequent-pattern mining method by
implementing the methods as MapReduce algorithms on big
data computing platforms, such as Hadoop [21]. After that,
to alleviate the performance, energy, and endurance problems
of above work on NVMs, the evergreen frequent-pattern tree
(EvFP-tree) jointly considers the intrinsic characteristics of
popular NVMs, such as PCM, to optimize the performance,
energy efficiency, and NVM lifetime of the tree construction
process [15]. However, it remains a missing piece on how
to enable highly-scalable, high-performance, energy-efficient
frequent-pattern mining on NVMs, and this paper aims to
fill up the gap.
There have been numerous NVM technologies proposed
as high-performance, energy-economic choices of storage or
memory media. NAND flash memory is widely used as the
storage medium in embedded systems or personal comput-
ers for its high density (as compared to other NVMs) and
fast performance (as compared to hard disk) [10]. Different
from other NVMs, the basic unit of a read/write operation
of NAND flash memory is page [10]. Meanwhile, there exist
byte-addressable NVMs, such as PCM [5, 17, 16] and STT-
RAM [19]. Although different NVMs come with different
intrinsic characteristics, some common characteristics exist:
(a) A write operation typically takes more (7X–10X) time and
energy than a read one on most NVMs [5]. It is therefore key
to reduce the NVM writes so as to optimize the performance
and energy efficiency of the NVM system. (b) For many N-
VMs, an NVM cell endures only a limited number of writes
before it becomes worn out. In other words, an NVM cell
has limited lifetime of writes. For example, the lifetime of a
PCM cell is typically 10
6
–10
9
writes [5]. Thus, the NVM cells
should be equally utilized by wear-leveling facility to endure
a reasonable lifetime of the system. Although NVMs provide
outstanding access performance and energy consumption as
compared with DRAM or hard disk, revision of existing algo-
rithms designed for DRAM or hard disk is often necessary to
adapt to the intrinsic characteristics of NVMs, so as to fully
exploit the advantages of NVMs.
This paper proposes parallel evergreen frequent-pattern
tree (PevFP-tree), a highly scalable frequent-pattern mining
method for byte-addressable NVMs. In particular, to ac-
celerate the frequent-pattern mining process, the huge-scale
dataset is partitioned into multiple parts, where each part
of the dataset can be mined by a processor/core in paral-
lel to accelerate the mining process. The mining results of
each part of the dataset will be kept in a separate FP-tree
called local FP-tree. We then propose an efficient link-merge
algorithm to merge all local FP-trees into one global FP-tree,
978-1-5090-1558-0/17/$31.00 ©2017 IEEE
7A-4
578