Mining Colossal Frequent Patterns • F. Zhu, X. Yan, J. Han, P. S. Yu, and H. Cheng, “Mining Colossal Frequent Patterns by Core Pattern Fusion”, ICDE'07. • We have many algorithms, but can we mine large (i.e., colossal) patterns? ― such as just size around 50 to 100? Unfortunately, not! • Why not? ― the curse of “downward closure” (向下封闭) of frequent patterns • The “downward closure” property • Any sub-pattern of a frequent pattern is frequent. • Example. If (a1, a2, …, a100) is frequent, then a1, a2, …, a100, (a1, a2), (a1, a3), …, (a1, a100), (a1, a2, a3), … are all frequent! There are about 2100 such frequent itemsets! • No matter using breadth-first search (e.g., Apriori) or depth-first search (FPgrowth), we have to examine so many patterns • Thus the downward closure property leads to explosion!翻译成中文,解释
时间: 2024-04-01 22:30:43 浏览: 115
Colossal-AI_A_Unified_Deep_Learning_System_for_Bi_ColossalAI.zip
《挖掘巨大频繁模式:核心模式融合》一文中提到,我们有很多算法可以挖掘频繁模式,但是我们能否挖掘大型的(即巨大的)模式,例如大小在50到100之间的模式呢?不幸的是,我们不能!为什么不行呢?因为频繁模式的“向下封闭性”导致了这个问题。向下封闭性是指,任何频繁模式的子模式都是频繁的。例如,如果(a1,a2,…,a100)是频繁的,那么a1,a2,…,a100,(a1,a2),(a1,a3),…,(a1,a100),(a1,a2,a3),…等都是频繁的!这样的频繁项集大约有2100个!不管是使用广度优先搜索(例如Apriori)还是深度优先搜索(FPgrowth),我们都必须检查这么多的模式。因此,向下封闭性导致了模式爆炸的问题。
阅读全文