基于python实现FP-Growth算法,要求测试集定义“交易”为订单ID,“购买商品”为一次订单内购买的商品,
时间: 2024-05-01 21:20:19 浏览: 98
好的,首先我们需要了解什么是FP-Growth算法。
FP-Growth算法是一种用于挖掘频繁项集的算法,它通过构建FP-Tree树来快速发现频繁项集。FP-Tree树是一种前缀树,它将所有的事务按照出现频率从高到低排序,并将它们插入到树中,同时保留每个事务中出现的所有项。
基于这个思想,我们可以实现一个简单的FP-Growth算法。下面是代码实现:
```python
class FPNode:
def __init__(self, item=None, count=1):
self.item = item
self.count = count
self.parent = None
self.link = None
self.children = {}
def add_child(self, child):
if child.item not in self.children:
self.children[child.item] = child
child.parent = self
else:
self.children[child.item].count += child.count
def get_child(self, item):
return self.children.get(item, None)
def increment(self):
self.count += 1
def display(self, indent=1):
print(' ' * indent, self.item, ' ', self.count)
for child in self.children.values():
child.display(indent+1)
def build_tree(transactions, minsup):
header_table = {}
for transaction in transactions:
for item in transaction:
header_table[item] = header_table.get(item, 0) + transactions[transaction]
for k in list(header_table):
if header_table[k] < minsup:
del(header_table[k])
freq_items = set(header_table.keys())
if len(freq_items) == 0:
return None, None
for k in header_table:
header_table[k] = [header_table[k], None]
root = FPNode()
for transaction, count in transactions.items():
ordered_items = [x for x in transaction if x in freq_items]
ordered_items.sort(key=lambda x: header_table[x][0], reverse=True)
if len(ordered_items) > 0:
current_node = root
for item in ordered_items:
current_node = current_node.get_child(item) or current_node.add_child(FPNode(item, 0))
current_node.increment()
header_table[item][1] = header_table[item][1] or current_node
return root, header_table
def find_prefix_path(tree_node):
path = {}
while tree_node is not None:
prefix_path = []
current_node = tree_node
while current_node.parent is not None:
prefix_path.append(current_node.item)
current_node = current_node.parent
if len(prefix_path) > 1:
path[frozenset(prefix_path[1:])] = tree_node.count
tree_node = tree_node.link
return path
def mine_tree(tree, header_table, minsup, pre_fix, freq_item_list):
sorted_items = [x[0] for x in sorted(header_table.items(), key=lambda x: x[1][0])]
for item in sorted_items:
new_freq_set = pre_fix.copy()
new_freq_set.add(item)
freq_item_list.append(new_freq_set)
conditional_pattern_bases = find_prefix_path(header_table[item][1])
conditional_tree, conditional_header = build_tree(conditional_pattern_bases, minsup)
if conditional_header is not None:
mine_tree(conditional_tree, conditional_header, minsup,
new_freq_set, freq_item_list)
def fp_growth(transactions, minsup):
tree, header_table = build_tree(transactions, minsup)
freq_item_list = []
mine_tree(tree, header_table, minsup, set(), freq_item_list)
return freq_item_list
```
这里的`transactions`是一个字典,键为订单ID,值为该订单中购买的商品列表。我们需要将其转换成一个列表,每个元素为一个购买商品的集合。下面是一个样例:
```python
transactions = {
1: ['A', 'B', 'C', 'D'],
2: ['B', 'C', 'E'],
3: ['A', 'B', 'C', 'E'],
4: ['B', 'D', 'E'],
5: ['A', 'B', 'C', 'D', 'E'],
6: ['A', 'C', 'E']
}
transaction_list = [[item] for order, items in transactions.items() for item in items]
```
然后我们就可以调用`fp_growth`函数来找出频繁项集了:
```python
freq_itemsets = fp_growth(transaction_list, 3)
for itemset in freq_itemsets:
print(itemset)
```
这里的`minsup`参数表示最小支持度,我们可以根据需要进行调整。
阅读全文