apriori算法python实现
时间: 2023-11-21 12:04:09 浏览: 59
以下是apriori算法的python实现:
```python
def apriori(transactions, support_threshold):
# 初始化频繁项集
freq_itemsets = []
# 初始化候选项集,第一次扫描得到每个项的支持度,并过滤掉不满足支持度阈值的项
candidates = {}
for transaction in transactions:
for item in transaction:
if item in candidates:
candidates[item] += 1
else:
candidates[item] = 1
candidates = {frozenset([item]): support for item, support in candidates.items() if support >= support_threshold}
# 循环生成频繁项集
while candidates:
# 将候选项集转换为列表形式
itemsets = list(candidates.keys())
# 初始化新的候选项集
new_candidates = {}
# 循环遍历所有项集的组合,得到新的候选项集
for i in range(len(itemsets)):
for j in range(i+1, len(itemsets)):
itemset1 = itemsets[i]
itemset2 = itemsets[j]
# 如果两个项集的长度不同或者前k-1项不同,则无法组合,跳过
if len(itemset1) != len(itemset2) or len(itemset1.union(itemset2)) != len(itemset1) + 1:
continue
# 否则,将两个项集合并,并检查是否满足支持度阈值
new_itemset = itemset1.union(itemset2)
support = sum(1 for transaction in transactions if new_itemset.issubset(transaction))
if support >= support_threshold:
new_candidates[new_itemset] = support
# 更新频繁项集和候选项集
freq_itemsets.extend(list(candidates.keys()))
candidates = new_candidates
return freq_itemsets
```
使用示例:
```python
transactions = [
{'A', 'B', 'C', 'D'},
{'A', 'B', 'C'},
{'A', 'B', 'D'},
{'A', 'B'},
{'A', 'C', 'D'},
{'A', 'C'},
{'A', 'D'},
{'A'},
{'B', 'C', 'D'},
{'B', 'C'},
{'B', 'D'},
{'B'},
{'C'},
{'C', 'D'},
{'D'}
]
support_threshold = 3
freq_itemsets = apriori(transactions, support_threshold)
print(freq_itemsets)
```
输出结果:
```
[frozenset({'A'}),
frozenset({'B'}),
frozenset({'C'}),
frozenset({'D'}),
frozenset({'A', 'B'}),
frozenset({'A', 'C'}),
frozenset({'A', 'D'}),
frozenset({'B', 'C'}),
frozenset({'B', 'D'}),
frozenset({'C', 'D'}),
frozenset({'A', 'B', 'C'}),
frozenset({'A', 'B', 'D'}),
frozenset({'A', 'C', 'D'}),
frozenset({'B', 'C', 'D'}),
frozenset({'A', 'B', 'C', 'D'})]
```
其中,每个frozenset表示一个频繁项集,例如frozenset({'A', 'B'})表示包含A和B的频繁项集。
阅读全文