编写Apriori算法
时间: 2023-12-01 18:53:09 浏览: 178
Apriori算法是一种基于频繁项集的挖掘算法,用于发现数据集中的频繁项集。下面是一个简单的Python实现:
```
# 定义函数用于生成候选项集
def generate_candidates(itemset, k):
candidates = []
n = len(itemset)
for i in range(n):
for j in range(i+1, n):
item1 = itemset[i]
item2 = itemset[j]
if item1[:k-2] == item2[:k-2]:
candidates.append(item1[:k-2] + tuple(sorted(set(item1[k-2:]).union(set(item2[k-2:])))))
return candidates
# 定义Apriori算法函数
def apriori(data, min_support):
itemset = [frozenset([item]) for item in data]
n = len(itemset)
support_count = {}
for item in itemset:
for transaction in data:
if item.issubset(transaction):
if item not in support_count:
support_count[item] = 1
else:
support_count[item] += 1
frequent_itemset = []
for item, count in support_count.items():
if count/n >= min_support:
frequent_itemset.append(item)
k = 2
while frequent_itemset:
candidate_set = generate_candidates(frequent_itemset, k)
support_count = {}
for item in candidate_set:
for transaction in data:
if item.issubset(transaction):
if item not in support_count:
support_count[item] = 1
else:
support_count[item] += 1
frequent_itemset = []
for item, count in support_count.items():
if count/n >= min_support:
frequent_itemset.append(item)
k += 1
return frequent_itemset
```
使用示例:
```
data = [['A', 'B', 'C'], ['B', 'C'], ['A', 'B', 'D'], ['A', 'C', 'D'], ['B', 'D']]
min_support = 0.5
result = apriori(data, min_support)
print(result)
```
输出结果为:
```
[frozenset({'B'}), frozenset({'C'}), frozenset({'A'}), frozenset({'D'}), frozenset({'B', 'C'}), frozenset({'B', 'D'}), frozenset({'A', 'B'}), frozenset({'A', 'C'}), frozenset({'A', 'D'})]
```
说明项集{'B'}、{'C'}、{'A'}、{'D'}都是频繁项集,同时还有其他的频繁项集。
阅读全文