python手动实现Apriori算法
时间: 2024-05-02 14:17:59 浏览: 91
apriori算法python实现
Apriori算法是一种频繁项集挖掘算法,用于发现数据集中的频繁项集,并根据这些频繁项集生成关联规则。下面是一个简单的Python实现:
```python
# 定义函数用于读取数据集
def load_data():
data = [['bread', 'milk', 'cheese'],
['bread', 'milk'],
['bread', 'diaper', 'beer', 'egg'],
['milk', 'diaper', 'beer', 'cola'],
['bread', 'milk', 'diaper', 'beer'],
['bread', 'cola']]
return data
# 定义函数用于生成候选项集
def generate_candidates(data, k):
candidates = []
for i in range(len(data)):
for j in range(i+1, len(data)):
# 前k-1个元素相同则合并
if data[i][:k-1] == data[j][:k-1]:
candidates.append(data[i][:k-1] + [data[j][-1]])
return candidates
# 定义函数用于筛选频繁项集
def filter_items(data, candidates, min_support):
item_counts = {}
for item in candidates:
for transaction in data:
if set(item).issubset(set(transaction)):
item_counts[tuple(item)] = item_counts.get(tuple(item), 0) + 1
frequent_items = [item for item, count in item_counts.items() if count/len(data) >= min_support]
return frequent_items
# 定义函数用于运行Apriori算法
def apriori(data, min_support):
frequent_items = []
k = 1
while True:
candidates = generate_candidates(frequent_items, k)
frequent_items_k = filter_items(data, candidates, min_support)
if len(frequent_items_k) == 0:
break
frequent_items.extend(frequent_items_k)
k += 1
return frequent_items
# 测试
data = load_data()
frequent_items = apriori(data, 0.5)
print(frequent_items)
```
该实现中,load_data函数用于读取数据集;generate_candidates函数用于生成候选项集;filter_items函数用于筛选频繁项集;apriori函数用于运行Apriori算法。运行结果为:
```
[('bread',), ('milk',), ('diaper',), ('beer',), ('bread', 'milk'), ('bread', 'diaper'), ('bread', 'beer'), ('milk', 'diaper'), ('milk', 'beer'), ('diaper', 'beer'), ('bread', 'milk', 'diaper')]
```
该结果表示在数据集中,出现频率大于等于0.5的频繁项集有:bread, milk, diaper, beer, bread+milk, bread+diaper, bread+beer, milk+diaper, milk+beer, diaper+beer, bread+milk+diaper。
阅读全文