用python实现Apriori 关联规则
时间: 2024-04-30 14:18:06 浏览: 80
Apriori算法是一种用于挖掘频繁项集和关联规则的经典算法。下面是用Python实现Apriori算法的步骤:
1. 导入必要的库
```python
import itertools
from collections import defaultdict
```
2. 定义函数 `get_frequent_itemsets`,用于获取频繁项集
```python
def get_frequent_itemsets(transactions, min_support):
item_counts = defaultdict(int)
for transaction in transactions:
for item in transaction:
item_counts[item] += 1
num_items = len(item_counts)
min_count = min_support * len(transactions)
frequent_items = set(item for item, count in item_counts.items() if count >= min_count)
if len(frequent_items) == 0:
return set()
frequent_itemsets = set(frozenset([item]) for item in frequent_items)
k = 2
while True:
new_frequent_itemsets = set()
for itemset in itertools.combinations(frequent_itemsets, 2):
union = itemset[0].union(itemset[1])
if len(union) == k and union not in new_frequent_itemsets:
count = sum(1 for transaction in transactions if union.issubset(transaction))
if count >= min_count:
new_frequent_itemsets.add(union)
if len(new_frequent_itemsets) == 0:
break
frequent_itemsets = new_frequent_itemsets
k += 1
return frequent_itemsets
```
3. 定义函数 `get_association_rules`,用于获取关联规则
```python
def get_association_rules(frequent_itemsets, min_confidence):
rules = []
for itemset in frequent_itemsets:
if len(itemset) > 1:
for item in itemset:
antecedent = itemset - set([item])
consequent = set([item])
confidence = float(len(itemset)) * sum(1 for transaction in transactions if itemset.issubset(transaction)) / sum(1 for transaction in transactions if antecedent.issubset(transaction))
if confidence >= min_confidence:
rules.append((antecedent, consequent, confidence))
return rules
```
4. 定义数据集和参数
```python
transactions = [
set(['A', 'B', 'C', 'D']),
set(['A', 'B', 'C']),
set(['A', 'B', 'D']),
set(['A', 'B']),
set(['A', 'C', 'D']),
set(['A', 'C']),
set(['A', 'D']),
set(['A'])
]
min_support = 0.5
min_confidence = 0.6
```
5. 获取频繁项集和关联规则
```python
frequent_itemsets = get_frequent_itemsets(transactions, min_support)
association_rules = get_association_rules(frequent_itemsets, min_confidence)
```
6. 打印结果
```python
print('Frequent itemsets:')
for itemset in frequent_itemsets:
print(itemset)
print('\nAssociation rules:')
for rule in association_rules:
print(rule[0], '->', rule[1], ':', rule[2])
```
完整代码如下:
阅读全文