关联规则算法实现的代码
时间: 2024-04-14 19:25:02 浏览: 11
关联规则算法是一种用于挖掘数据集中项之间关联关系的方法。其中最著名的算法是Apriori算法。下面是一个简单的Apriori算法的实现代码示例:
```python
# 导入所需的库
from itertools import combinations
# 定义函数来生成候选项集
def generate_candidates(data, k):
candidates = set()
for transaction in data:
for itemset in combinations(transaction, k):
candidates.add(itemset)
return candidates
# 定义函数来计算项集的支持度
def calculate_support(data, candidates, min_support):
support_counts = {}
for transaction in data:
for candidate in candidates:
if set(candidate).issubset(set(transaction)):
support_counts[candidate] = support_counts.get(candidate, 0) + 1
num_transactions = len(data)
frequent_items = []
for candidate, count in support_counts.items():
support = count / num_transactions
if support >= min_support:
frequent_items.append((candidate, support))
return frequent_items
# 定义函数来生成关联规则
def generate_rules(frequent_items, min_confidence):
rules = []
for itemset, support in frequent_items:
if len(itemset) > 1:
for i in range(1, len(itemset)):
for antecedent in combinations(itemset, i):
consequent = tuple(set(itemset) - set(antecedent))
confidence = support / get_support(frequent_items, antecedent)
if confidence >= min_confidence:
rules.append((antecedent, consequent, confidence))
return rules
# 定义函数来获取项集的支持度
def get_support(frequent_items, itemset):
for item, support in frequent_items:
if item == itemset:
return support
return 0
# 示例数据集
data = [
['A', 'B', 'C'],
['A', 'B'],
['A', 'C'],
['B', 'C'],
['A', 'B', 'C', 'D'],
['B', 'D']
]
# 设置最小支持度和最小置信度
min_support = 0.3
min_confidence = 0.6
# 使用Apriori算法挖掘关联规则
frequent_items = []
k = 1
while True:
candidates = generate_candidates(data, k)
frequent_items_k = calculate_support(data, candidates, min_support)
if len(frequent_items_k) == 0:
break
frequent_items.extend(frequent_items_k)
k += 1
rules = generate_rules(frequent_items, min_confidence)
# 打印结果
print("频繁项集:")
for itemset, support in frequent_items:
print(itemset, "Support:", support)
print("\n关联规则:")
for antecedent, consequent, confidence in rules:
print(antecedent, "=>", consequent, "Confidence:", confidence)
```
这段代码实现了Apriori算法,通过给定的数据集、最小支持度和最小置信度,可以找到频繁项集和关联规则。你可以根据自己的数据集和需求进行调整和使用。