Apriori 算法进行关联规则挖掘代码
时间: 2023-12-26 22:06:03 浏览: 70
以下是一个简单的 Apriori 算法实现,用于关联规则挖掘:
```python
import itertools
def apriori(transactions, support_threshold, confidence_threshold):
# 计算每个项的支持度
items = {}
for transaction in transactions:
for item in transaction:
if item in items:
items[item] += 1
else:
items[item] = 1
# 去除不满足支持度阈值的项
items = {item: count for item, count in items.items() if count >= support_threshold}
# 构建频繁项集
frequent_itemsets = [{item} for item in items]
k = 2
while True:
candidate_itemsets = set(itertools.chain(*[set(itertools.combinations(itemset, k)) for itemset in frequent_itemsets]))
if not candidate_itemsets:
break
# 计算候选项集的支持度
item_counts = {itemset: 0 for itemset in candidate_itemsets}
for transaction in transactions:
for itemset in candidate_itemsets:
if set(itemset).issubset(transaction):
item_counts[itemset] += 1
# 去除不满足支持度阈值的候选项集
candidate_itemsets = {itemset for itemset, count in item_counts.items() if count >= support_threshold}
# 构建频繁项集
frequent_itemsets = list(candidate_itemsets)
k += 1
# 构建关联规则
rules = []
for itemset in frequent_itemsets:
for item in itemset:
antecedent = itemset - {item}
consequent = {item}
confidence = items[itemset] / items[antecedent]
if confidence >= confidence_threshold:
rules.append((antecedent, consequent, confidence))
return rules
```
这个实现接受三个参数:transactions(事务列表),support_threshold(支持度阈值)和confidence_threshold(置信度阈值)。输入的事务列表应该是一个列表,其中每个元素表示一个事务,是一个由项组成的集合。支持度和置信度阈值应该是介于0和1之间的浮点数。函数将返回一个规则列表,其中每个规则由前提、后件和置信度组成。
阅读全文