python计算关联规则并设置提升度>1
时间: 2024-03-10 14:43:42 浏览: 165
好的,这是一个很好的问题。下面是使用Python实现Apriori算法计算关联规则并设置提升度>1的示例代码:
```python
from itertools import combinations, chain
from collections import defaultdict
def itemsets_from_transactions(transactions):
itemsets = defaultdict(int)
for transaction in transactions:
for item in transaction:
itemsets[item] += 1
return itemsets
def filter_itemsets_by_support(itemsets, min_support):
filtered_itemsets = {itemset: support for itemset, support in itemsets.items() if support >= min_support}
return filtered_itemsets
def itemsets_with_length(itemsets, length):
return {itemset for itemset in itemsets if len(itemset) == length}
def generate_candidates(itemsets, length):
item_list = sorted(list(itemsets))
return set(combinations(item_list, length))
def frequent_itemsets(transactions, min_support):
itemsets = itemsets_from_transactions(transactions)
itemsets = filter_itemsets_by_support(itemsets, min_support)
yield itemsets
for length in range(2, len(itemsets)):
itemsets = itemsets_with_length(itemsets, length - 1)
candidates = generate_candidates(itemsets, length)
itemsets = defaultdict(int)
for transaction in transactions:
for candidate in candidates:
if set(candidate).issubset(transaction):
itemsets[candidate] += 1
itemsets = filter_itemsets_by_support(itemsets, min_support)
if not itemsets:
return
yield itemsets
def rules_from_itemset(itemset, body):
head = itemset - body
if not head or not body:
return
confidence = itemsets[itemset] / itemsets[body]
lift = confidence / (itemsets[head] / len(transactions))
if lift > 1:
return (head, body, confidence, lift)
def generate_rules(frequent_itemsets, transactions, min_confidence):
itemsets = {}
for itemset in frequent_itemsets:
for item, support in itemset.items():
itemsets[(item,)] = support
for itemset in frequent_itemsets:
for length in range(2, len(itemset) + 1):
for combination in combinations(itemset, length):
itemset = set(itemset)
head = set(combination)
body = itemset - head
rule = rules_from_itemset(itemset, body)
if rule and rule not in itemsets:
yield rule
# 测试代码
transactions = [['A', 'B', 'C'], ['B', 'C'], ['A', 'B', 'D'], ['B', 'D'], ['A', 'C'], ['C', 'D']]
min_support = 2
min_confidence = 0.5
frequent_itemsets = list(frequent_itemsets(transactions, min_support))
rules = list(generate_rules(frequent_itemsets, transactions, min_confidence))
for head, body, confidence, lift in rules:
print(f'{head} => {body} (confidence: {confidence}, lift: {lift})')
```
在这个示例中,我们定义了几个辅助函数来实现Apriori算法和关联规则的计算。`frequent_itemsets`函数接受一个交易列表和最小支持度参数,返回所有频繁项集的列表。`generate_rules`函数接受频繁项集列表、交易列表和最小置信度参数,返回所有满足最小置信度和提升度>1的关联规则的列表。
我们使用一个简单的测试数据集来演示这个示例。在输出中,我们可以看到满足条件的关联规则及其置信度和提升度。
注意,这个示例中的代码实现并不完整,你可能需要针对自己的数据进行适当的修改。同时,Apriori算法在处理大数据集时可能会变得非常慢,你需要考虑优化算法或使用其他算法来处理大数据集。
阅读全文