关联规则的实际案例代码
时间: 2024-06-11 11:02:07 浏览: 13
关联规则是一种常用的数据挖掘技术,可以在大规模数据集中挖掘出项集之间的关联关系。下面是一个简单的关联规则实际案例代码:
假设我们有一个超市的交易数据集,其中包含了每个顾客购买的商品清单。我们可以使用Apriori算法来挖掘出频繁项集及其关联规则,以发现不同商品之间的购买关系。
以下是Python代码示例:
```python
from itertools import combinations
from collections import defaultdict
def read_data(filename):
with open(filename, 'r') as f:
for line in f:
yield frozenset(line.strip().split(','))
def itemsets_from_transactions(transactions, min_support):
item_counts = defaultdict(int)
for transaction in transactions:
for item in transaction:
item_counts[item] += 1
num_transactions = len(transactions)
freq_items = []
for item, count in item_counts.items():
support = count / num_transactions
if support >= min_support:
freq_items.append(frozenset([item]))
k = 2
while len(freq_items[-1]) < k and k < len(item_counts):
candidates = generate_candidates(freq_items, k)
freq_items = filter_candidates(transactions, candidates, min_support)
k += 1
return freq_items
def generate_candidates(itemsets, k):
candidates = []
for itemset1 in itemsets:
for itemset2 in itemsets:
if len(itemset1.union(itemset2)) == k:
candidate = itemset1.union(itemset2)
if candidate not in candidates:
candidates.append(candidate)
return candidates
def filter_candidates(transactions, candidates, min_support):
item_counts = defaultdict(int)
for transaction in transactions:
for candidate in candidates:
if candidate.issubset(transaction):
item_counts[candidate] += 1
num_transactions = len(transactions)
freq_items = []
for itemset, count in item_counts.items():
support = count / num_transactions
if support >= min_support:
freq_items.append(itemset)
return freq_items
def generate_rules(freq_itemsets, min_confidence):
rules = []
for itemset in freq_itemsets:
if len(itemset) > 1:
subsets = [frozenset(x) for x in combinations(itemset, len(itemset) - 1)]
for antecedent in subsets:
consequent = itemset.difference(antecedent)
confidence = support_count[itemset] / support_count[antecedent]
if confidence >= min_confidence:
rules.append((antecedent, consequent, confidence))
return rules
if __name__ == '__main__':
transactions = read_data('transactions.txt')
min_support = 0.5
min_confidence = 0.8
freq_itemsets = itemsets_from_transactions(transactions, min_support)
support_count = defaultdict(int)
for transaction in transactions:
for itemset in freq_itemsets:
if itemset.issubset(transaction):
support_count[itemset] += 1
rules = generate_rules(freq_itemsets, min_confidence)
for antecedent, consequent, confidence in sorted(rules, key=lambda x: x, reverse=True):
print('{} => {} (Confidence: {:.2f}, Support: {:.2f})'.format(
set(antecedent), set(consequent), confidence, support_count[antecedent.union(consequent)] / len(transactions)))
```
以上代码使用Apriori算法实现了频繁项集及其关联规则的挖掘,并输出了符合最小置信度阈值的规则及其支持度和置信度。
相关推荐
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![doc](https://img-home.csdnimg.cn/images/20210720083327.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![doc](https://img-home.csdnimg.cn/images/20210720083327.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)