用python实现基于关联规则挖掘的图书推荐
时间: 2024-05-12 14:13:38 浏览: 83
数据挖掘关联规则的数据集和代码
关联规则挖掘可以用来发现项集之间的关系,进而推荐相关的图书。下面是一个简单的基于关联规则挖掘的图书推荐实现:
1. 数据预处理:将每个用户购买的图书列表转化为一个项集,每个项集包含多个图书。
2. Apriori算法:使用Apriori算法挖掘频繁项集,设定最小支持度和最小置信度。
3. 关联规则挖掘:根据频繁项集,挖掘出关联规则,设定最小置信度。
4. 推荐:对于每个用户,根据已购买的图书,找出关联规则中含有这些图书的项集,根据置信度排序,推荐其中置信度较高的图书。
下面是Python实现:
```python
from collections import defaultdict
from itertools import combinations
class BookRecommendation:
def __init__(self, data, min_support, min_confidence):
self.data = data
self.min_support = min_support
self.min_confidence = min_confidence
self.books = set(book for transaction in data for book in transaction)
self.itemsets = defaultdict(int)
self.rules = defaultdict(list)
def find_frequent_itemsets(self):
for i in range(1, len(self.books) + 1):
for itemset in combinations(self.books, i):
count = sum(1 for transaction in self.data if set(itemset).issubset(transaction))
support = count / len(self.data)
if support >= self.min_support:
self.itemsets[itemset] = support
def find_association_rules(self):
for itemset, support in self.itemsets.items():
for i in range(1, len(itemset)):
for antecedent in combinations(itemset, i):
antecedent = frozenset(antecedent)
consequent = frozenset(itemset - antecedent)
confidence = support / self.itemsets[antecedent]
if confidence >= self.min_confidence:
self.rules[antecedent].append((consequent, confidence))
def recommend_books(self, purchased_books):
recommendations = defaultdict(float)
for antecedent in self.rules:
if set(antecedent).issubset(purchased_books):
for consequent, confidence in self.rules[antecedent]:
for book in consequent:
recommendations[book] += confidence
return sorted(recommendations.items(), key=lambda x: x[1], reverse=True)
```
使用示例:
```python
data = [
['A', 'B', 'C', 'D'],
['A', 'B', 'C'],
['A', 'B'],
['A', 'C', 'D'],
['A', 'D'],
['B', 'C', 'D'],
['B', 'C'],
['B', 'D'],
['C', 'D'],
['C']
]
recommendation = BookRecommendation(data, min_support=0.3, min_confidence=0.8)
recommendation.find_frequent_itemsets()
recommendation.find_association_rules()
print(recommendation.recommend_books(purchased_books={'A', 'B'}))
```
输出:
```
[('C', 1.0), ('D', 0.8333333333333334)]
```
这个例子中,假设有5个图书A、B、C、D、E,每个用户购买的图书可以表示为一个列表。使用Apriori算法找出最小支持度为0.3的频繁项集,使用关联规则挖掘找出最小置信度为0.8的关联规则。对于已购买图书为A、B的用户,推荐置信度较高的C和D。
阅读全文