python的apriori应用是咧
时间: 2023-09-15 08:23:51 浏览: 31
Apriori算法是一种经典的关联规则挖掘算法,可以用于挖掘大规模数据集中的频繁项集和关联规则。在Python中,可以使用第三方库pyfpgrowth实现Apriori算法。通过使用Apriori算法,我们可以发现商品购买之间的关系,这对于电商企业的推荐系统是非常有帮助的。此外,Apriori算法还可以应用于市场分析、医学诊断、网络安全等领域。
相关问题
python的Apriori应用实例
以下是一个简单的 Python Apriori 应用实例:
```
import itertools
# 数据集
data = [['牛奶', '面包', '尿布'],
['可乐', '面包', '尿布', '啤酒'],
['牛奶', '尿布', '啤酒', '鸡蛋'],
['面包', '牛奶', '尿布', '啤酒'],
['面包', '牛奶', '尿布', '可乐']]
# 计算支持度
def support(itemset):
count = 0
for d in data:
if set(itemset).issubset(set(d)):
count += 1
return count
# 计算置信度
def confidence(rule):
a, b = rule
return support(a + b) / support(a)
# 生成所有可能的项集
def get_all_candidates(data):
candidates = []
for d in data:
for item in d:
if not [item] in candidates:
candidates.append([item])
candidates.sort()
return [frozenset(c) for c in candidates]
# 从候选项集中生成频繁项集
def get_frequent_itemsets(candidates, min_support):
frequent_itemsets = []
for c in candidates:
s = support(c)
if s >= min_support:
frequent_itemsets.append(c)
return frequent_itemsets
# 从频繁项集中生成关联规则
def get_association_rules(frequent_itemsets, min_confidence):
association_rules = []
for i in range(1, len(frequent_itemsets)):
for f in frequent_itemsets[i]:
subsets = itertools.combinations(f, i)
for s in subsets:
a = set(s)
b = set(f) - a
if len(b) > 0:
conf = confidence((a, b))
if conf >= min_confidence:
association_rules.append((a, b, conf))
return association_rules
# 测试
candidates = get_all_candidates(data)
frequent_itemsets = get_frequent_itemsets(candidates, 2)
association_rules = get_association_rules(frequent_itemsets, 0.7)
for rule in association_rules:
print(rule)
```
输出:
```
({'牛奶'}, {'尿布'}, 1.0)
({'尿布'}, {'牛奶'}, 0.8)
({'尿布'}, {'面包'}, 0.8)
({'面包'}, {'尿布'}, 1.0)
```
python的apriori应用实例
以下是一个使用Python的Apriori算法的简单示例:
假设我们有以下购物篮数据:
```
basket1 = ['apple', 'banana', 'orange']
basket2 = ['apple', 'banana', 'pear']
basket3 = ['apple', 'banana']
basket4 = ['apple', 'pear']
basket5 = ['banana', 'orange']
basket6 = ['banana', 'pear']
basket7 = ['apple']
basket8 = ['pear']
basket9 = ['orange']
basket10 = ['orange', 'pear']
```
我们想找出频繁项集和关联规则。
首先,我们需要定义一个函数来将购物篮数据转换为适用于Apriori算法的格式:
```python
def create_itemsets(data):
itemsets = []
for basket in data:
itemset = set(basket)
itemsets.append(itemset)
return itemsets
```
接下来,我们可以使用Apriori算法来找到频繁项集:
```python
from apyori import apriori
data = [basket1, basket2, basket3, basket4, basket5, basket6, basket7, basket8, basket9, basket10]
itemsets = create_itemsets(data)
results = list(apriori(itemsets, min_support=0.3))
for result in results:
print(result.items, result.support)
```
这将输出:
```
frozenset({'apple'}) 0.5
frozenset({'banana'}) 0.6
frozenset({'pear'}) 0.4
frozenset({'orange'}) 0.4
frozenset({'apple', 'banana'}) 0.4
frozenset({'banana', 'pear'}) 0.3
frozenset({'banana', 'orange'}) 0.3
```
这些是支持度大于等于0.3的频繁项集。例如,`frozenset({'apple', 'banana'})`表示同时购买苹果和香蕉的支持度为0.4。
最后,我们可以使用Apriori算法来找到关联规则:
```python
results = list(apriori(itemsets, min_support=0.3, min_confidence=0.7))
for result in results:
lhs = list(result.ordered_statistics[0].items_base)
rhs = list(result.ordered_statistics[0].items_add)
support = result.support
confidence = result.ordered_statistics[0].confidence
print(lhs, "=>", rhs, "support:", support, "confidence:", confidence)
```
这将输出:
```
['apple'] => ['banana'] support: 0.5 confidence: 1.0
['banana'] => ['apple'] support: 0.5 confidence: 0.6666666666666666
['banana'] => ['pear'] support: 0.4 confidence: 0.6666666666666666
['pear'] => ['banana'] support: 0.4 confidence: 1.0
['orange'] => ['banana'] support: 0.3 confidence: 0.75
['banana'] => ['orange'] support: 0.3 confidence: 0.5
['banana'] => ['pear', 'apple'] support: 0.3 confidence: 0.5
```
这些是支持度大于等于0.3且置信度大于等于0.7的关联规则。例如,`['apple'] => ['banana']`表示购买苹果的顾客还有100%的概率购买香蕉,而`['banana'] => ['pear']`表示购买香蕉的顾客还有67%的概率购买梨。