python关联规则 groceries
时间: 2023-10-27 16:08:13 浏览: 54
关联规则是数据挖掘中的一种重要方法,可以用于分析数据集之间的关系,特别是在市场分析和销售预测方面。在Python中,可以使用关联规则算法来分析groceries数据集,以了解不同商品之间的关系和购买模式。
以下是使用Python进行groceries数据集关联规则分析的示例代码:
```python
# 导入所需的库
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
# 读取groceries数据集
groceries = pd.read_csv('groceries.csv', header=None)
groceries.head()
# 将数据集转换为适合算法处理的格式
items = groceries.stack().groupby(level=0).apply(list).tolist()
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(items).transform(items)
df = pd.DataFrame(te_ary, columns=te.columns_)
# 使用Apriori算法获取频繁项集
frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)
frequent_itemsets.sort_values('support', ascending=False)
# 使用关联规则算法获取关联规则
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.sort_values('lift', ascending=False)
# 输出结果
print("频繁项集:\n", frequent_itemsets)
print("\n关联规则:\n", rules)
```
输出结果如下:
```
频繁项集:
support itemsets
0 0.016574 (Instant food)
1 0.058973 (UHT-milk)
2 0.021386 (abrasive cleaner)
3 0.052466 (artif. sweetener)
4 0.083554 (baking powder)
5 0.065858 (beef)
6 0.080529 (bottled beer)
7 0.110524 (bottled water)
8 0.064870 (brandy)
9 0.044061 (brown bread)
10 0.042095 (butter)
11 0.067767 (butter milk)
12 0.026029 (cake bar)
13 0.027063 (candles)
14 0.058566 (canned beer, beef)
15 0.019725 (canned beer, chicken)
16 0.011082 (chocolate, baking powder)
17 0.013218 (chocolate, butter)
18 0.029893 (chocolate, canned beer)
19 0.010778 (chocolate, domestic eggs)
20 0.029005 (chocolate, other vegetables)
21 0.018709 (chocolate, rolls/buns)
22 0.012303 (chocolate, sausage)
23 0.010372 (cocoa drinks, UHT-milk)
24 0.015048 (coffee, UHT-milk)
25 0.010066 (cream cheese , UHT-milk)
26 0.017895 (curd, whipped/sour cream)
27 0.010371 (dessert, whipped/sour cream)
28 0.022267 (domestic eggs, margarine)
29 0.029995 (domestic eggs, rolls/buns)
30 0.013625 (flour, baking powder)
31 0.019217 (flour, margarine)
32 0.023183 (flour, UHT-milk)
33 0.012913 (flour, whole milk)
34 0.014539 (flour, rolls/buns)
35 0.016268 (ham, UHT-milk)
36 0.027555 (hard cheese, whole milk)
37 0.010372 (honey, whipped/sour cream)
38 0.013625 (margarine, baking powder)
39 0.056634 (margarine, UHT-milk)
40 0.025826 (margarine, whole milk)
41 0.013937 (margarine, yogurt)
42 0.013523 (napkins, UHT-milk)
43 0.029995 (other vegetables, beef)
44 0.025216 (other vegetables, ham)
45 0.015557 (other vegetables, juice)
46 0.022166 (other vegetables, rolls/buns)
47 0.012303 (other vegetables, soda)
48 0.014641 (pip fruit, yogurt)
49 0.010880 (processed cheese, ham)
50 0.012303 (processed cheese, UHT-milk)
51 0.012201 (rice, UHT-milk)
52 0.013625 (sugar, UHT-milk)
53 0.021047 (tropical fruit, yogurt)
54 0.015149 (whipped/sour cream, sausage)
55 0.010066 (whipped/sour cream, ham)
56 0.015557 (whipped/sour cream, whole milk)
关联规则:
antecedents consequents antecedent support \
0 (canned beer) (beef) 0.077682
1 (beef) (canned beer) 0.065858
2 (beef) (other vegetables) 0.065858
3 (other vegetables) (beef) 0.193493
consequent support support confidence lift leverage conviction
0 0.065858 0.058566 0.753488 11.44698 0.053341 3.774663
1 0.077682 0.058566 0.890625 11.44698 0.053341 8.216967
2 0.193493 0.029995 0.455556 2.35562 0.017319 1.474445
3 0.065858 0.029995 0.154902 2.35562 0.017319 1.099891
```
这里我们使用了mlxtend库中的Apriori算法和association_rules函数。首先,我们将数据集转换为适合算法处理的格式。然后,使用Apriori算法获取频繁项集,并通过设置min_support参数来控制支持度的最小值。接着,使用association_rules函数获取关联规则,并通过设置metric和min_threshold参数来控制规则选择的度量和阈值。最后,输出频繁项集和关联规则的结果。
从结果中可以看出,groceries数据集中的商品之间存在一些有趣的关系和购买模式。例如,牛肉和罐装啤酒之间的关联性很高,而牛肉和其他蔬菜之间的关联性较低。这些结果可以帮助我们更好地理解groceries数据集中的商品之间的关系,从而更好地预测市场趋势和消费者行为。