二、关联规则算法(15分) 题目: 某个超市的销售数据中包含了不同商品的购买记录。设计一个程序,使用关联规则算法来发现商品之间的关联关系,即一起购买的频繁项集和关联规则。 要求: 设计一个程序,使用关联规则算法(如Apriori算法)来发现频繁项集和关联规则。 提供必要的输入接口,例如购买记录和最小支持度、最小置信度等参数。 输出频繁项集和关联规则 注:销售数据示例如下: ["牛奶", "面包", "鸡蛋"], ["面包", "黄油"], ["牛奶", "鸡蛋"], ["面包", "黄油"], ["牛奶", "面包", "黄油", "鸡蛋"], ["面包", "鸡蛋"], ["牛奶", "面包", "黄油"], ["牛奶", "鸡蛋"], ["牛奶", "面包", "黄油", "鸡蛋"], ["牛奶", "面包", "黄油"]
时间: 2023-07-05 17:24:51 浏览: 140
以下是一个Python实现的例子,使用Apriori算法来发现频繁项集和关联规则。该程序提供了购买记录、最小支持度和最小置信度等参数的输入接口,输出频繁项集和关联规则。
```python
def load_data():
# 销售数据
data = [["牛奶", "面包", "鸡蛋"],
["面包", "黄油"],
["牛奶", "鸡蛋"],
["面包", "黄油"],
["牛奶", "面包", "黄油", "鸡蛋"],
["面包", "鸡蛋"],
["牛奶", "面包", "黄油"],
["牛奶", "鸡蛋"],
["牛奶", "面包", "黄油", "鸡蛋"],
["牛奶", "面包", "黄油"]]
return data
def create_C1(data):
# 创建候选项集C1,即所有不同商品的集合
C1 = set()
for transaction in data:
for item in transaction:
C1.add(frozenset([item]))
return C1
def support_count(data, Ck, min_support):
# 计算候选项集Ck的支持度,并返回支持度大于等于min_support的项集及其支持度
item_count = {}
for transaction in data:
for item in Ck:
if item.issubset(transaction):
if item not in item_count:
item_count[item] = 1
else:
item_count[item] += 1
num_transactions = float(len(data))
frequent_items = []
support_data = {}
for item in item_count:
support = item_count[item] / num_transactions
if support >= min_support:
frequent_items.append(item)
support_data[item] = support
return frequent_items, support_data
def apriori_gen(Lk, k):
# 根据频繁项集Lk生成候选项集Ck+1
Ck = []
len_Lk = len(Lk)
for i in range(len_Lk):
for j in range(i+1, len_Lk):
L1 = list(Lk[i])[:k-2]
L2 = list(Lk[j])[:k-2]
L1.sort()
L2.sort()
if L1 == L2:
Ck.append(Lk[i] | Lk[j])
return Ck
def apriori(data, min_support=0.5, min_confidence=0.7):
# 使用Apriori算法发现频繁项集和关联规则
C1 = create_C1(data)
L1, support_data = support_count(data, C1, min_support)
L = [L1]
k = 2
while len(L[k-2]) > 0:
Ck = apriori_gen(L[k-2], k)
Lk, support_k = support_count(data, Ck, min_support)
support_data.update(support_k)
L.append(Lk)
k += 1
rules = []
for i in range(1, len(L)):
for freq_set in L[i]:
H1 = [frozenset([item]) for item in freq_set]
if i > 1:
rules_from_conseq(freq_set, H1, support_data, rules, min_confidence)
else:
calc_confidence(freq_set, H1, support_data, rules, min_confidence)
return L, rules
def calc_confidence(freq_set, H, support_data, rules, min_confidence):
# 计算规则的置信度
pruned_H = []
for conseq in H:
conf = support_data[freq_set] / support_data[freq_set - conseq]
if conf >= min_confidence:
print(freq_set - conseq, "-->", conseq, "conf:", conf)
rules.append((freq_set - conseq, conseq, conf))
pruned_H.append(conseq)
return pruned_H
def rules_from_conseq(freq_set, H, support_data, rules, min_confidence):
# 从H中生成规则
m = len(H[0])
if len(freq_set) > (m + 1):
Hmp1 = apriori_gen(H, m+1)
Hmp1 = calc_confidence(freq_set, Hmp1, support_data, rules, min_confidence)
if len(Hmp1) > 1:
rules_from_conseq(freq_set, Hmp1, support_data, rules, min_confidence)
if __name__ == '__main__':
data = load_data()
L, rules = apriori(data, min_support=0.2, min_confidence=0.7)
```
在上述代码中,我们使用Apriori算法来发现频繁项集和关联规则。程序首先从数据中创建候选项集C1,然后使用support_count函数计算C1的支持度,并返回支持度大于等于min_support的项集及其支持度。接下来,程序使用apriori_gen函数根据频繁项集Lk生成候选项集Ck+1,然后使用support_count函数计算Ck+1的支持度,并返回支持度大于等于min_support的项集及其支持度。程序重复这个过程,直到没有更多的频繁项集产生为止,然后使用calc_confidence和rules_from_conseq函数计算关联规则的置信度并输出。在该程序中,我们使用的最小支持度为0.2,最小置信度为0.7,您可以根据实际需要进行调整。
阅读全文