from apriori import * # 编写代码实现关联规则抽取 dataset = load_data('data/apriori.txt') # 定义 generate_big_rules函数来获取关联规则 def generate_big_rules(L, support_data, min_conf): big_rule_list = [] sub_set_list = [] for i in range(0, len(L)): for freq_set in L[i]: # freq_set:('B4')、('B4', 'C4', 'H4') for sub_set in sub_set_list: #**************** BEGIN ***************** if sub_set.issubset(freq_set): # 计算置信度 # 前件、后件、支持度、置信度 #**************** END ******************* sub_set_list.append(freq_set) return big_rule_list def task(): L, support_data = generate_L(dataset, k=4, min_support=0.06) # 根据频繁项集寻找关联规则,设置置信度为 0.75 big_rules_list = generate_big_rules(L, support_data, min_conf=0.75) return big_rules_list
时间: 2023-07-19 20:55:35 浏览: 124
这段代码的功能是实现关联规则抽取,具体步骤如下:
1. 使用apriori.py中的load_data方法从文件中读入经过离散化处理后的数据集。
2. 定义generate_big_rules函数来获取关联规则。该函数的输入参数包括频繁项集L、支持度信息support_data和最小置信度min_conf。
3. 在函数中,通过两层循环遍历频繁项集L,对于每个频繁项freq_set,再遍历它的子集sub_set_list,判断子集是否是freq_set的子集,如果是,则计算置信度并保存关联规则。
4. 最后返回关联规则列表big_rule_list。
5. 在task函数中,调用generate_L方法得到频繁项集L和支持度信息support_data,然后调用generate_big_rules方法得到关联规则列表big_rules_list,并将其作为返回值。
需要注意的是,该代码中的generate_L函数没有给出,因此可能需要在其他地方进行定义。
相关问题
def create_C1(dataset): C1 = [] for transaction in dataset: for item in transaction: if not [item] in C1: C1.append([item]) C1.sort() return list(map(frozenset, C1)) # print(len(create_C1(dataset))) def scan_D(D, Ck, min_support): ssCnt = {} for tid in D: for can in Ck: if can.issubset(tid): if not can in ssCnt: ssCnt[can] = 1 else: ssCnt[can] += 1 num_items = float(len(D)) retList = [] support_data = {} for key in ssCnt: support = ssCnt[key] / num_items if support >= min_support: retList.insert(0, key) support_data[key] = support return retList, support_data def apriori_gen(Lk, k): retList = [] len_Lk = len(Lk) for i in range(len_Lk): for j in range(i + 1, len_Lk): L1 = list(Lk[i])[:k - 2] L2 = list(Lk[j])[:k - 2] L1.sort() L2.sort() if L1 == L2: retList.append(Lk[i] | Lk[j]) return retList def apriori(dataset, min_support): C1 = create_C1(dataset) D = list(map(set, dataset)) L1, support_data = scan_D(D, C1, min_support) L = [L1] k = 2 while len(L[k - 2]) > 0: Ck = apriori_gen(L[k - 2], k) Lk, support_k = scan_D(D, Ck, min_support) support_data.update(support_k) L.append(Lk) k += 1 return L, support_data如果我想在这段代码中加入置信度,怎么加
在这段代码中加入置信度,实际上是要实现关联规则算法。在Apriori算法的基础上,可以通过以下步骤来实现关联规则算法:
1. 从频繁项集L中生成候选规则集合,即对于每个频繁项集,生成包含该项集中所有元素的所有子集。
2. 对于每个候选规则,计算其置信度,并筛选出满足最小置信度要求的规则。
下面是修改后的代码实现:
```python
def generate_rules(L, support_data, min_conf):
rules_list = []
for i in range(1, len(L)):
for freq_set in L[i]:
H1 = [frozenset([item]) for item in freq_set]
if i > 1:
rules_from_conseq(freq_set, H1, support_data, rules_list, min_conf)
else:
calc_conf(freq_set, H1, support_data, rules_list, min_conf)
return rules_list
def calc_conf(freq_set, H, support_data, brl, min_conf):
pruned_h = []
for conseq in H:
conf = support_data[freq_set] / support_data[freq_set - conseq]
if conf >= min_conf:
print(freq_set - conseq, '-->', conseq, 'conf:', conf)
brl.append((freq_set - conseq, conseq, conf))
pruned_h.append(conseq)
return pruned_h
def rules_from_conseq(freq_set, H, support_data, brl, min_conf):
m = len(H[0])
if len(freq_set) > (m + 1):
Hmp1 = apriori_gen(H, m + 1)
Hmp1 = calc_conf(freq_set, Hmp1, support_data, brl, min_conf)
if len(Hmp1) > 1:
rules_from_conseq(freq_set, Hmp1, support_data, brl, min_conf)
def apriori(dataset, min_support, min_conf):
C1 = create_C1(dataset)
D = list(map(set, dataset))
L1, support_data = scan_D(D, C1, min_support)
L = [L1]
k = 2
while len(L[k - 2]) > 0:
Ck = apriori_gen(L[k - 2], k)
Lk, support_k = scan_D(D, Ck, min_support)
support_data.update(support_k)
L.append(Lk)
k += 1
rules_list = generate_rules(L, support_data, min_conf)
return L, support_data, rules_list
```
其中,generate_rules函数用于生成关联规则,calc_conf函数用于计算规则的置信度,rules_from_conseq函数用于从频繁项集中生成候选规则,apriori函数用于调用Apriori算法和关联规则算法,并返回频繁项集、支持度数据和关联规则列表。在调用apriori函数时,需要传入最小支持度和最小置信度阈值。
Given the following transaction record Transaction Records Transaction ID Items #1 apple, banana, coca-cola, doughnut #2 banana, coco-cola #3 banana, doughnut #4 apple, coca-cola #5 apple, banana, doughnut #6 apple, banana, coca-cola Build the FP-tree using a minimum support min_sup = 2. Show how the tree evolves for each transaction. Use the FP-Growth algorithm to discover frequent itemsets from the FP-tree. With the previous transaction record, Use the Apriori algorithm on this dataset and verify that it will generate the same set of frequent itemsets with min_sup = 2. Suppose that { Apple, Banana, Doughnut } is a frequent item set, derive all its association rules with min_confidence = 70%
Building the FP-tree:
Transaction ID #1: apple, banana, coca-cola, doughnut
```
root
|
a
|
p
|
p - b
| |
| c
| |
| d
```
Transaction ID #2: banana, coca-cola
```
root
|
a
|
p - b - c
| |
| d
```
Transaction ID #3: banana, doughnut
```
root
|
a
|
p - b - c
| | |
| | d
| |
| d
```
Transaction ID #4: apple, coca-cola
```
root
|
a - c
| |
| p - b - c
| | |
| | d
| |
| d
```
Transaction ID #5: apple, banana, doughnut
```
root
|
a - b - d
| | |
| | c
| |
| p - b - c
| |
| d
|
b - d
|
c
```
Transaction ID #6: apple, banana, coca-cola
```
root
|
a - b - c
| | |
| | d
| |
| p - b - c
| |
| d
|
b - d
|
c
```
Using the FP-Growth algorithm to discover frequent itemsets:
Starting with the most frequent item (d):
- d (4)
- b-d (3)
- c-b-d (2)
- a-b-d (2)
- a-p-b-d (2)
Next, starting with the next most frequent item (b):
- b (4)
- a-b (3)
- p-b (3)
- c-b (2)
- a-p-b (2)
- c-b-d (2)
- a-b-d (2)
- a-p-b-d (2)
Finally, starting with the least frequent item (c):
- c (3)
- b-c (2)
- a-b-c (2)
- p-b-c (2)
- c-b-d (2)
- a-b-d (2)
- a-p-b-d (2)
All sets of frequent itemsets with minimum support of 2 are:
- {d} (4)
- {b} (4)
- {c} (3)
- {a, d} (2)
- {b, d} (3)
- {p, b, d} (2)
- {c, b, d} (2)
- {a, b, d} (2)
- {a, p, b, d} (2)
- {a, b} (3)
- {p, b} (3)
- {c, b} (2)
- {a, p, b} (2)
- {c, b, d} (2)
- {a, b, d} (2)
- {a, p, b, d} (2)
- {a, c, b} (2)
- {p, c, b} (2)
- {a, p, c, b} (2)
Using the Apriori algorithm to verify the frequent itemsets with minimum support of 2:
Starting with 1-itemsets:
- {apple} (3)
- {banana} (4)
- {coca-cola} (3)
- {doughnut} (4)
Next, starting with 2-itemsets:
- {apple, banana} (2)
- {apple, coca-cola} (1)
- {apple, doughnut} (2)
- {banana, coca-cola} (2)
- {banana, doughnut} (2)
- {coca-cola, doughnut} (2)
Finally, starting with 3-itemsets:
- {apple, banana, doughnut} (2)
All sets of frequent itemsets with minimum support of 2 are:
- {banana} (4)
- {doughnut} (4)
- {apple} (3)
- {coca-cola} (3)
- {banana, doughnut} (2)
- {apple, doughnut} (2)
- {apple, banana} (2)
- {banana, coca-cola} (2)
- {coca-cola, doughnut} (2)
- {apple, banana, doughnut} (2)
The Apriori algorithm generates the same set of frequent itemsets with minimum support of 2 as the FP-Growth algorithm.
Deriving all association rules with 70% minimum confidence for the frequent itemset {Apple, Banana, Doughnut}:
First, find all the subsets of {Apple, Banana, Doughnut}:
- {Apple, Banana}
- {Apple, Doughnut}
- {Banana, Doughnut}
- {Apple}
- {Banana}
- {Doughnut}
Next, calculate the confidence for each rule:
- {Apple, Banana} -> {Doughnut} (2/2 = 100%)
- {Apple, Doughnut} -> {Banana} (2/2 = 100%)
- {Banana, Doughnut} -> {Apple} (2/2 = 100%)
- {Apple} -> {Banana, Doughnut} (2/3 = 67%)
- {Banana} -> {Apple, Doughnut} (2/4 = 50%)
- {Doughnut} -> {Apple, Banana} (2/4 = 50%)
All association rules with minimum confidence of 70% for the frequent itemset {Apple, Banana, Doughnut} are:
- {Apple, Banana} -> {Doughnut}
- {Apple, Doughnut} -> {Banana}
- {Banana, Doughnut} -> {Apple}
阅读全文