关联规则python代码
时间: 2023-06-16 19:03:06 浏览: 102
关联规则是数据挖掘中经常使用的技术,其中Apriori算法是一种常用的寻找频繁项集的方法。下面是使用Python实现Apriori算法的示例代码:
```python
def load_dataset():
"""
加载数据集
:return: 数据集
"""
dataset = [['milk', 'bread', 'eggs'],
['milk', 'bread', 'butter'],
['milk', 'bread'],
['milk', 'butter'],
['bread', 'eggs'],
['bread', 'butter'],
['eggs', 'butter'],
['milk', 'bread', 'eggs', 'butter'],
['milk']]
return dataset
def create_c1(dataset):
"""
创建候选项集C1
:param dataset: 数据集
:return: 候选项集C1
"""
c1 = []
for transaction in dataset:
for item in transaction:
if [item] not in c1:
c1.append([item])
c1.sort()
return list(map(frozenset, c1))
def scan_dataset(dataset, candidate_set, min_support):
"""
扫描数据集,计算支持度
:param dataset: 数据集
:param candidate_set: 候选项集
:param min_support: 最小支持度
:return: 支持度大于等于最小支持度的项集,以及每个项集的支持度
"""
item_count = {}
for transaction in dataset:
for candidate in candidate_set:
if candidate.issubset(transaction):
if candidate not in item_count:
item_count[candidate] = 1
else:
item_count[candidate] += 1
num_items = len(dataset)
support_data = {}
frequent_items = []
for item in item_count:
support = item_count[item] / num_items
if support >= min_support:
frequent_items.append(item)
support_data[item] = support
return frequent_items, support_data
def apriori_gen(frequent_items, k):
"""
根据频繁项集生成候选项集
:param frequent_items: 频繁项集
:param k: 项集元素个数
:return: 候选项集
"""
candidate_set = []
len_frequent_items = len(frequent_items)
for i in range(len_frequent_items):
for j in range(i + 1, len_frequent_items):
l1 = list(frequent_items[i])[:k - 2]
l2 = list(frequent_items[j])[:k - 2]
l1.sort()
l2.sort()
if l1 == l2:
candidate_set.append(frequent_items[i] | frequent_items[j])
return candidate_set
def apriori(dataset, min_support):
"""
Apriori算法
:param dataset: 数据集
:param min_support: 最小支持度
:return: 所有频繁项集以及每个项集的支持度
"""
candidate_set = create_c1(dataset)
frequent_items, support_data = scan_dataset(dataset, candidate_set, min_support)
frequent_items = [frequent_items]
k = 2
while len(frequent_items[k - 2]) > 0:
candidate_set = apriori_gen(frequent_items[k - 2], k)
frequent_items_k, support_data_k = scan_dataset(dataset, candidate_set, min_support)
support_data.update(support_data_k)
frequent_items.append(frequent_items_k)
k += 1
return frequent_items, support_data
def generate_rules(frequent_items, support_data, min_confidence):
"""
根据频繁项集和支持度生成关联规则
:param frequent_items: 频繁项集
:param support_data: 支持度
:param min_confidence: 最小置信度
:return: 关联规则以及每个规则的置信度
"""
rules_list = []
for i in range(1, len(frequent_items)):
for frequent_item in frequent_items[i]:
subsets = [frozenset(subset) for subset in combinations(frequent_item, i)]
for subset in subsets:
confidence = support_data[frequent_item] / support_data[subset]
if confidence >= min_confidence:
rules_list.append((subset, frequent_item - subset, confidence))
return rules_list
if __name__ == '__main__':
dataset = load_dataset()
min_support = 0.5
frequent_items, support_data = apriori(dataset, min_support)
min_confidence = 0.7
rules = generate_rules(frequent_items, support_data, min_confidence)
for rule in rules:
print(rule[0], '->', rule[1], 'confidence:', rule[2])
```
在这个示例代码中,我们使用了一个包含9个交易的数据集来演示Apriori算法。我们首先使用`create_c1`函数创建候选项集C1,然后使用`scan_dataset`函数扫描数据集,计算每个项集的支持度,得到频繁项集,接着使用`apriori_gen`函数根据频繁项集生成候选项集,再次使用`scan_dataset`函数扫描数据集,计算每个项集的支持度,得到频繁项集,重复这个过程,直到不再有频繁项集为止。最后,我们使用`generate_rules`函数根据频繁项集和支持度生成关联规则,并输出每个规则的置信度。
阅读全文