apriori算法python应用案例代码
时间: 2023-07-16 13:12:07 浏览: 48
以下是一个简单的 Python 实现 Apriori 算法的例子:
```python
def load_dataset():
return [['bread', 'milk', 'cheese'],
['bread', 'diapers', 'eggs'],
['milk', 'diapers', 'beer', 'cola'],
['bread', 'milk', 'diapers', 'beer'],
['bread', 'cola', 'diapers']]
def create_candidates(data_set, k):
candidates = []
for i in range(len(data_set)):
for j in range(i+1, len(data_set)):
l1 = list(data_set[i])[:k-2]
l2 = list(data_set[j])[:k-2]
l1.sort()
l2.sort()
if l1 == l2:
candidates.append(data_set[i] | data_set[j])
return candidates
def scan_data(data_set, candidates, min_support):
freq_set = {}
for candidate in candidates:
for transaction in data_set:
if candidate.issubset(transaction):
freq_set[candidate] = freq_set.get(candidate, 0) + 1
num_items = float(len(data_set))
ret_list = []
support_data = {}
for key in freq_set:
support = freq_set[key] / num_items
if support >= min_support:
ret_list.insert(0, key)
support_data[key] = support
return ret_list, support_data
def apriori(data_set, min_support=0.5):
candidates = [frozenset([item]) for item in data_set[0]]
freq_sets = []
support_data = {}
k = 2
while len(candidates) > 0:
ret_list, support_data = scan_data(data_set, candidates, min_support)
freq_sets.append(ret_list)
candidates = create_candidates(ret_list, k)
k += 1
return freq_sets, support_data
data_set = load_dataset()
freq_sets, support_data = apriori(data_set, min_support=0.4)
print(freq_sets)
```
这个例子使用一个内置数据集,并且通过 `load_dataset()` 函数加载。 `create_candidates()` 函数创建候选项集, `scan_data()` 函数扫描数据集并计算每个项集的支持度, `apriori()` 函数实现了整个算法并返回频繁项集和支持度数据。最后,我们使用 `min_support` 参数设置最小支持度阈值,并打印频繁项集。