apriori算法的代码
时间: 2023-06-25 21:05:55 浏览: 66
以下是 Apriori 算法的 Python 代码实现:
```python
def apriori(transactions, min_support):
"""
Generate frequent itemsets using the Apriori algorithm.
Parameters:
transactions: A list of transactions. Each transaction is a list of itemsets.
min_support: The minimum support threshold.
Returns:
A dictionary of frequent itemsets.
"""
# Count the occurrence of each item.
item_counts = {}
for transaction in transactions:
for item in transaction:
if item in item_counts:
item_counts[item] += 1
else:
item_counts[item] = 1
# Filter the infrequent items.
frequent_items = {}
for item, count in item_counts.items():
if count >= min_support:
frequent_items[item] = count
# Generate the frequent itemsets.
k = 2
current_itemsets = frequent_items
frequent_itemsets = {1: frequent_items}
while current_itemsets:
candidate_itemsets = {}
for itemset in combinations(current_itemsets.keys(), k):
# Check if all (k-1)-subsets of the itemset are frequent.
is_frequent = True
for subset in combinations(itemset, k-1):
if subset not in current_itemsets:
is_frequent = False
break
if is_frequent:
# Count the occurrence of the candidate itemset.
count = 0
for transaction in transactions:
if set(itemset).issubset(transaction):
count += 1
if count >= min_support:
candidate_itemsets[itemset] = count
current_itemsets = candidate_itemsets
frequent_itemsets[k] = current_itemsets
k += 1
return frequent_itemsets
```
在上述代码中,我们首先统计了每个项的出现次数,然后筛选出频繁项。接下来,我们利用 Apriori 算法生成频繁项集。我们从频繁项集的第一项开始,逐步增加项集的大小,直到没有更多频繁项集为止。在每个步骤中,我们生成候选项集,然后统计它们在事务中出现的次数,从而确定哪些候选项集是频繁项集。最终,我们返回一个包含所有频繁项集的字典,其中每个键表示项集的大小,每个值表示该大小的所有频繁项集。