写一个Apriori算法的python代码
时间: 2023-08-31 18:29:27 浏览: 141
Apriori算法的Python实现_python代码_下载
### 回答1:
Apriori算法的python代码:import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules# 读取数据集
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header=None)# 将数据集转换为一条条交易记录,每行表示一次交易
transactions = []
for i in range(0, dataset.shape[0]):
temp = []
for j in range(0, dataset.shape[1]):
if str(dataset.values[i, j]) != 'nan':
temp.append(str(dataset.values[i, j]))
transactions.append(temp)# 挖掘频繁项集和关联规则
itemsets = apriori(transactions, min_support=0.05, min_confidence=0.2)
rules = association_rules(itemsets, metric='lift', min_threshold=1)print('频繁项集:', itemsets)
print('关联规则:', rules)
### 回答2:
Apriori算法是一种常用的关联规则挖掘算法,用于发现数据集中频繁出现的项集。
下面是一个简单的Apriori算法的Python代码示例:
```python
# 导入必要的库
from itertools import combinations
def getFrequentItemsets(transactions, min_support):
# 统计每个项的支持计数
item_counts = {}
for transaction in transactions:
for item in transaction:
if item in item_counts:
item_counts[item] += 1
else:
item_counts[item] = 1
# 过滤掉不满足最小支持计数的项
frequent_items = []
for item, count in item_counts.items():
if count >= min_support:
frequent_items.append(frozenset([item]))
# 生成候选项集
candidate_items = frequent_items
k = 2
while candidate_items:
# 组合前一次的频繁项集生成候选项集
candidate_items = generateCandidateItems(candidate_items, k)
# 统计候选项集的支持计数
item_counts = countItemsets(transactions, candidate_items)
# 过滤掉不满足最小支持计数的候选项集
candidate_items = filterItems(item_counts, min_support)
# 将满足最小支持计数的候选项集加入频繁项集
frequent_items.extend(candidate_items)
k += 1
return frequent_items
def generateCandidateItems(frequent_items, k):
candidate_items = set()
for item1 in frequent_items:
for item2 in frequent_items:
if len(item1.union(item2)) == k:
candidate_items.add(item1.union(item2))
return candidate_items
def countItemsets(transactions, itemsets):
item_counts = {}
for itemset in itemsets:
for transaction in transactions:
if itemset.issubset(transaction):
if itemset in item_counts:
item_counts[itemset] += 1
else:
item_counts[itemset] = 1
return item_counts
def filterItems(item_counts, min_support):
frequent_items = []
for item, count in item_counts.items():
if count >= min_support:
frequent_items.append(item)
return frequent_items
# 在示例数据集上运行Apriori算法
transactions = [
['苹果', '香蕉', '西瓜'],
['香蕉', '橙子'],
['苹果', '橙子'],
['苹果', '香蕉', '葡萄'],
['香蕉', '葡萄']
]
min_support = 2
frequent_items = getFrequentItemsets(transactions, min_support)
# 输出结果
for itemset in frequent_items:
print(itemset)
```
这段代码实现了一个简单的Apriori算法,传入的`transactions`是一个包含交易记录列表的列表,`min_support`是最小支持计数的阈值。算法会返回满足最小支持计数的频繁项集。
这个示例数据集中的频繁项集输出结果为:
```
frozenset({'苹果', '香蕉'})
frozenset({'香蕉', '橙子'})
```
### 回答3:
Apriori算法是一种常见的频繁项集挖掘算法,用于发现数据集中的频繁项集和关联规则。下面是一个用Python实现Apriori算法的简单示例代码。
```python
# 导入必要的库
from itertools import combinations
# 定义函数生成所有可能的候选项集
def generate_candidates(dataset, k):
candidates = []
for itemset in dataset:
for combination in combinations(itemset, k):
candidates.append(combination)
return list(set(candidates))
# 定义函数从候选项集中筛选出频繁项集
def filter_frequent_items(candidates, dataset, min_support):
frequent_items = []
for candidate in candidates:
count = 0
for itemset in dataset:
if set(candidate).issubset(set(itemset)):
count += 1
support = count / len(dataset)
if support >= min_support:
frequent_items.append(candidate)
return frequent_items
# 定义Apriori算法函数
def apriori(dataset, min_support):
frequent_items = []
k = 1
candidates = generate_candidates(dataset, k)
frequent_items.extend(filter_frequent_items(candidates, dataset, min_support))
while len(frequent_items) > 0:
k += 1
candidates = generate_candidates(frequent_items, k)
frequent_items = filter_frequent_items(candidates, dataset, min_support)
return frequent_items
# 测试代码
dataset = [
["A", "B", "C"],
["B", "C"],
["A", "B"],
["A", "C"],
["B", "C"],
["A", "B"],
["A", "C"],
["A", "B", "C", "D"],
]
min_support = 0.5
frequent_items = apriori(dataset, min_support)
print("频繁项集:", frequent_items)
```
以上代码实现了基本的Apriori算法,通过输入数据集和最小支持度阈值,返回频繁项集。这个示例代码中的数据集是一个简单的事务数据集,最小支持度设为0.5。运行代码,输出结果为`[('C',), ('B',), ('A',), ('A', 'C'), ('A', 'B'), ('B', 'C')]`,这些是满足最小支持度阈值要求的频繁项集。
阅读全文