不使用apriori库的可以从外界读入数据集的关联规则apriori算法代码
时间: 2023-06-12 22:06:23 浏览: 89
以下是一个简单的Python实现,使用Pandas库从外部文件中读取数据集,并使用Apriori算法来找到频繁项集和关联规则。
```python
import pandas as pd
# 从文件中读取数据集,每行都是一个交易,每个交易的商品用逗号分隔
df = pd.read_csv('dataset.csv', header=None)
# 将数据集转换为列表的列表
transactions = []
for i in range(len(df)):
transactions.append(df.iloc[i, :].dropna().tolist())
# 定义函数来生成频繁项集
def get_frequent_itemsets(transactions, min_support):
item_counts = {}
for transaction in transactions:
for item in transaction:
if item not in item_counts:
item_counts[item] = 1
else:
item_counts[item] += 1
num_transactions = len(transactions)
frequent_itemsets = {}
for item, count in item_counts.items():
support = count / num_transactions
if support >= min_support:
frequent_itemsets[(item,)] = support
k = 2
while True:
candidates = {}
for itemset in frequent_itemsets:
for transaction in transactions:
for item in transaction:
if item not in itemset:
candidate = tuple(sorted(itemset + (item,)))
if candidate not in candidates:
candidates[candidate] = 1
else:
candidates[candidate] += 1
frequent_itemsets = {}
for itemset, count in candidates.items():
support = count / num_transactions
if support >= min_support:
frequent_itemsets[itemset] = support
if not frequent_itemsets:
break
k += 1
return frequent_itemsets
# 定义函数来生成关联规则
def get_association_rules(frequent_itemsets, min_confidence):
association_rules = []
for itemset, support in frequent_itemsets.items():
if len(itemset) < 2:
continue
for i in range(1, len(itemset)):
for antecedent in combinations(itemset, i):
antecedent = tuple(sorted(antecedent))
consequent = tuple(sorted(set(itemset) - set(antecedent)))
confidence = frequent_itemsets[itemset] / frequent_itemsets[antecedent]
if confidence >= min_confidence:
association_rules.append((antecedent, consequent, confidence))
return association_rules
# 调用函数来生成频繁项集
frequent_itemsets = get_frequent_itemsets(transactions, 0.5)
# 调用函数来生成关联规则
association_rules = get_association_rules(frequent_itemsets, 0.7)
# 输出结果
print("Frequent Itemsets:")
for itemset, support in frequent_itemsets.items():
print(itemset, support)
print("Association Rules:")
for antecedent, consequent, confidence in association_rules:
print(antecedent, "->", consequent, confidence)
```
请注意,此代码仅用于演示目的,并且没有进行任何优化。在实际使用中,您可能需要使用更高效的算法和数据结构来处理大型数据集。
阅读全文