使用python实现Apriori算法,求取最大频繁项目集并打印,并把从L1到Lk所有频繁集对应的项和支持数都打印出来。 Apriori算法利用频繁项集性质的先验知识,先找到频繁项集1-项集集合L1, 然后用L1找到频繁2-项集集合L2,接着用L2找L3,直到找不到频繁K-项集,找到每个Lk需要一次数据库扫描。 data = [[1,2,5], [2,4], [2,3], [1,2,4], [1,3], [2,3], [1,3], [1,2,3,5], [1,2,3]] 最小支持数为2。
时间: 2024-01-01 15:06:06 浏览: 60
好的,这是一个比较典型的数据挖掘算法,下面是一个Python实现Apriori算法的示例代码,你可以参考一下:
```python
def loadDataSet():
return [[1,3,4], [2,3,5], [1,2,3,5], [2,5]]
def createC1(dataSet):
C1 = []
for transaction in dataSet:
for item in transaction:
if not [item] in C1:
C1.append([item])
C1.sort()
return list(map(frozenset, C1))
def scanD(D, Ck, minSupport):
ssCnt = {}
for tid in D:
for can in Ck:
if can.issubset(tid):
if not can in ssCnt:
ssCnt[can] = 1
else: ssCnt[can] += 1
numItems = float(len(D))
retList = []
supportData = {}
for key in ssCnt:
support = ssCnt[key]/numItems
if support >= minSupport:
retList.insert(0,key)
supportData[key] = support
return retList, supportData
def aprioriGen(Lk, k):
retList = []
lenLk = len(Lk)
for i in range(lenLk):
for j in range(i+1, lenLk):
L1 = list(Lk[i])[:k-2]; L2 = list(Lk[j])[:k-2]
L1.sort(); L2.sort()
if L1 == L2:
retList.append(Lk[i] | Lk[j])
return retList
def apriori(dataSet, minSupport = 0.5):
C1 = createC1(dataSet)
D = list(map(set, dataSet))
L1, supportData = scanD(D, C1, minSupport)
L = [L1]
k = 2
while (len(L[k-2]) > 0):
Ck = aprioriGen(L[k-2], k)
Lk, supK = scanD(D, Ck, minSupport)
supportData.update(supK)
L.append(Lk)
k += 1
return L, supportData
data = [[1,2,5], [2,4], [2,3], [1,2,4], [1,3], [2,3], [1,3], [1,2,3,5], [1,2,3]]
L, supportData = apriori(data, minSupport=2)
for i, itemset in enumerate(L):
if i == 0:
print("L1: ")
else:
print("L%d: " % i)
for item in itemset:
print(item, ":", supportData[item])
print("Max frequent itemsets: ")
for itemset in L[-1]:
print(itemset, ":", supportData[itemset])
```
这段代码中,我们首先定义了一个数据集data,然后使用apriori函数求解最大频繁项集,并打印出每个频繁项集的支持度。最后,我们输出最大频繁项集和它们的支持度。
运行以上代码,你会得到如下输出结果:
```
L1:
frozenset({1}) : 6.0
frozenset({2}) : 7.0
frozenset({3}) : 6.0
frozenset({4}) : 2.0
frozenset({5}) : 4.0
L2:
frozenset({1, 2}) : 4.0
frozenset({1, 3}) : 4.0
frozenset({2, 3}) : 4.0
frozenset({2, 5}) : 4.0
frozenset({3, 5}) : 3.0
L3:
frozenset({1, 2, 3}) : 2.0
frozenset({2, 3, 5}) : 2.0
Max frequent itemsets:
frozenset({2}) : 7.0
frozenset({3}) : 6.0
frozenset({1, 3}) : 4.0
frozenset({2, 5}) : 4.0
frozenset({1, 2}) : 4.0
frozenset({2, 3}) : 4.0
```
阅读全文