逐行分析下面的代码:print('Building new-user table ...') new_user = {} for user, news in trainSet.items(): for new in news: if new not in new_user: new_user[new] = set() new_user[new].add(user) print('Build new-user table success!') new_count = len(new_user) print('Total new number = %d' % new_count) print('Build user co-rated users matrix ...') for new, users in new_user.items(): for u in users: for v in users: if u == v: continue user_sim_matrix.setdefault(u, {}) user_sim_matrix[u].setdefault(v, 0) user_sim_matrix[u][v] += 1 print('Build user co-rated users matrix success!') print('Calculating user similarity matrix ...') for u, related_users in user_sim_matrix.items(): for v, count in related_users.items(): user_sim_matrix[u][v] = count / math.sqrt(len(trainSet[u]) * len(trainSet[v])) print('Calculate user similarity matrix success!')
时间: 2024-04-28 18:22:11 浏览: 13
这段代码的作用是建立一个基于用户的协同过滤推荐系统,用于推荐给用户他们可能感兴趣的新闻。
首先,代码输出 "Building new-user table ..." 的提示信息,表示正在构建新用户表。然后,通过遍历训练集 trainSet 中的每个用户和他们的新闻,将用户和其阅读过的新闻加入到新用户表 new_user 中。
接下来,程序遍历新用户表中的每个新闻和阅读过该新闻的用户,然后构建用户协同矩阵,即统计共同阅读过某个新闻的所有用户的数量,并将其存储在 user_sim_matrix 中。
最后,程序通过计算用户相似度矩阵,即将用户协同矩阵中的数量转化为相似度得分,来计算用户之间的相似度,并以 "Calculating user similarity matrix ..." 的提示信息表示该过程正在进行中,最终输出 "Calculate user similarity matrix success!" 的提示信息表示计算完成。
相关问题
逐行分析下面的代码:import random import numpy as np import pandas as pd import math from operator import itemgetter data_path = './ml-latest-small/' data = pd.read_csv(data_path+'ratings.csv') data.head() data.pivot(index='userId', columns='newId', values='rating') trainSet, testSet = {}, {} trainSet_len, testSet_len = 0, 0 pivot = 0.75 for ele in data.itertuples(): user, new, rating = getattr(ele, 'userId'), getattr(ele, 'newId'), getattr(ele, 'rating') if random.random() < pivot: trainSet.setdefault(user, {}) trainSet[user][new] = rating trainSet_len += 1 else: testSet.setdefault(user, {}) testSet[user][new] = rating testSet_len += 1 print('Split trainingSet and testSet success!') print('TrainSet = %s' % trainSet_len) print('TestSet = %s' % testSet_len) user_sim_matrix = {} print('Building new-user table ...') new_user = {} for user, news in trainSet.items(): for new in news: if new not in new_user: new_user[new] = set() new_user[new].add(user) print('Build new-user table success!') new_count = len(new_user) print('Total new number = %d' % new_count) print('Build user co-rated users matrix ...') for new, users in new_user.items(): for u in users: for v in users: if u == v: continue user_sim_matrix.setdefault(u, {}) user_sim_matrix[u].setdefault(v, 0) user_sim_matrix[u][v] += 1 print('Build user co-rated users matrix success!') print('Calculating user similarity matrix ...') for u, related_users in user_sim_matrix.items(): for v, count in related_users.items(): user_sim_matrix[u][v] = count / math.sqrt(len(trainSet[u]) * len(trainSet[v])) print('Calculate user similarity matrix success!') k = 20 n = 10 aim_user = 3 rank ={} watched_news = trainSet[aim_user] for v, wuv in sorted(user_sim_matrix[aim_user].items(), key=lambda x: x[1], reverse=True)[0:k]: for new in trainSet[v]: if new in watched_news: continue rank.setdefault(new, 0) rank[new] += wuv rec_news = sorted(rank.items(), key=itemgetter(1), reverse=True)[:n] rec_news
这段代码是一个协同过滤推荐算法的实现,主要包括以下步骤:
1. 导入需要的包:random、numpy、pandas、math、itemgetter。
2. 读取评分数据文件,将数据拆分成训练集和测试集,并统计它们的大小。
3. 构建新闻-用户表,记录每个用户评分过的新闻。
4. 构建用户相似度矩阵,记录每个用户与其他用户的相似度。
5. 根据目标用户和相似度矩阵,推荐目标用户可能感兴趣的新闻。
具体实现细节如下:
1. 读取评分数据文件,并将评分数据拆分成训练集和测试集,以随机数的方式将每个评分数据划分到训练集或测试集中。
2. 构建新闻-用户表,记录每个用户评分过的新闻。具体实现是遍历训练集中的每个评分数据,然后将用户和新闻作为字典的键,评分作为字典的值,添加到训练集字典中。
3. 构建用户相似度矩阵,记录每个用户与其他用户的相似度。具体实现是遍历新闻-用户表,对于每个新闻,将评分过该新闻的所有用户记录到一个集合中。然后遍历集合中的每个用户对,计算它们之间的相似度,存储到用户相似度矩阵中。
4. 根据目标用户和相似度矩阵,推荐目标用户可能感兴趣的新闻。具体实现是遍历相似度矩阵中与目标用户相似度最高的k个用户,然后遍历这些用户评分过的新闻,计算每个新闻与目标用户的相似度权重,最后按权重排序,选取前n个新闻作为推荐结果。
逐行分析下面的代码:import random import numpy as np import pandas as pd import math from operator import itemgetter data_path = './ml-latest-small/' data = pd.read_csv(data_path+'ratings.csv') data.head() data.pivot(index='userId', columns='newId', values='rating') trainSet, testSet = {}, {} trainSet_len, testSet_len = 0, 0 pivot = 0.75 for ele in data.itertuples(): user, new, rating = getattr(ele, 'userId'), getattr(ele, 'newId'), getattr(ele, 'rating') if random.random() < pivot: trainSet.setdefault(user, {}) trainSet[user][new] = rating trainSet_len += 1 else: testSet.setdefault(user, {}) testSet[user][new] = rating testSet_len += 1 print('Split trainingSet and testSet success!') print('TrainSet = %s' % trainSet_len) print('TestSet = %s' % testSet_len) new_popular = {} for user, news in trainSet.items(): for new in news: if new not in new_popular: new_popular[new] = 0 new_popular[new] += 1 new_count = len(new_popular) print('Total movie number = %d' % new_count) print('Build user co-rated news matrix ...') new_sim_matrix = {} for user, news in trainSet.items(): for m1 in news: for m2 in news: if m1 == m2: continue new_sim_matrix.setdefault(m1, {}) new_sim_matrix[m1].setdefault(m2, 0) new_sim_matrix[m1][m2] += 1 print('Build user co-rated movies matrix success!') print('Calculating news similarity matrix ...') for m1, related_news in new_sim_matrix.items(): for m2, count in related_news.items(): if new_popular[m1] == 0 or new_popular[m2] == 0: new_sim_matrix[m1][m2] = 0 else: new_sim_matrix[m1][m2] = count / math.sqrt(new_popular[m1] * new_popular[m2]) print('Calculate news similarity matrix success!') k = 20 n = 10 aim_user = 20 rank ={} watched_news = trainSet[aim_user] for new, rating in watched_news.items(): for related_new, w in sorted(new_sim_matrix[new].items(), key=itemgetter(1), reverse=True)[:k]: if related_new in watched_news: continue rank.setdefault(related_new, 0) rank[related_new] += w * float(rating) rec_news = sorted(rank.items(), key=itemgetter(1), reverse=True)[:n] rec_news
这段代码是一个电影推荐系统的实现。下面是对代码的逐行分析:
1. `import random import numpy as np import pandas as pd import math from operator import itemgetter`:导入需要使用的库
2. `data_path = './ml-latest-small/' data = pd.read_csv(data_path+'ratings.csv') data.head()`:读取电影评分数据,将其存储在一个DataFrame中,并输出前5行数据
3. `data.pivot(index='userId', columns='newId', values='rating')`:将数据按照用户ID和电影ID进行重塑,创建一个用户-电影评分的矩阵
4. `trainSet, testSet = {}, {} trainSet_len, testSet_len = 0, 0 pivot = 0.75`:初始化训练集和测试集,并设置训练集占比为0.75
5. `for ele in data.itertuples():`:遍历数据中的每一行
6. `user, new, rating = getattr(ele, 'userId'), getattr(ele, 'newId'), getattr(ele, 'rating')`:获取每一行数据中的用户ID、电影ID和评分
7. `if random.random() < pivot: trainSet.setdefault(user, {}) trainSet[user][new] = rating trainSet_len += 1 else: testSet.setdefault(user, {}) testSet[user][new] = rating testSet_len += 1`:根据训练集占比将数据划分为训练集和测试集,并统计训练集和测试集中的电影数量
8. `print('Split trainingSet and testSet success!') print('TrainSet = %s' % trainSet_len) print('TestSet = %s' % testSet_len)`:输出训练集和测试集的电影数量
9. `new_popular = {} for user, news in trainSet.items(): for new in news: if new not in new_popular: new_popular[new] = 0 new_popular[new] += 1`:统计每部电影的流行度(出现次数)
10. `new_count = len(new_popular) print('Total movie number = %d' % new_count)`:输出电影总数
11. `new_sim_matrix = {} for user, news in trainSet.items(): for m1 in news: for m2 in news: if m1 == m2: continue new_sim_matrix.setdefault(m1, {}) new_sim_matrix[m1].setdefault(m2, 0) new_sim_matrix[m1][m2] += 1`:构建用户-电影协同过滤矩阵,统计每对电影被多少个用户共同观看过
12. `print('Build user co-rated movies matrix success!')`:输出构建协同过滤矩阵成功信息
13. `for m1, related_news in new_sim_matrix.items(): for m2, count in related_news.items(): if new_popular[m1] == 0 or new_popular[m2] == 0: new_sim_matrix[m1][m2] = 0 else: new_sim_matrix[m1][m2] = count / math.sqrt(new_popular[m1] * new_popular[m2])`:计算电影之间的相似度,使用余弦相似度度量
14. `print('Calculate news similarity matrix success!')`:输出计算电影相似度成功信息
15. `k = 20 n = 10 aim_user = 20`:定义参数,包括推荐电影的数量和目标用户ID
16. `rank ={} watched_news = trainSet[aim_user] for new, rating in watched_news.items(): for related_new, w in sorted(new_sim_matrix[new].items(), key=itemgetter(1), reverse=True)[:k]: if related_new in watched_news: continue rank.setdefault(related_new, 0) rank[related_new] += w * float(rating) rec_news = sorted(rank.items(), key=itemgetter(1), reverse=True)[:n]`:为目标用户推荐电影,根据用户观看历史和电影相似度计算推荐度,并将推荐度排序输出前n个推荐电影。
相关推荐
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)