基于grouplens的Book-Crossing数据集 写一个python书籍推荐系统
时间: 2024-02-25 11:54:51 浏览: 103
好的,以下是一个基于GroupLens的Book-Crossing数据集的Python书籍推荐系统的示例代码:
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import cosine_similarity
# 加载数据集
ratings = pd.read_csv('BX-Book-Ratings.csv', delimiter=';', error_bad_lines=False, encoding="latin-1")
books = pd.read_csv('BX-Books.csv', delimiter=';', error_bad_lines=False, encoding="latin-1")
users = pd.read_csv('BX-Users.csv', delimiter=';', error_bad_lines=False, encoding="latin-1")
# 过滤掉评分次数少于50次的书籍
book_ratings_count = pd.DataFrame(ratings.groupby('ISBN')['Book-Rating'].count())
book_ratings_count = book_ratings_count[book_ratings_count['Book-Rating'] >= 50]
book_ratings_count.reset_index(inplace=True)
# 合并数据集
ratings = pd.merge(ratings, book_ratings_count, on='ISBN', how='inner')
ratings.drop('Book-Rating_y', axis=1, inplace=True)
ratings.rename(columns={'Book-Rating_x': 'Book-Rating'}, inplace=True)
# 过滤掉评分次数少于50次的用户
user_ratings_count = pd.DataFrame(ratings.groupby('User-ID')['Book-Rating'].count())
user_ratings_count = user_ratings_count[user_ratings_count['Book-Rating'] >= 50]
user_ratings_count.reset_index(inplace=True)
# 合并数据集
ratings = pd.merge(ratings, user_ratings_count, on='User-ID', how='inner')
ratings.drop('Book-Rating_y', axis=1, inplace=True)
ratings.rename(columns={'Book-Rating_x': 'Book-Rating'}, inplace=True)
# 构建评分矩阵
ratings_matrix = ratings.pivot_table(index='User-ID', columns='ISBN', values='Book-Rating')
# 填充缺失值
ratings_matrix.fillna(0, inplace=True)
# 计算相似度矩阵
cosine_sim = cosine_similarity(ratings_matrix)
# 定义函数:获取相似的书籍
def get_similar_books(book_title, cosine_sim=cosine_sim):
idx = books[books['Book-Title'] == book_title].index[0]
sim_scores = list(enumerate(cosine_sim[idx]))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
sim_scores = sim_scores[1:11]
book_indices = [i[0] for i in sim_scores]
return books['Book-Title'].iloc[book_indices]
# 测试
book_title = 'The Da Vinci Code'
print('对于书籍:', book_title, '的推荐如下:')
print(get_similar_books(book_title))
```
这个示例代码使用了基于内容的推荐算法,通过计算书籍的相似度来进行推荐。具体实现中,首先加载了Book-Crossing数据集,并过滤掉了评分次数过少的书籍和用户。然后,构建了评分矩阵,并计算了相似度矩阵。最后,定义了一个函数,通过输入书籍名称,返回相似度最高的10本书籍。
需要注意的是,这个示例代码只是一个基础的框架,实际使用时需要根据具体业务需求进行调整和优化。
阅读全文