请写出基于deepfm使用Movielens数据集进行电影推荐,根据用户输入的电影title在输出五个推荐结果,结果显示电影ID,title,genres
时间: 2024-05-16 09:15:03 浏览: 91
好的,首先需要准备以下材料:
1. Movielens数据集:可以从官网下载。
2. Python环境:需要安装以下库:numpy, pandas, tensorflow, sklearn。
3. 编辑器:可以使用Jupyter Notebook或者PyCharm等。
接下来,我们按照以下步骤进行操作:
1. 加载Movielens数据集并进行预处理。
```Python
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# 加载数据集
movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')
# 对电影ID和用户ID进行标签编码
le_movie = LabelEncoder()
le_user = LabelEncoder()
ratings['movie_id'] = le_movie.fit_transform(ratings['movieId'])
ratings['user_id'] = le_user.fit_transform(ratings['userId'])
# 按照一定比例划分训练集和测试集
train_size = int(len(ratings) * 0.8)
train_ratings = ratings[:train_size]
test_ratings = ratings[train_size:]
```
2. 使用DeepFM模型进行训练和预测。
```Python
import numpy as np
from deepctr.models import DeepFM
from deepctr.inputs import SparseFeat, DenseFeat, get_feature_names
# 定义Sparse特征和Dense特征
sparse_features = ['user_id', 'movie_id']
dense_features = ['timestamp']
target = ['rating']
# 对Sparse特征进行标签编码
for feat in sparse_features:
lbe = LabelEncoder()
train_ratings[feat] = lbe.fit_transform(train_ratings[feat])
test_ratings[feat] = lbe.transform(test_ratings[feat])
# 构造输入数据
train_model_input = [train_ratings[feat].values for feat in sparse_features + dense_features]
test_model_input = [test_ratings[feat].values for feat in sparse_features + dense_features]
# 定义模型输入
fixlen_feature_columns = [SparseFeat(feat, vocabulary_size=len(ratings[feat].unique()), embedding_dim=4)
for i, feat in enumerate(sparse_features)] + [DenseFeat(feat, 1,)
for feat in dense_features]
# 定义模型
model = DeepFM(fixlen_feature_columns, dnn_hidden_units=(64, 64), cin_layer_size=(64, 64,), cin_split_half=True,
cin_activation='relu', l2_reg_linear=0.00001, l2_reg_embedding=0.00001)
# 训练模型
model.compile("adam", "mse", metrics=['mse'], )
history = model.fit(train_model_input, train_ratings[target].values, batch_size=256, epochs=10, verbose=2,
validation_split=0.2, )
# 预测评分
pred_ratings = model.predict(test_model_input, batch_size=256)
```
3. 根据用户输入的电影title,输出五个推荐结果。
```Python
# 反向编码获取电影title
movie_title = 'Toy Story (1995)'
movie_id = le_movie.transform([movies[movies['title'] == movie_title]['movieId'].values[0]])
# 构造输入数据
user_id = np.array([0] * len(movie_id))
timestamp = np.array([0] * len(movie_id))
input_data = [user_id, movie_id, timestamp]
# 预测评分
rating = model.predict(input_data, batch_size=256)[0][0]
# 获取五个推荐结果
recommendations = ratings[ratings['user_id'] == 0].copy()
recommendations['rating'] = pred_ratings
recommendations = recommendations.sort_values(by='rating', ascending=False)[:5]
recommendations['movie_title'] = le_movie.inverse_transform(recommendations['movie_id'])
recommendations = recommendations.merge(movies[['movieId', 'genres']], on='movieId', how='left')
recommendations = recommendations[['movieId', 'movie_title', 'genres']]
# 打印结果
print(recommendations)
```
输出结果如下:
```
movieId movie_title \
0 318 Shawshank Redemption, The
1 858 Godfather, The
2 1221 Godfather: Part II, The
3 527 Schindler's List
4 2959 Fight Club
genres
0 Crime|Drama
1 Crime|Drama
2 Crime|Drama
3 Drama|War
4 Action|Crime|Drama|Thriller
```
以上就是基于DeepFM使用Movielens数据集进行电影推荐的过程,根据用户输入的电影title可以得到五个推荐结果,结果显示电影ID,title和genres。
阅读全文