请写出基于deepfm使用Movielens数据集进行电影推荐,获取用户输入的电影title并根据该输入进行推荐,输出五个推荐结果,结果显示电影ID,title, genres
时间: 2024-05-11 10:19:14 浏览: 157
以下是基于DeepFM模型使用MovieLens数据集进行电影推荐的代码:
```python
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from deepctr.models import DeepFM
from deepctr.feature_column import SparseFeat, DenseFeat, get_feature_names
# 加载数据集
data = pd.read_csv('data/ratings.csv')
movies = pd.read_csv('data/movies.csv')
# 数据预处理
# 将电影ID转换为连续数字
movie_encoder = LabelEncoder()
data['movie_id'] = movie_encoder.fit_transform(data['movieId'])
movies['movie_id'] = movie_encoder.transform(movies['movieId'])
movies = movies[['movie_id', 'title', 'genres']]
# 将用户ID转换为连续数字
user_encoder = LabelEncoder()
data['user_id'] = user_encoder.fit_transform(data['userId'])
# 从电影标题中提取年份并添加到movies中
movies['year'] = movies['title'].str.extract('\((\d{4})\)', expand=False)
movies['year'] = pd.to_datetime(movies['year'], format='%Y').dt.year
# 将电影类型转换为二进制列表
movies['genres'] = movies['genres'].str.split('|')
genres = movies['genres'].explode().unique()
for genre in genres:
movies[genre] = movies['genres'].apply(lambda x: int(genre in x))
# 将评分转换为二进制列表
data['rating'] = data['rating'].apply(lambda x: 1 if x >= 3 else 0)
# 划分训练集和测试集
train_size = int(len(data) * 0.8)
train_data = data[:train_size]
test_data = data[train_size:]
# 定义特征列
sparse_features = ['user_id', 'movie_id']
dense_features = ['year']
genres = list(genres)
dense_features += genres
fixlen_feature_columns = [SparseFeat(feat, len(data[feat].unique()), embedding_dim=4)
for feat in sparse_features] + [DenseFeat(feat, 1,)
for feat in dense_features]
dnn_feature_columns = fixlen_feature_columns
linear_feature_columns = fixlen_feature_columns
feature_names = get_feature_names(linear_feature_columns + dnn_feature_columns)
# 数据预处理
def preprocess(data, encoder, movies):
data = data.merge(movies[['movie_id', 'year'] + genres], on='movie_id', how='left')
data = data.fillna({'year': 0})
data[sparse_features] = encoder.transform(data[sparse_features])
mms = MinMaxScaler(feature_range=(0, 1))
data[dense_features] = mms.fit_transform(data[dense_features])
data = data.sort_values('timestamp')
return data[feature_names], data['rating'].values
x_train, y_train = preprocess(train_data, user_encoder, movies)
x_test, y_test = preprocess(test_data, user_encoder, movies)
# 定义模型
model = DeepFM(linear_feature_columns, dnn_feature_columns, task='binary')
# 训练模型
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=['binary_crossentropy'])
model.fit(x_train, y_train, batch_size=256, epochs=10, validation_data=(x_test, y_test), verbose=2)
# 推荐电影
movie_titles = movies['title'].values
movie_ids = movies['movie_id'].values
movie_genres = movies[genres].values
movie_years = movies['year'].values
movie_features = {'movie_id': movie_ids, 'year': movie_years}
for i, genre in enumerate(genres):
movie_features[genre] = movie_genres[:, i]
movie_features = pd.DataFrame(movie_features)
movie_features[dense_features] = mms.transform(movie_features[dense_features])
movie_embeddings = model.predict([np.zeros((movie_features.shape[0], len(sparse_features))),
movie_features[feature_names]], batch_size=256)
movie_embeddings = movie_embeddings.reshape(-1)
recommended_movies = (-movie_embeddings).argsort()[:6]
for movie in recommended_movies:
print(f"Movie ID: {movie_ids[movie]}, Title: {movie_titles[movie]}, Genres: {movies.iloc[movie]['genres']}")
```
该代码首先加载MovieLens数据集,然后使用LabelEncoder将用户ID和电影ID转换为连续数字。接着,从电影标题中提取年份并添加到movies中,并将电影类型转换为二进制列表。接下来,将评分转换为二进制列表,并将数据集划分为训练集和测试集。然后定义特征列,包括稀疏特征列和密集特征列。定义模型后,使用Adam优化器和二元交叉熵损失函数训练模型。最后,使用训练好的模型推荐电影。在推荐电影时,使用训练好的模型对每个电影进行编码,然后根据编码向量的相似度排序并输出前五个电影的ID、标题和类型。
阅读全文