python如何对tmdb电影榜单数据集,建立多种模型进行评分预测
时间: 2024-01-14 08:03:28 浏览: 82
下面是一个基于Python的实现步骤,包括数据预处理、模型训练、模型评估等操作:
1. 数据预处理
```python
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
# 读取数据集
data = pd.read_csv('tmdb_movies.csv')
# 数据清洗
data.dropna(inplace=True)
# 特征工程
features = ['budget', 'popularity', 'runtime', 'vote_count']
X = data[features]
y = data['vote_average']
# 数据归一化
scaler = StandardScaler()
X = scaler.fit_transform(X)
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
2. 建立模型
```python
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor
# 建立决策树模型
dt = DecisionTreeRegressor(random_state=0)
dt.fit(X_train, y_train)
# 建立随机森林模型
rf = RandomForestRegressor(random_state=0)
rf.fit(X_train, y_train)
# 建立支持向量机模型
svm = SVR()
svm.fit(X_train, y_train)
# 建立神经网络模型
nn = MLPRegressor(random_state=0)
nn.fit(X_train, y_train)
```
3. 模型评估
```python
from sklearn.metrics import mean_squared_error, r2_score
# 决策树模型评估
y_pred_dt = dt.predict(X_test)
rmse_dt = mean_squared_error(y_test, y_pred_dt, squared=False)
r2_dt = r2_score(y_test, y_pred_dt)
# 随机森林模型评估
y_pred_rf = rf.predict(X_test)
rmse_rf = mean_squared_error(y_test, y_pred_rf, squared=False)
r2_rf = r2_score(y_test, y_pred_rf)
# 支持向量机模型评估
y_pred_svm = svm.predict(X_test)
rmse_svm = mean_squared_error(y_test, y_pred_svm, squared=False)
r2_svm = r2_score(y_test, y_pred_svm)
# 神经网络模型评估
y_pred_nn = nn.predict(X_test)
rmse_nn = mean_squared_error(y_test, y_pred_nn, squared=False)
r2_nn = r2_score(y_test, y_pred_nn)
# 输出评估结果
print(f'Decision Tree RMSE: {rmse_dt:.3f}, R2: {r2_dt:.3f}')
print(f'Random Forest RMSE: {rmse_rf:.3f}, R2: {r2_rf:.3f}')
print(f'SVM RMSE: {rmse_svm:.3f}, R2: {r2_svm:.3f}')
print(f'Neural Network RMSE: {rmse_nn:.3f}, R2: {r2_nn:.3f}')
```
4. 对比效果
根据模型评估结果,选择最优模型或进行模型融合,以提高预测效果。可以尝试使用交叉验证等方法,进一步比较模型效果。
阅读全文