构建和训练随机森林模型
时间: 2023-11-30 18:41:55 浏览: 28
以下是构建和训练随机森林模型的步骤:
1.导入需要的库和数据集
```python
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
data = pd.read_csv('data.csv')
```
2.对数据进行基本处理,如缺失值填充、数据类型转换等
```python
# 填充缺失值
data.fillna(0, inplace=True)
# 将字符串类型的特征值转换为数值类型
data['sex'] = data['sex'].map({'male': 1, 'female': 0})
```
3.确定目标值和特征值
```python
# 确定目标值和特征值
target = data['survived']
features = data[['pclass', 'sex', 'age', 'fare']]
```
4.将数据集分割为训练集和测试集
```python
# 将数据集分割为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3, random_state=42)
```
5.构建随机森林模型并进行训练
```python
# 构建随机森林模型并进行训练
rfc = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
rfc.fit(X_train, y_train)
```
6.保存模型
```python
# 保存模型
import joblib
joblib.dump(rfc, 'rfc_model.pkl')
```
7.使用测试集进行模型预测,并打印精度、召回率、f1-score等指标
```python
# 使用测试集进行模型预测,并打印精度、召回率、f1-score等指标
from sklearn.metrics import accuracy_score, recall_score, f1_score
y_pred = rfc.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Recall:', recall_score(y_test, y_pred))
print('F1-score:', f1_score(y_test, y_pred))
```