jupyter随机森林模拟实验
时间: 2023-10-14 15:28:47 浏览: 41
1. 准备数据集
首先,我们需要准备一个数据集。这里我们使用sklearn自带的iris数据集。
```python
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
```
2. 分割数据集
接下来,我们需要将数据集分割成训练集和测试集。
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
```
这里将数据集分割成了70%的训练集和30%的测试集。
3. 创建随机森林模型
```python
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=10)
```
这里创建了一个包含10棵决策树的随机森林模型。
4. 训练模型
```python
rfc.fit(X_train, y_train)
```
5. 预测
```python
y_pred = rfc.predict(X_test)
```
6. 模型评估
```python
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
```
这里使用准确率来评估模型的性能。
完整代码:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
rfc = RandomForestClassifier(n_estimators=10)
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
```
输出:
```
Accuracy: 0.9777777777777777
```