机器学习入门级实例针对葡萄酒质量利用KNN、决策树、随机森林对数据进行建模的完整代码
时间: 2023-07-31 07:13:40 浏览: 210
好的,以下是一个使用KNN、决策树、随机森林对葡萄酒质量数据进行建模的完整Python代码:
首先,需要导入必要的库:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
```
接下来,读取数据并进行预处理:
```python
# 读取数据集
data = pd.read_csv('winequality.csv')
# 将数据集分为特征和目标
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
# 将目标变量转换为二元变量
for i in range(len(y)):
if y[i] < 6:
y[i] = 0
else:
y[i] = 1
# 将数据集分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# 特征缩放
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
```
然后,使用KNN进行建模和预测:
```python
# KNN分类器
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
# 预测测试集结果
y_pred_knn = knn.predict(X_test)
# 输出模型准确率
acc_knn = accuracy_score(y_test, y_pred_knn)
print("KNN准确率:", acc_knn)
```
接下来,使用决策树进行建模和预测:
```python
# 决策树分类器
dt = DecisionTreeClassifier(criterion='entropy', random_state=0)
dt.fit(X_train, y_train)
# 预测测试集结果
y_pred_dt = dt.predict(X_test)
# 输出模型准确率
acc_dt = accuracy_score(y_test, y_pred_dt)
print("决策树准确率:", acc_dt)
```
最后,使用随机森林进行建模和预测:
```python
# 随机森林分类器
rf = RandomForestClassifier(n_estimators=10, criterion='entropy', random_state=0)
rf.fit(X_train, y_train)
# 预测测试集结果
y_pred_rf = rf.predict(X_test)
# 输出模型准确率
acc_rf = accuracy_score(y_test, y_pred_rf)
print("随机森林准确率:", acc_rf)
```
完整代码如下:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# 读取数据集
data = pd.read_csv('winequality.csv')
# 将数据集分为特征和目标
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
# 将目标变量转换为二元变量
for i in range(len(y)):
if y[i] < 6:
y[i] = 0
else:
y[i] = 1
# 将数据集分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# 特征缩放
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# KNN分类器
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
# 预测测试集结果
y_pred_knn = knn.predict(X_test)
# 输出模型准确率
acc_knn = accuracy_score(y_test, y_pred_knn)
print("KNN准确率:", acc_knn)
# 决策树分类器
dt = DecisionTreeClassifier(criterion='entropy', random_state=0)
dt.fit(X_train, y_train)
# 预测测试集结果
y_pred_dt = dt.predict(X_test)
# 输出模型准确率
acc_dt = accuracy_score(y_test, y_pred_dt)
print("决策树准确率:", acc_dt)
# 随机森林分类器
rf = RandomForestClassifier(n_estimators=10, criterion='entropy', random_state=0)
rf.fit(X_train, y_train)
# 预测测试集结果
y_pred_rf = rf.predict(X_test)
# 输出模型准确率
acc_rf = accuracy_score(y_test, y_pred_rf)
print("随机森林准确率:", acc_rf)
```
阅读全文