全部八个步骤使用代码实现
时间: 2024-12-15 17:16:05 浏览: 2
以下是全部八个步骤的Python代码实现:
**步骤1:数据预处理**
```python
import pandas as pd
from sklearn.preprocessing import StandardScaler
# 读取数据
df = pd.read_csv('white_wine.csv')
# 数据标准化
scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
df_scaled.to_csv('white2_wine.csv', index=False)
```
**步骤2:划分训练集和测试集**
```python
from sklearn.model_selection import train_test_split
# 划分数据
X = df_scaled.drop('quality', axis=1) # 特征
y = df_scaled['quality'] # 目标变量
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
```
**步骤3:构建分类器**
```python
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.cluster import KMeans
classifiers = [
('LR', LogisticRegression()),
('NB', GaussianNB()),
('DT', DecisionTreeClassifier(random_state=42)),
('RF', RandomForestClassifier(random_state=42)),
('KM', KMeans(n_clusters=2))
]
for name, clf in classifiers:
model = clf.fit(X_train, y_train)
# ... (继续保存模型或进行后续操作)
```
**步骤4:PCA降维**
```python
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
# 保存降维后的数据
df_pca = pd.DataFrame(X_train_pca, columns=['PC1', 'PC2', 'PC3'])
df_pca.to_csv('whine3_wine.csv', index=False)
```
**步骤5:在降维后建模并评估**
```python
for name, clf in classifiers:
model = clf.fit(X_train_pca, y_train)
accuracy = clf.score(X_test_pca, y_test)
print(f"{name}在降维后测试集准确率为:{accuracy}")
# 可视化准确率
plt.figure(figsize=(10, 6))
plt.bar(classifiers.keys(), [accuracy for _, _ in classifiers], label='Accuracy')
plt.xlabel('Classifier')
plt.ylabel('Accuracy')
plt.title('PCA后各分类器测试集准确率')
plt.legend()
```
**步骤6:随机验证数据**
```python
random_data = df_scaled.sample(10).reset_index(drop=True)
print(random_data)
```
**步骤7:模型预测及可视化**
```python
for name, clf in classifiers:
predictions = clf.predict(random_data.drop('quality', axis=1))
visualize_predictions(predictions, random_data['quality'], name)
```
**步骤8:降维后预测**
```python
for name, clf in classifiers:
predictions_pca = clf.predict(X_test_pca)
visualize_predictions_pca(predictions_pca, y_test, name)
```
这里假设`visualize_predictions`和`visualize_predictions_pca`是自定义的函数,用于可视化预测结果。记得替换它们的实际实现。
阅读全文