在CSV文件中划分训练集、验证集和测试集代码,顺便生成训练集、验证集和测试集的loss和accuracy曲线
时间: 2024-12-28 15:06:35 浏览: 11
在Python中,特别是在使用`sklearn`库处理CSV数据并进行机器学习任务时,可以使用以下步骤来划分训练集、验证集和测试集,并生成损失(loss)和准确率(accuracy)曲线:
首先,你需要导入必要的库:
```python
import pandas as pd
from sklearn.model_selection import train_test_split, KFold
from sklearn.metrics import accuracy_score, mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt
```
然后加载CSV数据:
```python
data = pd.read_csv('your_data.csv')
X = data.drop('target_column', axis=1)
y = data['target_column']
```
接下来,将数据分为特征(X)和目标变量(y),然后进行分割:
```python
X_train, X_rest, y_train, y_rest = train_test_split(X, y, test_size=0.2, random_state=42) # 首先做80%的数据作为训练集
X_val, X_test, y_val, y_test = train_test_split(X_rest, y_rest, test_size=0.5, random_state=42) # 再把剩余的20%分成验证集和测试集
```
假设我们使用Keras构建一个简单的模型示例:
```python
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=X_train.shape[1]))
model.add(Dense(1, activation='sigmoid')) # 根据任务调整激活函数
# 编译模型
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# 使用EarlyStopping回调防止过拟合
early_stopping = EarlyStopping(monitor='val_loss', patience=5)
# 训练模型
history = model.fit(X_train, y_train, epochs=100, validation_data=(X_val, y_val), callbacks=[early_stopping])
```
最后,我们可以使用`matplotlib`画出损失(loss)和准确率(accuracy)曲线:
```python
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Loss Curve')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Accuracy Curve')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
```
阅读全文