图片划分训练集和测试集
时间: 2024-02-03 18:02:59 浏览: 81
以下是两种图片划分训练集和测试集的方法:
1. 使用Python的scikit-learn库进行划分,可以实现分层抽样,即按照类别比例划分训练集和测试集。示例代码如下:
```python
from sklearn.model_selection import train_test_split
import os
import shutil
# 图片所在文件夹路径
data_dir = 'path/to/data'
# 获取所有图片路径和对应的标签
image_paths = []
labels = []
for label_name in os.listdir(data_dir):
label_dir = os.path.join(data_dir, label_name)
for image_name in os.listdir(label_dir):
image_path = os.path.join(label_dir, image_name)
image_paths.append(image_path)
labels.append(label_name)
# 划分训练集和测试集
train_paths, test_paths, train_labels, test_labels = train_test_split(image_paths, labels, test_size=0.2, stratify=labels)
# 将训练集和测试集分别复制到对应文件夹
train_dir = 'path/to/train'
test_dir = 'path/to/test'
for path, label in zip(train_paths, train_labels):
label_dir = os.path.join(train_dir, label)
os.makedirs(label_dir, exist_ok=True)
shutil.copy(path, label_dir)
for path, label in zip(test_paths, test_labels):
label_dir = os.path.join(test_dir, label)
os.makedirs(label_dir, exist_ok=True)
shutil.copy(path, label_dir)
```
2. 使用Keras的ImageDataGenerator类进行划分,可以实现数据增强和实时划分。示例代码如下:
```python
from keras.preprocessing.image import ImageDataGenerator
import os
# 图片所在文件夹路径
data_dir = 'path/to/data'
# 定义ImageDataGenerator对象
data_generator = ImageDataGenerator(validation_split=0.2)
# 生成训练集和验证集
train_generator = data_generator.flow_from_directory(
data_dir,
target_size=(224, 224),
batch_size=32,
subset='training'
)
validation_generator = data_generator.flow_from_directory(
data_dir,
target_size=(224, 224),
batch_size=32,
subset='validation'
)
# 训练模型
model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
validation_data=validation_generator,
validation_steps=validation_generator.samples // validation_generator.batch_size,
epochs=10
)
```
阅读全文