使用os读入图片数据集,并划分训练数据集和测试数据集
时间: 2024-12-16 22:16:09 浏览: 7
在Python中,我们可以使用`os`模块来遍历文件夹并加载图片数据,然后利用`sklearn`库中的`train_test_split`函数来划分训练数据集和测试数据集。以下是一个简单的步骤示例:
1. 首先,导入所需的库:
```python
import os
from skimage.io import imread
import numpy as np
from sklearn.model_selection import train_test_split
```
2. 确定图像文件夹路径及其结构:
```python
image_folder = "path/to/images" # 替换为实际图片文件夹路径
train_data_folder = os.path.join(image_folder, 'train') # 假设训练集在原文件夹下有子目录'train'
test_data_folder = os.path.join(image_folder, 'test') # 类似地,假设测试集在' test'子目录下
# 获取图片文件列表
image_files = [f for f in os.listdir(train_data_folder) if f.endswith(".jpg") or f.endswith(".png")] + \
[f for f in os.listdir(test_data_folder) if f.endswith(".jpg") or f.endswith(".png")]
```
3. 加载并预处理图像数据:
```python
images = []
labels = []
for folder, label in zip([train_data_folder, test_data_folder], ['train', 'test']):
for file in image_files:
img_path = os.path.join(folder, file)
img = imread(img_path)
images.append(img)
labels.append(label)
# 将图片转换成numpy数组
X = np.array(images)
y = np.array(labels)
```
4. 划分训练集和测试集:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
这将创建两个数据集:`X_train`和`y_train`用于训练模型,`X_test`和`y_test`用于评估模型性能。
阅读全文