代码实现对两千张图片进行随机划分为测试集和验证集
时间: 2023-07-16 18:14:49 浏览: 46
可以使用Python中的sklearn库中的train_test_split方法来实现对图片数据集的随机划分。具体步骤如下:
1. 导入sklearn库以及其他需要用到的库
```python
from sklearn.model_selection import train_test_split
import os
import random
```
2. 指定图片数据集所在的文件夹路径
```python
dataset_path = '/path/to/dataset/folder'
```
3. 获取所有图片文件名
```python
image_filenames = os.listdir(dataset_path)
```
4. 随机打乱图片文件名的顺序
```python
random.shuffle(image_filenames)
```
5. 划分数据集
```python
train_filenames, val_filenames = train_test_split(image_filenames, test_size=0.5)
```
其中,test_size参数指定了测试集所占比例,这里指定为0.5,也就是将数据集平均划分为测试集和验证集。
6. 可以根据需要将训练集和验证集的文件名写入txt文件中,方便后续读取。
完整代码如下:
```python
from sklearn.model_selection import train_test_split
import os
import random
dataset_path = '/path/to/dataset/folder'
image_filenames = os.listdir(dataset_path)
random.shuffle(image_filenames)
train_filenames, val_filenames = train_test_split(image_filenames, test_size=0.5)
with open('train.txt', 'w') as f:
for filename in train_filenames:
f.write(filename + '\n')
with open('val.txt', 'w') as f:
for filename in val_filenames:
f.write(filename + '\n')
```