用python写一个根据已有的train.csv,test.csv和val.csv,里面包含图片的名称和类别将图片自动分为训练,测试和验证集的程序
时间: 2024-11-18 09:16:48 浏览: 19
要编写一个Python脚本来根据`train.csv`, `test.csv`, 和 `val.csv` 文件自动划分图像到训练、测试和验证集,你可以遵循以下步骤:
1. **导入必要的库**[^2]:
```python
import os
import pandas as pd
from sklearn.model_selection import train_test_split
```
2. **读取CSV文件**[^1]:
```python
def read_csv_files(csv_files):
dataframes = []
for csv in csv_files:
df = pd.read_csv(csv)
dataframes.append(df)
return dataframes
train_df, test_df, val_df = read_csv_files(['train.csv', 'test.csv', 'val.csv'])
```
3. **合并数据**:
```python
# 如果数据来自同一源并且列名相同,可以直接合并
if len(train_df.columns) == len(test_df.columns) == len(val_df.columns):
all_data = pd.concat([train_df, test_df, val_df])
else:
raise ValueError("Columns in CSV files are not compatible.")
# 标记分割标签
all_data['split'] = 'all'
train_df = all_data[all_data.split == 'train']
test_df = all_data[all_data.split.isin(['test', 'val'])]
val_df = test_df[test_df.split == 'val']
```
4. **拆分图像文件夹**:
```python
def split_images(image_folder, dataframe, split_column, dest_folders):
for folder, images in dataframe.groupby(split_column):
os.makedirs(os.path.join(dest_folders[folder], folder), exist_ok=True)
for _, row in images.iterrows():
src_image = os.path.join(image_folder, row['image_name'])
dst_image = os.path.join(dest_folders[folder], row['image_name'])
os.rename(src_image, dst_image)
split_image_folders = {'train': 'train_images', 'test': 'test_images', 'val': 'val_images'}
split_images('original_images_folder', train_df, 'split', split_image_folders)
split_images('original_images_folder', test_df, 'split', split_image_folders)
split_images('original_images_folder', val_df, 'split', split_image_folders)
```
5. **验证结果**:
```python
for df, folder in zip([train_df, test_df, val_df], split_image_folders):
assert set(df.image_name.unique()) == set(os.listdir(os.path.join('original_images_folder', folder))), "Images not properly split"
```
这个脚本假设CSV文件中有`image_name`列,用于存储图像路径,以及一个表示数据所属集合的列(如`split`)。你需要根据实际的数据结构调整代码。
阅读全文