python按类别分层抽样将数据划分为训练集、验证集和测试集的代码实现
时间: 2023-09-22 20:07:59 浏览: 62
下面是一个基于Python的示例代码,用于按类别分层抽样将数据划分为训练集、验证集和测试集:
```
import os
import random
import shutil
# 定义数据集根目录
data_dir = '/path/to/dataset'
# 定义划分比例
train_ratio = 0.6
val_ratio = 0.2
test_ratio = 0.2
# 定义类别列表
class_names = os.listdir(data_dir)
# 遍历每个类别
for class_name in class_names:
# 获取当前类别下的所有文件路径
class_dir = os.path.join(data_dir, class_name)
file_paths = os.listdir(class_dir)
# 打乱文件路径顺序
random.shuffle(file_paths)
# 计算划分点
num_files = len(file_paths)
num_train = int(num_files * train_ratio)
num_val = int(num_files * val_ratio)
# 划分数据集
train_files = file_paths[:num_train]
val_files = file_paths[num_train:num_train+num_val]
test_files = file_paths[num_train+num_val:]
# 创建目标目录
os.makedirs(os.path.join(data_dir, 'train', class_name), exist_ok=True)
os.makedirs(os.path.join(data_dir, 'val', class_name), exist_ok=True)
os.makedirs(os.path.join(data_dir, 'test', class_name), exist_ok=True)
# 移动文件到目标目录
for file_name in train_files:
src_path = os.path.join(class_dir, file_name)
dst_path = os.path.join(data_dir, 'train', class_name, file_name)
shutil.move(src_path, dst_path)
for file_name in val_files:
src_path = os.path.join(class_dir, file_name)
dst_path = os.path.join(data_dir, 'val', class_name, file_name)
shutil.move(src_path, dst_path)
for file_name in test_files:
src_path = os.path.join(class_dir, file_name)
dst_path = os.path.join(data_dir, 'test', class_name, file_name)
shutil.move(src_path, dst_path)
```
上述代码将数据集划分为了训练集、验证集和测试集,其中训练集占60%,验证集占20%,测试集占20%。每个类别的文件都被分别移动到了对应的目录中。