Python提取COCO数据集指定类别至VOC格式

200 浏览量更新于2024-08-28 收藏 58KB PDF 举报

在Python中，利用pycocotools库可以方便地从COCO数据集中提取特定类别的图片和相应的标注信息。COCO（Common Objects in Context）是一个广泛使用的图像识别数据集，包含了大量的图像和详细的类别标签。首先，你需要安装pycocotools，可以通过git克隆其GitHub仓库，并指定子目录PythonAPI进行安装： ```shell pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI ``` 以下是一个详细的步骤，教你如何提取COCO数据集中特定类别的内容，例如人（person）类： 1. 导入必要的库： - `from pycocotools.coco import COCO`：用于与COCO数据集交互。 - `os`：处理文件和目录操作。 - `shutil`：复制文件和目录。 - `tqdm`：进度条模块，使过程可视化。 - `skimage.io`：读取图像。 - `matplotlib.pyplot`：绘制图像和可视化。 - `cv2`：OpenCV库，用于图像处理。 - `PIL`：Python Imaging Library，用于图像处理。 2. 定义变量： - `savepath`：存储提取的图片和注解文件的路径。 - `img_dir` 和 `anno_dir`：分别用于存储图片和注解文件的子目录。 - `datasets_list`：指定要处理的数据集部分，如`['train2014', 'val2014']`。 - `classes_names`：要提取的类别名称列表，如`['person']`。 3. 初始化COCO对象并加载数据集： ```python coco = COCO(dataDir + 'annotations/instances_train2014.json') # 使用训练集或验证集的json文件 ``` 4. 遍历数据集中的图片： ```python for dataset in datasets_list: img_ids = coco.getImgIds(catIds=[coco.getCatIds(catNms=classes_names)]) # 获取特定类别的图片ID for img_id in tqdm(img_ids): img_data = coco.loadImgs(img_id)[0] img_path = dataDir + img_data['file_name'] anno_path = anno_dir + img_data['file_name'].replace('.jpg', '.xml') # 根据COCO的标注文件命名规则 # 复制图片到目标目录 shutil.copy(img_path, img_dir) # 如果注解文件不存在，创建并写入注解 if not os.path.exists(anno_path): with open(anno_path, 'w') as f: f.write(headstr % img_data['file_name']) ann_ids = coco.getAnnIds(imgIds=img_id, catIds=coco.getCatIds(catNms=classes_names)) anns = coco.loadAnns(ann_ids) for ann in anns: # 写入注解信息 ann_dict = coco.annToXml(ann) f.write(ann_dict) ``` 这个脚本会将COCO数据集中指定类别的图片和对应的XML标注文件提取出来，并保存到指定的`img_dir`和`anno_dir`目录下。如果你想提取其他类，只需更改`classes_names`列表即可。值得注意的是，COCO数据集还提供了`instances_val2014.json`等其他版本，可以根据需求调整`dataDir`和`datasets_list`。同时，该脚本假设`annToXml`方法可以成功地将COCO标注转换为VOC格式的XML。如果你需要进一步处理或转换标注，可能需要查阅pycocotools的文档或进行适当的调整。

python实现提取实现提取COCO,VOC数据集中特定的类数据集中特定的类

1.python提取提取COCO数据集中特定的类数据集中特定的类

安装pycocotools github地址：https://github.com/philferriere/cocoapi

pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI

提取特定的类别如下：

from pycocotools.coco import COCO

import os

import shutil

from tqdm import tqdm

import skimage.io as io

import matplotlib.pyplot as plt

import cv2

from PIL import Image, ImageDraw

#the path you want to save your results for coco to voc

savepath="/media/huanglong/Newsmy/COCO/" #保存提取类的路径,我放在同一路径下

img_dir=savepath+'images/'

anno_dir=savepath+'Annotations/'

# datasets_list=['train2014', 'val2014'] datasets_list=['train2014']

classes_names = ['person'] #coco有80类，这里写要提取类的名字，以person为例

#Store annotations and train2014/val2014/... in this folder

dataDir= '/media/huanglong/Newsmy/COCO/' #原coco数据集

headstr = """\

<database>My Database</database>

<image>flickr</image>

</source>

<owner>

<name>company</name>

</owner>

<size>

</size>

"""

objstr = """\

<pose>Unspecified</pose>

</bndbox>

</object>

"""

tailstr = '''\

</annotation>

'''

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38621565

粉丝: 4
资源: 959

Python提取COCO数据集指定类别至VOC格式

COCO到VOC转换脚本：解析并生成VOC格式数据

COCO数据集图像及注释提取工具使用指南

Deeplab预训练分段数据集生成器coco2voc2介绍

coco_data_extract:从COCO数据集中提取特定的目录图像和注释

voc中xml格式转coco数据json格式

目标检测数据集制作:VOC,COCO,YOLO等常用数据集格式的制作和互相转换脚本

目标检测常用数据集，包含了coco格式yolo格式voc格式，并且包含了它们互相转换的代码

5. 深度学习中的目标检测 python代码实现

揭秘YOLO数据集自定义类提取秘籍：打造专属数据集，轻松实现目标检测

使用Python实现yolov3目标检测：基本概念与环境搭建

最新资源