利用多文本元数据自动构建大规模相关图像数据集

研究论文

46 浏览量更新于2024-08-26 收藏 436KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"具有多个文本元数据的自动图像数据集构建" 这篇研究论文主要探讨了如何自动构建一个包含多个文本元数据的大型图像数据集。在当前的计算机视觉领域，高质量的图像数据集对于模型训练和算法开发至关重要。然而，手动收集和标注大量图像既耗时又昂贵。因此，该研究提出了一种新颖的框架，旨在通过自动化过程解决这一问题。首先，研究工作利用谷歌图书Ngram语料库来扩展给定的查询。谷歌图书Ngram语料库是一个巨大的文本数据库，包含了书籍中的大量词汇出现频率信息。通过搜索这个语料库，可以为原始查询找到更丰富的语义描述，这有助于获取与查询主题更相关的信息。接下来，为了筛选出视觉上非显著和相关性较低的扩展项，论文提出了一种过滤机制。这一过程有助于确保所选图像与查询主题有高度关联性。然后，使用这些过滤后的扩展词从互联网上检索图像。为了进一步剔除噪声图像，研究人员采用了聚类算法和逐步卷积神经网络（CNN）。聚类能够将相似的图像分组，而CNN则可以对图像进行深度分析，识别其内容并排除与查询不符的图像。这种结合使用传统方法与深度学习技术的方式，能够更有效地去除不相关或质量低下的图像。为了验证所提方法的有效性，研究者构建了一个实验来评估他们的框架。通过对比实验，他们证明了利用多文本元数据的自动图像数据集构建方法相比于传统方法能显著提高数据集的相关性和质量。这种方法的应用可以极大地促进计算机视觉领域的研究，尤其是在图像识别、物体检测和语义理解等任务上。这篇论文提供了一种创新的解决方案，它不仅提高了自动构建图像数据集的效率，还确保了数据集的质量和多样性。这种方法有望为未来的计算机视觉研究提供更加丰富和精确的训练数据，推动相关技术的进步。

资源详情

资源推荐

AUTOMATIC IMAGE DATASET CONSTRUCTION WITH MULTIPLE TEXTUAL

METADATA

Yazhou Yao

1,2

, Jian Zhang

, Fumin Shen

, Xiansheng Hua

, Jingsong Xu

, Zhenmin Tang

University of Technology Sydney, Australia,

Nanjing University of Science and Technology, China

University of Electronic Science and Technology of China,

Alibaba Group, Hangzhou, China

{yaoyazhou, fumin.shen, huaxiansheng}@gmail.com, tzm.cs@njust.edu.cn

{jian.zhang, jingsong.xu}@uts.edu.au

ABSTRACT

The goal of this work is to automatically collect a large num-

ber of highly relevant images from the Internet for given

queries. A novel image dataset construction framework is

proposed by employing multiple textual metadata. In spe-

ciﬁc, the given queries are ﬁrst expanded by searching in the

Google Books Ngrams Corpora to obtain a richer semantic

description, from which the visually non-salient and less rel-

evant expansions are then ﬁltered. After retrieving images

from the Internet with ﬁltered expansions, we further ﬁlter

noisy images by clustering and progressively Convolutional

Neural Networks (CNN). To verify the effectiveness of our

proposed method, we construct a dataset with 10 categories,

which is not only much larger than but also have compara-

ble cross-dataset generalization ability with manually labeled

dataset STL-10 and CIFAR-10.

Index Terms— Automatic Image Dataset Construction,

Multiple textual metadata, Clustering, Progressively CNN

1. INTRODUCTION

Labelled image datasets have played a critical role in high-

level image understanding and drive the progress of feature

designing. For example, ImageNet has acted as one of the

most important factors in the recent advance of developing

and deploying visual representation learning models (e.g.,

deep CNN). However, the process of constructing ImageNet

is both time consuming and labor intensive. It is consequently

a natural idea to leverage image search engine (e.g., Google

Image) or social network (e.g., Flickr) to construct the desired

image dataset. Generally, Google Image search engine has

a relatively higher accuracy than social network like Flickr.

However, directly constructing image dataset with retrieved

images from Google is not practical. It is mainly due to the

download restrictions for each query and the unsatisfactory

accuracy of ranking relatively rearward images. In order to

tackle this problem, we propose a novel image dataset con-

structing framework, through which a large of highly relevant

images are automatically extracted from the Internet.

Fig. 1: The average precision of top 1000 images in Google

image, Flickr and our dataset for 10 queries.

In order to build a high-quality image dataset from Inter-

net, we propose to construct the collection for each query by

three major steps: query expanding, noisy expansions ﬁlter-

ing and noisy images ﬁltering. Speciﬁcally, by searching in

the Google Books Ngrams Corpora (GBNC), we ﬁrstly ex-

pand the given query to a set of semantically rich expansions,

from which the noisy query expansions are then removed by

exploiting both word-word and visual-visual similarity. After

we obtain the candidate images by retrieving these ﬁltered ex-

pansions with search engine, as an important step, clustering

and progressively CNN based methods are applied to further

remove these noisy images. To verify effectiveness of the

proposed automatic image dataset construction method, we

build a image dataset with 10 categories named AutoImgSet-

10. We evaluate its precision by comparing with methods

[1, 2, 3]. In addition, we also evaluate the cross-dataset gen-

eralization ability by comparing with two manually labeled

image datasets STL-10 and CIFAR-10. Fig.1 demonstrates

the improvement achieved by our method over the initially

downloaded images from Google and Flickr.

2. RELATED WORK

To our knowledge, there are three principal methods of con-

structing image dataset: manual annotation, semi-automatic

method and automatic method. Manual annotation has a high

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38746926

粉丝: 12
资源: 994

利用多文本元数据自动构建大规模相关图像数据集

wikipedia_info.zip_数据集_维基百科网站_跨模态_跨模态图像_跨模态数据

图像、文本或音频等类型数据集.docx

总文本数据集。它由1555个图像组成，具有超过3种不同的文本方向_水平，多方向和弯曲，一种.zip

卷积神经网络cifar10

广义可加模型可以多分类吗

数据仓库与数据挖掘相关的算法

MLPClassifier

谈一谈深度学习的原理

bp、lstm、gru和随机森林

HTML和XML有什么区别？

matlab深度学习入门实例:从0搭建卷积神经网络cnn

2024深度学习方向

深度学习grumatlab代码讲解

想知道html中常用的代码

图像、文本或音频等类型数据集.zip

中文文本推断项目,包括88万文本蕴含中文文本蕴含数据集的翻译与构建,基于深度学习的文本蕴含判定模型构建

YOLO焊缝检测数据集-dataset-11.zip

水稻与杂草分类数据集

果园环境的苹果数据集，具备images、xml、txt文件，共有2538张images，相信我没错！！！

人工智能深度学习之斑马线检测数据集斑马线数据集

最新资源