实时图片内容搜索与自动化注解技术

需积分: 3 194 浏览量更新于2024-08-01 收藏 3.68MB PDF 举报

"基于内容图像快速搜索" 在计算机科学领域，基于图像内容的搜索技术一直是一项重要的研究课题。这种技术能够帮助用户快速找到与输入图像内容相似的其他图像，广泛应用于网络图片搜索、在线图片分享社区以及科学研究等领域。本文将探讨一种名为"Real-Time Computerized Annotation of Pictures"的方法，该方法由Jia Li和James Z. Wang两位IEEE资深会员提出，旨在加速图像自动注解的过程。首先，图像自动注解是基于内容图像搜索的关键组成部分。它涉及机器学习中的两个核心问题：有效的方法优化和数据估计。作者开发的新技术为解决这两个问题提供了基础。这些技术被整合到名为Automatic Linguistic Indexing of Pictures - Real Time (ALIPR)的系统中，这是一个全自动化、高速的在线图片注解系统。 D2聚类方法是ALIPR系统中的一个重要创新，其灵感来源于K-Means算法，但适用于向量的加权集合。D2聚类法能够将具有类似特征的图像对象分组，这对于处理大量图像数据时的相似性检测至关重要。它能够快速而准确地识别和归类不同图像之间的共性特征。此外，针对非向量数据，ALIPR采用了一种广义混合模型技术，其中特别提到了Kernel Smoothing作为特殊情况。这一技术利用了新颖的Hypothetical Local Mapping (HLM)概念，使得非结构化和复杂的数据类型也能进行有效的建模和分析。在实际应用中，ALIPR系统已经通过数千张来自互联网的图片进行了测试，这验证了其在处理大规模图片库时的高效性和准确性。通过对图像内容的快速理解和标注，ALIPR显著提升了基于内容的图像搜索性能，使得用户能够在海量图片中迅速找到所需的目标图像。基于内容图像快速搜索的技术不仅依赖于高效的图像处理算法，还需要先进的机器学习策略来理解和描述图像内容。ALIPR系统是这一领域的杰出代表，它通过D2聚类和HLM等技术解决了自动注解的挑战，为提升图像搜索速度和准确性开辟了新的路径。这一领域的持续发展对于推动互联网图像服务、社交媒体以及人工智能的进步具有重要意义。

To reduce from an order of 10 minutes to a second (which

represents a three order-of-magnitude cutback), substantial

reduction in both the feature complexity and modeling

complexity must be accomplished while maintaining a

reasonable level of accuracy for practical use in online tasks.

The integration of region segmentation and representative

color and texture extraction from the segments is a suitable

time-reduction strategy; however, sophisticated region seg-

mentation methods themselves are often not in real time.

Borrowing from the experiences gained in large-scale visual

similarity search, we use a fast image segmentation method

based on wavelets and K-Means clustering [32].

The low complexity of this segmentation method makes

it an attractive option for processing large amounts of

images. Unfortunately, this method is tuned toward

recognizing scenes, and thus, we expect its insufficiency

for recognizing individual objects, given the great variations

a type of objects (for example, dogs) can appear in pictures.

Although object names are often assigned by the system, the

selection is mostly based on statistical correlation with

scenes. On the other hand, as pointed out by one reviewer,

different levels of performance may be possible under a

more controlled image set, such as various types of the

same object or images of the same domain. We will explore

this in the future.

After the region-based signatures are extracted from the

pictures, we encounter the essential obstacle: the segmenta-

tion-based signatures are unordered and of arbitrary lengths

across the picture collection, primarily because the number of

regions used to represent a picture often depends on how

complicated the composition of the picture is. No existing

statistical tools can handle the modeling in this scenario. The

key challenge to us, therefore, is to develop new statistical

methods for signature modeling and model matching when

the signatures are in the form of discrete distributions. The

details on these are provided in the following sections.

2.3 Image Signature

To form the signature of an image, two types of features are

extracted: color and texture. To extract the color part of the

signature, the RGB color components of each pixel are

converted to the LUV color components. The 3D color vectors

at all the pixels are clustered by K-Means. The number of

clusters in K-Means is determined dynamically by thresh-

olding the average within cluster distances. Arranging the

cluster labels of the pixels into an image according to the pixel

positions, we obtain a segmentation of the image. We refer to

the collection of pixels mapped to the same cluster as a region.

For each region, its average color vector and the percentage of

pixels it contains with respect to the whole image are

computed. The c olor signature is thus formulated as a

discrete distribution fðv

ð1Þ

Þ; ðv

ð2Þ

Þ; ...; ðv

ðmÞ

Þg,

where v

ðjÞ

is the mean color vector, p

ðjÞ

is the associated

probability, and m is the number of regions.

We use wavelet coefficients in high-frequency bands to

form texture features. A Daubechies-4 wavelet transform [9]

is applied to the L component (intensity) of each image. The

transform decomposes an image into four frequency bands:

LL, LH, HL, HH. The LH, HL, and HH band wavelet

coefficients corresponding to the same spatial position in the

image are grouped into one 3D texture feature vector. If an

image contains n

 n

pixels, the total number of texture

feature vectors is



due to the subsampling of the

wavelet transform. When forming the texture features, the

absolute values of the wavelet coefficients are used. K-Means

clustering is applied to the texture feature vectors to extract

the major modes of these vectors. Again, the number of

clusters is decided adaptively by thresholding the average

within cluster distances. Similar to color, the texture signature

is cast into a discrete distribution.

Although we only involve color and texture in the current

image signature, other types of image features such as shape

and salient points can also be formulated into discrete

distributions, that is, bags of weighted vectors. For instance,

bags of SIFT features [17] are used to characterize and

subsequently detect advertisement logos in video frames [1].

As expected, our current image signature is not sensitive to

shape patterns. We choose to use color and texture features

because they are relatively robust for digital photos generated

by Internet users. Shape or salient point features may be more

appealing for recognizing objects. However, these features

are highly prone to corruption when the background is noisy,

object-viewing angle varies, or occlusion occurs, as is usually

the case. Moreover, semantics of an image sometimes cannot

be adequately expressed by a collection of object names.

Deriving image signatures that are robust and strong for

semantic recognition is itself a deep research problem that we

would like to explore in the future.

In general, let us denote images in the database by

f

;

; ...;

g. Suppose every image is represented by an

array of discrete distributions, 

¼ð

i;1

;

i;2

; ...;

i;d

Þ. Denote

the space of 

i;l

by 

, 

i;l

2 

, l ¼ 1; 2; ...;d. Then, the space

of 

is the Cartesian product space

 ¼ 

 



The dimension d of , that is, the number of distributions

contained in 

, is referred to as the superdimension to

distinguish from the dimensions of vector spaces on which

these distributions are defined. For a fixed superdimension j,

the distributions 

i;j

, i ¼ 1; ...;N, are defined on the same

vector space, R

, where d

is the dimension of the jth sample

space. Denote distribution 

i;j



i;j

¼ v

ð1Þ

i;j

ð1Þ

i;j



ð2Þ

i;j

ð2Þ

i;j



; ...;v

ðm

i;j

ðm

i;j

no

; ð1Þ

where v

ðkÞ

i;j

, k ¼ 1; ...;m

i;j

, are vectors on which the

distribution 

i;j

takes positive probability p

ðkÞ

i;j

. The cardin-

ality of the support set for 

i;j

is m

i;j

, which varies with both

the image and the superdimension.

To further clarify the notation, consider the following

example. Suppose images are segmented into regions by

clustering 3D color features and 3D texture features respec-

tively. Suppose a region formed by segmentation with either

type of features is characterized by the corresponding mean

feature vector. For brevity, suppose the regions have equal

weights. Since two sets of regions are obtained for each image,

the superdimensionality is d ¼ 2. Let the first superdimen-

sion correspond to color-based regions and the second to

texture-based regions. Suppose an image i has four color

regions and five texture regions. Then,



i;1

n

ð1Þ

i;1

;



;



ð2Þ

i;1

;



; ...;



ð4Þ

i;1

;

o

ðkÞ

i;1

;



i;2

n

ð1Þ

i;2

;



;



ð2Þ

i;2

;



; ...;



ð5Þ

i;2

;

o

ðkÞ

i;2

988 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 6, JUNE 2008

Authorized licensed use limited to: BEIJING UNIVERSITY OF TECHNOLOGY. Downloaded on May 23,2010 at 15:46:08 UTC from IEEE Xplore. Restrictions apply.

剩余17页未读，继续阅读

liuhaijun11

粉丝: 2
资源: 3

实时图片内容搜索与自动化注解技术

基于内容的图像检索

新基于内容图像检索系统

基于内容图像检索程序及论文

基于快速搜索算法的分形图像压缩技术研究

基于图像识别的图像搜索系统开发.zip

基于内容的图像检索方法

基于内容的图像处理系统

基于内容的图像检索软件

基于内容的图像检索专题

基于内容的图像检索系统

最新资源