谱聚类哈希：一种高效语义编码方法

spectral

hashing

需积分: 9 85 浏览量更新于2024-09-10 收藏 996KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"Spectral hashing是一种用于数据点编码的方法，旨在通过二进制代码来表示数据，使得具有相似语义的数据点之间的汉明距离能够反映其相似性。这种方法与图划分问题紧密相关，并被证明是NP难的问题。通过松弛原问题，可以采用谱方法求解，该方法基于图拉普拉斯矩阵的阈值特征向量。利用最近关于图拉普拉斯特征向量收敛到流形拉普拉斯-贝尔特拉米特征函数的成果，能有效地计算新数据点的编码。整个过程学习和应用都非常简单，实验结果显示，谱哈希码在性能上优于现有最先进的技术。" 本文由Yair Weiss、Antonio Torralba和Rob Fergus三位作者共同撰写，主要探讨了语义哈希（Semantic Hashing）的优化问题及其与图划分的联系。在语义哈希中，目标是为数据点生成紧凑的二进制码，使码字间的汉明距离能反映它们在语义上的相似度。作者指出，寻找最佳编码方案对于给定数据集的问题与图分割问题密切相关，并且这个问题被证明属于NP完全问题。为了简化这一复杂问题，文章提出了一个谱方法。这种方法通过对图拉普拉斯矩阵的特征向量进行阈值处理来找到解决方案。图拉普拉斯矩阵在图论中是一个重要的工具，它反映了图中节点的连接情况。阈值特征向量可以视为一种近似最优编码的方式。论文还引用了关于图拉普拉斯特征向量如何趋近于流形的拉普拉斯-贝尔特拉米特征函数的最新研究。这使得我们能够高效地计算新数据点的编码，即使这些数据点没有在训练集中出现过。这种高效性是通过利用流形理论的性质实现的，流形理论是现代几何学的一个分支，能够描述复杂数据的内在结构。最终，作者的实验表明，他们提出的谱哈希方法在保持计算简单性的同时，性能优于当前的最先进方法。这表明谱哈希不仅提供了有效的数据编码策略，而且在实际应用中具有很高的潜力。对于大规模数据集的检索和分类，以及在资源有限的环境中，如嵌入式系统或移动设备，这种高效的语义哈希方法尤其有价值。

资源详情

资源推荐

Spectral Hashing

Yair Weiss

1,3

School of Computer Science,

Hebrew University,

91904, Jerusalem, Israel

yweiss@cs.huji.ac.il

Antonio Torralba

CSAIL, MIT,

32 Vassar St.,

Cambridge, MA 02139

torralba@csail.mit.edu

Rob Fergus

Courant Institute, NYU,

715 Broadway,

New York, NY 10003

fergus@cs.nyu.edu

Abstract

Semantic hashing[1] seeks compact binary codes of data-points so that the

Hamming distance between codewords correlates with semantic similarity.

In this paper, we show that the problem of ﬁnding a best co de for a given

dataset is closely related to the problem of graph partitioning and can

be shown to be NP hard. By relaxing the original problem, we obtain a

spectral method whose solutions are simply a subset of thresholded eigen-

vectors of the graph Laplacian. By utilizing recent results on convergence

of graph Laplacian eigenvectors to the Laplace-Beltrami eigenfunctions of

manifolds, we show how to eﬃciently calculate the code of a novel data-

point. Taken together, both learning the code and applying it to a novel

point are extremely simple. Our experiments show that our codes outper-

form the state-of-the art.

1 Introduction

With the advent of the Internet, it is now possible to use huge training sets to address

challenging tasks in machine learning. As a motivating example, consider the recent work

of Torralba et al. who collected a dataset of 80 million images from the Internet [2, 3]. They

then used this weakly labeled dataset to perform scene categorization. To categorize a novel

image, they simply searched for similar images in the dataset and used the labels of these

retrieved images to predict the label of the novel image. A similar approach was used in [4]

for scene completion.

Although conceptually simple, actually carrying out such methods requires highly eﬃcient

ways of (1) storing millions of images in memory and (2) quickly ﬁnding similar images to

a target image.

Semantic hashing, introduced by Salakhutdinov and Hinton[5] , is a clever way of addressing

both of these challenges. In semantic hashing, each item in the database is represented by a

compact binary code. The code is constructed so that similar items will have similar binary

codewords and there is a simple feedforward network that can calculate the binary code for

a novel input. Retrieving similar neighbors is then done simply by retrieving all items with

codes within a small Hamming distance of the code for the query. This kind of retrieval can

be amazingly fast - millions of queries per second on standard computers. The key for this

method to work is to learn a good code for the dataset. We need a code that is (1) easily

computed for a novel input (2) requires a small number of bits to code the full dataset and

(3) maps similar items to similar binary codewords.

To simplify the problem, we will assume that the items have already been embedded in

a Euclidean space, say R

, in which Euclidean distance correlates with the desired simi-

larity. The problem of ﬁnding such a Euclidean emb edding has been addressed in a large

下载后可阅读完整内容，剩余7页未读，立即下载

u010453586

粉丝: 0
资源: 4

谱聚类哈希：一种高效语义编码方法

Reversed Spectral Hashing

图像检索哈希算法的发展史，标注对应的年限

图像检索哈希算法的发展史，标注年份

图像检索哈希算法的发展史,标注年限

图像检索哈希算法的发展史,请标注年限

spectral embedding and spectral rotation

spectral 读取hdr文件

windows下安装spectral

python下的spectral模块

spectral是什么库

spectral = sp.Spectral(n_clusters=k)为什么只标红了sp

python tensorflow spectral normalization

spectral entropy python代码

opencv中spectralClustering函数怎么用

spectral工具箱

上述代码spectral_amp如何安装

SKlearn.clustering.spectralClustering

spectralclustering函数使用

最新资源