实时3D形状检索：GIFT框架与大规模数据集挑战

研究论文

46 浏览量更新于2024-07-15 收藏 1.3MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"这篇研究论文‘GIFT: Towards Scalable 3D Shape Retrieval’探讨了在三维（3D）形状检索中实现可扩展性的挑战和解决方案。作者Song Bai、Xiang Bai、Zhichao Zhou、Zhaoxiang Zhang、Qi Tian和Longin Jan Latecki都是该领域的资深专家。他们提出了一种基于3D形状投影图像的实时3D形状搜索引擎，以解决传统算法在处理大规模数据集时的效率问题。" 正文: 在3D形状检索领域，项目分析扮演着至关重要的角色，因为人类对3D形状的视觉感知依赖于从不同视角观察到的各种2D视图。尽管多视图信息和区分性视图被广泛利用，但大多数基于投影的检索系统面临着计算成本高的问题，这使得它们无法满足搜索引擎对于可扩展性的基本要求。在过去的三年里，SHREC（Shape Retrieval Contest）竞赛重点关注3D形状检索算法的可扩展性，并为此组织了一系列大型规模的测试赛道。然而，实验结果显示，传统的算法在处理大规模数据集时表现不佳，无法直接应用。论文“GIFT”中介绍的新方法是针对这一问题的一次突破。GIFT（可能是"Generalized Image Features for Transformation"的缩写，但原文并未明确给出）是一个实时的3D形状搜索引擎，它依赖于3D形状的投影图像。这种实时属性使得搜索引擎能够快速高效地处理大量3D形状数据。论文详细阐述了如何通过优化投影图像的表示和比较方法来降低计算复杂度，同时保持检索精度。这可能包括特征提取、编码、相似度度量等方面的创新，以确保在大数据集上的快速检索。此外，GIFT可能还采用了数据结构和索引技术，如倒排索引或kd-trees，以加速查询过程，减少不必要的计算。论文还可能对比了GIFT方法与其他现有算法的性能，展示了其在大规模数据集上的优越性。这些比较可能包括检索速度、准确率和内存占用等关键指标。通过这种方式，GIFT为3D形状检索提供了一个新的、可扩展的解决方案，为未来的相关研究奠定了基础，并可能对3D数据的检索和分析产生深远影响。 “GIFT: Towards Scalable 3D Shape Retrieval”这篇论文揭示了3D形状检索领域的一个重要进展，即如何在不牺牲效率的前提下提高检索系统的处理能力，这对于大数据时代的3D内容管理和应用具有重要意义。

资源详情

资源推荐

BAI et al.: GIFT: TOWARDS SCALABLE 3D SHAPE RETRIEVAL 1259

II. RELATED WORK

3D shape retrieval has been extensively investigated for a long

time, and plenty of algorithms were proposed for 3D model pre-

processing, feature extraction, shape matching, etc. A thorough

and exhausted review of those algorithms is unrealistic. There-

fore, we mainly focus on projection-based methods which have

a close relationship with our work.

Light Field Descriptor (LFD) [15], composed of Zernike mo-

ments and Fourier descriptors, is one of the most representa-

tive projection-based algorithms. Its basic assumption is that

if two 3D shapes are similar, they also look similar from all

viewpoints. Vranic et al. [27] deﬁne a composite shape descrip-

tor, which is generated using depth buffer images, silhouettes,

and ray-extents of a polygonal mesh. In [11], a novel descrip-

tor called PANORAMA is proposed. It projects 3D shapes to

the lateral surface of a cylinder, and describes the obtained

panoramic view by 2D Discrete Fourier Transform and 2D Dis-

crete Wavelet Transform. To ensure the rotation invariance as

far as possible, Continuous PCA (CPCA) and Normals PCA

(NPCA) [17] are both applied to 3D shapes before rendering the

projection. Daras et al. [28] propose Compact Multi-view De-

scriptor (CMVD), where 18 characteristic views are described

by 2D Polar-Fourier Transform, 2D Zernike Moments, and 2D

Krawtchouk Moments.

Meanwhile, some researchers consider borrowing the devel-

opment of feature learning in natural image analysis, so as to at-

tain discriminative representations of projections. For example,

Furuya et al. [29] introduce the Bag of visual Words (BoW) [14]

to 3D shape retrieval, where local descriptors [21] are extracted

on depth projections of 3D shapes and encoded into histogram

feature via vector quantization. By putting the visual descrip-

tors from different projections in one bag, Vectors of Locally

Aggregated Tensors (VLAT) [16] is investigated to produce an

equal-sized feature for each 3D shape. Tabia et al. [30], [31]

ﬁrstly explore the usage of covariance matrices of descriptors,

instead of the descriptors themselves, in 3D shape analysis.

Bai et al. [32] introduce a two layer coding framework which

jointly encodes a pair of views. By doing so, the spatial ar-

rangement of multiple views is captured which is shown to be

rotation-invariant.

Since deep learning has been proven to be a powerful tool

in many computer vision and pattern recognition topics, there

is an growing interest to leverage this popular paradigm in

3D shape community. As an extension of PANORAMA [11],

Shi et al. [33] choose to pool the response of each row of

feature map so that the deep panoramic representation re-

mains unchanged when the 3D shape rotates with regard to

its principal axis. Multi-view Convolutional Neural Networks

(MVCNN) [34] sets a view pooling layer in the architecture

of CNN to aggregate the multiple view representations. Note

that some deep-learning-based algorithms do not learn from

projections of shapes. For example, Wu et al. [9] perform 3D

Convolution on voxel grid of shapes with Deep Belief Network.

They also construct a large scale 3D shape repository called

ModelNet. In [35]–[38], deep learning is applied to mid-level

shape descriptors, instead of raw shape data.

Fig. 2. Illustration of projection rendering. θ

is the polar angle in xy plane

and θ

is the angle between the camera and xy plane.

Besides, there are also some works which focus on the opti-

mal matching strategy (e.g., clock matching [39], vector extrap-

olation matching [40], random forest [41], elastic net match-

ing [ 42]), discriminative view selection (e.g., adaptive views

clustering [43]), feature fusion (e.g., 2D/3D Hybrid [44], Hybrid

BoW [45], ZFDR [46]) and re-ranking (Multi-Feature Anchor

Manifold Ranking [47], diffusion process [23]).

As opposed to the above algorithms concerning retrieval ac-

curacy only, we establish a shape search system which attaches

more importance to retrieval efﬁciency.

III. P

ROPOSED SEARCH ENGINE

In this section, the details of each component of the proposed

search engine are given.

A. Projection Rendering

Prior to projection rendering, pose normalization for each 3D

shape is needed in order to attain invariance to some common

geometrical transformations. However, unlike many previous

algorithms [11], [17], [44] that require rotation normalization

using some Principal Component Analysis (PCA) techniques,

we only normalize the scale and the translation in our system.

Our concerns are two-fold: 1) PCA techniques are not always

stable, especially when dealing with some speciﬁc geometrical

characteristics such as symmetries, large planar or bumpy sur-

faces; 2) the view feature used in our system can tolerate the

rotation issue to a certain extent, though cannot be completely

invariant to such changes. In fact, we observe that if enough

projections (more than 25 in our experiments) are used, one can

already achieve reliable retrieval performances.

The projection procedure is as follows. Firstly, as illustrated

in Fig. 2, we place the centroid of each 3D shape at the origin

of a spherical coordinate system, and resize the maximum polar

distance of the points on the surface of the shape to unit length.

Then, we evenly divide [0, 2π] into 8 parts to get the values of

, and divide [0,π] into 8 parts to get the values of θ

.For

each pair (θ

,θ

), a virtual camera is set on the unit sphere.

剩余14页未读，继续阅读

weixin_38718223

粉丝: 11
资源: 930

实时3D形状检索：GIFT框架与大规模数据集挑战

TACOTRON：走向端到端语音合成 TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS.pdf

NOX: Towards an Operating System for Networks

depgraph: towards any structural pruning

addersr: towards energy efficient image super-resolution

faster r-cnn: towards real-time object detection with region proposal networks

《Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks》的参考实例

请讲一下UniMSE：Towards Unified Multimodal Sentiment Analysis and Emotion Recognition这篇论文中的模型

UniMSE：Towards Unified Multimodal Sentiment Analysis and Emotion Recognition这篇文章中，模型特点是什么？

better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection

towards optimal fine grained retrieval via decorrelated centralized loss wit

基于单目相机的姿态跟踪推荐最近几年的论文

并联机器人轨迹规划学习资料pdf

给我超高分辨率图像上做目标检测的相关文献

faster rcnn源码下载

laneatt代码复现

使用paddleocr参考文献

推荐几篇最高的“学习分析技术”相关论文，中英文都可以

以Faster RCNN为参考文献的具体写法

SAM-DETR算法参考文献

使用biped数据集训练HED

最新资源