深度利用长期视图依赖性进行3D形状识别

73 浏览量更新于2024-08-28 收藏 1.73MB PDF 举报

"这篇研究论文‘Deeply Exploiting Long-Term View Dependency for 3D Shape Recognition’深入探讨了如何利用长期视图依赖性来提升3D形状识别的性能。作者包括Yong Xu, ChaoDazheng, Ruotao Xu以及Yu Hui Quan等人，来自中国华南理工大学、鹏城实验室、广东省通信与计算机网络实验室以及广东省计算智能与空间信息重点实验室。该研究得到了中国国家自然科学基金、广东省自然科学基金以及广州市科技计划项目的资助。" 在3D形状识别领域，深度学习已经取得了显著的进步，但仍然面临一些挑战，如如何有效处理不同视角下的形状变化和理解长期的视图依赖性。这篇论文的核心贡献在于提出了一种方法，深挖长期视图间的依赖关系，以提高对3D形状的识别准确性和鲁棒性。传统的3D形状识别方法通常关注于单一视图或短期的视图序列，这可能会忽略形状在不同视角下的全局信息。论文中，作者提出了一种新的深度学习架构，它能够捕捉并利用不同视图之间的长期依赖，这有助于模型理解和学习3D形状的复杂几何特性。这个框架可能包含多层神经网络，每一层都设计用于捕获不同时间步长的视图信息。通过递归或循环神经网络（如LSTM或GRU）等结构，模型可以记住过去视图的信息，并将其与当前视图结合，以形成更全面的形状表示。此外，可能还使用了注意力机制，使模型能根据重要性权重分配对不同视图的关注。实验部分，作者可能对比了他们的方法与现有的3D形状识别技术，展示了在标准数据集上的性能提升，例如SHREC、ModelNet或3DShapeNets等。这些结果验证了利用长期视图依赖性的有效性，表明这种方法对于提高3D形状识别的准确性具有重要意义。此外，论文也讨论了可能遇到的挑战，比如视图选择的不均匀性、视图序列的噪声，以及如何在有限的训练数据下优化模型。作者可能提出了相应的解决方案，如数据增强、迁移学习或者半监督学习策略，以应对这些问题。这篇研究论文对于理解3D形状识别中的视图依赖性提供了新的见解，并为未来的工作提供了一个强大且有潜力的工具。通过深入挖掘长期视图关系，该方法有望推动3D形状识别技术的进一步发展，尤其是在自动驾驶、机器人视觉、虚拟现实和增强现实等领域。

Y. Xu et al.: Deeply Exploiting Long-Term View Dependency for 3D Shape Recognition

II. RELATED WORK

The early approaches to 3D shape recognition focus on the

design of handcrafted shape features. In recent years, inspired

by the success of machine learning (particularly deep learn-

ing) in computer vision, many learning-based approaches

have been proposed to learn adaptive shape descriptors from a

set of 3D data. Our work follows this line of research. In this

section, a brief review is given on the handcrafted features,

followed by a detailed review on the learning-based methods.

A. HANDCRAFTED FEATURES

The classic shape descriptors, such as the statistical

moments [29], Fourier descriptor [29], [30] and eigenvalue

descriptor [31], are devoted to the global description of

shapes. These methods suffer from the sensitivity to non-

rigid transformations or topological changes. To overcome

this weakness, some local geometric descriptors were pro-

posed as the building blocks to form global shape fea-

tures, e.g. spin images [32], shape context [33] and mesh

HOG [34]. Nevertheless, such descriptors are not robust

to local geometric deformations or perturbations. Recently,

the diffusion-based approaches have emerged as a promis-

ing direction for shape description, which enjoy strong

robustness to isometric deformations and small perturbations

on surfaces. These methods model the geometric structure

of shapes with a certain diffusion process, and the shape

descriptors are built upon the associated diffusion operators,

e.g. Discrete Laplace-Beltrami operator [35] and heat kernel

signature [36], [37].

B. LEARNING-BASED METHODS

Embracing recent advances of deep learning and neural

networks (NNs) in image classiﬁcation (i.e. an analogous

task to 3D shape recognition) [16], [17], most learning-

based methods for 3D shape recognition are built upon

NN architectures. There are mainly three formats of

3D data used in the NN-based methods, including points, vox-

els and views. According to the data format of NN’s input, the

NN-based methods can be classiﬁed into three categories:

point-cloud based methods, voxel-based methods, and view-

based methods. We focus more on the view-based meth-

ods in the literature review, as our method belongs to this

type. It is noted that there is a group of learning-based

approaches that take mesh surface as input by generalizing

CNNs to non-Euclidean geometries (e.g.spectral CNNs [38],

anisotropic CNNs [39]) or by using the handcrafted features

of objects as input (e.g. [40]). These approaches are devoted

to matching tasks, without published results on standard

shape recognition. Thus, we omit this group of approaches in

our literature review. It is also noted that a few approaches

use two or more sources of 3D data for further improve-

ment, e.g. both voxels and views are used in [41]. Last

but not least, there are also some approaches built upon

other machine learning techniques, e.g. multi-hypergraph

learning [42].

1) POINT-CLOUD BASED METHODS

In contrast to image data which is row-column indexed,

a point cloud (except for those computed from depth images)

is generally a set of point coordinates with irregular orga-

nization and unordered structure, which hinders the trivial

use of traditional image CNNs in point-cloud based methods.

To address such a fundamental challenge arising from raw

data, new NN architectures are needed. A pioneering work is

PointNet [43], a permutation-invariant deep architecture that

learns a spatial-encoding representation for each point and

combines them into a global descriptor. In [44], the Point-

Net architecture is extended to a hierarchical version called

PointNet++, which aims at better exploiting local structures

of shapes by applying PointNet recursively on a nested group-

ing of the input point cloud. Another work with the same

purpose of exploiting local shape structures is done in [45] via

kernel correlation and graph pooling. The grouping scheme

in PointNet++ is to implicitly exploit the spatial distribution

of points. For explicit exploitation, the kd-Net [46] builds a

kd-tree on the input point set and runs hierarchical feature

extractions from the leaves to root. Due to the non-overlap

partition by kd-tree, the kd-Net lacks of the overlapped recep-

tive ﬁelds which are useful for recognition. To address this

issue, Li et al. [47] proposed to replace the kd-tree with a

self-organizing map (SOM) and perform k-NN search from

points to SOM nodes, by which the receptive ﬁled overlap

can be controlled. Instead of directly dealing with point sets

in the network, Simonovsky and Komodakis [20] proposed to

structure the point cloud with a graph and apply a graph CNN

to process the graph-structured data.

2) VOXEL-BASED METHODS

Voxels of 3D objects are a straightforward extension of pixels

of 2D images, by which an object shape is represented as a

volumetric binary occupancy grid. Unlike point cloud data,

voxels are well indexed. Thus, the image-based CNNs can

be easily extended to handle voxelized data. A seminal work

can be traced back to 3D-ShapeNet [18], a volumetric con-

volutional deep belief network which expresses 3D shapes as

a probability distribution of binary variables on a voxel grid.

Another early attempt is Voxnet [19], which uses a shallow

volumetric CNN with the volumetric probabilistic occupancy

grid representation. The Voxnet architecture is jointed with an

orientation estimation task in [48] for performance improve-

ment. Since voxels are volumetric representations that can

easily become computationally intractable, the above voxel-

based NNs have to be shallow. To make use of the power of

deep learning, Brock et al. [49] proposed a deep voxel-based

CNN architecture which can be effectively and efﬁciently

trained. With the same purpose, Riegler et al. [50] proposed

to exploit the sparsity of voxelized data to enable deeper

networks without reducing resolution. To anylase the shape

distribution of 3D objects, [51] uses a VAE (variational

auto-encoder) to reconstruct full 3D shapes from voxelized

single-views. With the latent variables learned by the VAE,

111680 VOLUME 7, 2019

剩余13页未读，继续阅读

weixin_38683195

粉丝: 3
资源: 881

深度利用长期视图依赖性进行3D形状识别

Deeply Recursive Low- and High-Frequency Fusing Networks for Single Image Super

A-deeply-supervised-image-fusion-network-for-change-detection-in-remote-sensing-images:Paper的官方工具：深度监控的图像融合网络，用于高分辨率双时相遥感图像的变化检测

遥感图像变化检测论文

[错误] #include nested 到o deeply 怎么解决

1 20 C:\Program Files (x86)\Dev-Cpp\MinGW64\lib\gcc\x86_64-w64-mingw32\4.9.2\include\c++\iostream [Error] #include nested too deeply

Error #include nested too deeply 如何解决

python3 test.py -opt options/df2k/test_df2k.yml

Can you write an article about art.

perl 递归处理hash

请以“literacy and technology”为话题作一篇英语演讲

最新资源