深度探索计算机视觉前沿：理论与实践创新

4星 · 超过85%的资源需积分: 12 175 浏览量更新于2024-07-21 收藏 17.03MB PDF 举报

《计算机视觉高级主题》是一本涵盖了计算机视觉与模式识别领域的深度研究著作，由Giovanni Maria Farinella、Sebastiano Battiato和Roberto Cipolla三位编者编辑。该书作为"计算机视觉与模式识别进展"系列的一部分，共收录了433页的内容，于2013年9月24日出版，以英文撰写，具有Springer出版社的版权。书号分别为ISBN-10: 144715519X和ISBN-13: 978-1447155195。本书的核心内容包括多个前沿课题，既探讨了理论层面的深入研究，也展示了实践应用中的创新算法。书中涉及的关键知识点有： 1. 视觉特征与轨迹特征：章节详细介绍了视觉特征提取技术，这对于图像理解和分析至关重要，是后续处理如对象识别的基础。 2. 立体匹配：研究了如何通过立体视觉来实现空间深度感知，这是三维重建的关键步骤。 3. 半监督对象识别挑战与方法：深入剖析了在标注数据有限的情况下，如何提高物体识别的准确性和鲁棒性，以及介绍了一种新颖的人类动作分类方法。 4. 微型飞行器（MAVs）的视觉定位：针对无人飞行器的视觉导航技术进行了探讨，这在自主机器人和无人机领域具有重要价值。 5. 形状优化中的矩形约束：讨论了利用矩形约束进行凸形状优化的问题，对计算机图形学和机器人设计有实际应用。 6. 协同识别问题与大规模图像分类：针对多模态或跨模态识别问题，提供了基于距离的分类器，适用于海量图像的高效分类。 7. 四色定理在MRF问题中的应用：将数学理论与计算机视觉结合，展示了如何通过解决四色问题来优化概率图模型。 8. 室内环境理解与增强学习：介绍了采用贝叶斯生成模型来解析复杂室内场景的方法，以及如何通过提升k-NN规则来改进机器学习性能。 9. 场景特定物体检测：专门针对特定环境下的物体检测问题进行了深入探讨，这对智能监控和自动驾驶等场景极为重要。 10. 视频超分辨率时间序列：研究了如何利用时间信息来提升视频分辨率，增强了视觉体验和分析能力。《计算机视觉高级主题》提供了一个综合的视角，展示了当前计算机视觉研究的前沿趋势和技术突破，对于研究人员、工程师以及对该领域感兴趣的学生来说，是一本不可多得的参考资料。

4 M. Weinmann

Fig. 1.2 Illusion showing that an image might be more than the sum of its parts (“The Forest Has

Eyes” © Bev Doolittle/The Greenwich Workshop, Inc.)

Since then, many investigations yielded improved insights into human vision. Dif-

ferent psychological theories have been established among which Structuralism,

Gestalt Theory and the Theory of Ecological Optics cover important ideas which

ﬁnally led to cognitive and computational approaches to visual perception. With all

these ideas, a growing awareness of the importance of visual features emerged.

Structuralism, where perception is based on a large number of sensory atoms

measuring color at speciﬁc locations within the ﬁeld of view, represents a very sim-

ple principle of visual perception. However, when considering visual illusions such

as the one depicted in Fig. 1.2, it becomes apparent that the whole might be dif-

ferent from the sum of its parts, which has already been that concisely summarized

by Aristotle. Hence, visual perception not only involves local image characteristics

such as lines, angles, colors or shape, but also grouping processes for what has been

detected with the senses and the respective interpretation. In Fig. 1.2, the grouping

processes arise from the fact that human vision is biased and trained to see faces. The

concept of Holism where the whole cannot be represented by utilizing the sum of its

parts alone due to relations between the parts has been addressed in Gestalt Theory.

The term gestalt can be seen as a German synonym for shape, form, ﬁgure or conﬁg-

uration. Summarizing the main ideas of Gestalt Theory, it has already been stated in

1924 that “there are entities where the behavior of the whole cannot be derived from

its individual elements nor from the way these elements ﬁt together; rather the oppo-

site is true: the properties of any of the parts are determined by the intrinsic structural

laws of the whole” [146]. These structural laws of the whole correspond to relations

between the parts and, as a consequence, special principles for grouping elements

of an image have been proposed. According to [145] and [70], the visual system

uses different principles for automatically grouping elements into patterns such as

proximity, similarity, closure, symmetry, common fate (i.e. common motion) and

continuity.In[50], further principles like homogeneity and contour are proposed.

1 Visual Features—From Early Concepts to Modern Computer Vision 5

The homogeneity is introduced as texture and the contour represents a physical edge

caused by an abrupt change in texture or color. However, describing the whole con-

tent of an image might be very complex and therefore, the concept of Reductionism

according to which an image can be explained by reduction to its fundamental parts

became more and more important. In [50] and [51], the Theory of Ecological Optics

has been presented which focuses on active and direct perception of visual informa-

tion. Involving basic principles of Gestalt Theory as well as aspects of information

theory, a modiﬁed set of 10 principles concerning varieties of continuous regularity,

discontinuous regularity or recurrence, proximity and situations involving interac-

tion is presented in [7]. These principles arise from three main observations:

• Information is not uniformly distributed but rather concentrated along contours

and especially at points on a contour at which its direction changes, that is, cor-

ners.

• Contours are caused by changes of homogeneity with respect to color or texture.

• The degree of homogeneity, that is, texture, can be considered as a characteristic.

Consequently, the perceived information contains a certain degree of redundancy.

Visual features should therefore provide a compact representation of the image con-

tent. Hence, typical types of visual features in an image might be corners, contours

(i.e., shape), color/intensity or homogeneity (i.e., texture). However, exactly these

types of visual features have already been addressed in Fig. 1.1.

The cognitive and computational approaches to visual perception are based on

combining descriptive principles already proposed in Gestalt Theory and the Theory

of Ecological Optics with a model of perceptual processing. According to [87], the

early processing of visual information therefore focused on achieving a primitive but

rich description of intensity changes present in an image which has been denoted as

primal sketch. This description is derived from different kinds of intensity changes

like edges, lines and blobs as well as additional parameters specifying attributes

like position, size, orientation, contrast, termination and fuzziness. As the primal

sketch represents a relatively impractical description, further processes for grouping

elements are essential which are similar to those of Gestalt Theory. Further involving

concepts of Reductionism has proven to be feasible, as an image typically contains

very much and redundant information. Exploiting all this information for a special

application results in a large amount of input data which leads to problems if tasks

cannot be solved fast enough. Hence, only the most signiﬁcant information of an

image can be exploited which typically corresponds to visual features. Thus, feature

extraction is commonly introduced for a variety of tasks in computer vision. Suitable

features even allow for comparing images or image patches with different size as

well as image content in form of objects which vary in scale and orientation.

1.2.3 Are Visual Features Always Necessary?

Although many investigations within the last few decades have clearly demonstrated

the high potential of using visual features in a variety of applications, some investi-

6 M. Weinmann

gations show that several tasks can also be solved without using such features. The

automatic and accurate alignment of captured point clouds, for instance, is an im-

portant task for digitization, reconstruction and interpretation of 3D scenes. Active

sensors such as terrestrial laser scanners can cope with measuring the 3D distance

of scene points and simultaneously capturing image information in form of either

co-registered camera images or panoramic reﬂectance images representing the re-

spective energy of the backscattered laser light. The recorded 3D point clouds typi-

cally provide a high point density as well as a high measurement accuracy. Hence,

the registration of two partially overlapping scans can be carried out based on the

3D geometry alone and thus without the need of visual features if the 3D structure

of the scene is distinctive enough.

Considering the example of point cloud registration, standard approaches such

as the Iterative Closest Point (ICP) algorithm [18, 117]orLeast Squares 3D Sur-

face Matching (LS3D)[53] only exploit spatial 3D information. Whereas the ICP

algorithm iteratively minimizes the difference between two point clouds, the LS3D

approach minimizes the distance between matched surfaces. Other approaches fo-

cus on the distribution of the points on 2D scan slices [22]orin3D[86]. For

environments with regular surfaces, various types of geometric primitives such as

planes [22, 106, 139] or more complex geometric features like spheres, cylinders or

tori [109] have been proposed. In scenes without regular surfaces, the registration

can rely on descriptors representing local surface patches which may, for instance,

be derived from geometric curvature or normal vectors of the local surface [8].

Several investigations, however, have shown that the registration of point clouds

can efﬁciently be supported by involving visual features derived from 2D imagery.

As both range and intensity information are typically measured on a regular scan

grid resulting from a cylindrical or spherical projection, they can be represented

as images. From these images, distinctive feature points can be extracted and re-

liable feature point correspondences between the images of different scans can

be derived. The extraction of such features has been proposed from range images

[11, 127], from intensity images [20, 66, 140] and from co-registered camera im-

ages [4, 10, 17]. In general, features in the intensity images provide a higher level

of distinctiveness than features in the respective range images [122] and, proba-

bly, information not yet represented in the range measurements. Projecting the in-

formation of distinctive 2D points to 3D space according to the respective range

information yields sparse point clouds describing physically almost identical 3D

points. The point cloud registration may then exploit the reliable 3D/3D correspon-

dences [122] or 3D/2D correspondences [143, 144] between different scans which

typically involves a RANSAC-based scheme [42]. Thus, the reduction to sparse

point clouds signiﬁcantly reduces the time effort and even tends to improve the accu-

racy of the registration results as the amount and inﬂuence of outliers can be reduced

[140, 143, 144]. Furthermore, these approaches can directly be transferred to Time-

of-Flight cameras or devices based on the use of structured light (e.g., Microsoft

Kinect). Consequently, most of the current approaches addressing point cloud reg-

istration consider both range and intensity information for reaching an increased

performance, although the alignment can also be carried out without using visual

features if the scene provides a sufﬁciently distinctive 3D structure.

1 Visual Features—From Early Concepts to Modern Computer Vision 7

1.2.4 What Is the Best Visual Feature?

Considering the very general deﬁnition, a large variety of visual feature types can be

extracted which indicates that, depending on the respective application and the data

available, different types of visual features will surely differ in their suitability. For

instance, it might be sufﬁcient to detect people via skin color extraction or to extract

bright objects in front of a dark background by simple thresholding techniques. Both

tasks can be realized very easily but if additional aspects are essential, for example,

when the detected persons or objects should be identiﬁed by comparison to elements

of a large database, the selected visual features may not provide enough information.

For this purpose, other feature types are more appropriate. This example clearly

shows that, instead of searching for the best feature type, it is more desirable to get

characteristics for a good feature type.

Based on approaches for extracting adequate features in pattern recognition, it

can be noticed that several ideas may obviously be adapted to characterize visual

features. Thus, in general, besides being observable, a good visual feature could at

least satisfy several of the most commonly desired characteristics:

• The extracted features should be distinctive and thus signiﬁcantly differ from their

spatial neighborhood.

• The extracted features should be invariant to irrelevant changes of the respective

image or image sequence from which they have been extracted, and robust against

noise effects.

• The feature extraction and the comparison of extracted features should be efﬁcient

with respect to computational effort.

• A visual feature or a set of visual features should be comparable and thus allow

for detecting similar content in different images or image sequences.

• The feature type should be highly relevant with respect to the application. For

different classes with (possibly abstract) objects, the features should provide only

small variations when comparing objects within the same class and large varia-

tions when comparing objects of different classes.

1.3 The Different Types of Visual Features

Once the awareness for transferring the idea of visual features to computational

concepts had emerged, different methods for extracting these features evolved. One

of the ﬁrst surveys on feature extraction has already been carried out in 1969 [76].

Since then, a very large number of approaches for detecting different kinds of fea-

tures has been presented of which only a relatively small number still has signiﬁ-

cant impact on current research. Hence, describing all of these methods for feature

extraction is beyond the scope of this chapter and, as visual features typically cor-

respond to a certain location or region within an image and its temporal behavior,

only the most important ideas concerning the spatial domain (Sect. 1.3.1) and the

spatio-temporal domain (Sect. 1.3.2) are considered. The main focus is on spatial

剩余436页未读，继续阅读

yinkaisheng-nj

粉丝: 762
资源: 6231

深度探索计算机视觉前沿：理论与实践创新

[Advanced.Topics.in.C(2013.11)].Noel.Kalicharan.pdf

Computer Vision Algorithms and Applications.pdf

A Year in Computer Vision.pdf

Advanced Topics on Computer Vision, Control and Robotics in Mechatronics

【2018新书】Advanced Topics on Computer Vision, Control and Robotics in Mechatronics

Recent Advances in Computer Vision

Geometric Computing: for WT, Robot Vision, Learning, Control and Action

[Advanced] Natural Language Processing: Language Modeling and Sentiment Analysis in MATLAB

[Advanced Chapter] Blind Source Separation of Signals in MATLAB: Implementing ICA Algorithm

Advanced Feature Engineering Techniques: 10 Methods to Power Up Your Models

最新资源