深度学习对比：CNN与HTM在目标识别中的应用

1星需积分: 9 188 浏览量更新于2024-07-18 收藏 7.24MB PDF 举报

"这篇论文深入探讨了深度学习在计算机视觉中的应用，主要对比了卷积神经网络（CNNs）和层次时间记忆（HTMs）在对象识别任务上的性能和特点。作者Vincenzo Lomonaco在导师Davide Maltoni的指导下，于2014-15学年第二学期的硕士研究生阶段完成了这项研究，隶属于意大利博洛尼亚大学科学学院计算机科学硕士课程。" 深度学习是现代计算机视觉领域的一个核心部分，它通过模拟人脑神经网络的工作方式，学习从原始输入数据中提取高级特征，进而解决复杂的视觉问题，如图像分类、目标检测和语义分割等。这篇论文主要关注两个深度学习模型：卷积神经网络（CNNs）和层次时间记忆（HTMs），它们在处理图像识别任务时各有优势。卷积神经网络（CNNs）是深度学习在图像处理中最常用的模型之一。CNNs以其独特的卷积层和池化层结构，能够自动学习图像的局部特征，并且具有平移不变性，即无论物体在图像中的位置如何变化，都能有效地识别出来。在图像分类和物体检测任务中，CNNs已经取得了显著的成果，例如著名的AlexNet、VGG、ResNet和EfficientNet等模型。另一方面，层次时间记忆（HTMs）是基于大脑皮层工作原理的一种机器学习模型，特别适合处理序列数据和模式识别。HTMs强调空间和时间的局部连接，以及自适应学习和预测能力。虽然HTMs在自然语言处理和时间序列预测中表现出色，但将其应用于计算机视觉领域的图像识别任务相对较少，其潜力有待进一步探索。论文的比较部分可能分析了两种模型在训练效率、识别精度、计算资源需求和泛化能力等方面的差异。CNNs通常需要大量的标注数据和计算资源进行训练，但一旦训练完成，它们在大规模图像数据集上的表现通常优于其他方法。而HTMs可能在小样本学习和非监督学习场景下有优势，因为它们更注重理解数据的内在结构和模式。此外，论文可能还讨论了未来的研究方向，如结合CNNs和HTMs的优点，开发出既能充分利用图像局部特征，又能捕捉时间序列信息的新型模型，这将有助于推动计算机视觉技术的进一步发展。这篇论文对深度学习在计算机视觉领域的应用进行了深入研究，对比了两种重要模型的优劣，对于理解和改进视觉识别系统具有重要的理论与实践价值。

Abbreviations 2

artiﬁcial neural networks will be covered. In chapter 2 and 3, the two algorithms will

be explained in great details. In chapter 4, a new benchmark for image sequences will

be introduced and in chapter 5, experiments results will be reported. Eventually, in

chapter 6 conclusions will be drawn and future work directions suggested.

Chapter 1. Background 4

In 1959, Arthur Samuel deﬁned machine learning as a “Field of study that gives com-

puters the ability to learn without being explicitly programmed” [20].

Tom M. Mitchell provided a widely quoted, more formal deﬁnition: “A computer program

is said to learn from experience E with respect to some class of tasks T and performance

measure P, if its performance at tasks in T, as measured by P, improves with experience

E” [21].

This deﬁnition is notable because it deﬁnes machine learning in fundamentally oper-

ational terms rather than cognitive ones, thus following Alan Turing’s proposal in his

paper Computing Machinery and Intelligence that the question “Can machines think?”

be replaced with the question “Can machines do what we (as thinking entities) can do?”

[22]

1.1.1 Categories and tasks

Usually, machine learning tasks are classiﬁed into three broad categories. These depend

on the nature of the learning “signal” or “feedback” available to a learning system: [17]

• Supervised learning: Is the machine learning approach of inferring a function

from supervised training data. The training data consist of a set of training ex-

amples i.e. pairs consisting of an input object (typically a vector) and a desired

output value (also called the supervisory signal). A supervised learning algorithm

analyzes the training data and produces an inferred function, which can generalize

from the training data to unseen situations in a “reasonable” way.

• Unsupervised learning: Closely related to pattern recognition, unsupervised

learning is about analyzing data and looking for patterns. It is an extremely

powerful tool for identifying structure in data. Unsupervised learning can be a

goal in itself or a means towards an end.

• Reinforcement learning: Is learning by interacting with an environment. An

RL agent learns from the consequences of its actions, rather than from being explic-

itly taught and it selects its actions on basis of its past experiences (exploitation)

and also by new choices (exploration), which is essentially trial and error learning.

The reinforcement signal that the RL-agent receives is a numerical reward, which

encodes the success of an action’s outcome, and the agent seeks to learn to select

actions that maximize the accumulated reward over time.

Between supervised and unsupervised learning another category of learning methods

can be found. It is called Semi-supervised learning and it is used in the presence of an

基本操作术语

实体思维

监控信号

半监督学习

Chapter 1. Background 5

incomplete training signal: a training set with some (often many) of the target outputs

missing. Transduction is a special case of this principle where the entire set of problem

instances is known at learning time, except that part of the targets are missing.

Among other categories of machine learning problems, it is worth pointing out Multi-task

learning which learns its own inductive bias based on previous experience. On the other

hand, Developmental learning is elaborated for robot learning and generates its own

sequences (also called curriculum) of learning situations to cumulatively acquire reper-

toires of novel skills through autonomous self-exploration and social interaction with

human teachers. It also uses guidance mechanisms such as active learning, maturation,

motor synergies, and imitation.

Another categorization of machine learning tasks arises considering the desired output

of a machine-learned system: [23]

• In classiﬁcation, inputs are divided into two or more classes, and the learner must

produce a model that assigns unseen input patterns to one or more of these classes

(fuzzy classiﬁcation). This is typically tackled in a supervised way. Spam ﬁltering

is an example of classiﬁcation, where the inputs are email (or other) messages and

the classes are “spam” and “not spam”.

• In regression, which is also a supervised problem, the outputs are continuous

rather than discrete.

• In clustering, a set of input patters have to be divided into groups. Unlike

in classiﬁcation, the groups are not known beforehand, making this typically an

unsupervised task. Topic modeling is a related problem, where a program is given

a list of human language documents and is asked to ﬁnd out which documents

cover similar topics.

• Density estimation ﬁnds the distribution of input patterns in some space.

• Dimensionality reduction simpliﬁes inputs by mapping them into a lower-

dimensional space.

1.2 Computer Vision

Computer vision is a ﬁeld which collects methods for acquiring, processing, analyzing,

and understanding images and, in general, high-dimensional data from the real world.

The aim of the discipline is to elaborate these data to produce numerical or symbolic

information in the forms of decisions [24]. A fundamental idea that has always stood

转导推理

除了

归纳性偏向

发展性学习

详细阐述

课程表

各种方法

自主探索

社会互动

主动学习

运动协同

模仿

完善

引导机制

垃圾邮件过滤

主题模型

学科

数值或符号信息

解析

基本概念

Chapter 1. Background 6

behind this ﬁeld has been to duplicate the abilities of human vision by electronically

perceiving and understanding an image [25]. This image understanding can be seen as

the disentangling of symbolic information from image data using models constructed

with the aid of geometry, physics, statistics, and learning theory [26]. Computer vision

has also been deﬁned as the enterprise of automating and integrating a wide range of

processes and representations for vision perception.

Being computer vision a scientiﬁc discipline, it is concerned with the theory behind

artiﬁcial systems extracting information from images. The image data can take many

forms, such as image sequences, views from multiple cameras, or multi-dimensional data

from a medical scanner. From a technological point of view, computer vision seeks to

apply its theories and models to the construction of computer vision systems.

Sub-ﬁelds of computer vision include object recognition, scene understanding, video

tracking, event detection, object pose estimation, learning, indexing, motion estimation,

and image restoration.

1.2.1 Object recognition

Object recognition is the task within computer vision which is concerned with the ﬁnding

and identiﬁcation of objects in images or video sequences. Humans are able to recognize a

multitude of objects without much eﬀort, despite the fact that the objects in the images

may vary signiﬁcantly due to diﬀerent view points, many diﬀerent sizes and scales,

lighting conditions and poses. Objects can even be recognized when the view is partially

obstructed. This task is still a challenge for computer vision systems. Many approaches

to the task have been implemented over multiple decades. In this dissertation, this task

will be confronted in-depth.

1.3 Artiﬁcial neural networks

In machine learning, artiﬁcial neural networks (ANNs) are a family of statistical learning

models inspired by biological neural networks (common in the brains of many mammals)

[17]. They can be used to estimate or approximate functions that can depend on a large

number of inputs and are generally unknown. Artiﬁcial neural networks are generally

presented as systems of interconnected “neurons” which send messages to each other.

Each of their connection has a numeric weight that can be tuned based on experience,

making neural nets adaptive to inputs and capable of learning [27].

电子感知

复制；模仿

解缠；剖析

视觉感知

科学学科

旨在

部分遮挡

深入的

剩余96页未读，继续阅读

David-Chow

粉丝: 2372
资源: 10

深度学习对比：CNN与HTM在目标识别中的应用

ImageNet Bundle - Adrain Rosebrock (Deep Learning For Computer Vision)

Deep Learning for Computer Vision with_Python_Practitioner Bundle【完整版】

Deep Learning for Computer Vision.pdf

deep learning for computer vision

Deep Learning for Computer Vision with Python

Deep Learning for Computer Vision with Tensorflow

Deep Learning for Computer Vision with python

停车场管理系统c语言.docx

精选毕设项目-人民好公仆小程序（生活+便民+政务）.zip

精选毕设项目-相册；处理用户信息.zip

最新资源