机器学习：约束驱动的方法

需积分: 9 113 浏览量更新于2024-07-18 收藏 4.19MB PDF 举报

"Machine Learning. A Constraint-based Approach" 是一本由 Marco Gori 所著的书籍，出版于 Morgan Kaufmann（Elsevier 的一个印记）旗下。这本书探讨了机器学习的约束基础方法，旨在理解人类智能的本质，并非仅仅局限于计算机科学领域。尽管神经科学的研究可能为智能背后的计算过程带来新的启示，但目前机器学习的进步主要基于数学模型和计算机实现。书中提出，认知的基础可能不在于生物解决方案的复杂性，而更多地在于高层次的信息法则。书中的标签"机器学习"表明主题聚焦于这一人工智能的重要分支。作者讨论了机器学习的任务，如符号和次符号环境的表示、生物与人工神经网络的对比、学习协议以及基于约束的学习。书中强调，机器学习不仅在数学和逻辑运算中展现出力量，而且在棋类游戏等智力挑战中已经超越了人类。例如，计算机程序在15拼图、魔方、国际象棋和围棋等经典难题中取得了成功。然而，语言、视觉和运动控制等依赖于学习的许多认知技能仍然对机器来说极具挑战性。这表明虽然在某些方面机器已经表现出强大的智能，但人类智能的某些核心方面尚未被完全模拟。本书通过深入探讨这些主题，旨在深化读者对机器学习的理解，以及它如何通过约束基础的方法来逼近人类的智慧。书中的章节如"1.1 Why Do Machines Need to Learn?"探讨了机器学习任务的分类，以及不同类型的环境表示和学习协议。"1.1.5 Constraint-Based Learning"部分则着重讲述了如何利用约束来驱动学习过程。"1.2 Principles and Practice"章节则可能涵盖了理论与实际应用之间的关系，以及在解决谜题和问题时机器学习所面临的挑战。总体而言，这本书不仅提供了对机器学习的深入洞察，还鼓励读者思考认知的普遍性信息法则，以及这些法则如何独立于生物学机制而存在。对于希望了解机器学习基本原理以及其在人工智能领域影响的读者，这本书将是一份宝贵的资源。

4 CHAPTER 1 The Big Picture

4 trillion records! For all of them, the programmer would be expected to be patient

enough to complete the table with the associated class code. This simple example is

asortof2

warning message:Asd grows towards values that are ordinarily used

for the retina resolution, the space of the table becomes prohibitive. There is more

– we have made the tacit assumption that the characters are provided by a reliable

segmentation program, which extracts them properly from a given form. While this

might be reasonable in simple contexts, in others segmenting the characters might

be as difﬁcult as recognizing them.

Segmentation might

be as difﬁcult as

recognition!

In vision and speech perception, nature seems

to have fun in making segmentation hard. For example, the word segmentation of

speech utterances cannot rely on thresholding analyses to identify low levels of the

signal. Unfortunately, those analyses are doomed to fail. The sentence

“computers

are attacking the secret of intelligence”

, quickly pronounced, would likely

be segmented as

com / pu / tersarea / tta / ckingthesecre / tofin / telligence.

The signal is nearly null before the explosion of voiceless plosives

p, t, k, whereas,

because of phoneme coarticulation, no level-based separation between contiguous

words is reliable. Something similar happens in vision. Overall, it looks like seg-

mentation is a truly cognitive process that in most interesting tasks does require

understanding the information source.

1.1.1 LEARNING TASKS

Intelligent

Agent: χ : E → D.

agents interact with the environment, from which they are expected to

learn, with the purpose of solving assigned tasks. In many interesting real-world prob-

lems we can make the reasonable assumption that the intelligent agent interacts with

the environment by distinct segmented elements e ∈ E of the learning environment,

on which it is expected to take a decision. Basically, we assume somebody else has

already faced and solved the segmentation problem, and that the agent only processes

single elements from the environment. Hence, the agent can be regarded as a func-

tion χ : E → O, where the decision result is an element of O. For example, when

performing optical character recognition in plain text, the character segmentation can

take place by algorithms that must locate the row/column transition from the text to

background. This is quite simple, unless the level of noise in the image document is

pretty high.

χ = h ◦ f ◦ π ,

where π is the input

encoding, f is the

learning function,

and h is the output

encoding.

In general, the agent requires an opportune internal representation of elements

in E and O, so that we can think of χ as the composition χ = h ◦ f ◦ π.Here

π : E → X is a preprocessing map that associates every element of the environment

e with a point x = π(e) in the input space X , f : X → Y is the function that takes

the decision y = f(x) on x, while h : Y → O maps y onto the output o = h(y).

In the above handwritten character recognition task we assume that we are given a

low resolution camera so that the picture can be regarded as a point in the environ-

ment space E . This element can be represented — as suggested by Eq. (1.1.1) —as

elements of a 64-dimensional Boolean hypercube (i.e., X ⊂ R

). Basically, in this

1.1 Why Do Machines Need to Learn? 5

case π is simply mapping the Boolean matrix to a Boolean vector by row scanning in

such a way that there is no information loss when passing from e to x. As it will be

shown later, on the other hand, the preprocessing function π typically returns a pat-

tern representation with information loss with respect to the original environmental

representation e ∈ E . Function f maps this representation onto the one-hot encoding

of number 2 and, ﬁnally, h transforms this code onto a representation of the same

number that is more suitable for the task at hand:

−→ (0, 0, 0, 1, 1, 0, 0, 0,...,0, 0, 0, 0, 0, 0, 1, 1)



−→ (0, 0, 1, 0, 0, 0, 0, 0, 0, 0)



−→ 2.

Overall the action of χ can be nicely written as χ(

) = 2. In many learning ma-

chines, the output encoding function h plays a more important role, which consists

of converting real-valued representations y = f(x) ∈ R

onto the corresponding

one-hot representation. For example, in this case, one could simply choose h such

that h

(y) = δ

(i,arg max

, where δ denotes the Kronecher’s delta. In doing so, the

hot bit is located at the same position as the maximum of y. While this apparently

makes sense, a more careful analysis suggests that such an encoding suffers from a

problem that is pointed out in Exercise 2.

Functions π(·) and h(·) adapt the environmental information and the decision

to the internal representation of the agent. As it will be seen throughout the book,

depending on the task, E and O can be highly structured, and their internal represen-

tation plays a crucial role in the learning process. The speciﬁc role of π(·) is to encode

the environmental information into an appropriate internal representation. Likewise,

function h(·) is expected to return the decision on the environment on the basis of the

internal state of the machine. The core of learning is the appropriate discovering of

f(·), so as to obey the constraints dictated by the environment.

What are the environmental conditions that are dictated by the environment?

Learning from

examples.

Since the dawn of machine learning, scientists have mostly been following the princi-

ple of learning from examples. Under this framework, an intelligent agent is expected

to acquire concepts by induction on the basis of collections L ={(e

), κ =

1,...,)}, where an oracle, typically referred to as the supervisor, pairs inputs

∈ E with decision values o

∈ O. A ﬁrst important distinction concerns clas-

siﬁcation and regression tasks. In the ﬁrst case, the decision requires the ﬁniteness of

O, while in the second case O can be thought of as a continuous set.

Classiﬁcation and

regression.

First, let us focus on classiﬁcation. In simplest cases, O ⊂ N is a collection

of integers that identify the class of e. For example, in the handwritten character

recognition problem, restricted to digits, we might have |O|=10. In this case, we can

promptly see the importance of distinguishing the physical, the environmental, and

the decision information with respect to their corresponding internal representation

of the machine. At the pure physical level, handwritten chars are the outcome of

the physical process of light reﬂection. It can be captured as soon as we deﬁne the

retina R as a rectangle of R

, and interpret the reﬂected light by the image function

6 CHAPTER 1 The Big Picture

v : Z ⊂ R

→ R

, where the three dimensions express the (R,G,B) components of

the color. In doing so, any pixel z ∈ Z is associated with the brightness value v(z).

As we sample the retina, we get the matrix R



— this is a grid over the retina. The

corresponding resolution characterizes the environmental information, namely what

is stored in the camera which took the picture. Interestingly, this isn’t necessarily the

internal information which is used by the machine to draw the decision. The typical

resolution of pictures stored in a camera is very high for the purpose of character

classiﬁcation. As it will be pointed out in Section 1.3.2, a signiﬁcant de-sampling of



still retains the relevant cues needed for classiﬁcation.

Qualitative

descriptions.

Instead of de-sampling the picture coming from the camera, one might carry out

a more ambitious task, with the purpose of better supporting the sub-sequent deci-

sion process. We can extract features that come from a qualitative description of the

given pattern categories, so as the created representation is likely to be useful for

recognition. Here are a few attempts to provide insightful qualitative descriptions.

The character

zero is typically pretty smooth, with rare presence of cusps. Basically,

the curvature at all its points doesn’t change too much. On the other hand, in most

instances of characters like

two, three, four, five, seven, there is a higher de-

gree of change in the curvature. The

one, like the seven, is likely to contain a portion

that is nearly a segment, and the

eight presents the strong distinguishing feature of a

central cross that joins the two rounded portions of the char. This and related descrip-

tions could be properly formalized with the purpose of processing the information in



so as to produce the vector x ∈ X . This internal machine representation is what

is concretely used to take the decision. Hence, in this case, function π performs a

preprocessing aimed at composing an internal representation that contains the above

described features. It is quite obvious that any attempt to formalize this process of

feature extraction needs to face the ambiguity of the statements used to report the

presence of the features. This means that there is a remarkable degree of arbitrariness

in the extraction of notions like cups, line, small or high curvature. This can result in

signiﬁcant information loss, with the corresponding construction of poor representa-

tions.

One-hot encoding.

The output encoding of the decision can be done in different ways. One possibil-

ity is to choose function h = id (identity function), which forces the development of

f with codomain O. Alternatively, as already seen, one can use the one-hot encod-

ing. More efﬁcient encodings can obviously be used: In the case of O =[0..9], four

bits sufﬁce to represent the ten classes. While this is deﬁnitely preferable in terms

of saving space to represent the decision, it might be the case that codes which gain

compactness with respect to one-hot aren’t necessarily a good choice. More compact

codes might result in a cryptic coding description of the class that could be remark-

ably more difﬁcult to learn than one-hot encoding. Basically, functions π and h offer

a speciﬁc view of the learning task χ, and contribute to constructing the internal rep-

resentation to be learned. As a consequence, depending on the choice of π and h,the

complexity of learning f can change signiﬁcantly.

In regression tasks, O is a continuous set. The substantial difference with respect

to classiﬁcation is that the decision doesn’t typically require any decoding, so that

1.1 Why Do Machines Need to Learn? 7

FIGURE 1.1

This learning task is presented in the UCI Machine Learning repository

https://archive.ics.uci.edu/ml/datasets/Car+Evaluation.

h = id . Hence, regression is characterized by Y ∈ R

. Examples of regression tasks

might involve values on the stock market, electric energy consumption, temperature

and humidity prediction, and expected company income.

Attribute type.

The information that a machine is expected to process may have different at-

tribute types. Data can be inherently continuous. This is the case of classic ﬁelds

like computer vision and speech processing. In other cases, the input belongs to a

ﬁnite alphabet, that is, it has a truly discrete nature. An interesting example is the

car evaluation artiﬁcial learning task proposed in the UCI Machine Learning repos-

itory https://archive.ics.uci.edu/ml/datasets/Car+Evaluation. The evaluation that is

sketched below in Fig. 1.1 is based on a number of features ranging from the buying

price to the technical features.

Here

CAR refers to car acceptability and can be regarded as the higher order cate-

gory that characterizes the car. The other high-order category uppercase nodes

PRICE,

TECH, and COMFORT refer to the overall notion of price, technical and comfort features.

Node

PRICE collects the buying price and the maintenance price, COMFORT groups to-

gether the number of doors (

doors), the capacity in terms of person to carry (person),

and the size of luggage boot (

lug-boot). Finally, TECH, in addition to COMFORT, takes

into account the estimated safety of the car (

safety). As we can see, there is remark-

able difference with respect to learning tasks involving continuous feature, since in

this case, because of the nature of the problem, the leaves take on discrete values.

When looking at this learning task carefully, the conjecture arises that the decision

might beneﬁt from considering the hierarchical aggregation of the features that is

sketched by the tree. On the other hand, this might also be arguable, since all the

leaves of the tree could be regarded as equally important for the decision.

8 CHAPTER 1 The Big Picture

FIGURE 1.2

Two chemical formulas: (A) acetaldehyde with formula CH

CHO, (B) N-heptane with the

chemical formula H

C(CH

)

However,

Data structure.

there are learning tasks where the decision is strongly dependent on

truly structured objects, like trees and graphs. For example, Quantitative Structure

Activity Relationship (QSAR) explores the mathematical relationship between the

chemical structure and the pharmacological activity in a quantitative manner. Sim-

ilarly, Quantitative Structure-Property Relationship (QSPR) is aimed at extracting

general physical–chemical properties from the structure of the molecule. In these

cases we need to take a decision from an input which presents a relevant degree of

structure that, in addition to the atoms, strongly contributes to the decision process.

Formulas in Fig. 1.2 are expressed by graphs, but chemical conventions in the rep-

resentation of formulas, like

for benzene, do require careful investigation of the

way e ∈ E is given an internal representation x ∈ X by means of function π.

Spatiotemporal

environments.

Most challenging learning tasks cannot be reduced to the assumption that the

agent processes single entities e ∈ E . For example, the major problem that typically

arises in problems like speech and image processing is that we cannot rely on robust

segmentations of entities. Spatiotemporal environments that typically characterize

human life offer information that is spread in space and time, without offering re-

liable markers to perform segmentation of meaningful cognitive patterns. Decisions

arise because of a complex process of spatiotemporal immersion. For example, hu-

man vision can also involve decisions at pixels level, in contexts which involve the

spatial regularities, as well as the temporal structure connected to sequential frames.

This seems to be mostly ignored in most research on object recognition. On the other

hand, the extraction of symbolic information from images that are not frames of a

temporally coherent visual stream would have been extremely harder than in our vi-

sual experience. Clearly, this comes from the information-based principle that in any

world of shufﬂed frames, a video requires an order of magnitude more information

for its storing than the corresponding temporally coherent visual stream. As a conse-

quence, any recognition process is remarkably more difﬁcult when shufﬂing frames,

which clearly indicates the importance of keeping the spatiotemporal structure that

is naturally associated with the learning task. Of course, this makes it more difﬁcult

to formulate sound theories of learning. In particular, if we really want to fully cap-

ture spatiotemporal structures, we must abandon the safe model of processing single

剩余562页未读，继续阅读

wang1062807258

粉丝: 13
资源: 272

机器学习：约束驱动的方法

Machine Learning A Constraint-Based Approach.pdf

Machine Learning A Constraint-Based Approach 无水印pdf

implementation 'com.android.support.constraint:constraint-layout:<version>'的版本号

"constraint-based verification\" book pdf"

import android.support.constraint.ConstraintLayout怎么升级为androidx.

android.support.constraint.ConstraintLayout报错

1. 对一小型数据库应用系统进行需求分析；（参考后面系统题目选项）2.绘制E-R图；3.将E-R模型向数据模型转换；4.创建数据库，定义基本表5.定义完成性和约束

最新资源