构建能像人一样学习和思考的机器智能

智能机器

人工智能

需积分: 50 59 浏览量更新于2024-07-20 收藏 3.58MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"这篇论文《Building Machines That Learn and Think Like People》由Brenden M. Lake等人撰写，探讨了如何构建能像人类一样学习和思考的智能机器。论文指出，当前的人工智能虽然在诸如对象识别、视频游戏和棋盘游戏等领域取得了与人类相当甚至超越人类的表现，但它们在关键方面仍与人类智能存在差异。作者主张真正的人类式学习和思考机器需要在学习内容和学习方式上进行扩展和改进，强调了因果模型、直觉理论、组合性和学习如何学习的重要性。" 在当前的人工智能领域，深度神经网络已经取得了显著的进步，特别是在解决特定任务如图像识别、游戏等方面。然而，尽管这些系统受到了生物启发并实现了高性能，但它们缺乏人类智能的关键特性，如对世界的因果理解、知识的拓展和迁移能力。首先，文章提到，机器应该构建世界的因果模型，这涉及到理解和解释环境中的事件和关系，而不仅仅是识别模式。人类能够通过因果推理理解事件背后的原因，预测结果，并根据新信息调整自己的模型，这种能力对于智能机器来说至关重要。其次，作者提倡使用基于物理学和心理学的直觉性理论，使机器在学习后能够支持并扩展已有的知识。这可能意味着将人类的认知机制融入到机器学习中，例如模拟人类的归纳推理和概念形成过程，让机器能够从个别实例中归纳出一般规律。再者，组合性是人类智能的一个显著特征，它允许我们应用已有的知识解决新问题。机器需要学会将知识碎片组合成新的概念，这需要一种灵活的学习机制，使得机器可以快速获取新知识并将其泛化到新的任务和情境中。最后，学习如何学习（learning-to-learn）的概念被提出，这是一种元学习（meta-learning）方法，机器可以通过自我调整学习策略来适应不同任务，提高学习效率。这种能力将使机器更加自主，能够针对不同情境自适应地优化其学习过程。构建类人智能机器的关键在于开发能理解因果、运用直觉理论、体现组合性并具备学习如何学习能力的系统。这样的机器将更接近人类智能的复杂性和灵活性，从而在更广泛的领域展现出更强的适应性和创造性。这一研究方向对未来的AI发展具有深远影响，它旨在推动人工智能不仅仅是模仿人类的行为，而是真正理解和模仿人类的思维过程。

资源详情

资源推荐

A i)

ii)

iii)

iv)

B iii)i)

ii) iv)

A B

Figure 1: The characters challenge: human-level learning of a novel handwritten characters (A),

with the same abilities also illustrated for a novel two-wheeled vehicle (B). A single example of a

new visual concept (red box) can be enough information to support the (i) classiﬁcation of new

examples, (ii) generation of new examples, (iii) parsing an object into parts and relations, and

(iv) generation of new concepts from related concepts. Adapted from Lake, Salakhutdinov, and

Tenenbaum (2015).

a far richer repertoire of structural relations between strokes. Furthermore, people can eﬃciently

integrate across multiple examples of a character to infer which have optional elements, such as

the horizontal cross-bar in ‘7’s, combining diﬀerent variants of the same character into a single co-

herent representation. Additional progress may come by combining deep learning and probabilistic

program induction to tackle even richer versions of the Characters Challenge.

3.2 The Frostbite Challenge

The second challenge concerns the Atari game Frostbite (Figure 2), which was one of the control

problems tackled by the DQN of V. Mnih et al. (2015). The DQN was a signiﬁcant advance

in reinforcement learning, showing that a single algorithm can learn to play a wide variety of

complex tasks. The network was trained to play 49 classic Atari games, proposed as a test domain

for reinforcement learning (Bellemare, Naddaf, Veness, & Bowling, 2013), impressively achieving

human-level performance or above on 29 of the games. It did, however, have particular trouble

with Frostbite and other games that required temporally extended planning strategies.

In Frostbite, players control an agent (Frostbite Bailey) tasked with constructing an igloo within a

time limit. The igloo is built piece-by-piece as the agent jumps on ice ﬂoes in water (Figure 2A-C).

The challenge is that the ice ﬂoes are in constant motion (moving either left or right), and ice ﬂoes

only contribute to the construction of the igloo if they are visited in an active state (white rather

than blue). The agent may also earn extra points by gathering ﬁsh while avoiding a number of

fatal hazards (falling in the water, snow geese, polar bears, etc.). Success in this game requires a

the network was trained anew for each game, meaning the visual system and the policy are highly

specialized for the games it was trained on. More recent work has shown how these game-speciﬁc

networks can share visual features (Rusu et al., 2016) or be used to train a multi-task network

(Parisotto, Ba, & Salakhutdinov, 2016), achieving modest beneﬁts of transfer when learning to

play new games.

Although it is interesting that the DQN learns to play games at human-level performance while

assuming very little prior knowledge, the DQN may be learning to play Frostbite and other games

in a very diﬀerent way than people do. One way to examine the diﬀerences is by considering the

amount of experience required for learning. In V. Mnih et al. (2015), the DQN was compared with a

professional gamer who received approximately two hours of practice on each of the 49 Atari games

(although he or she likely had prior experience with some of the games). The DQN was trained on

200 million frames from each of the games, which equates to approximately 924 hours of game time

(about 38 days), or almost 500 times as much experience as the human received.

Additionally, the

DQN incorporates experience replay, where each of these frames is replayed approximately 8 more

times on average over the course of learning.

With the full 924 hours of unique experience and additional replay, the DQN achieved less than

10% of human-level performance during a controlled test session (see DQN in Fig. 3). More recent

variants of the DQN have demonstrated superior performance (Schaul et al., 2016; Stadie et al.,

2016; van Hasselt, Guez, & Silver, 2016; Wang et al., 2016), reaching 83% of the professional

gamer’s score by incorporating smarter experience replay (Schaul et al., 2016) and 96% by using

smarter replay and more eﬃcient parameter sharing (Wang et al., 2016) (see DQN+ and DQN++

in Fig. 3).

But they requires a lot of experience to reach this level: the learning curve provided

in Schaul et al. (2016) shows performance is around 46% after 231 hours, 19% after 116 hours, and

below 3.5% after just 2 hours (which is close to random play, approximately 1.5%). The diﬀerences

between the human and machine learning curves suggest that they may be learning diﬀerent kinds

of knowledge, using diﬀerent learning mechanisms, or both.

The contrast becomes even more dramatic if we look at the very earliest stages of learning. While

both the original DQN and these more recent variants require multiple hours of experience to

perform reliably better than random play, even non-professional humans can grasp the basics

of the game after just a few minutes of play. We speculate that people do this by inferring a

general schema to describe the goals of the game and the object types and their interactions,

using the kinds of intuitive theories, model-building abilities and model-based planning mecha-

nisms we describe below. While novice players may make some mistakes, such as inferring that

ﬁsh are harmful rather than helpful, they can learn to play better than chance within a few min-

utes. If humans are able to ﬁrst watch an expert playing for a few minutes, they can learn even

faster. In informal experiments with two of the authors playing Frostbite on a Javascript emu-

lator (http://www.virtualatari.org/soft.php?soft=Frostbite), after watching videos of expert play

on YouTube for just two minutes, we found that we were able to reach scores comparable to or

The time required to train the DQN (compute time) is not the same as the game (experience) time. Compute

time can be longer.

The reported scores use the “human starts” measure of test performance, designed to prevent networks from just

memorizing long sequences of successful actions from a single starting point. Both faster learning (Blundell et al.,

2016) and higher scores (Wang et al., 2016) have been reported using other metrics, but it is unclear how well the

networks are generalizing with these alternative metrics.

剩余57页未读，继续阅读

choupw

粉丝: 0
资源: 5

构建能像人一样学习和思考的机器智能

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and

Hands-On Machine Learning with Scikit-Learn and TensorFlow [EPUB]

Humanoid Robots. Human-like Machines

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for scikit-learn

learn opencv 4 building projects pdf

Failed building wheel for scikit-learn

how can i learn javascript

Building a Logistics Management System with Spring Boot

from tensorflow.contrib import learn

how to learn node.js

ERROR: Failed building wheel for scikit-learn

Help me write an article about the most interesting place in Beijing that you would like to show your friends, no more than 180words

failed building wheel for scikit-learn

Installing 4 of 10: Intel® oneAPI Threading Building Blocks

写一篇关于交流为什么家庭支持很重要的英语演讲时长大约5分钟

mybatisplus and()

how to learn deep learning

Error building SqlSession.

6.18-gdb,ros and roslaunch with gdb

最新资源