深度学习：人工智能的奇妙有效性与未来挑战

需积分: 50 124 浏览量更新于2024-09-04 收藏 1.62MB PDF 举报

"这篇文档探讨了人工智能领域中深度学习（Deep Learning, DL）的‘不合理有效性’现象。Terrence J. Sejnowski指出，尽管深度学习已经在语音识别、图像标注和语言翻译等多个任务上展现出高水准的表现，但我们对其为何如此有效仍然知之甚少。" 深度学习是人工智能领域的关键组成部分，它模拟了人脑神经网络的结构，通过多层非线性变换处理复杂数据。这种技术的成功应用已经无处不在，但其内在的工作机制尚未得到充分理解。按照统计学中的样本复杂性和非凸优化理论，深度学习模型在实际问题中的高效表现应该是不可能的。然而，这个看似矛盾的现象激发了研究者们深入探索深度学习网络训练和效果背后的几何学原理。文档提到，尽管存在诸多未解的悖论，比如深度学习网络的训练过程如何收敛到全局最优解，但研究已经开始揭示高维空间的几何特性对模型性能的影响。建立一个数学理论来解释深度学习的工作原理，将有助于我们评估其优势并指导未来的算法设计。深度学习的核心在于神经网络，由大量的神经元组成，每一层神经元连接前一层并进行信息传递。通过反向传播和梯度下降等优化算法，网络可以逐步调整权重以最小化损失函数，从而提升预测或分类的准确性。然而，这个过程中涉及的大量参数和高维度特征空间使得优化过程变得极其复杂，传统理论无法完全解释其高效学习的能力。在实践中，深度学习的不合理有效性可能源于其对数据内在结构的自动学习能力，即特征学习。通过多层非线性转换，网络能够逐渐抽象出数据的高层表示，这在某种程度上模仿了人类大脑的学习方式。此外，过参数化也可能是一个因素，尽管网络包含的参数远超过训练数据的数量，但在某些情况下，这反而有助于提高泛化性能，防止过拟合。未来的研究将继续聚焦于以下几个方向：理解深度学习的内在机制，包括为什么在非凸优化问题中能找到全局最优；研究如何有效地利用大规模数据和计算资源；以及探索更高效的网络结构和训练策略。这些努力将有望推动深度学习理论的成熟，使其在人工智能领域的应用更加广泛且有据可依。

The unreasonable effectiveness of deep learning in

artificial intelligence

Terrence J. Sejnowski

a,b,1



Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037; and

Division of Biological Sciences, University of

California San Diego, La Jolla, CA 92093

Edited by David L. Donoho, Stanford University, Stanford, CA, and approved November 22, 2019 (received for review September 17, 2019)

Deep learning networks have been trained to recognize speech,

caption photographs, and translate text between languages at

high levels of performance. Although applications of deep learn-

ing networks to real-world problems have become ubiquitous, our

understanding of why they are so effective is lacking. These empirical

results should not be possible according to sample complexity in

statistics and nonconvex optimization theory. However, paradoxes

in the training and effectiveness of deep learning networks are

being investigated and insights are being found in the geometry of

high-dimensional spaces. A mathematical theory of deep learning

would illuminate how they function, allow us to assess the strengths

and weaknesses of different network architectures, and lead to

major improvements. Deep learning has provided natural ways for

humans to communicate with digital devices and is foundational for

building artificial general intelligence. Deep learning was inspired by

the architecture of the cerebral cortex and insights into autonomy

and general intelligence may be found in other brain regions that

are essential for planning and survival, but major breakthroughs will

be needed to achieve these goals.

deep learning

artificial intelligence

neural networks

n 1884, Edwin Abbott wrote Flatland: A Romance of Many

Dimensions (1) (Fig. 1). This book was written as a satire on

Victorian society, but it has endured because of its exploration of

how dimensionality can change our intuitions about space. Flat-

land was a 2-dimensional (2D) world inhabited by geometrical

creatures. The mathematics of 2 dimensions was fully understood

by these creatures, with circles being more perfect than triangles.

In it a gentleman square has a dream about a sphere and wakes up

to the possibility that his universe might be much larger than he or

anyone in Flatland could imagine. He was not able to convince

anyone that this was possible and in the end he was imprisoned.

We can easily imagine adding another spatial dimension when

going from a 1-dimensional to a 2D world and from a 2D to a

3-dimensional (3D) world. Lines can intersect themselves in 2 di-

mensions and sheets can fold back onto themselves in 3 dimen-

sions, but imagining how a 3D object can fold back on itself in a

4- di me ns io na l space is a stretch that was achieved by Charles Howard

Hinton in the 19th century (https://en.wikipedia.org /wiki/Charles_

Howard_Hinton). What are the properties of spaces having even

higher dimensions? What is it like to live in a space with 100 dimen-

sions, or a million dimensions, or a space like our brain that has a

million billion dimensions (the number of synapses between neurons)?

The first Neural Information Processing Systems (NeurIPS)

Conference and Workshop took place at the Denver Tech Center

in 1987 (Fig. 2). The 600 attendees were from a wide range of

disciplines, including physics, neuroscience, psychology, statistics,

electrical engineering, computer science, computer vision, speech

recognition, and robotics, but they all had something in common:

They all worked on intractably difficult problems that were not

easily solved with traditional methods and they tended to be out-

liers in their home disciplines. In retrospect, 33 y later, these misfits

were pushing the frontiers of their fields into high-dimensional

spaces populated by big datasets, the world we are living in to-

day. As the president of the foundation that organizes the annual

NeurIPS confer ences, I oversa w the remarkable evolution of a

community that created modern machine learning. This confer-

ence has grown steadily and in 2019 attracted over 14,000 par-

ticipants. Many intractable problems eventually became tractable,

and today machine learning serves as a foundation for contem-

porary artificial intelligence (AI).

The early goals of machine learning were more modest than

those of AI. Rather than aiming directly at general intelligence,

machine learning started by attacking practical problems in

perception, language, motor control, prediction, and inference

using learning from data as the primary tool. In contrast, early

attempts in AI were characterized by low-dimensional algorithms

that were handcrafted. However, this approach only worked for

well-controlled environments. For example, in blocks world all

objects were rectangular solids, identically painted and in an envi-

ronment with fixed lighting. These algorithms did not scale up to

vision in the real world, where objects have complex shapes, a wide

range of reflectances, and lighting conditions are uncontrolled. The

real world is high-dimensional and there may not be any low-

dimensional model that can be fit to it (2). Similar problems were

encountered with early models of natural languages based on

symbols and syntax, which ignored the complexities of semantics

(3). Practical natural language applications became possible once

the compl exity of deep learning language models approached the

complexity of the real world. Models of natural language with

millions of parameters and trained with millions of labeled exam-

ples are now used routinely. Even larger deep learning language

networks are in production today, providing services to millions of

users online, less than a decade since they were introduced.

Origins of Deep Learning

I have written a book, The Deep Learning Revolution: Artificial

Intelligence Meets Human Intelligence (4), which tells the story of

how deep learning came about. Deep learning was inspired by

the massively parallel architecture found in brains and its origins

can be traced to Frank Rosenblatt’s perceptron (5) in the 1950s

that was based on a simplified model of a single neuron in-

troduced by McCulloch and Pitts (6). The perceptron performed

pattern recognition and learned to classify labeled examples (Fig.

3). Rosenblatt proved a theorem that if there was a set of pa-

rameters that could classify new inputs correctly, and there were

This paper results from the Arthur M. Sackler Colloquium of the National Academy of

Sciences, “The Science of Deep Learning,” held March 13–14, 2019, at the National Acad-

emy of Sciences in Washington, DC. NAS colloquia began in 1991 and have been pub-

lished in PNAS since 1995. From February 2001 through May 2019 colloquia were

supported by a generous gift from The Dame Jillian and Dr. Arthur M. Sackler Foun-

dation for the Arts, Scienc es, & Humanit ies, in m emory of Dame Sackler’shusband,

Arthur M. Sackler. The comp lete program and video recordings of most presentations

are availabl e on the NAS website at www. nasonlin e.org/sc ien ce-of- deep-learni ng.

Author contributions: T.J.S. wrote the paper.

The author declares no competing interest.

This article is a PNAS Direct Submission.

Published under the PNAS license.

Email: terry@salk.edu.

www.pnas.org/cgi/doi/10.1073/pnas.1907373117 PNAS Latest Articles

1of6

NEUROSCIENCECOMPUTER SCIENCES COLLOQUIUM

PAPER

Downloaded by guest on February 23, 2020

下载后可阅读完整内容，剩余5页未读，立即下载

syp_net

粉丝: 159
资源: 1187

深度学习：人工智能的奇妙有效性与未来挑战

2018 cvpr the unreasonable effectiveness of deep features as a perceptual metric

RNN知名论文

遗忘门的不合理效力THE UNREASONABLE EFFECTIVENESS OF THE FORGET GATE.pdf

1804.04849THE UNREASONABLE EFFECTIVENESS OF THE FORGET.zip

The Unreasonable Effectiveness of Data

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)

最新资源