信息瓶颈理论：深度学习泛化能力的整合探索

需积分: 9 86 浏览量更新于2024-07-09 收藏 32.11MB PDF 举报

《信息瓶颈理论在深度学习中的应用》（THE INFORMATION BOTTLENECK THEORY OF DEEP LEARNING）是一篇由Frederico Guth撰写的硕士论文，针对深度学习领域的核心问题进行了深入探讨。该研究论文发表于巴西利亚大学的计算机科学学院，旨在融合和整合信息瓶颈原理，以理解深度神经网络为何能够在众多任务中展现出卓越的泛化能力。信息瓶颈理论起源于信息论，它关注的是如何在信息传递过程中减小不必要的噪声，只保留关键信息。在深度学习中，这一理论被应用于解释模型参数的优化过程，特别是对于复杂模型如深度神经网络，为何能够在训练数据集有限的情况下，依然能够对未知数据做出准确预测。论文提出，深度学习中的每一层可能可以被视为一个信息瓶颈，通过这种机制，模型学会了忽略无关的细节，而专注于学习到数据中最核心的特征表示。作者分析了深度学习中的信息流动，认为模型的每一层都在压缩输入信息，只保留对于最终任务至关重要的部分。这种选择性保留信息的能力有助于防止过拟合，提高模型在新数据上的泛化性能。此外，论文还讨论了可能存在的挑战，即尽管取得了显著的成功，但深度学习是否真正解决了所有问题，或者只是暂时缓解了某些问题，如过度拟合。论文的指导教师包括Teóﬁlo Emídio de Campos教授、John Shawe-Taylor教授、Moacir Antonelli Ponti教授以及Genaína Nunes Rodrigues教授，他们分别来自巴西利亚大学、伦敦大学学院和圣保罗大学，表明这篇论文得到了跨学科的专家指导和审阅。该论文的摘要强调了深度学习中信息处理的关键性作用，它不仅仅是一个技术进步，更是一个理论框架，有助于我们理解模型为何能超越传统机器学习方法，特别是在面对大量数据和复杂任务时。然而，它也提醒我们，尽管取得了令人瞩目的成果，深度学习的未来还有许多未解之谜等待探索。这篇论文为深度学习的研究提供了一个新颖的视角，帮助我们深入理解模型背后的机制，同时警示我们在追求更高的性能时，不应忽视潜在的问题和理论基础的探讨。

2 introduction

Mathematics and science are both tools for knowledge acquisition.

Also, they are social constructs, as both rely on peer-reviewing. They

are quite different, however.

Science is empiric, based on facts collected from experience. When

physicists around the world measured events that corroborated New-

ton’s “Law of Universal Gravitation”, they did not prove it correct; they

just made his theory more and more plausible. Still, it was needed only

one experiment to show that Einstein’s Relativity Theory was even more

believable. In contrast, we can and do prove things in mathematics.

In mathematics, knowledge is absolute truth, and the way one builds

new knowledge with it, its inference method, is deduction. In science,

knowledge is justiﬁed belief, there are degrees of plausibility, and its

inference method is induction.

Mathematics is language, a formal one, a tool to precisely commu-

nicate some kinds of thoughts. As it happens with natural languages,

there is beauty in it. The mathematician expands the boundaries of

expression in this language and Greeks were poets. Even though Baby-

lonians were ﬁrst in ﬁnding mathematical truths, the Greeks invented

Mathematics as epistemology.

Understanding the epistemic contrast between mathematics and

science will help us understand the past of Artiﬁcial Intelligence (

and avoid some perils in its future. This contrast will be a recurrent

theme in this dissertation.

1.1.2 The importance of theoretical development

In science, we collect facts, but they need interpretation. Science is

a narrative of how we understand the world. A description without

explanation is not science because it does not provide a plausible

meaning to what we observed. This meaning brings with itself a view

of how the world works, which can be applied in new situations and

predict what will happen. It can be falsiﬁable.

To illustrate, take the ancient human desire of ﬂying. Since antiquity,

there have been stories of men strapping wings to themselves and

attempting to ﬂy by jumping from a tower and ﬂapping those wings

like birds (see ﬁgure 1.2). While the issues of lift, stability, and control

were poorly understood, most attempts ended in severe injury or even

death. It did not matter how much evidence, how many hours of seing

different animals ﬂying did those ludricous brave men experienced,

the meaning they took from what they saw was wrong and their

predictions incorrect.

They did not die in vain; Science advances when scientists are wrong.

Theories must be falsiﬁable, and scientists cheer for their failure. When

it fails, there is room for new approaches. Only when we understood

the evidence of animal ﬂight on the perspective of aerodynamics we

learned to ﬂy better than any other animal before. Science works by a

[version: June 19, 2020 at 18:52 ]

4 introduction

•

Even among AI researchers, there is a trend of “mathiness” and

speculation disguised as explanations in conference papers [33].

•

There is no place for papers that unpretentiously describe sur-

prising phenomena without trying to come up with an expla-

nation. As if the mere inconsistency of the current theoretical

framework was unworthy of publication.

While physicists rejoice on ﬁnding phenomena that contradict cur-

rent theories, computer scientists get bafﬂed. In Natural Sciences,

unexplained phenomena lead to theoretical development. In AI, they

bring “winters”.

1.2 problem

1.2.1 Learning Theory has failed deep

Artiﬁcial Intelligence has been through several “winters”, periods

of progress stagnation and lack of funding. In 1957, Herbert Simon

Herbert Simon

(1916-2001) received

the Turing Award in

1975, and the Nobel

Prize in Economics

in 1978

famously predicted that within ten years, a computer would be a

chess champion [45, section 1.3]. It took around 40 years, in any case.

Computer scientists lacked understanding of the exponential nature

of the problems they were trying to solve: Computational Complexity

Theory had yet to be invented.

Machine Learning Theory (computational and statistical) tries to

avoid a similar trap by analysing and classifying learning problems

according to the number of samples required to learn them (and the

number steps also). An honest assessment concludes it is now failing

its mission. If by one hand, it lead to the development of useful ma-

chine learning algorithms like SVMs, by the other, it has also predicted

that generalisation requires simpler models in terms of parameters,

delaying the development of Deep Learning for years. In total dis-

regard to the theory, deep learning models have shown spectacular

generalisation power with hundreds of millions of parameters.

The curse of dimensionality is quite meaningless in practice[...][24].

– Jeremy Howard, fast.ai creator

In the last decade, we have witnessed a myriad of astonishing

successes in Deep Learning. Despite those many successes in research

and industry applications, we may again be climbing a peak of inﬂated

expectations. If in the past, the false solution was to throw computation

power on problems, today we try trowing data. Such behaviour has

triggered a winner-takes-all competition among a handful of large

[version: June 19, 2020 at 18:52 ]

1.3 objective 5

corporations for who owns more data (our data), raising concerns

about privacy and concentration of power.

Yet, we know for a fact that learning from way fewer samples is

possible: humans show a much better generalisation ability than our

current state of the art artiﬁcial intelligence. To achieve such needed

generalisation power, we may need to understand better how learning

happens in deep learning. Rethinking generalisation [67] will reshape

the very foundations of machine learning theory.

1.2.2 Problem statement

The practice of modern machine learning has outpaced its theoretical

development. In particular, deep learning models present generali-

sation capabilities unpredicted by current machine learning theory.

There is yet no established new general theory of learning which

handles this problem.

In 2015, Naftali Tishby and Noga Zaslavsky published a theory of

learning based on the information-theoretical concept of the bottleneck

principle [60]. This theory is general and can explain several deep

learning phenomena inconsistent to current Machine Learning Theory.

The reason it is still not yet hors concours is three-fold:

There has been some valid criticism to the experimental setting of

the article mentioned above, which independent developments

from Achille and Soatto address.

The understanding of this new theory demands a prior knowl-

edge of Information Theory which deep learning practitioners

of today are not used to.

Efforts on this new theory are scattered and knowledge still

needs to be consolidated.

1.3 objective

This document aims to investigate the scattered efforts of using the in-

formation bottleneck principle to explain the generalisation capabilities

of deep neural networks and consolidate them into a comprehensive

digest of this new general deep learning theory.

1.4 outline

•

Chapter 2 - Artiﬁcial Intelligence: The chapter deﬁnes what

artiﬁcial intelligence is, presents the epistemological differences

of intelligent agents in history, and discusses their consequences

to the theory of machine learning.

[version: June 19, 2020 at 18:52 ]

剩余134页未读，继续阅读

努力+努力=幸运

粉丝: 17
资源: 136

信息瓶颈理论：深度学习泛化能力的整合探索

"2018年《深度学习综合隐私安全性分析》：白盒推理攻击下的独立与联合学习

"神经网络与深度学习绪论及课程概要详解

"机器学习、深度学习画图PPT模板-ML Visuals By dair.ai

Information.Theory.Inference.and.Learning.Algorithms.pdf

Theory of Point Estimation.pdf

The Theory of Information and Coding.pdf

An overview of multi-task learning.pdf

Deep.Learning.with.TensorFlow

Deep.Learning.with.Keras.epub

Deep Learning with Hadoop.pdf

最新资源