没有合适的资源?快使用搜索试试~ 我知道了~
首页2018ICML会议笔记
资源详情
资源评论
资源推荐

ICML 2018 Notes
Stockholm, Sweden
David Abel
∗
david_abel@brown.edu
July 2018
Contents
1 Conference Highlights 3
2 Tuesday July 10th 3
2.1 Tutorial: Toward Theoretical Understanding of Deep Learning . . . . . . . . . . . . 4
2.1.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Overparameterization and Generalization Theory . . . . . . . . . . . . . . . . 6
2.1.3 The Role of Depth in Deep Learning . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Theory of Generative Models and Adversarial Nets . . . . . . . . . . . . . . . 9
2.1.5 Deep Learning Free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Tutorial: Optimization Perspectives on Learning to Control . . . . . . . . . . . . . . 11
2.2.1 Introduction: RL, Optimzation, and Control . . . . . . . . . . . . . . . . . . 11
2.2.2 Different Approaches to Learning to Control . . . . . . . . . . . . . . . . . . 14
2.2.3 Learning Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 Model-Based RL To The Rescue . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Wednesday July 11th 20
3.1 Best Paper 1: Obfuscated Gradients Give a False Sense of Security [7] . . . . . . . . 20
3.2 Reinforcement Learning 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Problem Dependent RL Bounds to Identify Bandit Structure in MDPs [46] . 22
3.2.2 Learning with Abandonment [38] . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.3 Lipschitz Continuity in Model-Based RL [6] . . . . . . . . . . . . . . . . . . . 24
3.2.4 Implicit Quantile Networks for Distributional RL [13] . . . . . . . . . . . . . 24
3.2.5 More Robust Doubly Robust Off-policy Evaluation [17] . . . . . . . . . . . . 25
3.3 Reinforcement Learning 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 Coordinating Exploration in Concurrent RL [15] . . . . . . . . . . . . . . . . 26
3.3.2 Gated Path Planning Networks [29] . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
∗
http://david-abel.github.io/
1

3.4.1 PredRNN++: Towards a Resolution of the Deep-in-Time Dilemma [42] . . . 27
3.4.2 Hierarchical Long-term Video Prediction without Supervision [43] . . . . . . 27
3.4.3 Evolving Convolutional Autoencoders for Image Restoration [40] . . . . . . . 28
3.4.4 Model-Level Dual Learning [45] . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Reinforcement Learning 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.1 Machine Theory of Mind [34] . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.2 Been There Done That: Meta-Learning with Episodic Recall [36] . . . . . . . 30
3.5.3 Transfer in Deep RL using Successor Features in GPI [9] . . . . . . . . . . . . 31
3.5.4 Continual Reinforcement Learning with Complex Synapses [26] . . . . . . . . 31
4 Thursday July 12th 32
4.1 Intelligence by the Kilowatthour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.1 Free Energy, Energy, and Entropy . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 Energy Efficient Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Best Paper 2: Delayed Impact of Fair Machine Learning . . . . . . . . . . . . . . . . 34
4.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.1 Decoupling Gradient Like Learning Rules from Representations . . . . . . . . 36
4.3.2 PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos [33] 36
5 Friday July 13th 36
5.1 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.1 Hierarchical Imitation and Reinforcement Learning [28] . . . . . . . . . . . . 37
5.1.2 Using Reward Machines for High-Level Task Specification [23] . . . . . . . . 38
5.1.3 Policy Optimization with Demonstrations [25] . . . . . . . . . . . . . . . . . . 38
5.2 Language to Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.1 Grounding Verbs to Perception . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.2 Grounding Language to Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Building Machines that Learn and Think Like People . . . . . . . . . . . . . . . . . 41
6 Saturday July 14th: Lifelong RL Workshop 42
6.1 Multitask RL for Zero-shot Generalization with Subtask Dependencies . . . . . . . . 42
6.2 Unsupervised Meta-Learning for Reinforcement Learning . . . . . . . . . . . . . . . 43
7 Sunday July 15th: Workshops 44
7.1 Workshop: Exporation in RL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.1.1 Meta-RL of Structured Exploration Strategies . . . . . . . . . . . . . . . . . . 44
7.1.2 Counter-Based Exploration with the Successor Representation . . . . . . . . 45
7.1.3 Is Q Learning Provably Efficient . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.1.4 Upper Confidence Bounds Action-Values . . . . . . . . . . . . . . . . . . . . . 46
7.2 Workshop: AI for Wildlife Conservation . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2.1 Data Innovation in Wildlife Conservation . . . . . . . . . . . . . . . . . . . . 47
7.2.2 Computing Robust Strategies for Managing Invasive Paths . . . . . . . . . . 49
7.2.3 Detecting and Tracking Communal Bird Roosts in Weather Data . . . . . . . 50
7.2.4 Recognition for Camera Traps . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.2.5 Crowdsourcing Mountain Images for Water Conservation . . . . . . . . . . . 51
7.2.6 Detecting Wildlife in Drone Imagery . . . . . . . . . . . . . . . . . . . . . . . 51
2

This document contains notes I took during the events I managed to make it to at ICML in Stock-
holm, Sweden. Please feel free to distribute it and shoot me an email at david_abel@brown.edu if
you find any typos or other items that need correcting.
1 Conference Highlights
Some folks jokingly called it ICRL this year — the RL sessions were in the biggest room and appar-
ently had the most papers. It’s pretty wild. A few of my friends in RL were reminiscing over the
times when there were a dozen or so RL folks at a given big ML conference. My primary research
area is in RL, so I tend to track the RL talks most closely (but I do care deeply about the broader
community, too), All that being said, these notes are heavily biased toward the RL sessions. Also,
I was spending quite a bit more time prepping for my talks/poster sessions so I missed a bit more
than usual.
Some takeaways:
• I’d like to see more explanatory papers in RL – that is, instead of focusing on introducing
new algorithms that perform better on our benchmarks, reflecting back on the techniques
we’ve introduced and do a deep analysis (either theoretical or experimental) to uncover what,
precisely, these methods do.
• I’m going to spend some time thinking about what it would look like to make foundational
progress in RL without MDPs at the core of the result, (there’s some nice work out there
already [30]).
• Lots of tools are sophisticated and robust enough to make a huge impact, now. If you’re into
AI for the long haul Utopia style vision of the future, now is a good time to start thinking
deeply about how to help the world with the tools we’ve been developing. As a start take a
look at the AI for Wildlife Conservation workshop (and comp sust community
1
).
• Sanjeev Arora’s Deep Learning Theory tutorial and Ben Recth’s Optimization tutorial were
both excellent – I’d suggest taking a look at each if you get time. The main ideas for me were
(Sanjeev) we might want to think about doing unsupervised learning with more connection
to downstream tasks, and (Ben) RL and Control theory have loads in common, and the
communities should talk more.
2 Tuesday July 10th
It begins! Tuesday starts with Tutorials (I missed the morning session due to jet lag). I’ll be
attending the Theory of Deep Learning tutorial and the Optimization for Learning to Control
tutorial.
1
http://www.compsust.net/
3

2.1 Tutorial: Toward Theoretical Understanding of Deep Learning
Sanjeev Arora is speaking.
2
Some Terminology:
• Parameters of deep net
• (x
1
, y
1
) . . . , (x
i
, y
i
) iid training from distribution D
• `(θ, x, y): Loss function
• Objective: arg min
θ
E
i
[`(θ, x
i
, y
i
]
• Gradient Descent:
θ
t+1
← θ
t
− η∇
θ
E
i
[`(θ
t
, x
i
, y
i
)] (1)
Point: Optimization concepts already shape deep learning.
Goal of Theory: Theorems that sort through competing intuitions, lead to new insights and
concepts. A mathematical basis for new ideas.
Talk Overview:
1. Optimization: When/how can it find decent solutions. Highly nonconvex.
2. Overparameterization/Generalizations: When the # parameters >> # training samples.
Does it help? Why no nets generalize?
3. The Role of Depth
4. Unsupervised Learning/GANs
5. Simpler methods to Replace Deep Learning
2.1.1 Optimization
Point: Optimization concepts have already helped shape deep learning.
Hurdle: Most optimization problems are non-convex. So, we don’t expect to have polynomial
time algorithms.
Possible Goals of Optimization:
• Find critical point ∇ = 0.
• Find local optimum: ∇
2
is positive semi-definite.
• Find global optimum, θ
∗
.
Assumptions about initialization:
• Prove convergence from all starting points θ
0
2
Video will be available here: https://icml.cc/Conferences/2018/Schedule?type=Tutorial.
4

• Prove random initial points will converge.
• Prove initialization from special initial points.
Note: if optimization is in R
d
, then you want run time poly(d, 1/ε), where ε = accuracy. The naive
upper bound is exponential in exp d/ε.
Curse of Dimensionality: In R
d
, ∃exp(d) directions whose parwise angle is > 60
◦
. Thus,
∃exp(d/ε) special directions s.t. all directions have angle at most ε with one of these (an “ε-cover”).
Black box for analysis of Deep Learning. Why: don’t know the landscape, really, just the loss func-
tion. We have basically no mathematical characterization of (x, y), since y is usually a complicated
function of x (think about classifying objects in images: x is an image, y is “dog”).
Instead, we can get: θ → f → f(θ), ∇f
θ
. Using just this blackbox analysis, we can’t get global
optimums.
Gradient Descent:
• ∇ 6= 0: so, there is a descent direction.
• But, if ∇
2
is high, allows ∇ to fluctuate a lot!
• So, to ensure descent, we must take small steps determined by smoothness:
∇
2
f(θ) ≤ βI (2)
Claim 2.1. If η = 1/2β, then we can achieve |∇f| < ε, in # steps proportional to β/ε
2
.
Proof.
f(θ
t
) − f(θ
t+1
) ≥ ∇f(θ
t
)(θ
t+1
− θ
t
) −
1
2
β|θ
t
− θ
t+1
|
2
= η|∇
t
|
2
−
1
2
βη
2
|∇
t
|
2
=
1
2β
= |∇
t
|
2
But, the solution here is just a critical point, which is a bit too weak. One idea to improve: avoid
saddle points, as in Perturbed SGD introduced by Ge et al. [18].
What about 2nd order optimization? Like the Newton Method. So, we instead consider:
θ → f → f(θ), ∇f
θ
, ∇
2
f
θ
, (3)
which lets us make stronger guarantees about solutions at the expense of additional computation.
Non-black box analyses. Lots of ML problems that are subclasses of depth two neural networks:
• Make assumptions about net’s structure, data distribution, etc.
• May use different algorithms from SGD.
5
剩余54页未读,继续阅读










安全验证
文档复制为VIP权益,开通VIP直接复制

评论0