探索深度强化学习：改变世界的未来决策技术

需积分: 50 37 浏览量更新于2024-07-14 收藏 32.12MB PDF 举报

"Morales M. Grokking Deep Reinforcement Learning (MEAP Version 11) 2020.pdf" 本书《Grokking Deep Reinforcement Learning》是Manning Publications于2020年出版的早期访问计划（MEAP）版本，旨在帮助读者深入理解深度强化学习，并鼓励他们成为该领域的积极参与者。深度强化学习是一种人工智能技术，具有潜在的变革性力量，可以改变我们所知的世界。通过将人类从决策过程中移除，我们可以让计算机发挥其无与伦比的持久性和工作道德，避免人类因疲劳、偏见或不完美决策带来的问题。深度强化学习（Deep Reinforcement Learning，DRL）是机器学习的一个分支，它结合了深度学习的复杂模型表示能力和强化学习的环境交互学习机制。在DRL中，智能体通过与环境的互动学习最优策略，以最大化长期奖励。这一过程类似于动物通过试错来学习行为，因此被称为“强化”学习。深度学习则允许智能体从高维输入数据中学习抽象特征，这对于处理复杂环境如游戏、机器人控制或自动驾驶等任务至关重要。作者指出，当前的深度强化学习尽管已经取得了显著的成就，例如在围棋、Atari游戏和连续控制任务中的表现，但仍然存在许多挑战和未解决的问题。这包括学习效率低、泛化能力差、对环境变化的适应性不足以及容易过拟合等问题。这些问题的存在为研究者提供了广阔的探索空间，意味着在这个领域有大量机会进行创新和改进。深度强化学习的应用前景广阔，几乎可以涵盖所有需要持续决策的领域。例如，在医疗保健中，智能系统可以通过分析病历和症状来制定最佳治疗方案；在教育中，个性化的学习路径可以提高教学效果；在金融领域，自动交易系统能够快速做出最优投资决策；在国防中，自主无人机可以执行危险的任务；在机器人技术中，自主导航和物体抓取等能力可以大幅提升效率。实际上，任何涉及重复决策过程的场景都可能受益于深度强化学习的进步。为了充分利用深度强化学习，读者不仅需要掌握相关的理论知识，如马尔可夫决策过程（Markov Decision Process, MDP）、Q-learning、策略梯度算法等，还需要熟悉深度学习框架，如TensorFlow、PyTorch等，并具备一定的编程能力。此外，理解如何设计合适的奖励函数、如何平衡探索与利用、如何处理延迟奖励等核心问题也是至关重要的。《Grokking Deep Reinforcement Learning》这本书旨在引导读者深入这个充满潜力的领域，通过学习和实践，成为推动深度强化学习发展的贡献者，共同创造一个由机器智能驱动的更高效、更公正的世界。

展开

What is deep reinforcement learning?

Deep reinforcement learning agents

learn from sequential feedback

The action taken by the agent may have delayed consequences. The reward may be sparse

and only manifest after several time steps. Thus the agent must be able to learn from

sequential feedback. Sequential feedback gives rise to a problem referred to as the temporal

credit assignment problem. The temporal credit assignment problem is the challenge of

determining which state and/or action is responsible for a reward. When there is a temporal

component to a problem, and actions have delayed consequences, it becomes challenging to

assign credit for rewards.

In chapter 3, we'll study the ins and outs of sequential feedback in isolation. That is, your

programs learn from simultaneously sequential, supervised (as opposed to evaluative) and

exhaustive (as opposed to sampled) feedback.

The difficulty of the temporal credit assignment problem

(1) You are in state 0.

(2) OK. I'll take action A.

(3) You got +23.

(4) You are in state 3.

Agent Environment

(5) Nice! Action A again, please.

(6) No problem, -100.

(7) You are in state 3.

(8) Ouch! Get me out of here!

(9) Action B?!

(10) Sure, -100.

(11) You are in state 3.

(12) Was it taking action A in state 0 to be blamed for the -100?

Sure, choosing action A in state0 gave me a good immediate reward,

but maybe that is what sent me to state 3, which is terrible.

Should I have chosen action B in state 0?

Oh, man... Temporal credit assignment is hard...

Time

Agent

Environment

...

©Manning Publications Co. To comment go to liveBook

https://forums.manning.com/forums/grokking-deep-reinforcement-learning

The past, present, and future of deep reinforcement learning

The past, present, and future

of deep reinforcement learning

History is not necessary to gain skills, but it can allow you to understand the context around

a topic, which in turn can help you gain motivation, and therefore skills. The history of AI

and DRL should help you set expectations about the future of this powerful technology. At

times I feel the hype surrounding AI is actually productive; people get interested. But right

after that, when it's time to put in work, hype no longer helps, and it is actually a problem.

So, while I'd like to be excited about AI, I also need to set some realistic expectations.

Recent history of artificial intelligence

and deep reinforcement learning

The beginnings of DRL could be traced many years back as humans have been intrigued

by the possibility of intelligent creatures other than ourselves since antiquity. But a good

beginning could be Alan Turing's work in the 1930s, 1940s, and 1950s which paved the way

for modern computer science and AI by laying down critical theoretical foundations that

later scientists leveraged.

The most well-known of these is the Turing Test, which proposes a standard for measuring

machine intelligence: if a human interrogator is unable to distinguish a machine from

another human on a chat Q&A session, then the computer is said to count as intelligent.

Though rudimentary, the Turing Test allowed generations to wonder about the possibilities

of creating smart machines by setting a goal that researchers could pursue.

The formal beginnings of AI as an academic discipline can be attributed to John McCarthy,

an influential AI researcher who made several notable contributions to the field. To name a

few, McCarthy is credited with coining the term "artificial intelligence" in 1955, leading the

first AI conference in 1956, inventing the Lisp programming language in 1958, co-founding

the MIT AI Lab in 1959, and contributing important papers to the development AI as a field

over several decades.

Artificial intelligence winters

All the work and progress of AI early on created a great deal of excitement, but there were

also significant setbacks. Prominent AI researchers suggested we would be able to create

human-like machine intelligence within years, but this never came. Things got worse when

a well-known researcher named James Lighthill compiled a report criticizing the state of

academic research in AI. All of these developments contributed to a long period of reduced

funding and interest in AI research known as the first AI winter.

https://forums.manning.com/forums/grokking-deep-reinforcement-learning

剩余398页未读，继续阅读

身份认证购VIP最低享 7 折!

30元优惠券

tracylhp

粉丝: 1

探索深度强化学习：改变世界的未来决策技术

无人驾驶Deep Reforcement Learning.pdf

double_deep_reinforcement_learning.pdf

grokking-deep-learning.pdf

好的那么你能用xml来介绍一下近期索尼出品的漫威电影蜘蛛侠平行宇宙和他的续集吗？

消毒机器人的路径规划国内外研究现状的相关文献

ps5上有什么好玩的游戏

介绍科技与工业发展史的书有什么

写一篇800字的ps5体验报告

半监督不均衡问题近年来相关的参考文献和代码

Grokking Deep Learning（我想学深度学习）.7z

最新资源