深度学习计算机视觉：Python实战指南

需积分: 8 106 浏览量更新于2024-07-18 收藏 26.37MB PDF 举报

"深度学习是计算机视觉领域的重要技术，通过Python实现深度学习的实战书籍" 在《Deep Learning for Computer Vision with Python Starter Bundle》第一版中，作者Dr. Adrian Rosebrock详细介绍了深度学习如何应用于计算机视觉，并提供了丰富的实践指导。本书的出版归功于作者投入的时间和努力，鼓励未购买的读者支持作者，通过官方链接购买。本书的目的是为不同层次的读者提供深度学习的知识。对于初学者，书中会介绍基础概念和入门方法；而对于已经有一定经验的深度学习从业者，书中的高级主题和实践经验将有助于深化理解。书的组织结构分为三个部分： 1.1. **Volume #1: Starter Bundle** - 这一部分主要是针对深度学习新手，涵盖了基础知识和入门实践，帮助读者建立起对深度学习的理解，包括神经网络的基本原理、卷积神经网络（CNN）的工作方式以及如何用Python和相关的库（如TensorFlow、Keras等）构建和训练模型。 1.2. **Volume #2: Practitioner Bundle** - 进阶部分面向已经有一定深度学习基础的读者，讨论更复杂的主题，如深度学习模型的优化、数据预处理、超参数调整、模型的部署以及如何应对过拟合和欠拟合等问题。这部分还可能涉及实时预测、大规模数据集的处理和使用预训练模型进行迁移学习。 1.3. **Volume结构的详细章节** - 虽然没有给出具体章节内容，但通常深度学习的书籍会涵盖卷积神经网络（CNN）、循环神经网络（RNN）、长短时记忆网络（LSTM）的应用，以及如何利用这些模型进行图像分类、物体检测、语义分割等任务。此外，还会讨论损失函数、优化算法、正则化策略以及模型评估方法。通过阅读本书，读者不仅可以学习到深度学习的理论，还能掌握实际操作的技能，从而在计算机视觉领域应用深度学习解决实际问题。无论是对深度学习感兴趣的初学者，还是希望提升自己技能的专业人士，都能从中受益。

1. Introduction

“The secret of getting ahead is to get started.” – Mark Twain

Welcome to Deep Learning for Computer Vision with Python. This book is your guide to

mastering deep learning applied to practical, real-world computer vision problems utilizing the

Python programming language and the Keras + mxnet libraries. Inside this book, you’ll learn how

to apply deep learning to take-on projects such as image classiﬁcation, object detection, training

networks on large-scale datasets, and much more.

Deep Learning for Computer Vision with Python strives to be the perfect balance between

theory taught in a classroom/textbook and the actual hands-on knowledge you’ll need to be

successful in the real world.

To accomplish this goal, you’ll learn in a practical, applied manner by training networks on

your own custom datasets and even competing in challenging state-of-the-art image classiﬁcation

challenges and competitions. By the time you ﬁnish this book, you’ll be well equipped to apply

deep learning to your own projects. And with enough practice, I have no doubt that you’ll be able

to leverage your newly gained knowledge to ﬁnd a job in the deep learning space, become a deep

learning for computer vision consultant/contractor, or even start your own computer vision-based

company that leverages deep learning.

So grab your highlighter. Find a comfortable spot. And let me help you on your journey to

deep learning mastery. Remember the most important step is the ﬁrst one – to simply get started.

1.1 I Studied Deep Learning the Wrong Way. . . This Is the Right Way

I want to start this book by sharing a personal story with you:

Toward the end of my graduate school career (2013-2014), I started wrapping my head around

this whole "deep learning" thing due to a timing quirk. I was in a very unique situation. My

dissertation was (essentially) wrapped up. Each of my Ph.D. committee members had signed off on

it. However, due to university/department regulations, I still had an extra semester that I needed to

"hang around" for before I could ofﬁcially defend my dissertation and graduate. This gap essentially

16 Chapter 1. Introduction

left me with an entire semester (

≈ 4

months) to kill –

it was an excellent time to start studying

deep learning.

My ﬁrst stop, as is true for most academics, was to read through all the recent publications

on deep learning. Due to my machine learning background, it didn’t take long to grasp the actual

theoretical foundations of deep learning.

However, I’m of the opinion that until you take your theoretical knowledge and implement

it, you haven’t actually learned anything yet. Transforming theory into implementation is a

very

different process, as any computer scientist who has taken a data structures class before will

tell you: reading about red-black trees and then actually implementing them from scratch requires

two different skill sets.

And that’s exactly what my problem was.

After reading these deep learning publications, I was left scratching my head; I couldn’t take

what I learned from the papers and implement the actual algorithms, let alone reproduce the results.

Frustrated with my failed attempts at implementation, I spent hours searching on Google,

hunting for deep learning tutorials, only to come up empty-handed. Back then, there weren’t many

deep learning tutorials to be found.

Finally, I resorted to playing around with libraries and tools such as Caffe, Theano, and Torch,

blindly followed poorly written blog posts (with mixed results, to say the least).

I wanted to get started, but nothing had actually clicked yet – the deep learning lightbulb in my

head was stuck in the “off” position.

To be totally honest with you, it was a painful, emotionally trying semester. I could clearly see

the value of deep learning for computer vision, but I had nothing to show for my effort, except for a

stack of deep learning papers on my desk that I understood but struggled to implement.

During the last month of the semester, I ﬁnally found my way to deep learning success through

hundreds of trial-and-error experiments, countless late nights, and a lot of perseverance. In the long

run, those four months made a massive impact on my life, my research path, and how I understand

deep learning today. . .

. . . but I would not advise you to take the same path I did.

If you take anything from my personal experience, it should be this:

1. You don’t need a decade of theory to get started in deep learning.

2. You don’t need pages and pages of equations.

3. And you certainly don’t need a degree in computer science (although it can be helpful).

When I got started studying deep learning, I made the critical mistake of taking a deep dive into

the publications without ever resurfacing to try and implement what I studied. Don’t get me wrong

– theory is important. But if you don’t (or can’t) take your newly minted theoretical knowledge and

use apply it to build actual real-world applications, you’ll struggle to ﬁnd your space in the deep

learning world.

Deep learning, and most other higher-level, specialized computer science subjects are recog-

nizing that theoretical knowledge is not enough –

we need to be practitioners in our respective

ﬁelds as well

. In fact, the concept of becoming a deep learning practitioner was my

exact motiva-

tion in writing Deep Learning for Computer Vision with Python.

While there are:

Textbooks that will teach you the theoretical underpinnings of machine learning, neural

networks, and deep learning

And countless “cookbook”-style resources that will “show you in code”, but never relate the

code back to true theoretical knowledge. . .

. . . none of these books or resources will serve as the bridge between the other.

On one side of the bridge you have your textbooks, deeply rooted in theory and abstraction.

And on the other side, you have “show me in code” books that simply present examples to you,

1.2 Who This Book Is For 17

perhaps explaining the code, but never relating the code back to the underlying theory.

There is a fundamental disconnect between these two styles of learning, a gap that I want

to help ﬁll so you can learn in a better, more efﬁcient way.

I thought back to my graduate school days, to my feelings of frustration and irritation, to the

days when I even considered giving up. I channeled these feelings as I sat down to write this

book.

The book you’re reading now is the book I wish I had when I ﬁrst started studying deep

learning.

Inside the remainder of Deep Learning for Computer Vision with Python, you’ll ﬁnd

super

practical walkthroughs

hands-on tutorials (with lots of code)

, and a

no-nonsense teaching

style

that is guaranteed to cut through all the cruft and help you master deep learning for computer

vision.

Rest assured, you’re in good hands – this is the exact book that you’ve been looking for, and

I’m incredibly excited to be joining you on your deep learning for visual recognition journey.

1.2 Who This Book Is For

This book is for

developers

researchers

, and

students

who want to become proﬁcient in deep

learning for computer vision and visual recognition.

1.2.1 Just Getting Started in Deep Learning?

Don’t worry. You won’t get bogged down by tons of theory and complex equations. We’ll start

off with the basics of machine learning and neural networks. You’ll learn in a fun, practical way

with lots of code. I’ll also provide you with references to seminal papers in the machine learning

literature that you can use to extend your knowledge once you feel like you have a solid foundation

to stand on.

The most important step you can take right now is to simply get started. Let me take care of the

teaching – regardless of your skill level, trust me, you will not get left behind. By the time you

ﬁnish the ﬁrst few chapters of this book, you’ll be a neural network ninja and be able to graduate to

the more advanced content.

1.2.2 Already a Seasoned Deep Learning Practitioner?

This book isn’t just for beginners – there’s advanced content in here too. For each chapter in this

book, I provide a set of academic references you can use to further your knowledge. Many chapters

inside Deep Learning for Computer Vision with Python actually explain these academic concepts in

a manner that is easily understood and digested.

Best of all, the solutions and tactics I provide inside this book can be directly applied to

your current job and research. The time you’ll save by reading through Deep Learning for

Computer Vision with Python will more than pay for itself once you apply your knowledge to your

projects/research.

1.3 Book Organization

Since this book covers a huge amount of content, I’ve broken down the book into

three vol-

umes

called

“bundles”

. Each bundle sequentially builds on top of the other and includes all

chapters from the lower volumes. You can ﬁnd a quick breakdown of the bundles below.

1.3.1 Volume #1: Starter Bundle

The Starter Bundle is a great ﬁt if you’re taking your ﬁrst steps toward deep learning for image

classiﬁcation mastery.

18 Chapter 1. Introduction

You’ll learn the basics of:

1. Machine learning

2. Neural Networks

3. Convolutional Neural Networks

4. How to work with your own custom datasets

1.3.2 Volume #2: Practitioner Bundle

The Practitioner Bundle builds on the Starter Bundle and is perfect if you want to study deep

learning in-depth, understand advanced techniques, and discover common best practices and rules

of thumb.

1.3.3 Volume #3: ImageNet Bundle

The ImageNet Bundle is the complete deep learning for computer vision experience. In this volume

of the book, I demonstrate how to train large-scale neural networks on the massive ImageNet dataset

as well as tackle real-world case studies, including age + gender prediction, vehicle make + model

identiﬁcation, facial expression recognition, and much more.

1.3.4 Need to Upgrade Your Bundle?

If you would ever like to upgrade your bundle, all you have to do is send me a message and we can

get the upgrade taken care of ASAP:

http://www.pyimagesearch.com/contact/

1.4 Tools of the Trade: Python, Keras, and Mxnet

We’ll be utilizing the Python programming language for all examples in this book. Python is an

extremely easy language to learn. It has intuitive syntax. Is super powerful. And it’s

the best

way to work with deep learning algorithms.

The primary deep learning library we’ll be using is Keras [1]. The Keras library is maintained

by the brilliant François Chollet, a deep learning researcher and engineer at Google. I have been

using Keras for years and can say that it’s hands-down my favorite deep learning package. As a

minimal, modular network library that can use either Theano or TensorFlow as a backend, you just

can’t beat Keras.

The second deep learning library we’ll be using is mxnet [2] (ImageNet Bundle only), a

lightweight, portable, and ﬂexible deep learning library. The mxnet package provides bindings

to the Python programming language and specializes in distributed, multi-machine learning – the

ability to parallelize training across GPUs/devices/nodes is critical when training deep neural

network architectures on massive datasets (such as ImageNet).

Finally, we’ll also be using a few computer vision, image processing, and machine learning

libraries such as OpenCV, scikit-image, scikit-learn, etc.

Python, Keras, and mxnet are well-built tools that when combined tighter create a powerful deep

learning development environment that you can use to master deep learning for visual recognition.

1.4.1 What About TensorFlow?

TensorFlow [3] and Theano [4] are libraries for deﬁning abstract, general-purpose computation

graphs. While they are used for deep learning, they are not deep learning frameworks and are in

fact used for a great many other applications than deep learning.

Keras, on the other hand, is a deep learning framework that provides a well-designed API

to facilitate building deep neural networks with ease. Under the hood, Keras uses either the

剩余331页未读，继续阅读

Niklause

粉丝: 9

深度学习计算机视觉：Python实战指南

最新资源