Python深度学习实战：计算机视觉入门指南

需积分: 0 141 浏览量更新于2024-07-17 收藏 9.56MB PDF 举报

《深度学习计算机视觉实战指南：Python版》（Deep Learning for Computer Vision with Python Practitioner）是由Adrian Rosebrock博士编写的实用教材，适合那些希望深入了解深度学习在计算机视觉领域应用的Python开发者。该书是第一版的1.1.0，版权由Adrian Rosebrock持有，发布于PyImageSearch.com。本书的核心内容围绕深度学习技术在计算机视觉中的实践展开，重点介绍如何使用Python进行实现。作者强调了数据增强在训练深度学习模型中的关键作用，它能够帮助提高模型的泛化能力，防止过拟合，并通过实例来演示其效果。第2章“数据增强”深入探讨了数据增强的概念，包括： 1. **什么是数据增强？** 数据增强是一种策略，通过对原始数据进行一系列变换（如旋转、翻转、缩放、裁剪等），生成多样化的训练样本，使模型能更好地适应各种输入变化。 2. **可视化数据增强** - 通过可视化展示，读者可以直观地理解数据增强过程对图像带来的变化，这对于理解其背后原理至关重要。 3. **对比训练与数据增强** - 分别比较使用和不使用数据增强时模型的训练效果。例如，书中提到的Flowers-17数据集，通过对比实验展示了数据增强显著提高了模型在花卉分类任务上的性能。 4. **具体案例** - 提供了Flowers-17数据集的不同处理方式，比如无数据增强的情况和有数据增强的情况，对比分析了两种情况下模型的训练结果和验证准确率。 5. **预处理技巧** - 除了数据增强，还介绍了“aspect-aware preprocessing”，即考虑图像宽高比的预处理方法，以确保在不同尺度下的图像都能得到有效处理。此外，书籍还鼓励读者购买正版支持作者，以推动更多高质量的IT教育资源的产生。全书旨在通过实际操作和实例，帮助读者掌握深度学习在计算机视觉中的应用，无论你是初学者还是有一定经验的开发人员，都能从中受益匪浅。通过阅读这本书，你将能构建自己的深度学习项目，解决实际的计算机视觉问题。

14 Chapter 2. Data Augmentation

Figure 2.1:

Left:

A sample of 250 data points that follow a normal distribution exactly.

Right:

Adding a small amount of random “jitter” to the distribution. This type of data augmentation can

increase the generalizability of our networks.

Let’s consider the Figure 2.1 (left) of a normal distribution with zero mean and unit variance.

Training a machine learning model on this data may result in us modeling the distribution exactly –

however, in real-world applications, data rarely follows such a neat distribution.

Instead, to increase the generalizability of our classiﬁer, we may ﬁrst randomly jitter points

along the distribution by adding some values

drawn from a random distribution (right). Our plot

still follows an approximately normal distribution, but it’s not a perfect distribution as on the left. A

model trained on this data is more likely to generalize to example data points not included in the

training set.

In the context of computer vision, data augmentation lends itself naturally. For example, we

can obtain additional training data from the original images by apply simple geometric transforms

such as random:

1. Translations

2. Rotations

3. Changes in scale

4. Shearing

5. Horizontal (and in some cases, vertical) ﬂips

Applying a (small) amount of these transformations to an input image will change its appearance

slightly, but it does not change the class label – thereby making data augmentation a very natural,

easy method to apply to deep learning for computer vision tasks. More advanced techniques for

data augmentation applied to computer vision include random perturbation of colors in a given

color space [6] and nonlinear geometric distortions [7].

2.2 Visualizing Data Augmentation

The best way to understand data augmentation applied to computer tasks is to simply visualize a

given input being augmented and distorted. To accomplish this visualization, let’s build a simple

Python script that uses the built-in power of Keras to perform data augmentation. Create a new ﬁle,

name it augmentation_demo.py. and insert the following code:

2.2 Visualizing Data Augmentation 15

1 # import the necessary packages

2 from keras.preprocessing.image import ImageDataGenerator

3 from keras.preprocessing.image import img_to_array

4 from keras.preprocessing.image import load_img

5 import numpy as np

6 import argparse

Lines 2-6

import our required Python packages. Take note of

Line 2

where we import the

ImageDataGenerator

class from Keras – this code will be used for data augmentation and includes

all relevant methods to help us transform our input image.

Next, we parse our command line arguments:

8 # construct the argument parse and parse the arguments

9 ap = argparse.ArgumentParser()

10 ap.add_argument("-i", "--image", required=True,

11 help="path to the input image")

12 ap.add_argument("-o", "--output", required=True,

13 help="path to output directory to store augmentation examples")

14 ap.add_argument("-p", "--prefix", type=str, default="image",

15 help="output filename prefix")

16 args = vars(ap.parse_args())

Our script requires three command line arguments, each detailed below:

• --image

: This is the path to the input image that we want to apply data augmentation to and

visualize the results.

• --output

: After applying data augmentation to a given image, we would like to store the

result on disk so we can inspect it – this switch controls the output directory.

• --prefix: A string that will be prepended to the output image ﬁlename.

Now that our command line arguments are parsed, let’s load our input image, convert it to a

Keras-compatible array, and add an extra dimension to the image, just as we would do if we were

preparing our image for classiﬁcation:

18 # load the input image, convert it to a NumPy array, and then

19 # reshape it to have an extra dimension

20 print("[INFO] loading example image...")

21 image = load_img(args["image"])

22 image = img_to_array(image)

23 image = np.expand_dims(image, axis=0)

We are now ready to initialize our ImageDataGenerator:

25 # construct the image generator for data augmentation then

26 # initialize the total number of images generated thus far

27 aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1,

28 height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,

29 horizontal_flip=True, fill_mode="nearest")

30 total = 0

The

ImageDataGenerator

class has a number of parameters, too many to enumerate in

this book. For a full review of the parameters, please refer to the ofﬁcial Keras documentation

(http://pyimg.co/j8ad8).

16 Chapter 2. Data Augmentation

Instead, we’ll be focusing on the augmentation parameters you will most likely use in your

own applications. The

rotation_range

parameter controls the degree range of the random

rotations. Here we’ll allow our input image to be randomly rotated

±30

degrees. Both the

width_shift_range

and

height_shift_range

are used for horizontal and vertical shifts, re-

spectively. The parameter value is a fraction of the given dimension, in this case, 10%.

The

shear_range

controls the angle in counterclockwise direction as radians in which our

image will allowed to be sheared. We then have the

zoom_range

, a ﬂoating point value that allows

the image to be “zoomed in” or “zoomed out” according to the following uniform distribution of

values: [1 - zoom_range, 1 + zoom_range].

Finally, the horizontal_flip boolean controls whether or not a given input is allowed to be

ﬂipped horizontally during the training process. For most computer vision applications a horizontal

ﬂip of an image does not change the resulting class label – but there are applications where a

horizontal (or vertical) ﬂip does change the semantic meaning of the image. Take care when

applying this type of data augmentation as our goal is to slightly modify the input image, thereby

generating a new training sample, without changing the class label itself. For a more detailed review

of image transformations, please refer to Module #1 in PyImageSearch Gurus ([8], PyImageSearch

Gurus) as well as Szeliski [9].

Once ImageDataGenerator is initialized, we can actually generate new training examples:

32 # construct the actual Python generator

33 print("[INFO] generating images...")

34 imageGen = aug.flow(image, batch_size=1, save_to_dir=args["output"],

35 save_prefix=args["prefix"], save_format="jpg")

37 # loop over examples from our image data augmentation generator

38 for image in imageGen:

39 # increment our counter

40 total += 1

42 # if we have reached 10 examples, break from the loop

43 if total == 10:

44 break

Lines 34 and 35

initialize a Python generator used to construct our augmented images. We’ll

pass in our input

image

, a

batch_size

(since we are only augmenting one image), along with

a few additional parameters to specify the output image ﬁle paths, the preﬁx for each ﬁle path,

and the image ﬁle format.

Line 38

then starts looping over each

image

in the

imageGen

generator.

Internally,

imageGen

is automatically generating a new training sample each time one is requested

via the loop. We then increment the total number of data augmentation examples written to disk

and stop the script from executing once we’ve reached ten examples.

To visualize data augmentation in action, we’ll be using Figure 2.2 (left), an image of Jemma,

my family beagle. To generate new training example images of Jemma, just execute the following

command:

$ python augmentation_demo.py --image jemma.png --output output

After the script executes you should see ten images in the output directory:

$ ls output/

image_0_1227.jpg image_0_2358.jpg image_0_4205.jpg image_0_4770.jpg

2.3 Comparing Training With and Without Data Augmentation 17

Figure 2.2:

Left:

The input image we are going to apply data augmentation to.

Right:

A montage

of data augmentation examples. Notice how each image has been randomly rotated, sheared,

zoomed, and horizontally ﬂipped.

image_0_1933.jpg image_0_2914.jpg image_0_4657.jpg image_0_6934.jpg

image_0_9197.jpg image_0_953.jpg

I have constructed a montage of each of these images so you can visualize them in Figure 2.2

(right). Notice how each image has been randomly rotated, sheared, zoomed, and horizontally

ﬂipped. In each case the image retains the original class label: dog; however, each image has been

modiﬁed slightly, thereby giving our neural network new patterns to learn from when training. Since

the input images will constantly be changing (while the class labels remain the same), it’s common

to see our training accuracy decrease when compared to training without data augmentation.

However, as we’ll ﬁnd out later in this chapter, data augmentation can help dramatically reduce

overﬁtting, all the while ensuring that our model generalizes better to new input samples. Further-

more, when working with datasets where we have too few examples to apply deep learning, we

can utilize data augmentation to generate additional training data, thereby reducing the amount of

hand-labeled data required to train a deep learning network.

2.3 Comparing Training With and Without Data Augmentation

In the ﬁrst part of this section, we’ll discuss the Flowers-17 dataset, a very small dataset (in terms

of deep learning for computer vision tasks), and how data augmentation can help us artiﬁcially

increase the size of this dataset by generating additional training samples. From there we’ll perform

two experiments:

1. Train MiniVGGNet on Flowers-17 without data augmentation.

2. Train MiniVGGNet on Flowers-17 with data augmentation.

As we’ll ﬁnd out, applying data augmentation dramatically reduces overﬁtting and allows

MiniVGGNet to obtain substantially higher classiﬁcation accuracy.

2.3.1 The Flowers-17 Dataset

The Flowers-17 dataset [10] is a ﬁne-grained classiﬁcation challenge where our task is to recognize

17 distinct species of ﬂowers. The image dataset is quite small, having only 80 images per class for

a total of 1,360 images. A general rule of thumb when applying deep learning to computer vision

tasks is to have 1,000-5,000 examples per class, so we are certainly at a huge deﬁcit here.

We call the Flowers-17 a ﬁne-grained classiﬁcation task because all categories are very similar

(i.e., species of ﬂower). In fact, we can think of each of these categories as subcategories. The

categories are certainly different, but share a signiﬁcant amount of common structure (e.x., petals,

18 Chapter 2. Data Augmentation

Figure 2.3: A sample of ﬁve (out of the seventeen total) classes in the Flowers-17 dataset where

each class represents a speciﬁc ﬂower species.

stamen, pistil, etc.). Fine-grained classiﬁcation tasks tend to be the most challenging for deep

learning practitioners as it implies that our machine learning models need to learn extremely discrim-

inating features to distinguish between classes that are very similar. This ﬁne-grained classiﬁcation

task becomes even more problematic given our limited training data.

2.3.2 Aspect-aware Preprocessing

Up until this point, we have only preprocessed images by resizing them to a ﬁxed size, ignoring the

aspect ratio. In some situations, especially for basic benchmark datasets, doing so is acceptable.

However, for more challenging datasets we should still seek to resize to a ﬁxed size, but

maintain the aspect ratio. To visualize this action, consider Figure 2.4.

On the left, we have an input image that we need to resize to a ﬁxed width and height. Ignoring

the aspect ratio, we resize the image to

256 × 256

pixels (middle), effectively squishing and

distorting the image such that it meets our desired dimensions. A better approach would be to

take into account the aspect ratio of the image (right) where we ﬁrst resize along the shorter

dimension such that the width is 256 pixels and then crop the image along the height, such that the

height is 256 pixels.

While we have effectively discarded part of the image during the crop, we have also maintained

the original aspect ratio of the image. Maintaining a consistent aspect ratio allows our Convolutional

Neural Network to learn more discriminative, consistent features. This is a common technique that

we’ll be applying when working with more advanced datasets throughout the rest of the Practitioner

Bundle and ImageNet Bundle.

To see how aspect-aware preprocessing is implemented, let’s update our

pyimagesearch

project structure to include a AspectAwarePreprocessor:

--- pyimagesearch

| |--- __init__.py

| |--- callbacks

| |--- nn

| |--- preprocessing

| | |--- __init__.py

| | |--- aspectawarepreprocessor.py

剩余209页未读，继续阅读

Alpha95

粉丝: 1

Python深度学习实战：计算机视觉入门指南

Deep_Learning_for_Computer_Vision_with_Python_Practitioner_Bundle.pdf

Deep_Learning_for_Computer_Vision_with_Python_Practitioner

Deep_Learning_for_Computer_Vision_with_Python_Practitioner 第三册

Deep_Learning_for_Computer_Vision_with_Python-2-Practitioner Bun

Deep_Learning_for_Computer_Vision_with_Python_Adrian Rosebrock

Deep Learning for Computer Vision with Python 2 Practitioner Bundle.pdf

Deep Learning for Computer Vision with_Python_Practitioner Bundle【完整版】

Deep Learning for Computer Vision with Python Practitioner Bundle

[Adrian_Rosebrock]Deep Learning for Computer Vision with Python_1_2

Deep Learning for Computer Vision with Python 3 ImageNetBundle.pdf

最新资源