深度学习计算机视觉：目标检测与数据增强实战

需积分: 22 92 浏览量更新于2024-07-18 1 收藏 9.53MB PDF 举报

"本书主要关注目标检测领域，特别是深度学习在计算机视觉中的应用，并强调了数据增强技术在训练过程中的重要性。作者Adrian Rosebrock通过Python实践者合集详细介绍了这些概念，旨在帮助读者理解和提升目标检测模型的性能。" 在目标检测中，深度学习扮演着至关重要的角色。它利用神经网络模型，如R-CNN、Fast R-CNN、Faster R-CNN、YOLO（You Only Look Once）和Mask R-CNN等，来识别并定位图像中的特定对象。这些模型通常包含卷积层、池化层以及用于分类和定位的全连接层，它们能够学习特征并逐步提高对目标的理解。数据增强是深度学习训练过程中的一个关键环节，尤其在目标检测中。它通过对原始训练数据进行各种变换，如旋转、缩放、翻转、裁剪等，以增加模型的泛化能力。这样做可以有效地防止过拟合，即模型过度学习训练数据的特定细节而无法很好地处理未见过的数据。书中提到，数据增强不仅可以提高模型的性能，还能帮助网络更好地适应不同场景和视角的变化。 2.1 What Is Data Augmentation? 这一部分会详细解释数据增强的基本原理和目的，阐述它是如何通过创建虚拟的新样本来扩大训练集的。 2.2 Visualizing Data Augmentation 部分将展示数据增强的实际效果，通过可视化的方式帮助读者直观理解这些变换如何改变图像并影响模型的训练。 2.3 Comparing Training With and Without Data Augmentation 部分通过对比实验，展示了在有无数据增强的情况下，模型的训练效果。书中以Flowers-17数据集为例，分别在没有数据增强和使用数据增强的情况下进行训练，分析它们的差异。Aspect-aware Preprocessing是一种特定的数据预处理方法，可以确保在增强图像时保持其原有的纵横比，避免形状失真对模型的影响。书中还详细介绍了没有数据增强的Flowers-17实验结果以及使用数据增强后的改进。通过这部分内容的学习，读者可以了解到数据增强对于目标检测模型的重要性，以及如何有效地应用数据增强来优化模型的性能。无论是对初学者还是有一定经验的开发者来说，这本书都提供了宝贵的实战经验和理论知识，有助于他们在目标检测领域取得更好的成果。

展开

14 Chapter 2. Data Augmentation

Figure 2.1:

Left:

A sample of 250 data points that follow a normal distribution exactly.

Right:

Adding a small amount of random “jitter” to the distribution. This type of data augmentation can

increase the generalizability of our networks.

Let’s consider the Figure 2.1 (left) of a normal distribution with zero mean and unit variance.

Training a machine learning model on this data may result in us modeling the distribution exactly –

however, in real-world applications, data rarely follows such a neat distribution.

Instead, to increase the generalizability of our classiﬁer, we may ﬁrst randomly jitter points

along the distribution by adding some values

drawn from a random distribution (right). Our plot

still follows an approximately normal distribution, but it’s not a perfect distribution as on the left. A

model trained on this data is more likely to generalize to example data points not included in the

training set.

In the context of computer vision, data augmentation lends itself naturally. For example, we

can obtain additional training data from the original images by apply simple geometric transforms

such as random:

1. Translations

2. Rotations

3. Changes in scale

4. Shearing

5. Horizontal (and in some cases, vertical) ﬂips

Applying a (small) amount of these transformations to an input image will change its appearance

slightly, but it does not change the class label – thereby making data augmentation a very natural,

easy method to apply to deep learning for computer vision tasks. More advanced techniques for

data augmentation applied to computer vision include random perturbation of colors in a given

color space [6] and nonlinear geometric distortions [7].

2.2 Visualizing Data Augmentation

The best way to understand data augmentation applied to computer tasks is to simply visualize a

given input being augmented and distorted. To accomplish this visualization, let’s build a simple

Python script that uses the built-in power of Keras to perform data augmentation. Create a new ﬁle,

name it augmentation_demo.py. and insert the following code:

2.2 Visualizing Data Augmentation 15

1 # import the necessary packages

2 from keras.preprocessing.image import ImageDataGenerator

3 from keras.preprocessing.image import img_to_array

4 from keras.preprocessing.image import load_img

5 import numpy as np

6 import argparse

Lines 2-6

import our required Python packages. Take note of

Line 2

where we import the

ImageDataGenerator

class from Keras – this code will be used for data augmentation and includes

all relevant methods to help us transform our input image.

Next, we parse our command line arguments:

8 # construct the argument parse and parse the arguments

9 ap = argparse.ArgumentParser()

10 ap.add_argument("-i", "--image", required=True,

11 help="path to the input image")

12 ap.add_argument("-o", "--output", required=True,

13 help="path to output directory to store augmentation examples")

14 ap.add_argument("-p", "--prefix", type=str, default="image",

15 help="output filename prefix")

16 args = vars(ap.parse_args())

Our script requires three command line arguments, each detailed below:

• --image

: This is the path to the input image that we want to apply data augmentation to and

visualize the results.

• --output

: After applying data augmentation to a given image, we would like to store the

result on disk so we can inspect it – this switch controls the output directory.

• --prefix: A string that will be prepended to the output image ﬁlename.

Now that our command line arguments are parsed, let’s load our input image, convert it to a

Keras-compatible array, and add an extra dimension to the image, just as we would do if we were

preparing our image for classiﬁcation:

18 # load the input image, convert it to a NumPy array, and then

19 # reshape it to have an extra dimension

20 print("[INFO] loading example image...")

21 image = load_img(args["image"])

22 image = img_to_array(image)

23 image = np.expand_dims(image, axis=0)

We are now ready to initialize our ImageDataGenerator:

25 # construct the image generator for data augmentation then

26 # initialize the total number of images generated thus far

27 aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1,

28 height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,

29 horizontal_flip=True, fill_mode="nearest")

30 total = 0

The

ImageDataGenerator

class has a number of parameters, too many to enumerate in

this book. For a full review of the parameters, please refer to the ofﬁcial Keras documentation

(http://pyimg.co/j8ad8).

16 Chapter 2. Data Augmentation

Instead, we’ll be focusing on the augmentation parameters you will most likely use in your

own applications. The

rotation_range

parameter controls the degree range of the random

rotations. Here we’ll allow our input image to be randomly rotated

±30

degrees. Both the

width_shift_range

and

height_shift_range

are used for horizontal and vertical shifts, re-

spectively. The parameter value is a fraction of the given dimension, in this case, 10%.

The

shear_range

controls the angle in counterclockwise direction as radians in which our

image will allowed to be sheared. We then have the

zoom_range

, a ﬂoating point value that allows

the image to be “zoomed in” or “zoomed out” according to the following uniform distribution of

values: [1 - zoom_range, 1 + zoom_range].

Finally, the horizontal_flip boolean controls whether or not a given input is allowed to be

ﬂipped horizontally during the training process. For most computer vision applications a horizontal

ﬂip of an image does not change the resulting class label – but there are applications where a

horizontal (or vertical) ﬂip does change the semantic meaning of the image. Take care when

applying this type of data augmentation as our goal is to slightly modify the input image, thereby

generating a new training sample, without changing the class label itself. For a more detailed review

of image transformations, please refer to Module #1 in PyImageSearch Gurus ([8], PyImageSearch

Gurus) as well as Szeliski [9].

Once ImageDataGenerator is initialized, we can actually generate new training examples:

32 # construct the actual Python generator

33 print("[INFO] generating images...")

34 imageGen = aug.flow(image, batch_size=1, save_to_dir=args["output"],

35 save_prefix=args["prefix"], save_format="jpg")

37 # loop over examples from our image data augmentation generator

38 for image in imageGen:

39 # increment our counter

40 total += 1

42 # if we have reached 10 examples, break from the loop

43 if total == 10:

44 break

Lines 34 and 35

initialize a Python generator used to construct our augmented images. We’ll

pass in our input

image

, a

batch_size

(since we are only augmenting one image), along with

a few additional parameters to specify the output image ﬁle paths, the preﬁx for each ﬁle path,

and the image ﬁle format.

Line 38

then starts looping over each

image

in the

imageGen

generator.

Internally,

imageGen

is automatically generating a new training sample each time one is requested

via the loop. We then increment the total number of data augmentation examples written to disk

and stop the script from executing once we’ve reached ten examples.

To visualize data augmentation in action, we’ll be using Figure 2.2 (left), an image of Jemma,

my family beagle. To generate new training example images of Jemma, just execute the following

command:

$ python augmentation_demo.py --image jemma.png --output output

After the script executes you should see ten images in the output directory:

$ ls output/

image_0_1227.jpg image_0_2358.jpg image_0_4205.jpg image_0_4770.jpg

2.3 Comparing Training With and Without Data Augmentation 17

Figure 2.2:

Left:

The input image we are going to apply data augmentation to.

Right:

A montage

of data augmentation examples. Notice how each image has been randomly rotated, sheared,

zoomed, and horizontally ﬂipped.

image_0_1933.jpg image_0_2914.jpg image_0_4657.jpg image_0_6934.jpg

image_0_9197.jpg image_0_953.jpg

I have constructed a montage of each of these images so you can visualize them in Figure 2.2

(right). Notice how each image has been randomly rotated, sheared, zoomed, and horizontally

ﬂipped. In each case the image retains the original class label: dog; however, each image has been

modiﬁed slightly, thereby giving our neural network new patterns to learn from when training. Since

the input images will constantly be changing (while the class labels remain the same), it’s common

to see our training accuracy decrease when compared to training without data augmentation.

However, as we’ll ﬁnd out later in this chapter, data augmentation can help dramatically reduce

overﬁtting, all the while ensuring that our model generalizes better to new input samples. Further-

more, when working with datasets where we have too few examples to apply deep learning, we

can utilize data augmentation to generate additional training data, thereby reducing the amount of

hand-labeled data required to train a deep learning network.

2.3 Comparing Training With and Without Data Augmentation

In the ﬁrst part of this section, we’ll discuss the Flowers-17 dataset, a very small dataset (in terms

of deep learning for computer vision tasks), and how data augmentation can help us artiﬁcially

increase the size of this dataset by generating additional training samples. From there we’ll perform

two experiments:

1. Train MiniVGGNet on Flowers-17 without data augmentation.

2. Train MiniVGGNet on Flowers-17 with data augmentation.

As we’ll ﬁnd out, applying data augmentation dramatically reduces overﬁtting and allows

MiniVGGNet to obtain substantially higher classiﬁcation accuracy.

2.3.1 The Flowers-17 Dataset

The Flowers-17 dataset [10] is a ﬁne-grained classiﬁcation challenge where our task is to recognize

17 distinct species of ﬂowers. The image dataset is quite small, having only 80 images per class for

a total of 1,360 images. A general rule of thumb when applying deep learning to computer vision

tasks is to have 1,000-5,000 examples per class, so we are certainly at a huge deﬁcit here.

We call the Flowers-17 a ﬁne-grained classiﬁcation task because all categories are very similar

(i.e., species of ﬂower). In fact, we can think of each of these categories as subcategories. The

categories are certainly different, but share a signiﬁcant amount of common structure (e.x., petals,

18 Chapter 2. Data Augmentation

Figure 2.3: A sample of ﬁve (out of the seventeen total) classes in the Flowers-17 dataset where

each class represents a speciﬁc ﬂower species.

stamen, pistil, etc.). Fine-grained classiﬁcation tasks tend to be the most challenging for deep

learning practitioners as it implies that our machine learning models need to learn extremely discrim-

inating features to distinguish between classes that are very similar. This ﬁne-grained classiﬁcation

task becomes even more problematic given our limited training data.

2.3.2 Aspect-aware Preprocessing

Up until this point, we have only preprocessed images by resizing them to a ﬁxed size, ignoring the

aspect ratio. In some situations, especially for basic benchmark datasets, doing so is acceptable.

However, for more challenging datasets we should still seek to resize to a ﬁxed size, but

maintain the aspect ratio. To visualize this action, consider Figure 2.4.

On the left, we have an input image that we need to resize to a ﬁxed width and height. Ignoring

the aspect ratio, we resize the image to

256 × 256

pixels (middle), effectively squishing and

distorting the image such that it meets our desired dimensions. A better approach would be to

take into account the aspect ratio of the image (right) where we ﬁrst resize along the shorter

dimension such that the width is 256 pixels and then crop the image along the height, such that the

height is 256 pixels.

While we have effectively discarded part of the image during the crop, we have also maintained

the original aspect ratio of the image. Maintaining a consistent aspect ratio allows our Convolutional

Neural Network to learn more discriminative, consistent features. This is a common technique that

we’ll be applying when working with more advanced datasets throughout the rest of the Practitioner

Bundle and ImageNet Bundle.

To see how aspect-aware preprocessing is implemented, let’s update our

pyimagesearch

project structure to include a AspectAwarePreprocessor:

--- pyimagesearch

| |--- __init__.py

| |--- callbacks

| |--- nn

| |--- preprocessing

| | |--- __init__.py

| | |--- aspectawarepreprocessor.py

剩余209页未读，继续阅读

身份认证购VIP最低享 7 折!

30元优惠券

EricAn

粉丝: 2870

深度学习计算机视觉：目标检测与数据增强实战

目标检测.pdf

目标检测YOLOv4共3个文档 1-原版论文pdf-2-中文翻译pdf-3-中英文翻译对照pdf

目标检测必看的六篇经典综述文章

基于codebook运动目标检测.zip资源matlab opencv运动目标检测程序资料

OpenCV-Android运动目标检测.zip资源matlab opencv运动目标检测程序资料

自适应背景更新opencv目标检测.zip资源matlab opencv运动目标检测程序资料

Opencv实现的运动目标检测算法.zip资源matlab opencv运动目标检测程序资料

OpenCV检测运动目标.zip资源matlab opencv运动目标检测程序资料

简单的运动目标目标检测，matlab源程序（推荐）.zip资源matlab opencv运动目标检测程序资料

基于高斯建模的运动目标检测，学习资料pdf格式(matlab).zip资源matlab opencv运动目标检测程序资料

最新资源