深度学习驱动的尺度不变图像分类框架

50 浏览量更新于2024-08-26 收藏 498KB PDF 举报

“深度学习的图像分类尺度不变框架” 在计算机视觉领域，图像分类是一项关键任务，而尺度不变性是确保模型在处理不同尺寸或分辨率的图像时能够保持高性能的关键特性。这篇研究论文“深度学习的图像分类尺度不变框架”由Yalong Jiang和Zheru Chi提出，他们来自香港理工大学电子与信息工程系以及深圳理工大学研究院。该论文针对当前深度学习模型（如卷积神经网络CNNs）在处理尺度变化图像时面临的挑战提出了一个新颖的解决方案。传统的解决方法通常有两种：一是集成多个针对特定尺度变化的CNNs，二是通过数据增强来增加训练样本的多样性。然而，这些方法并未从根本上解决CNNs对同一图像不同变体产生不同特征表示的问题。论文中的新框架旨在创建一种统一的表示方法，使同一类图像的不同变体能够在特征空间中被有效地区分开来。该框架的核心在于结合尺度变异性特征和尺度不变性特征，通过拼接这两种特征来扩大特征空间。这允许模型能够捕捉到图像的各种尺度信息，同时保持对基本类别的一致识别能力。通过这种方式，即使输入图像在尺寸和分辨率上有显著差异，模型也能正确地将它们归类到同一类别中，从而提高了分类的准确性和鲁棒性。此外，该框架可能包括对CNN架构的改进，例如引入多尺度检测层或者利用金字塔结构来处理不同大小的输入。它还可能涉及优化训练策略，例如动态调整学习率或使用特定的损失函数来强化尺度不变性。这篇论文的工作对于深度学习在图像分类领域的应用具有重要意义，尤其是在应对真实世界场景中广泛存在的尺度变化问题。通过构建尺度不变框架，可以提升模型在各种条件下的泛化能力，这对于自动驾驶、监控系统、医学影像分析等应用具有深远的影响。

A Scale-Invariant Framework For Image

Classification With Deep Learning

Yalong Jiang

, Zheru Chi

1,2

1. Department of Electronic and Information Engineering, the Hong Kong Polytechnic University, Hong Kong SAR, China

2. PolyU Shenzhen Research Institute, Shenzhen, China

yalong.jiang@connect.polyu.hk, chi.zheru@polyu.edu.hk

Abstract—In this paper, we propose a scale-invariant

framework based on Convolutional Neural Networks (CNNs).

The network exhibits robustness to scale and resolution

variations in data. Previous efforts in achieving scale invariance

were made on either integrating several variant-specific CNNs or

data augmentation. However, these methods did not solve the

fundamental problem that CNNs develop different feature

representations for the variants of the same image. The topology

proposed by this paper develops a uniform representation for

each of the variants of the same image. The uniformity is

acquired by concatenating scale-variant and scale-invariant

features to enlarge the feature space so that the case when input

images are of diverse variations but from the same class can be

distinguished from another case when images are of different

classes. Higher-order decision boundaries lead to the success of

the framework. Experimental results on a challenging dataset

substantiates that our framework performs better than

traditional frameworks with the same number of free parameters.

Our proposed framework can also achieve a higher training

efficiency.

Keywords—convolutional neural networks; robustness to scale

variations; scale invariance; higher-order decision boundaries

I. INTRODUCTION

The advantage of Convolutional Neural Networks (CNNs)

over traditional machine learning techniques lies in that CNNs

[1] can approximate any function [2]. With stochastic gradient

descent algorithm, CNNs can develop effective

representations to describe input data. The representations can

be formed to face challenges in a wide range of tasks [3] and

[4].

Strong as the expressiveness of CNNs is, the pure reliance

on local patterns still hampers their performance, especially in

the cases where input images suffer from variations, such as

scaling [5], deformations, translations, etc. These variations can

cause misclassifications in some critical tasks, such as art

attribution [6]-[10]. This is because the task-relevant clues,

such as textures, change as scale varies.

Current state-of-the-art algorithms dealing with

variations are mostly based on model averaging, such as in

[1][8][9][15], several different CNNs form an ensemble with

each CNN being associated with one scale. Although these

algorithms are effective to some extent, they have the

following limitations:

• Model averaging cannot improve the flexibility of CNNs

and it still relies on local patterns which are scale-variant.

• The CNNs in model averaging are independent. For each

specific input, only one CNN can perform well, other

CNNs may harm the overall performance. This is because

model averaging cannot integrate scale-invariant features

with scale-variant features optimally.

• Traditional frameworks can only perform well when test

images are of the same scale as training images. They

cannot generalize well beyond training data.

• Training cost is high because several CNNs need to be

trained separately.

The cause of the above limitations is that CNNs in

traditional frameworks cannot develop complete feature

representations covering scale-invariant and scale-variant

features. For example, the CNNs [18] tuned to smaller images

tend to rely on scale-invariant features describing contours

while the CNNs tuned to larger images tend to rely on scale-

variant features describing textures. Neither of the above

CNNs develop both types of features.

This problem cannot be solved by scale jittering because

when the CNN is firstly exposed to one scale during training

and testing and then exposed to two scales [19] during training

and testing, its performance will drop. This corresponds to the

problem in statistical learning that if a method is not flexible

enough to model the complex variations in data, bias will

increase. Only by increasing the model’s flexibility can we

reduce the bias and achieve robustness to variations.

We propose to solve the problem by concatenating features

describing local textures with features describing contours.

This increases the model’s flexibility and a robust feature

description including global and local properties can be

generated. The major difference between our framework and

current popular algorithms (such as model averaging [15]) lies

in that the proposed framework focuses on the scale-variant

and scale-invariant features from input images, the feature

representation in the proposed framework is robust to scale

variances. In comparison, each of the CNNs in model

averaging [15] is over-fitted to one scale variant and cannot

tackle images of other scales. Our framework first decomposes

input images into scale-invariant and scale-variant parts using

the preprocessing algorithm addressed in Section II, then feed

each part to one branch in the framework. Each branch in our

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

Banff Center, Banff, Canada, October 5-8, 2017

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38652196

粉丝: 2
资源: 939

深度学习驱动的尺度不变图像分类框架

基于深度学习的图像增强技术综述.pdf

基于深度学习和迁移学习的水果图像分类.pdf

基于图像不变特征深度学习的交通标志分类.pdf

适合细粒度图像分类的方法，以及能提升细粒度图像分类精度的具体方法，以及如何进行数据处理

如何结合图像处理和深度学习技术构建服装款式系统，并实现风格的量化分析和数据库管理？

如何利用Python实现U-Net模型的多尺度训练，以提高生物医学图像的语义分割精度？

注意力机制结合深度学习分类器

c++利用opencv进行图像识别

python图像识别入门到精通

最新资源