压缩双线性池化：提升视觉识别效率的新方法

需积分: 0 198 浏览量更新于2024-08-04 收藏 2.06MB PDF 举报

"本文介绍了2016年CVPR会议上发表的研究成果——压缩双线性池化(Compact Bilinear Pooling)。作者Yang Gao、Oscar Beijbom、Ning Zhang和Trevor Darrell来自加州大学伯克利分校的EECS部门和Snapchat公司。他们提出了一种新的双线性表示方法，能在保持识别性能的同时，显著降低特征维度，使其更适用于后续分析。通过新颖的核化分析，他们解释了双线性池化的鉴别能力，并为紧凑型池化方法的进一步研究提供了平台。实验表明，这种紧凑的双线性表示在各种视觉任务上表现优异，如语义分割、细粒度识别和人脸识别。" 正文：双线性模型在视觉任务中的应用越来越广泛，尤其是在语义分割、细粒度识别和人脸识别等领域，其表现卓越。然而，传统的双线性特征具有高维度的特点，通常达到数十万到数百万，这极大地限制了它们在实际系统中的应用，因为高维度特征不仅计算复杂，而且内存需求巨大。针对这一问题，Yang Gao等人提出了压缩双线性池化技术。这项技术的核心是创建两种具有与完整双线性表示相同鉴别能力，但维度却大幅度降低的紧凑型表示。这使得模型能够在保持高性能的同时，适应更高效的后处理分析，从而实现端到端优化的视觉识别系统。为了实现这一目标，研究人员进行了新颖的核化分析，深入理解了双线性池化的鉴别能力。通过这种分析，他们揭示了如何在较低维度下保留关键信息，同时减少冗余，从而设计出高效且紧凑的表示。这不仅为理解双线性池化的工作机制提供了理论依据，也为未来开发更高效、更紧凑的池化方法奠定了基础。实验结果证明，这种紧凑的双线性表示在各种视觉任务上展现出与全尺寸双线性特征相当甚至更好的性能。这意味着，在不牺牲识别精度的前提下，可以大大降低系统的计算负担和内存需求，这对于资源受限的设备（如移动设备或嵌入式系统）尤其重要。压缩双线性池化是双线性模型在实际应用中的一种重要优化，它为深度学习模型的轻量化和效率提升提供了新的可能。通过将复杂的双线性计算转化为更高效的表示形式，这一技术有望推动计算机视觉领域的发展，特别是在要求实时性和低功耗的场景中。

展开

Compact Bilinear Pooling

Yang Gao

, Oscar Beijbom

, Ning Zhang

2∗

, Trevor Darrell

1 †

EECS, UC Berkeley

Snapchat Inc.

{yg, obeijbom, trevor}@eecs.berkeley.edu {ning.zhang}@snapchat.com

Abstract

Bilinear models has been shown to achieve impressive

performance on a wide range of visual tasks, such as se-

mantic segmentation, ﬁne grained recognition and face

recognition. However, bilinear features are high dimen-

sional, typically on the order of hundreds of thousands to a

few million, which makes them impractical for subsequent

analysis. We propose two compact bilinear representations

with the same discriminative power as the full bilinear rep-

resentation but with only a few thousand dimensions. Our

compact representations allow back-propagation of classi-

ﬁcation errors enabling an end-to-end optimization of the

visual recognition system. The compact bilinear represen-

tations are derived through a novel kernelized analysis of

bilinear pooling which provide insights into the discrimina-

tive power of bilinear pooling, and a platform for further

research in compact pooling methods. Experimentation il-

lustrate the utility of the proposed representations for image

classiﬁcation and few-shot learning across several datasets.

1. Introduction

Encoding and pooling of visual features is an integral

part of semantic image analysis methods. Before the in-

ﬂuential 2012 paper of Krizhevsky et al. [17] rediscovering

the models pioneered by [19] and related efforts, such meth-

ods typically involved a series of independent steps: feature

extraction, encoding, pooling and classiﬁcation; each thor-

oughly investigated in numerous publications as the bag of

visual words (BoVW) framework. Notable contributions in-

clude HOG [9], and SIFT [24] descriptors, ﬁsher encod-

ing [26], bilinear pooling [3] and spatial pyramids [18],

each signiﬁcantly improving the recognition accuracy.

Recent results have showed that end-to-end back-

propagation of gradients in a convolutional neural network

∗

This work was done when Ning Zhang was in Berkeley.

†

Prof. Darrell was supported in part by DARPA; AFRL; DoD MURI

award N000141110688; NSF awards IIS-1212798, IIS-1427425, and IIS-

1536003, and the Berkeley Vision and Learning Center.

sketch 1

sketch 2

compact feature

activation

sum-pooled

compact feature

Figure 1: We propose a compact bilinear pooling method

for image classiﬁcation. Our pooling method is learned

through end-to-end back-propagation and enables a low-

dimensional but highly discriminative image representation.

Top pipeline shows the Tensor Sketch projection applied to

the activation at a single spatial location, with ∗ denoting

circular convolution. Bottom pipeline shows how to obtain

a global compact descriptor by sum pooling.

(CNN) enables joint optimization of the whole pipeline, re-

sulting in signiﬁcantly higher recognition accuracy. While

the distinction of the steps is less clear in a CNN than in a

BoVW pipeline, one can view the ﬁrst several convolutional

layers as a feature extractor and the later fully connected

layers as a pooling and encoding mechanism. This has been

explored recently in methods combining the feature extrac-

tion architecture of the CNN paradigm, with the pooling &

encoding steps from the BoVW paradigm [23, 8]. Notably,

Lin et al. recently replaced the fully connected layers with

bilinear pooling achieving remarkable improvements for

ﬁne-grained visual recognition [23]. However, their ﬁnal

representation is very high-dimensional; in their paper the

encoded feature dimension, d, is more than 250, 000. Such

representation is impractical for several reasons: (1) if used

with a standard one-vs-rest linear classiﬁer for k classes,

the number of model parameters becomes kd, which for

e.g. k = 1000 means > 250 million model parameters, (2)

for retrieval or deployment scenarios which require features

to be stored in a database, the storage becomes expensive;

storing a millions samples requires 2TB of storage at dou-

arXiv:1511.06062v2 [cs.CV] 12 Apr 2016

下载后可阅读完整内容，剩余9页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

大头蚊香蛙

粉丝: 22

压缩双线性池化：提升视觉识别效率的新方法

CompactBilinearPooling-Pytorch:紧凑型双线性池的Pytorch实现

compact bilinear pooling(tensorflow版本)

双线性池化（Bilinear pooling）tensorflow版

Kulkarni-ReconNet-Non-Iterative-Reconstruction-CVPR-2016-paper.docx

2016-26cvpr.pdf

2019-CVPR-A Simple Pooling-Based Design for Real-Time Salient Ob

saliency-2016-cvpr:浅层和深层卷积网络用于显着性预测

CVPR.2016.91.docx

AutoTrack-CVPR2020.rar

Nvidia jetson-inference Hello AI World Networks — FCN-Alexnet-SYNTHIA-CVPR16.zip

最新资源