定点分解网络：降低计算复杂度与存储需求

21 浏览量更新于2024-08-26 收藏 211KB PDF 举报

"定点分解网络是针对深度神经网络计算复杂度高、资源消耗大的问题提出的一种新方法，旨在减少计算量和存储需求。该方法通过将预训练模型转换为只有-1、0和1权重的网络，大幅减少了乘积累运算(MACs)，从而在保持性能的同时降低了硬件资源的需求。在大规模ImageNet分类任务上进行了广泛的实验，证明了其有效性。" 定点分解网络（Fixed-point Factorized Networks, FFN）是一种针对深度学习领域的创新技术，特别关注于在嵌入式系统如智能手机上应用深度神经网络（DNN）时面临的计算密集和资源消耗问题。近年来，DNN在各种任务中表现出卓越的性能，成为计算机视觉领域最强大且广泛使用的工具之一。然而，这种高效性能的背后，是巨大的计算量和内存需求，这限制了它们在资源受限的设备上的应用。 FFN的提出旨在解决这一挑战，它通过一种新颖的网络结构设计，将预训练的DNN模型转化为固定点表示，即网络权重仅包含-1、0和1。这种简化极大地减少了乘积累运算（Multiply-Accumulate Operations, MACs），MACs是DNN中最主要的计算操作，也是资源消耗的主要来源。减少MACs数量可以显著降低硬件的计算负担，使得在资源有限的平台上运行深度学习模型成为可能。实验部分，FFN在ImageNet大规模图像分类任务上进行了验证。ImageNet是一个包含数百万张图像和上千类别的数据集，常用于评估深度学习模型的分类性能。结果显示，尽管FFN大幅压缩了网络，但其分类性能仍能保持在可接受的水平，表明这种方法在降低计算复杂性和存储需求的同时，未牺牲过多的模型准确性。此外，FFN的固定点表示还有利于硬件实现和优化。由于只需要处理三种可能的数值，可以设计出更高效的硬件加速器，进一步提高在嵌入式系统的运行速度。这对于物联网(IoT)、自动驾驶汽车等对实时性和功耗有严格要求的应用场景具有重要意义。定点分解网络提供了一种有效的策略，以降低深度神经网络的计算复杂度和资源需求，促进深度学习技术在嵌入式设备上的广泛应用。未来的研究可能会进一步探索如何优化FFN的设计，以达到更高的性能和更低的资源消耗，同时保持模型的泛化能力。

arXiv:1611.01972v2 [cs.CV] 29 Aug 2017

Fixed-point Factorized Networks

Peisong Wang

1,2

and Jian Cheng

1,2,3∗

Institute of Automation, Chinese Academy of Sciences

University of Chinese Academy of Sciences

Center for Excellence in Brain Science and Intelligence Technol o gy, CAS

{peisong.wang, jcheng}@nlpr.ia.ac.cn

Abstract

In recent years, Deep Neural Networks (DNN) based

methods have achieved remarkable performance in a wide

range of tasks and have b een among the most powerful

and widely used techniques in computer vision. However,

DNN-based methods are both co m putational-intensive and

resource-consuming, which hinders the application of these

methods on embedded systems like smart phones. To alle-

viate this problem, we introduce a novel Fixed-point Fac-

torized Networks (FFN) for pretrained models to reduce

the computational complexity as well as the storage re-

quirement of networks. The resulting networks have only

weights of -1, 0 and 1, which signiﬁcantly eliminate s the

most resource-consu ming multiply-accum ulate operations

(MACs). Extensive experiments on large-scale ImageNet

classiﬁcation task show the proposed FFN only requires

one-thousandth of multiply operations with comparable ac-

curacy.

1. Introduction

Deep neu ral networks (DNNs) have recently been set-

ting new state of the art perf ormance in many ﬁelds in-

cluding computer vision, speech recognitio n as well as nat-

ural lang uage processing . Convolutional neural networks

(CNNs), in par ticular, have outperforme d tra ditional ma-

chine learning algorith ms on co mputer vision tasks such

as image recognition, object detection, semantic segmenta-

tion as well as gestu re and action recognition. These break-

throughs a re partially due to the adde d computatio nal com-

plexity and the storage footprint, which makes these mod-

els very hard to train as well as to deploy. For example,

the Alexnet [20] involves 61M ﬂo ating point parameters

and 725M h igh precision multiply-accumulate operations

(MACs). Current DNNs are usually train ed ofﬂine by uti-

lizing specialize d hardware like NVIDIA GPUs and CPU

∗

The corresponding author.

clusters. But such an amount of computation may be unaf-

fordable for portable devices such as mobile phones, tablets

and wearable devices, which usually have limited comput-

ing reso urces. What’s mo re, the huge storage requirement

and large memory accesses may hinder efﬁcient hardware

implementation of neural networks, like FPGAs and neural

network orien te d chips.

To speed-up test-phase computation of deep models, lots

of matrix and tensor factorization based methods are inves-

tigated by the community rec e ntly [5, 15, 32, 21, 18, 30].

However, these methods commonly utilize full-precision

weights, which are hardware-unfriendly especially for em-

bedded systems. Moreover, the low compression ratios hin-

der the applications of th ese methods on mobile devices.

Fixed-point quantization can partially alleviate these two

problems mentioned above. There have been many stud -

ies working on reducing the storage and the computational

complexity of DNNs by quantizing the parameters of these

models. Some of these works [3, 6, 8, 22, 24] quantize

the pretrained weights using several bits (usua lly 3∼12

bits) with a minimal loss of performance. However, in

these kinds of quantized networks one still needs to employ

large numbers of multiply-ac cumulate operations. Others

[23, 1, 4, 2, 17, 12, 25] focus on training th ese networks

from scratch with b inary (+1 and -1) or ternary (+1, 0 and -

1) weights. These methods do not rely on pretrained models

and may reduce the c omputations at training stage as well

as testing stage. But on the other han d, these methods could

not make use of the pretrained models very efﬁciently due

to the dramatic info rmation loss during the binary or ternary

quantization of weights.

In this paper, we prop ose a uniﬁed framework called

Fixed-point Factorized Network (FFN) to simultaneously

accelerate and compress DNN models with only minor per-

formance degradation. Speciﬁcally, we propose to ﬁrst di-

rectly factorize the weight matrix usin g ﬁxed-point (+1, 0

and -1) representation followed by recovering the (pseudo)

full precision subma trices. We also propose an effective and

practical technique called weight balancing, which makes

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38603204

粉丝: 3
资源: 972

定点分解网络：降低计算复杂度与存储需求

基于FPGA的卷积神经网络定点加速.pdf

定点化FFT与IFFT

定点自动编码器的性能

可配置定点FFT处理器的设计

FPPCP:定点并行计算协议。 之所以这样做，是因为我需要在网络中的分布式计算集群中的GPU，FPGA和其他并行计算硬件上运行计算。 它可以用于科学和金融，但我（或我们）无法提供任何保证

医疗保险定点医疗机构检查表归纳.pdf

医疗保险定点医疗机构检查表知识.docx

工程质量目标分解.docx

C 代码 使用范德蒙德计算正交规则的权重 矩阵，假设已指定点.rar

C 代码 使用范德蒙德计算二维正交规则的权重 矩阵，假设已指定点.rar

最新资源

FPPCP:定点并行计算协议。之所以这样做，是因为我需要在网络中的分布式计算集群中的GPU，FPGA和其他并行计算硬件上运行计算。它可以用于科学和金融，但我（或我们）无法提供任何保证

C 代码使用范德蒙德计算正交规则的权重矩阵，假设已指定点.rar

C 代码使用范德蒙德计算二维正交规则的权重矩阵，假设已指定点.rar