深度量化神经网络：效率与精度白皮书概要

神经网络

需积分: 13 26 浏览量更新于2024-07-16 收藏 858KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

本篇量化神经网络白皮书主要探讨了如何通过技术手段将深度卷积神经网络（Deep Convolutional Neural Networks, DCNNs）转换为在推理阶段使用整数权重和激活的高效模型。作者Raghuraman Krishnamoorthi针对该主题进行了深入研究，概述了不同的量化方法和策略。首先，章节2（Quantizer Design）介绍了几种量化器的设计。包括： 1. **均匀线性量化器（Uniform Affine Quantizer）**：这种量化器将连续值范围映射到离散的整数值，采用固定的间距进行量化，适用于对精度要求不高的场景。 2. **均匀对称量化器（Uniform Symmetric Quantizer）**：与线性量化器类似，但中心化于零，对于输入值的正负分布都有较好的处理能力。 3. **随机量化器（Stochastic Quantizer）**：引入随机因素来平衡精度和计算效率，适用于对性能稳定性有一定要求的应用。 4. **模拟量化误差在反向传播中的建模（Modeling simulated quantization in the backward pass）**：为了在训练过程中考虑到量化带来的影响，需要在梯度计算时考虑量化误差。 5. **确定量化参数（Determining Quantizer parameters）**：涉及选择合适的量化步长、最小值和最大值，以优化模型性能和内存占用。 6. **量化粒度（Granularity of quantization）**：指量化过程中的精度级别，不同的粒度会影响模型的精度和计算复杂度。章节3（Quantized Inference: Performance and Accuracy）着重分析了量化后的模型在推理阶段的表现。内容包括： - **后训练量化（Post-Training Quantization）**：在模型训练完成后，直接将浮点权重和激活转换为整数，分为仅量化权重和同时量化权重和激活两种方式。 - **量化感知训练（Quantization-Aware Training）**：一种更彻底的方法，通过操作变换、批标准化等手段让模型在训练过程中适应量化过程，从而提升精度。 - **操作变换（Operation Transformations for Quantization）**：调整卷积和激活函数等操作以适应低精度计算。 - **批标准化（Batch Normalization）**：标准化层需要特别处理，因为它们依赖于浮点运算。 - **实验结果**：展示了不同方法在精度和速度上的权衡。章节4（Training Best Practices）提供了训练时的优化建议，确保模型在量化后的性能保持在可接受范围内。章节5和6讨论了模型架构和运行时测量，包括推荐的模型结构设计以及实际执行时的性能评估。最后，章节7和8总结了论文的主要结论和未来的研究方向，以及对人工智能领域特别是批标准化对量化影响的讨论。这篇白皮书提供了深度理解如何通过量化技术提升神经网络在设备上执行效率的全面指南，对开发和部署高效、低功耗的深度学习模型具有重要参考价值。

资源详情

资源推荐

Figure 1: Simulated Quantizer (top), showing the quantization of output values. Ap-

proximation for purposes of derivative calculation (bottom).

2.5 Determining Quantizer parameters

The quantizer parameters can be determined using several criteria. For example, Ten-

sorRT [11] minimizes the KL divergence between the original and quantized distribu-

tions to determine the step size. In this work, we adopt simpler methods. For weights,

we use the actual minimum and maximum values to determine the quantizer parame-

ters. For activations, we use the moving average of the minimum and maximum values

across batches to determine the quantizer parameters. For post training quantization

approaches, one can improve the accuracy of quantized models by careful selection of

quantizer parameters.

2.6 Granularity of quantization

We can specify a single quantizer (deﬁned by the scale and zero-point) for an entire

tensor, referred to as per-layer quantization. Improved accuracy can be obtained by

adapting the quantizer parameters to each kernel within the tensor [17]. For example,

the weight tensor is 4 dimensional and is a collection of 3 dimensional convolutional

kernels, each responsible for producing one output feature map. per-channel quantiza-

tion has a different scale and offset for each convolutional kernel. We do not consider

per-channel quantization for activations as this would complicate the inner product

computations at the core of conv and matmul operations. Both per-layer and per-

channel quantization allow for efﬁcient dot product and convolution implementation

as the quantizer parameters are ﬁxed per kernel in both cases.

3 Quantized Inference: Performance and Accuracy

Quantizing a model can provide multiple beneﬁts as discussed in section 1. We discuss

multiple approaches for model quantization and show the performance impact for each

of these approaches.

3.1 Post Training Quantization

In many cases, it is desirable to reduce the model size by compressing weights and/or

quantize both weights and activations for faster inference, without requiring to re-train

the model. Post Training quantization techniques are simpler to use and allow for

quantization with limited data. In this section, we study different quantization schemes

for weight only quantization and for quantization of both weights and activations. We

show that per-channel quantization with asymmetric ranges produces accuracies close

to ﬂoating point across a wide range of networks.

3.1.1 Weight only quantization

A simple approach is to only reduce the precision of the weights of the network to 8-

bits from ﬂoat. Since only the weights are quantized, this can be done without requiring

any validation data (See ﬁgure 2). A simple command line tool can convert the weights

from ﬂoat to 8-bit precision. This setup is useful if one only wants to reduce the model

剩余35页未读，继续阅读

zhizhang_xia

粉丝: 0
资源: 1

深度量化神经网络：效率与精度白皮书概要

2023量化科技白皮书.pdf

QTF量化科技嘉年华组委会：2023中国量化科技白皮书.pdf

商用密码应用安全性评估量化评估规则.pdf

量化科技白皮书2023 pdf

什么是量化神经网络？

神经网络模型压缩之量化神经网络模型压缩之量化

量化神经网络和单层前馈神经网络的关系

学习向量量化神经网络模型

极端量化神经网络的鲁棒性

matlab r2016a神经网络设计应用27例

python金融分析与量化交易实战 pan.baidu.com

量化对神经网络性能的影响

多因子量化选股系列之二:中证500指数增强策略.pdf

研发和测试人员的绩效考核量化方法.pdf

适合对影像位深进行降位的卷积神经网络类型

某著名制造企业kpi指标体系.pdf

神经网络量化指标是什么？

神经网络模型编译面试题

在神经网络里怎么进行数据量化

如何对神经网络进行量化

最新资源