AdaPT：深度神经网络的动态定点量化训练

版权申诉

PDF格式 | 1.35MB | 更新于2024-07-06 | 54 浏览量 | 举报

“自适应精度训练（AdaPT）是一种动态固定点量化训练方法，旨在优化深度神经网络（DNNs）的训练和推理时间。该方法特别适用于资源受限环境或对推理速度有严格要求的应用。” 在计算机科学（cs）领域，深度学习模型的计算量和内存需求巨大，特别是在训练和推理阶段。为了降低这些需求，量化技术应运而生。量化通过减少数据表示的精度，比如将浮点数转换为定点数，来减少存储和计算开销，这对于在嵌入式设备或边缘计算场景中部署DNNs至关重要。当前最先进的（SOTA）量化方法主要集中在后训练量化，即对已经训练好的DNN进行量化以加速推理过程。然而，这种方法可能无法充分利用硬件资源，因为在训练过程中并未考虑量化的影响。虽然有一些研究涉及量化训练，但大多数方法在最后的训练阶段仍然需要全精度（通常是单精度）的微调，或者强制在整个DNN中使用全局的位宽，这可能导致层间位宽分配不理想，进而影响资源利用率和模型性能。为了解决这些问题，研究人员提出了AdaPT，一种新的固定点量化稀疏训练策略。AdaPT的核心是根据信息论条件动态决定在不同训练周期之间切换精度。目标是根据每个层的需求来确定最佳的位宽，这样可以更有效地分配资源，同时保持模型的准确性。这种自适应的方法使得模型在训练过程中能够灵活调整其精度，以适应不同的计算和存储需求，从而达到优化训练效率和模型性能的目的。 AdaPT的实施可能涉及到以下几个关键步骤： 1. 分析每个层的训练状态和信息熵，以判断何时应该提高或降低精度。 2. 设计适当的精度切换策略，如基于学习率、损失函数变化或其他训练指标的阈值。 3. 实现位宽动态调整机制，确保在低精度训练时模型仍能收敛并保持稳定。 4. 结合稀疏性训练，进一步减少计算和存储需求，可能通过正则化技术或剪枝策略实现。 5. 对模型的性能进行持续监控，确保在精度和速度之间找到最佳平衡。 AdaPT是量化训练领域的创新尝试，它通过动态调整精度来优化DNN的训练过程，有望在资源受限的环境中提供更快的训练速度和更高的能效比，同时保持模型的预测精度。这种技术对于推动深度学习在移动设备和物联网等领域的应用具有重要意义。

decreasing the precision of the network on a per-layer

basis. Weights and activations are quantized to the

lowest bit-width possible without information loss ac-

cording to our heuristic and a certain degree of spar-

sity is induced while at the same time the bit-width

is kept high enough for further learning steps to suc-

ceed. This results in a network which has advantages

in terms of model size and time for training and infer-

ence. By training AlexNet and ResNet20 on the CI-

FAR10/100 datasets, we demonstrate on the basis of

an analytical model for the computational cost that

in comparison to a ﬂoat32-baseline AdaPT is compet-

itive in terms of accuracy and produces a non-trivial

reduction of computational costs (speedup). Com-

pared to MuPPET, AdaPT also has certain intrinsic

methodological advantages. After AdaPT training,

the model is fully quantized and sparsiﬁed to a cer-

tain degree s.t., unlike the case with MuPPET, which

outputs a ﬂoat32 model, AdaPT carries over it’s ad-

vantages to the inference phase as well.

2 Background

2.1 Quantization

Numerical representation describes how numbers are

stored in memory (illustrated by ﬁg. 1) and how

arithmetic operations on those numbers are con-

ducted. Commonly available on consumer hardware

are ﬂoating-point and integer representations while

ﬁxed-point or block-ﬂoating-point representations are

used in high-performance ASICs or FPGAs. The nu-

merical precision used by a given numerical represen-

tation refers to the amounts of bits allocated for the

representation of a single number, e.g. a real num-

ber stored in ﬂoat32 refers to ﬂoating-point repre-

sentation in 32-bit precision. With these deﬁnitions

of numerical representation and precision in mind,

most generally speaking, quantization is the concept

of running a computation or parts of a computation

at reduced numerical precision or a diﬀerent numeri-

cal representation with the intent of reducing compu-

tational costs and memory consumption. Quantized

execution of a computation however can lead to the

introduction of an error either through the quantized

representations the machine epsilon 

mach

being too

large (underﬂow) to accurately depict resulting real

values or the representable range being too small to

store the result (overﬂow).

Floating-Point Quantization The value v of a

ﬂoating point number is given by v =

p−1

× b

where s is the signiﬁcand (mantissa), p is the pre-

cision (number of digits in s), b is the base and

e is the exponent [45]. Hence quantization using

ﬂoating-point representation can be achieved by re-

ducing the number of bits available for mantissa and

exponent, e.g. switching from a ﬂoat32 to a ﬂoat16

representation, and is oﬀered out of the box by com-

mon machine learning frameworks for post-training

quantization[46, 47].

Integer Quantization Integer representation is

available for post-training quantization and QAT

(int8, int16 due to availability on consumer hard-

ware) in common machine learning frameworks [46,

47]. Quantized training however is not supported

due to integer quantized activations being not

meaningfully diﬀerentiable, making standard back-

propagation inapplicable [33]. Special cases of integer

quantization are 1-bit and 2-bit quantization, which

are often referred to as binary and ternary quantiza-

tion in literature.

Block-Floating-Point Quantization Block-

ﬂoating-point represents each number as a pair

of W L (word length) bit signed integer x and a

scale factor s s.t. the value v is represented as

v = x × b

−s

with base b = 2 or b = 10. The scaling

factor s is shared across multiple variables (blocks),

hence the name block-ﬂoating point, and is typically

determined s.t. the modulus of the larges element is

∈ [

, 1] [48]. Block-ﬂoating-point arithmetic is used

in cases where variables cannot be expressed with

suﬃcient accuracy on native ﬁxed-point hardware.

Fixed-Point Quantization Fixed-point numbers

have a ﬁxed number of decimal digits assigned and

hence every computation must be framed s.t. the

剩余18页未读，继续阅读

身份认证购VIP最低享 7 折!

30元优惠券

易小侠

粉丝: 6650

AdaPT：深度神经网络的动态定点量化训练

自适应滑模控制原创研究与T4+ADAPT应用分析

图像文本对齐新方法ADAPT：自适应跨模态嵌入技术

北京航空航天大学多模型自适应控制研究

ACM.rar_ACM adaptive_FUZZY LOGIC OFDM_adaptive modulation_adapt

matlab_1.rar_adapt pid matlab_adaptive controller_self pid_单神经元P

sliding-mode-control.zip_I&I_OV3X_adaptive sliding_滑模控制_自适应滑模

ws_adapt_reactor_abaquspython_ABAQUS_自适应网格_

adapt-control7.zip_模型 自适应_自适应_自适应 控制_自适应控制

GNC-and-ADAPT:渐进式非凸（GNC）和自适应修整（ADAPT）算法用于异常鲁棒估计

adapt_mid_filter.zip_mid_filter_中值滤波_滤波_自适应中值滤波

最新资源

adapt-control7.zip_模型自适应_自适应_自适应控制_自适应控制