SynergyFlow：弹性架构加速大规模深度学习批处理

研究论文

139 浏览量更新于2024-07-14 收藏 1.91MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"SynergyFlow是一种弹性加速器架构，专门设计用于支持大规模深度神经网络的批处理。该架构旨在解决在处理包含卷积层和全连接层的多样化神经网络模型时，由于计算和内存带宽需求差异导致的性能下降问题。" 深度神经网络（DNNs）已经在众多应用领域取得了显著的成功，但其计算密集型和内存密集型特性对硬件资源提出了高要求。为了提高性能和能效，加速器解决方案成为了研究的焦点。然而，现有的加速器方案在处理整个网络模型时，特别是在卷积层与全连接层之间，以及不同NN模型之间的计算需求和内存带宽需求差异上，可能会遇到性能瓶颈。 SynergyFlow弹性加速器架构的创新之处在于它内建了层次级（layer-level）和模型级（model-level）并行性支持。这种设计允许架构根据不同的层类型和模型动态调整资源分配，以适应不同计算和内存需求。通过这种方式，SynergyFlow能够有效地平衡计算资源和内存带宽，确保在整个网络模型的执行过程中保持高性能。具体来说，SynergyFlow可能包括以下几个关键组件和机制： 1. **弹性资源调度**：系统能够根据当前处理的层的特性，如卷积层的滤波器大小或全连接层的节点数量，动态调整计算单元的数量和内存带宽。 2. **并行处理单元**：设计有多种类型的处理单元，分别针对卷积层和全连接层优化，以实现最佳计算效率。 3. **高效内存管理**：通过智能缓存策略和数据重用机制，减少内存访问延迟，提高整体吞吐量。 4. **批处理优化**：考虑到深度学习训练通常采用批处理方式，SynergyFlow可能还包括优化批处理流水线的机制，以最大化加速器的利用率。 5. **灵活的互连网络**：内部的高速互连网络允许数据在不同处理单元间快速传输，以支持并行计算。 6. **自适应控制逻辑**：控制系统能够实时监控和调整资源分配，以应对不断变化的工作负载需求。通过这些特性，SynergyFlow不仅提高了对大型DNN模型的处理能力，还降低了性能波动，确保了在各种工作负载下的稳定性和效率。这一研究对于推动深度学习硬件加速器的发展具有重要意义，为未来数据中心和边缘计算环境中的大规模神经网络训练和推理提供了新的设计思路。

资源详情

资源推荐

8:6 J. Li et al.

Fig. 4. Complexity analysis on VGG11 (the largest value of each item is set as the baseline).

Fig. 5. Timing graph of NN accelerator architectures.

that CONV layers can do several orders of magnitudes more computing operations on each mem-

ory access than FC layers. The architectural implications can be reached from two perspectives:

1) Memory-centric perspective: If CONV and FC layers are fed with the same memory bandwidth,

then to saturate the memory bandwidth, the computing capacity (e.g., the number of PEs in the

computing engine) will be determined by CONV because its CTC ratio is much larger than that

of FC. In such case, eciency loss is inevitable because the memory-intensive FC does not have

enough computations to saturate the computing capacity designed for the computing-intensive

CONV.

2) Computing-centric perspective: The CONV layers and FC layers are served with the same

computing capacity; then, to make full use of the capacity, the memory bandwidth will be deter-

mined by memory-intensive FC layers indicated by the much lower CTC ratio. Under the same

computation volume, FC layers need much larger data volume than CONV layers. In such case,

eciency loss is also inevitable since CONV layers do not have enough memory access to saturate

the memory bandwidth designed for FC layers.

In the following discussion, we will use the computing-centric perspective to uncover the limi-

tations of conventional designs.

The eciency loss cannot be avoided under the conventional architectures largely due to the

intrinsically monolithic nature; i.e., the core for CONV and FC is the same one and cannot be

spatially separated. The principle of “one t to all” is challenged in this scenario. Figure 5(a) further

demonstrates the limitation of monolithic architecture where a single core processes CONV and

FC layers sequentially. The timing diagram consists of the states of two components, core and

MEM, indicating the utilization of computation resource and memory bandwidth, respectively.

The “idle” state indicates that the core or MEM is idle. The busy state of the core or MEM can be

ACM Transactions on Design Automation of Electronic Systems, Vol. 24, No. 1, Article 8. Pub. date: December 2018.

剩余26页未读，继续阅读

weixin_38610277

粉丝: 8
资源: 906

SynergyFlow：弹性架构加速大规模深度学习批处理

【高创新】基于鲸鱼优化算法WOA-Transformer-LSTM实现故障识别Matlab实现.rar

《冯唐成事心法》学习笔记01：逆境来，了怎么办？

c语言课程设计-职工资源管理系统.7z

VB个人邮件处理系统(源代码+系统).zip

java基于ssm+jsp咖啡馆管理系统源码 带毕业论文

【高创新】基于雾凇优化算法RIME-Transformer-BiLSTM实现故障识别Matlab实现.rar

vb+access大气污染模型(系统+翻译+论文+开题).zip

【高创新】基于多元宇宙优化算法MVO-Transformer-BiLSTM实现故障识别Matlab实现.rar

Linux_运维脚本、GUI安装以及学习文档_ansible-linux.zip

Go语言介绍（十五）--Go 语言range.docx

Palo Alto Networks PA-220 管理员指导手册

VB+ACCESSVCD租借管理系统(系统+论文+需要分析).zip

三菱PLC 结构化编程Q系列整条生产线大型项目 两台Q PLC 6台触摸屏 以太网通讯 机器人控制 mes ccd RS232串

4-3_Business_BLUE_2017_01.potx

数据分析-Game-Data-Mining-With-R.zip

java基于ssm+vue校园二手交易平台系统源码 带毕业论文+PPT

vb+access学籍管理系统(系统+论文).zip

基于变分多谐波对偶模式追踪从噪声信号中提取重复瞬态分量的方法附Matlab代码.rar

4-3_Consumer_BLUE_2017_01.potx

java基于ssm+vue贫困认定管理平台系统源码 带毕业论文

最新资源

java基于ssm+jsp咖啡馆管理系统源码带毕业论文

三菱PLC 结构化编程Q系列整条生产线大型项目两台Q PLC 6台触摸屏以太网通讯机器人控制 mes ccd RS232串

java基于ssm+vue校园二手交易平台系统源码带毕业论文+PPT

java基于ssm+vue贫困认定管理平台系统源码带毕业论文