探索计算网络与CNTK工具箱：统一学习模型框架详解

139 浏览量更新于2024-07-14 收藏 1.43MB PDF 举报

《计算网络与计算网络工具包介绍》（MSR-TR-2014-112, 草稿版本1.0，发布日期：2016年1月21日）是一篇由一系列杰出的AI和机器学习专家共同编写的论文，包括Amit Agarwal、Eldar Akchurin等人。本文旨在提供一个统一的框架——计算网络（Computational Network, CN），来描述各种复杂的机器学习模型，如深度神经网络（Deep Neural Networks, DNNs）、卷积神经网络（Convolutional Neural Networks, CNNs）、循环神经网络（Recurrent Neural Networks, RNNs）、长短期记忆网络（Long Short-Term Memory, LSTM）以及逻辑回归和最大熵模型等。计算网络被视为一个有向图，其中每个叶子节点代表输入值或参数，而非叶子节点则表示一系列计算步骤，这些步骤组合起来构成了机器学习算法的核心逻辑。通过这种结构，作者们希望能够提供一种直观且易于理解的方式来展示不同类型的模型如何通过序列化的计算过程进行信息处理和学习。这种方法不仅有助于提高模型的可解释性，也有助于跨模型间的比较和优化。论文涵盖了对CN理论基础的深入探讨，包括但不限于网络架构的设计原则、训练方法、优化策略以及如何在实际应用中实现性能优化。此外，作者们还可能讨论了计算网络工具包（Computational Network Toolkit, CNTK）的开发背景和设计理念，它是Microsoft为支持这些计算网络模型而构建的一套开源软件工具，旨在简化模型开发、部署和实验过程。在文中，读者可以期待找到关于模型选择、超参数调整、并行计算、数据流图表示以及如何利用CN来解决实际问题（如图像识别、自然语言处理等）的实用指导。这篇论文是对现代机器学习领域中计算网络概念的权威指南，对于研究者、工程师以及对深度学习技术感兴趣的读者来说，它是一份不可或缺的参考资料。

CHAPTER 1. INTRODUCTION 15

Figure 1.5: Training error rate as a function of epoch number for the TIMIT speech

example.

On a 2.2GHz Xeon processor the training takes 167 minutes without a GPU,

and 61 minutes with a low-end Quadro 4000 GPU.

CHAPTER 2. COMPUTATIONAL NETWORK 19

2.2 Forward Computation

When the model parameters (i.e., weight nodes in Figure 2.1) are known, we can

compute the value of any node given the new input values. Unlike in the DNN case,

where the computation order can be trivially determined as layer-by-layer compu-

tation from bottom up, in CN different network structure comes with a different

computation order. When the CN is a directed acyclic graph (DAG) the computa-

tion order can be determined with a depth-ﬁrst traverse over the DAG. Note that in

a DAG there is no directed cycles (i.e., no recurrent loop). However, there might

be loops if we don’t consider edge directions, for which Figure 2.3 is an example.

This is because the same computation node may be a child of several other nodes.

Algorithm 2.2. determines the computation order of a DAG and takes care of this

condition. Once the order is decided, it will remain the same for all the subsequent

runs, regardless of the computational environment. In other words, this algorithm

only needs to be executed per output node and then cache the computation order.

Following the order determined by Algorithm 2.2, the forward computation of the

CN is carried out synchronously. The computation of the next node starts only after

the computation of the previous node has ﬁnished. It is suitable for environments

where single computing device, such as one GPGPU or one CPU host, is used, or

the CN itself is inherently sequential, e.g., when the CN represents a DNN.

Algorithm 2.2 Synchronous forward computation of a CN. The computation order

is determined by a depth-ﬁrst traverse over the DAG.

1: procedure DECIDEFORWARDCOMPUTATIONORDER(root, visited, order)

 Enumerate nodes in the DAG in the depth-ﬁrst order.

 visited is initialized as an empty set. order is initialized as an empty

queue

2: if root /∈ visited then  the same node may be a child of several nodes.

3: visited ← visited ∪ root

4: for each c ∈ root.children do  apply to children recursively

5: call DECIDEFORWARDCOMPUTATIONORDER(c, visited, order)

6: end for

7: order ← order + root  Add root to the end of order

8: end if

9: end procedure

The forward computation can also be carried out asynchronously with which

the order of the computation is determined dynamically. This can be helpful when

the CN has many parallel branches and there are more than one computing device

to compute these branches in parallel. Algorithm 2.3 shows an algorithm that car-

剩余155页未读，继续阅读

weixin_38500734

粉丝: 6
资源: 957

探索计算网络与CNTK工具箱：统一学习模型框架详解

An Introduction to Computational Fluid Dynamics

An Introduction to Computer Networks

Computational electromagnetics: the finite-difference time-domain method

an introduction to computational fluid dynamics 2nd

matlab fvm,The Finite Volume Method in Computational Fluid Dynamics: An Advanced Introduction with O...

introduction to computational molecular biology pdf下载

Very Deep Convolutional Networks for Large-Scale Image Recognition" by Karen Simonyan and Andrew Zisserman (2014)

找几篇关于深度学习轴承监测诊断的案例、优缺点分析的近5年的文献

Local-to-Global Self-Attention in Vision Transformers

最新资源