深度神经网络构建实践指南

需积分: 9 39 浏览量更新于2024-09-10 收藏 768KB PDF 举报

"这篇文章是关于在构建深度神经网络时的一些实用建议，主要集中在感知、控制和认知领域。作者MaĴ Hand和Daniel R在他们的机器学习实验室积累了大量的训练小时数，并在此过程中学到了很多经验教训。文章基于TensorFlow的实践经验，分享了一些可能对初学者有帮助的技巧和注意事项。" 深度神经网络（Deep Neural Networks，DNN）已经成为现代人工智能领域的核心，尤其在图像识别（CNN）、深度学习（DL）等任务中表现出色。然而，实际构建和训练DNN的过程中，会遇到许多挑战。以下是作者MaĴ Hand和Daniel R基于他们的经验给出的一些实用建议： 1. 数据预处理：数据是深度学习的基础，良好的数据预处理能显著提升模型性能。这包括数据清洗、标准化、归一化以及可能的数据增强技术，如旋转、翻转、裁剪等。 2. 模型架构选择：根据任务需求选择合适的网络结构，如卷积神经网络（CNN）适用于图像处理，循环神经网络（RNN）适用于序列数据。对于复杂任务，可以考虑使用预训练模型如VGG或ResNet作为基础架构。 3. 权重初始化：合理的权重初始化有助于网络更快地收敛。例如，Xavier初始化或He初始化可以平衡输入层和隐藏层之间的方差。 4. 激活函数：选择合适的激活函数至关重要，ReLU常用于隐藏层以避免梯度消失，而softmax用于多分类任务的输出层。 5. 损失函数：根据任务类型选择合适的损失函数，如交叉熵用于分类，均方误差用于回归。 6. 学习率调度：初始学习率的选择很重要，通常开始时设置较高，然后逐渐减小以稳定训练。可以使用学习率衰减策略，如指数衰减或余弦退火。 7. 正则化与dropout：正则化如L1或L2可以帮助防止过拟合，dropout则通过随机关闭部分神经元在训练期间引入不确定性。 8. 批量归一化（Batch Normalization）：批量归一化可以加速训练并提高模型的泛化能力。 9. 梯度检查：定期进行梯度检查以确保计算的梯度正确无误，防止因编程错误导致的无效训练。 10. 集成学习：结合多个模型的预测，可以提高整体的准确性和鲁棒性。 11. 实验记录与版本控制：使用版本控制系统如Git来跟踪代码和模型的迭代，保持实验的可重复性。 12. 超参数调优：使用网格搜索、随机搜索或贝叶斯优化等方法找到最优的超参数组合。 13. 利用GPU加速：深度学习计算密集，利用GPU可以大大提高训练速度。虽然这些技巧在许多情况下适用，但并非所有情况都通用，具体应用时需根据任务和数据特性进行调整。重要的是，理解每个技巧背后的原理，并在实践中不断试验和学习，才能更好地驾驭深度神经网络。

2018/7/2 Practical Advice for Building Deep Neural Networks – Perception, Control, Cognition

https://pcc.cs.byu.edu/2017/10/02/practical-advice-for-building-deep-neural-networks/ 1/10

Perception, Control, Cognition

Combining deep neural networks with Bayesian models to

advance the potential of AI

Practical Advice for Building Deep Neural Networks

Posted on October 2, 2017October 10, 2017 by Ma H and Daniel R

In our machine learning lab, we’ve accumulated tens of thousands of training hours across numerous

high-powered machines. The computers weren’t the only ones to learn a lot in the process, though:

we ourselves have made a lot of mistakes and ﬁxed a lot of bugs.

Here we present some practical tips for training deep neural networks based on our experiences

(rooted mainly in TensorFlow). Some of the suggestions may seem obvious to you, but they weren’t

to one of us at some point. Other suggestions may not apply or might even be bad advice for your

particular task: use discretion!

We acknowledge these are all well-known methods. We, too, stand on the shoulders of giants here!

Our objective with this article is simply to summarize them at a high level for use in practice.

General Tips

Use the ADAM optimizer. It works really well. Prefer it to more traditional optimizers such as

vanilla gradient descent. TensorFlow note: If saving and restoring weights, remember to set up the

Saver after seing up the AdamOptimizer , because ADAM has state (namely per-weight

learning rates) that need to be restored as well.

ReLU is the best nonlinearity (activation function). Kind of like how Sublime is the best text

editor. But really, ReLUs are fast, simple, and, amazingly, they work, without diminishing

gradients along the way. While sigmoid is a common textbook activation function, it does not

propagate gradients well through DNNs.

Do NOT use an activation function at your output layer. This should be obvious, but it is an easy

mistake to make if you build each layer with a shared function: be sure to turn oﬀ the activation

function at the output.

DO add a bias in every layer. This is ML 101: a bias essentially translates a plane into a best-

ﬁing position. In y=mx+b , b is the bias, allowing the line to move up or down into the “best ﬁt”

position.

下载后可阅读完整内容，剩余9页未读，立即下载

MustX

粉丝: 12
资源: 50

深度神经网络构建实践指南

YouTube推荐系统Paper[2016]-Deep Neural Networks for YouTube Recommendations.pdf

Deep Neural Networks are Easily Fooled

Architecture, Principles, and Applications for Building Efficient Neural Networks

Analyzing the Black Box of Deep Neural Networks

【Advanced Section】In-depth Study of Neural Networks: Deep Belief Networks and Adaptive Learning ...

【Advanced】Image Recognition in MATLAB: Using Convolutional Neural Networks for Image Recognition

【Neural Network Expansion】: The Application of Neural Networks and Deep Learning Models in Linear ...

Watermarking Deep Neural Networks

Adaptive Normalized Risk-Averting Training for Deep Neural Networks

How transferable are features in deep neural networks?怎么引用

最新资源