深度学习性能提升：初始化策略与Python实现

需积分: 0 42 浏览量更新于2024-08-05 收藏 818KB PDF 举报

"本文介绍了提高深度神经网络性能的关键因素之一——权重初始化，并详细讲解了零初始化、随机初始化、He初始化和Xavier初始化等方法的原理及其优缺点。" 在深度学习中，权重初始化对于网络的训练效果至关重要。不同的初始化策略会影响到神经网络的训练速度和最终的模型性能。首先，我们来探讨零初始化。零初始化所有权重，虽然简单，但会导致前向传播和反向传播过程中对称性的问题，使得神经元间的差异消失，从而减弱了网络的学习能力。接下来是随机初始化，它通过赋予权重随机值打破对称性，允许网络进行有效的学习。然而，当网络层次加深，随机初始化可能会引发梯度消失或梯度爆炸的问题。梯度消失主要与sigmoid等激活函数有关，它们的导数在饱和区接近于零，导致深层网络的梯度非常小，影响浅层权重的更新。相反，梯度爆炸则是因为初始权重过大，导致前向传播过程中各层变化过快，权重更新过于剧烈。为了解决这些问题，He初始化和Xavier初始化应运而生。He初始化是由He等人提出的，特别针对ReLU激活函数设计。它建议权重初始化为随机值后除以输入维度的平方根，这有助于保持每一层的输入激活值在适当范围内，减少梯度爆炸的风险，同时保持足够的梯度流过深层网络。 Xavier初始化，又称为均匀初始化，是由Glorot和Bengio提出的。它与He初始化类似，不过考虑的是输入和输出维度的平均值，即权重初始化为随机值后除以输入和输出维度的均值的平方根。这种初始化方法旨在保持前向传播和反向传播的信号强度稳定，适用于非线性激活函数，如tanh和sigmoid。在实际应用中，选择哪种初始化方法通常取决于网络结构和激活函数。对于使用ReLU的网络，He初始化通常表现更好，而对于sigmoid或tanh，Xavier初始化可能更为合适。当然，随着深度学习研究的发展，还有其他如Kaiming初始化、Orthogonal初始化等方法，它们都是为了优化网络训练的效率和效果。权重初始化是深度学习中不可忽视的一部分，合理的初始化策略能够有效缓解梯度消失和梯度爆炸问题，提高网络的训练能力和泛化性能。在实践中，需要根据具体任务和网络架构来选择最合适的初始化方法。

In[2]:

2 - Zero initialization

There are two types of parameters to initialize in a neural network:

the weight matrices

the bias vectors

Exercise: Implement the following function to initialize all parameters to zeros. You'll see later that this does not work well since it fails to "break

symmetry", but lets try it anyway and see what happens. Use np.zeros((..,..)) with the correct shapes.

如果对W和b参数初始化为0（零向量、零矩阵），看看效果

( , , , . . . , , )

[1]

[2]

[3]

[

−

[

]

( , , , . . . , , )

[1]

[2]

[3]

[

−

[

]

def

model(X, Y, learning_rate = 0.01, num_iterations = 15000, print_cost =

True

, initialization = "he"):

"""

Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID.

Arguments:

X -- input data, of shape (2, number of examples)

Y -- true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples)

learning_rate -- learning rate for gradient descent

num_iterations -- number of iterations to run gradient descent

print_cost -- if True, print the cost every 1000 iterations

initialization -- flag to choose which initialization to use ("zeros","random" or "he")

Returns:

parameters -- parameters learnt by the model

"""

grads = {}

costs = []

# to keep track of the loss

m = X.shape[1]

# number of examples

layers_dims = [X.shape[0], 10, 5, 1]

# Initialize parameters dictionary.

initialization == "zeros":

parameters = initialize_parameters_zeros(layers_dims)

elif

initialization == "random":

parameters = initialize_parameters_random(layers_dims)

elif

initialization == "he":

parameters = initialize_parameters_he(layers_dims)

# Loop (gradient descent)

for

range(0, num_iterations):

# Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID.

a3, cache = forward_propagation(X, parameters)

# Loss

cost = compute_loss(a3, Y)

# Backward propagation.

grads = backward_propagation(X, Y, cache)

# Update parameters.

parameters = update_parameters(parameters, grads, learning_rate)

# Print the loss every 1000 iterations

print_cost

and

1000 == 0:

print("Cost after iteration {}: {}".format(i, cost))

costs.append(cost)

# plot the loss

plt.plot(costs)

plt.ylabel('cost')

plt.xlabel('iterations (per hundreds)')

plt.title("Learning rate ="

str(learning_rate))

plt.show()

return

parameters

剩余10页未读，继续阅读

有只风车子

粉丝: 38
资源: 329

深度学习性能提升：初始化策略与Python实现

Python-机器学习 课程

1-机器学习系列（1）：深度前馈神经网络--原理解释、公式推导及Python实现2

2d-Convolution-Images-using-Python:无用的图像处理程序

Python机器学习(scikit-learn)：监督学习 - 神经网络（深度学习）-谢TS的博客.pdf

机器学习-10-神经网络python实现

颜色分类leetcode-coding-blocks-machine-learning:编码块机器学习

iitr-deep-learning-spl-tf2:IIT Roorkee 的深度学习专业课程（使用 python、numpy、pandas、sklearn、TensorFlow 2）

Hands-On Genetic Algorithms with Python: Applying genetic algori

颜色分类leetcode-Machine-Learning-Algorithm:回归、KNN、聚类、神经网络等的实现

Machine-Learning-by-Hand:所有基本机器学习算法的库，仅需最少的外部支持即可进行编码

最新资源

Python-机器学习课程