The total number of epochs for training was 200. The learning rate for each architecture was kept constant at 0.0001 with the RMSprop31 (root mean square propagation) algorithm, which was used for gradient descent optimization. Though we experimented with different batch sizes (8/16/24/ 32), a batch size of 16 was used for both training and validat- ing the deep convolutional architecture, because it gave the best result. Leaky ReLU (alpha ¼ 0.01), where negative values are occasionally allowed to propagate, was applied in convolutional layers 1 and 2. This provided nonlinearity on the output of the convolutional layers. As all our architec- tures were shallow, to prevent overfitting, both dropout32 and L2 regularization33 were applied before the classification layer. 解释
时间: 2024-02-14 08:03:47 浏览: 211
衰变:#Copyright 2021 by YoungWoon Cho,Danny HongThe Cooper Union for the Science and Art#ECE471 Machine Learning Architecture
这段话主要描述了一个深度卷积神经网络的训练细节。其中,训练的总迭代次数为200次,使用了RMSprop算法作为梯度下降优化器,并设置了学习率为0.0001。在不同的batch size中,选择了16作为训练和验证的批次大小,因为它提供了最好的结果。在前两个卷积层中应用了Leaky ReLU激活函数,这使得负值有时也可以传播,提供了非线性。为了防止过拟合,模型中应用了dropout和L2正则化。
阅读全文