Softmax详解：从概念到损失函数

需积分: 0 120 浏览量更新于2024-08-05 收藏 2.18MB PDF 举报

"这篇博客由红色石头撰写，主要讲解了Softmax函数在机器学习和深度学习中的重要性，特别是处理多分类问题时的作用。作者还提醒了在实际应用中注意数值溢出的问题，并给出了相应的解决方案。此外，文章还提到了Softmax损失函数的概念。" 在机器学习领域，Softmax函数是一个至关重要的概念，特别是在多分类问题的解决中。它是一种将任意实数向量转换为概率分布的形式，使得输出的各个元素总和为1，且每个元素的值代表对应类别的概率。表达式为： \[ \text{Softmax}(V)_i = \frac{e^{V_i}}{\sum_{j=1}^{C} e^{V_j}} \] 这里的 \( V_i \) 表示分类器前级输出的第i个值，C是类别总数，而 \( S_i \) 是经过Softmax转换后的概率。这个转换使得模型的输出能直观地表示出各类别的相对可能性。例如，假设有一个4分类问题，原始输出分别为 [3, 2, 5, 1]，经过Softmax后，它们会被转换为概率形式，如 [0.8390, 0.1255, 0.0319, 0.0036]，这表明模型预测第一类的概率最高。然而，Softmax函数在处理大数值时可能出现数值溢出的问题，因为指数运算可能导致极大的数值。为了解决这个问题，通常会先对原始向量V进行归一化操作，即减去最大值，防止指数爆炸。这样可以确保在计算时避免数值溢出，提高计算的稳定性。在损失函数方面，Softmax常与交叉熵损失函数结合使用，形成Softmax交叉熵损失。在多分类任务中，这种损失函数能够有效地衡量模型预测概率分布与真实标签之间的差异。线性分类器的输出 \( s \) 与权重系数矩阵 \( W \) 和输入 \( x \) 相乘得到，即 \( s = Wx \)，然后通过Softmax函数转化为概率，接着与真实的one-hot编码标签计算交叉熵。 Softmax函数在深度学习和机器学习模型中扮演着关键角色，不仅提供了一个概率解释，还简化了模型训练过程中的损失计算。理解并正确使用Softmax对于构建高效准确的分类模型至关重要。

2018/9/29 三分钟带你对 Softmax 划重点 - 红色石头的专栏 - CSDN博客

https://blog.csdn.net/red_stone1/article/details/80687921 3/11

使用矩阵运算，对权重 W 求导函数定义如下：

实际验证表明，矩阵运算速度要比嵌套循环快很多，特别是在训练样本数量多的情况下。我们使用 CIFAR-10 数据集中约5000个样本对两种求导方

比：

结果显示为：

naive loss: 2.362135e+00 computed in 14.680000s

vectorized loss: 2.362135e+00 computed in 0.242000s

Loss difference: 0.000000

Gradient difference: 0.000000

loss += -scores_shift[right_class] + np.log(np.sum(np.exp(scores_shift)))

for j in xrange(num_classes):

softmax_output = np.exp(scores_shift[j]) / np.sum(np.exp(scores_shift))

if j == y[i]:

dW[:,j] += (-1 + softmax_output) * X[i,:]

else:

dW[:,j] += softmax_output * X[i,:]

loss /= num_train

loss += 0.5 * reg * np.sum(W * W)

dW /= num_train

dW += reg * W

return loss, dW

def softmax_loss_vectorized(W, X, y, reg):

"""

Softmax loss function, vectorized version.

Inputs and outputs are the same as softmax_loss_naive.

"""

# Initialize the loss and gradient to zero.

loss = 0.0

dW = np.zeros_like(W)

num_train = X.shape[0]

num_classes = W.shape[1]

scores = X.dot(W)

scores_shift = scores - np.max(scores, axis = 1).reshape(-1,1)

softmax_output = np.exp(scores_shift) / np.sum(np.exp(scores_shift), axis=1).reshape(-1,1)

loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))

loss /= num_train

loss += 0.5 * reg * np.sum(W * W)

dS = softmax_output.copy()

dS[range(num_train), list(y)] += -1

dW = (X.T).dot(dS)

dW = dW / num_train + reg * W

return loss, dW

tic = time.time()

loss_naive, grad_naive = softmax_loss_naive(W, X_train, y_train, 0.000005)

toc = time.time()

print('naive loss: %e computed in %fs' % (loss_naive, toc - tic))

tic = time.time()

loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_train, y_train, 0.000005)

toc = time.time()

print('vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')

print('Loss difference: %f' % np.abs(loss_naive - loss_vectorized))

print('Gradient difference: %f' % grad_difference)

剩余10页未读，继续阅读

食色也

粉丝: 37
资源: 351

Softmax详解：从概念到损失函数

0083-极智AI-解读C++手写softmax激活函数-个人笔记

20. softmax softmax-loss cross-entropy交叉熵1

Optimizing-Softmax-Regression-with-MCMC

matlab求导代码-logsumexp-softmax:logsumexp-softmax

AM-MobileNet1D:附加保证金MobileNet1D是一种新的用于说话人识别的轻量级深度学习模型，该模型基于MobileNetV2体系结构和附加保证金Softmax（AM-Softmax）损失函数。）

深度神经网络结构以及Pre-Training的理解 - cyq0122的专栏 - 博客频道 - CSDN.NET1

09-python-theano-Softmax回归-人工神经网络-随机数流量

pytorch-lab-06-1-softmax-classification.ipynb

softmax与softmax-loss：提升深度学习泛化能力的几何视角

Softmax-cross-attention

最新资源