设置Delta:你可能注意到上面的内容对超参数 及其设置是一笔带过,那么它应该被设置成什么值?
需要通过交叉验证来求得吗?现在看来,该超参数在绝大多数情况下设为 都是安全的。超参
数 和 看起来是两个不同的超参数,但实际上他们一起控制同一个权衡:即损失函数中的数据损失和
正则化损失之间的权衡。理解这一点的关键是要知道,权重 的大小对于分类分值有直接影响(当然
对他们的差异也有直接影响):当我们将 中值缩小,分类分值之间的差异也变小,反之亦然。因
此,不同分类分值之间的边界的具体值(比如 或 )从某些角度来看是没意义的,
因为权重自己就可以控制差异变大和缩小。也就是说,真正的权衡是我们允许权重能够变大到何种程度
(通过正则化强度 来控制)。
与二元支持向量机(Binary Support Vector Machine)的关系:在学习本课程前,你可能对于二元支
持向量机有些经验,它对于第i个数据的损失计算公式是:
unvectorized version. Compute the multiclass svm loss for a single example
(x,y)
- x is a column vector representing an image (e.g. 3073 x 1 in CIFAR-10)
with an appended bias dimension in the 3073-rd position (i.e. bias trick)
- y is an integer giving index of correct class (e.g. between 0 and 9 in CIFAR-
10)
- W is the weight matrix (e.g. 10 x 3073 in CIFAR-10)
"""
delta = 1.0 # see notes about delta later in this section
scores = W.dot(x) # scores becomes of size 10 x 1, the scores for each class
correct_class_score = scores[y]
D = W.shape[0] # number of classes, e.g. 10
loss_i = 0.0
for j in xrange(D): # iterate over all wrong classes
if j == y:
# skip for the true class to only loop over incorrect classes
continue
# accumulate loss for the i-th example
loss_i += max(0, scores[j] - correct_class_score + delta)
return loss_i
def L_i_vectorized(x, y, W):
"""
A faster half-vectorized implementation. half-vectorized
refers to the fact that for a single example the implementation contains
no for loops, but there is still one loop over the examples (outside this
function)
"""
delta = 1.0
scores = W.dot(x)
# compute the margins for all classes in one vector operation
margins = np.maximum(0, scores - scores[y] + delta)
# on y-th position scores[y] - scores[y] canceled and gave delta. We want
# to ignore the y-th position and only consider margin on max wrong class
margins[y] = 0
loss_i = np.sum(margins)
return loss_i