Compute the loss and gradients for a two layer fully connected neural network. Inputs: - X: Input data of shape (N, D). Each X[i] is a training sample. - y: Vector of training labels. y[i] is the label for X[i], and each y[i] is an integer in the range 0 <= y[i] < C. This parameter is optional; if it is not passed then we only return scores, and if it is passed then we instead return the loss and gradients. - reg: Regularization strength. Returns: If y is None, return a matrix scores of shape (N, C) where scores[i, c] is the score for class c on input X[i]. If y is not None, instead return a tuple of: - loss: Loss (data loss and regularization loss) for this batch of training samples. - grads: Dictionary mapping parameter names to gradients of those parameters with respect to the loss function; has the same keys as self.params.
时间: 2023-11-22 11:52:49 浏览: 146
To compute the loss and gradients for a two layer fully connected neural network, we need to perform forward and backward propagation.
Forward propagation:
1. Compute the scores for each class by multiplying the input data X with the weight matrix W1 and adding the bias term b1. Then apply ReLU activation function to the result.
2. Compute the scores for each class by multiplying the output of the first layer with the weight matrix W2 and adding the bias term b2.
The loss function for a multi-class classification problem is usually the cross-entropy loss.
Backward propagation:
1. Compute the gradient of the loss with respect to the scores of the second layer.
2. Compute the gradient of the loss with respect to the parameters of the second layer (W2 and b2).
3. Compute the gradient of the loss with respect to the output of the first layer.
4. Compute the gradient of the loss with respect to the scores of the first layer (taking into account the ReLU activation function).
5. Compute the gradient of the loss with respect to the parameters of the first layer (W1 and b1).
Finally, we add the regularization term to the loss and compute the gradients with respect to the regularization term as well.
Here's the code:
```python
def two_layer_fc(X, params, reg=0.0):
W1, b1, W2, b2 = params['W1'], params['b1'], params['W2'], params['b2']
N, D = X.shape
scores = None
# Forward pass
hidden_layer = np.maximum(0, np.dot(X, W1) + b1) # ReLU activation
scores = np.dot(hidden_layer, W2) + b2
# If y is not given, return scores
if y is None:
return scores
# Compute the loss and gradients
loss = None
grads = {}
# Compute the loss (data loss and regularization loss)
num_classes = W2.shape[1]
exp_scores = np.exp(scores)
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
correct_logprobs = -np.log(probs[range(N),y])
data_loss = np.sum(correct_logprobs) / N
reg_loss = 0.5 * reg * (np.sum(W1*W1) + np.sum(W2*W2))
loss = data_loss + reg_loss
# Compute the gradients
dscores = probs
dscores[range(N),y] -= 1
dscores /= N
dW2 = np.dot(hidden_layer.T, dscores)
db2 = np.sum(dscores, axis=0, keepdims=True)
dhidden = np.dot(dscores, W2.T)
dhidden[hidden_layer <= 0] = 0
dW1 = np.dot(X.T, dhidden)
db1 = np.sum(dhidden, axis=0, keepdims=True)
# Add regularization gradient contribution
dW2 += reg * W2
dW1 += reg * W1
# Store gradients in dictionary
grads['W1'] = dW1
grads['b1'] = db1
grads['W2'] = dW2
grads['b2'] = db2
return loss, grads
```
阅读全文