Complete the code for Differentially Private Stochastic Gradient Descent. a. Fill in the code for per-example clipping and adding Gaussian noise. b. Implement the privacy budget composition. Calculate the privacy budget of the training process, which means calculating $\epsilon$ based on the variance of Gaussian noise $\sigma^2$ and the given $\delta = 10^{-5}$ in different epochs. You can use basic composition to complete the code. If you correctly apply the Moments Accountant method, you will receive bonus points. import numpy as np from scipy import optimize from scipy.stats import norm import math """ Optionally you could use moments accountant to implement the epsilon calculation. """ def get_epsilon(epoch, delta, sigma, sensitivity, batch_size, training_nums): """ Compute epsilon with basic composition from given epoch, delta, sigma, sensitivity, batch_size and the number of training set. """ return epsilon
时间: 2024-03-25 20:37:55 浏览: 102
Here is the completed code for Differentially Private Stochastic Gradient Descent, including per-example clipping and adding Gaussian noise as well as privacy budget composition:
```python
import numpy as np
from scipy import optimize
from scipy.stats import norm
import math
def per_example_clipping(grad, clip_factor):
"""
Clip the gradient per example with a given clip factor.
"""
return np.clip(grad, -clip_factor, clip_factor)
def add_gaussian_noise(grad, sigma):
"""
Add Gaussian noise to the gradient with a given standard deviation.
"""
return grad + np.random.normal(0, sigma, grad.shape)
def get_epsilon(epoch, delta, sigma, sensitivity, batch_size, training_nums):
"""
Compute epsilon with basic composition from given epoch, delta, sigma, sensitivity, batch_size and the number of training set.
"""
steps = math.ceil(training_nums / batch_size) * epoch
epsilon = sigma * math.sqrt(2 * math.log(1.25 / delta)) / sensitivity
return epsilon * steps
def dp_sgd(X, y, epochs, batch_size, clip_factor, sigma, delta):
n, d = X.shape
w = np.zeros(d)
for epoch in range(epochs):
for i in range(0, n, batch_size):
X_batch = X[i:i+batch_size]
y_batch = y[i:i+batch_size]
grad = np.mean(X_batch * (sigmoid(X_batch.dot(w)) - y_batch).reshape(-1, 1), axis=0)
clipped_grad = per_example_clipping(grad, clip_factor)
noise_grad = add_gaussian_noise(clipped_grad, sigma)
w -= noise_grad
epsilon = get_epsilon(epoch+1, delta, sigma, clip_factor/batch_size, batch_size, n)
print("Epoch {}: Epsilon = {}".format(epoch+1, epsilon))
return w
```
The `per_example_clipping` function clips the gradient per example with a given clip factor. The `add_gaussian_noise` function adds Gaussian noise to the gradient with a given standard deviation. The `get_epsilon` function computes epsilon with basic composition from given epoch, delta, sigma, sensitivity, batch_size and the number of training set.
The `dp_sgd` function performs Differentially Private Stochastic Gradient Descent. For each epoch, it loops over the training set in batches and computes the gradient of the loss function using the sigmoid function. It then clips the gradient per example, adds Gaussian noise to the clipped gradient, and updates the weight vector. Finally, it computes the privacy budget using the `get_epsilon` function and prints it out.
Note that the `get_epsilon` function uses basic composition to compute the privacy budget. It calculates the total number of steps based on the number of epochs and the batch size, and then uses the formula for epsilon with basic composition to compute the privacy budget for each epoch.
It is worth noting that basic composition may not provide the tightest bound on privacy, and using the Moments Accountant method may provide a tighter bound.
阅读全文