用代码自己实现Adam算法
时间: 2024-10-21 08:10:37 浏览: 17
Adam (Adaptive Moment Estimation) 是一种常用的优化算法,它结合了梯度下降法(SGD)的优点以及动量法(Momentum)的稳定性能。以下是使用 Python 和 PyTorch 实现 Adam 算法的基本步骤:
```python
import torch
from torch.optim import Optimizer
class CustomAdam(Optimizer):
def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0):
defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
super(CustomAdam, self).__init__(params, defaults)
def step(self, closure=None):
"""Performs a single optimization step."""
if closure is not None:
loss = closure()
for group in self.param_groups:
for p in group['params']:
if p.grad is None:
continue
grad = p.grad.data # 获取梯度
state = self.state[p]
# 如果这是第一个迭代,则初始化 m 和 v(移动平均和指数移动平均)
if 'momentum_buffer' not in state:
state['momentum_buffer'] = torch.zeros_like(p.data)
state['step'] = 0
state['exp_avg'] = torch.zeros_like(p.data)
state['exp_avg_sq'] = torch.zeros_like(p.data)
exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
beta1, beta2 = group['betas']
state['step'] += 1 # 更新步数
# 计算 bias-corrected first and second moments
if group['weight_decay'] != 0:
exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
else:
exp_avg.mul_(beta1).add_(grad)
exp_avg_sq.mul_(beta2).addcmul_(grad, grad)
denom = exp_avg_sq.sqrt().add_(group['eps'])
bias_correction1 = 1 - beta1 ** state['step']
bias_correction2 = 1 - beta2 ** state['step']
step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1
p.data.addcdiv_(-step_size, exp_avg, denom) # 更新参数
return loss
# 使用示例
model.parameters(), lr=0.001)
```
这个自定义的 Adam 类实现了基本的 Adam 更新规则。注意,为了保持兼容性,你需要导入 `math` 模块,并且将 `state['exp_avg']` 和 `state['exp_avg_sq']` 替换为你所需的移动平均变量名。
阅读全文