单智能体强化学习算法代码
时间: 2024-11-19 12:14:43 浏览: 6
MADRL面向角色的多智能体强化学习(ROMA)算法代码
单智能体强化学习(Single-Agent Reinforcement Learning, SARL)是一种机器学习技术,用于训练智能体在一个环境中通过尝试不同的动作来最大化累积奖励。这里以Q-learning为例,这是一种基于模型-free(无模型)的简单但有效的算法:
```python
# 导入必要的库
import numpy as np
class QLearningAgent:
def __init__(self, state_space, action_space, learning_rate=0.5, discount_factor=0.9):
self.state_space = state_space
self.action_space = action_space
self.q_table = np.zeros((state_space, action_space))
self.learning_rate = learning_rate
self.discount_factor = discount_factor
def choose_action(self, state):
# ε-greedy策略选择
if np.random.uniform(0, 1) < self.epsilon:
return np.random.choice(self.action_space)
else:
return np.argmax(self.q_table[state])
def update_q_value(self, state, action, reward, next_state, done):
# 更新Q表
max_future_q = np.max(self.q_table[next_state]) if not done else 0
current_q = self.q_table[state][action]
new_q = (1 - self.learning_rate) * current_q + self.learning_rate * (reward + self.discount_factor * max_future_q)
self.q_table[state][action] = new_q
# 使用示例
agent = QLearningAgent(state_space=10, action_space=4)
for episode in range(num_episodes):
# 在每个episode里...
state = env.reset()
while True:
action = agent.choose_action(state)
next_state, reward, done, _ = env.step(action)
agent.update_q_value(state, action, reward, next_state, done)
state = next_state
if done:
break
```
在这个例子中,`epsilon`是一个探索-利用参数,`num_episodes`是总的迭代次数。智能体通过不断试错,更新其对状态-动作组合的Q值估计,以便在未来的选择中做出更优决策。
阅读全文