强化学习 python代码
时间: 2024-07-17 07:01:18 浏览: 137
强化学习是一种机器学习方法,通过不断试错并学习最优策略来解决复杂的问题。Python作为其常用的编程语言之一,有许多库支持强化学习,如TensorFlow、PyTorch和OpenAI Gym等。
下面是一个简单的Q-learning(一种基于表格的强化学习算法)的Python代码示例,用于解决迷宫问题:
```python
import numpy as np
class QLearningAgent:
def __init__(self, action_size, learning_rate=0.1, discount_factor=0.9):
self.action_size = action_size
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.q_table = np.zeros((action_size,))
def act(self, state):
# 使用ε-greedy策略选择行动
if np.random.rand() < epsilon:
return np.random.choice(list(range(self.action_size)))
else:
return np.argmax(self.q_table)
def learn(self, state, action, reward, next_state):
best_next_action = np.argmax(self.q_table[next_state])
max_future_q = self.q_table[best_next_action]
current_q = self.q_table[action]
self.q_table[action] = (1 - self.learning_rate) * current_q + \
self.learning_rate * (reward + self.discount_factor * max_future_q)
# 使用代码
epsilon = 0.95 # 探索率
agent = QLearningAgent(action_size=4) # 四个方向
for _ in range(1000): # 训练次数
# 执行动作、接收奖励并更新状态
state = 0 # 假设初始在迷宫某处
for _ in range(100): # 每次迭代步数
action = agent.act(state)
reward, next_state = get_reward_and_next_state(state, action) # 自定义函数获取奖励和新状态
agent.learn(state, action, reward, next_state)
state = next_state
epsilon *= 0.99 # 随时间衰减探索率
# 当训练完成后,可以使用q_table查询每个状态下应该采取的动作
```
阅读全文