强化学习 走迷宫 python代码
时间: 2024-06-02 15:06:07 浏览: 131
强化学习是一种通过试错来学习最优行动策略的机器学习方法。它通过与环境交互,不断试错并根据奖励信号进行学习,最终达到最优策略。
下面是一个使用强化学习走迷宫的Python代码示例:
```python
import numpy as np
# 定义迷宫地图
maze = np.array([
[0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 1, 0],
[0, 0, 1, 0, 0, 0],
[0, 1, 0, 1, 1, 0],
[1, 0, 0, 0, 1, 0],
[1, 1, 1, 0, 0, 0]
])
# 定义Q表格
Q = np.zeros([6, 6])
# 定义参数
alpha = 0.8
gamma = 0.9
epsilon = 0.2
episodes = 10000
# 定义获取下一步状态和奖励的函数
def get_state_reward(state, action):
if action == 'up':
next_state = (max(state-1, 0), state)
elif action == 'down':
next_state = (min(state+1, 5), state)
elif action == 'left':
next_state = (state, max(state-1, 0))
else:
next_state = (state, min(state+1, 5))
if maze[next_state] == 1:
reward = -1
elif next_state == (5,5):
reward = 10
else:
reward = -0.01
return next_state, reward
# 进行强化学习
for i in range(episodes):
state = (0, 0)
while state != (5,5):
if np.random.uniform() < epsilon:
action = np.random.choice(['up', 'down', 'left', 'right'])
else:
action = ['up', 'down', 'left', 'right'][np.argmax(Q[state])]
next_state, reward = get_state_reward(state, action)
Q[state][action] += alpha * (reward + gamma * np.max(Q[next_state]) - Q[state][action])
state = next_state
# 输出最终Q表格
print(Q)
# 使用最终Q表格走迷宫
state = (0, 0)
while state != (5,5):
print(state)
action = ['up', 'down', 'left', 'right'][np.argmax(Q[state])]
state,_ = get_state_reward(state, action)
print(state)
```
在这个例子中,我们定义了一个6x6的迷宫地图,并使用Q-learning算法进行强化学习,得到最优策略。通过输出最终的Q表格,我们可以看到在每个状态下采取不同行动的得分情况。最后,我们使用最优策略走出了迷宫。
阅读全文