首页写一个在gym环境下运行的移动障碍物网格环境

写一个在gym环境下运行的移动障碍物网格环境

时间: 2024-04-09 17:28:17 浏览: 123

下面是一个在gym环境下运行的移动障碍物网格环境的示例代码： ```python import gym from gym import spaces import numpy as np class GridWorldEnv(gym.Env): def __init__(self, grid_size=5): super(GridWorldEnv, self).__init__() self.grid_size = grid_size self.start_position = (0, 0) self.target_position = (grid_size-1, grid_size-1) self.current_position = self.start_position self.action_space = spaces.Discrete(4) # 上、下、左、右四个动作 self.observation_space = spaces.Tuple(( spaces.Discrete(grid_size), spaces.Discrete(grid_size) )) def reset(self): self.current_position = self.start_position return self.current_position def step(self, action): x, y = self.current_position if action == 0: # 上 y += 1 elif action == 1: # 下 y -= 1 elif action == 2: # 左 x -= 1 elif action == 3: # 右 x += 1 # 检查新位置的合法性 if 0 <= x < self.grid_size and 0 <= y < self.grid_size: self.current_position = (x, y) done = self.current_position == self.target_position reward = 1.0 if done else 0.0 return self.current_position, reward, done, {} def render(self): grid = np.zeros((self.grid_size, self.grid_size)) grid[self.start_position] = 0.5 grid[self.target_position] = 0.5 grid[self.current_position] = 1.0 for row in grid: print(' '.join(['{:1.1f}'.format(cell) for cell in row])) # 使用示例 env = GridWorldEnv(grid_size=5) observation = env.reset() for _ in range(10): env.render() action = env.action_space.sample() observation, reward, done, _ = env.step(action) if done: break ``` 运行上述代码，您将看到一个5x5的网格环境，起始位置标记为0.5，目标位置标记为0.5，当前位置标记为1.0。每次循环中，随机选择一个动作，更新环境状态，直到到达目标位置为止。

阅读全文