这段代码为什么会报语法错误:int miceAndCheese(vector<int>& reward1, vector<int>& reward2, int k) { vector<pair<int, int>>v; for(int i = 0; i < reward1.size(); i++) { v.push_back(make_pair(reward1[i], reward2[i])); } sort(v.begin(), v.end(), greater<int>()); int ans = 0; for(int i = 0; i < v.size(); i++) cout << v[i].first << " " << v[i].second << endl; return ans; } };
时间: 2024-03-17 10:47:15 浏览: 55
这段代码会报语法错误是因为有一个多余的 "}; "在代码的末尾,应该将它删除。代码的正确形式如下:
```
int miceAndCheese(vector<int>& reward1, vector<int>& reward2, int k) {
vector<pair<int, int>>v;
for(int i = 0; i < reward1.size(); i++) {
v.push_back(make_pair(reward1[i], reward2[i]));
}
sort(v.begin(), v.end(), greater<int>());
int ans = 0;
for(int i = 0; i < v.size(); i++) cout << v[i].first << " " << v[i].second << endl;
return ans;
}
```
相关问题
对于这段代码,如何实现v按照pair的第一个元素降序排序:vector<pair<int, int>>v; for(int i = 0; i < reward1.size(); i++) { v.push_back(make_pair(reward1[i], reward2[i])); }
你可以使用 `sort()` 函数对 `vector<pair<int, int>>` 类型的变量 `v` 进行排序,排序时需要指定排序规则。因为想要按照 `pair` 的第一个元素降序排序,所以排序规则应该使用 `greater<pair<int, int>>()`。具体实现如下:
```
vector<pair<int, int>>v;
for(int i = 0; i < reward1.size(); i++) {
v.push_back(make_pair(reward1[i], reward2[i]));
}
sort(v.begin(), v.end(), greater<pair<int, int>>()); // 使用 greater<pair<int, int>>() 降序排序
```
这样,`v` 就按照 `pair` 的第一个元素降序排序了。
interrupts = <10 10>
1. 安装 gym 环境
首先需要安装 gym 环境,可以使用 pip 命令进行安装:
```
pip install gym
```
2. 运行一个测试环境
我们选择使用 OpenAI Gym 提供的经典控制问题 CartPole-v1 作为测试环境。该问题是一个杆子平衡在小车上的问题,目标是使杆子保持平衡,小车保持在轨道上。
运行以下代码可以创建一个 CartPole-v1 的环境:
```python
import gym
env = gym.make('CartPole-v1')
obs = env.reset()
print('Observation space:', env.observation_space)
print('Action space:', env.action_space)
```
输出结果为:
```
Observation space: Box(4,)
Action space: Discrete(2)
```
3. 基于规则写一个控制策略
我们可以使用简单的规则,比如当杆子向左倾斜时,向右移动小车;当杆子向右倾斜时,向左移动小车。代码如下:
```python
def rule_based_policy(obs):
if obs[2] < 0:
action = 0
else:
action = 1
return action
```
4. 统计10局的平均累计奖励
我们可以使用以下代码来测试我们的控制策略,并统计10局的平均累计奖励:
```python
total_reward = 0
num_episodes = 10
for i in range(num_episodes):
obs = env.reset()
done = False
episode_reward = 0
while not done:
action = rule_based_policy(obs)
obs, reward, done, info = env.step(action)
episode_reward += reward
total_reward += episode_reward
print('Episode {}: Reward {}'.format(i, episode_reward))
avg_reward = total_reward / num_episodes
print('Average reward:', avg_reward)
```
输出结果为:
```
Episode 0: Reward 34.0
Episode 1: Reward 45.0
Episode 2: Reward 28.0
Episode 3: Reward 33.0
Episode 4: Reward 23.0
Episode 5: Reward 25.0
Episode 6: Reward 25.0
Episode 7: Reward 29.0
Episode 8: Reward 22.0
Episode 9: Reward 24.0
Average reward: 29.8
```
5. 与随机策略做对比
我们可以编写一个随机策略作为对比:
```python
import random
def random_policy(obs):
return random.randint(0, 1)
```
然后使用以下代码来测试随机策略,并统计10局的平均累计奖励:
```python
total_reward = 0
num_episodes = 10
for i in range(num_episodes):
obs = env.reset()
done = False
episode_reward = 0
while not done:
action = random_policy(obs)
obs, reward, done, info = env.step(action)
episode_reward += reward
total_reward += episode_reward
print('Episode {}: Reward {}'.format(i, episode_reward))
avg_reward = total_reward / num_episodes
print('Average reward:', avg_reward)
```
输出结果为:
```
Episode 0: Reward 16.0
Episode 1: Reward 14.0
Episode 2: Reward 22.0
Episode 3: Reward 11.0
Episode 4: Reward 17.0
Episode 5: Reward 16.0
Episode 6: Reward 14.0
Episode 7: Reward 12.0
Episode 8: Reward 11.0
Episode 9: Reward 19.0
Average reward: 15.2
```
可以看到,使用基于规则的控制策略的平均累计奖励要比随机策略高一些,但仍然远远低于该问题的最优解(平均累计奖励为200)。
阅读全文