tensorflow2写DQN代码
时间: 2024-05-09 15:16:14 浏览: 198
Tensorflow 实现DQN强化学习.zip
5星 · 资源好评率100%
以下是使用TensorFlow 2实现DQN(Deep Q Network)的示例代码:
首先,我们需要导入所需的库:
```python
import random
import numpy as np
import tensorflow as tf
from collections import deque
```
然后,我们定义DQN模型。这个模型由一个卷积神经网络(CNN)和一个全连接神经网络(Dense)组成。CNN用于提取图像特征,而Dense用于将这些特征映射到Q值。
```python
class DQNAgent:
def __init__(self, state_shape, action_size, learning_rate, gamma, epsilon, epsilon_decay):
self.state_shape = state_shape # 输入状态的形状
self.action_size = action_size # 动作空间大小
self.learning_rate = learning_rate # 学习率
self.gamma = gamma # 折扣因子
self.epsilon = epsilon # 探索率
self.epsilon_decay = epsilon_decay # 探索率衰减率
self.memory = deque(maxlen=2000) # 经验回放缓存
self.model = self.build_model() # 创建模型
def build_model(self):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, (8, 8), strides=(4, 4), activation='relu', input_shape=self.state_shape))
model.add(tf.keras.layers.Conv2D(64, (4, 4), strides=(2, 2), activation='relu'))
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(512, activation='relu'))
model.add(tf.keras.layers.Dense(self.action_size))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=self.learning_rate), loss='mse')
return model
```
接下来,我们定义经验回放函数,用于将经验元组保存到缓存中。
```python
def remember(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
```
然后,我们定义动作选择函数。在这个函数中,我们使用探索率ε-greedy策略选择动作。当随机数小于探索率时,我们随机选择一个动作;否则,我们使用模型预测的Q值选择最优动作。
```python
def act(self, state):
if np.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
q_values = self.model.predict(state)
return np.argmax(q_values[0])
```
接下来,我们定义训练函数。在这个函数中,我们首先从经验回放缓存中随机选择一批经验元组。然后,我们使用模型预测Q值,并计算目标Q值。最后,我们使用梯度下降法更新模型。
```python
def train(self, batch_size):
if len(self.memory) < batch_size:
return
minibatch = random.sample(self.memory, batch_size)
for state, action, reward, next_state, done in minibatch:
target = reward
if not done:
target = reward + self.gamma * np.amax(self.model.predict(next_state)[0])
q_values = self.model.predict(state)
q_values[0][action] = target
self.model.fit(state, q_values, epochs=1, verbose=0)
if self.epsilon > 0.01:
self.epsilon *= self.epsilon_decay
```
最后,我们定义保存和加载模型的函数。
```python
def save(self, name):
self.model.save(name)
def load(self, name):
self.model = tf.keras.models.load_model(name)
```
现在,我们可以创建一个DQNAgent对象并开始训练。
```python
env = gym.make('CartPole-v0')
state_shape = env.observation_space.shape
action_size = env.action_space.n
agent = DQNAgent(state_shape, action_size, learning_rate=0.001, gamma=0.95, epsilon=1.0, epsilon_decay=0.995)
for episode in range(1000):
state = env.reset()
state = np.reshape(state, [1, state_shape[0], state_shape[1], state_shape[2]])
for time in range(500):
action = agent.act(state)
next_state, reward, done, _ = env.step(action)
next_state = np.reshape(next_state, [1, state_shape[0], state_shape[1], state_shape[2]])
agent.remember(state, action, reward, next_state, done)
state = next_state
if done:
print("episode: {}/{}, score: {}, epsilon: {:.2}".format(episode, 1000, time, agent.epsilon))
break
agent.train(batch_size=32)
if episode % 10 == 0:
agent.save("cartpole-dqn.h5")
```
在训练结束后,我们可以使用以下代码测试模型的性能:
```python
agent.load("cartpole-dqn.h5")
state = env.reset()
state = np.reshape(state, [1, state_shape[0], state_shape[1], state_shape[2]])
for time in range(500):
env.render()
action = agent.act(state)
next_state, reward, done, _ = env.step(action)
next_state = np.reshape(next_state, [1, state_shape[0], state_shape[1], state_shape[2]])
state = next_state
if done:
break
env.close()
```
阅读全文