joint action
时间: 2023-07-25 16:28:11 浏览: 46
Joint action(协同行动)是指两个或多个个体协同合作以完成某个任务或目标的过程。在人类社会中,许多活动都需要进行协同行动,例如团队运动、音乐演奏、合作学习等等。在人工智能领域,研究者们也在探索如何让多个智能体进行协同行动以完成某些任务,例如多智能体强化学习、多智能体协作等等。在这些场景中,每个智能体都需要根据自身的知识和能力与其他智能体进行合作,以达到共同的目标。协同行动需要智能体之间相互协调和沟通,以便在没有中央控制的情况下实现目标。
相关问题
翻译Agent 𝑐 𝑖 . In this paper, we regard each charging station 𝑐 𝑖 ∈ 𝐶 as an individual agent. Each agent will make timely recommendation decisions for a sequence of charging requests 𝑄 that keep coming throughout a day with multiple long-term optimization goals. Observation 𝑜 𝑖 𝑡 . Given a charging request 𝑞𝑡 , we define the observation 𝑜 𝑖 𝑡 of agent 𝑐 𝑖 as a combination of the index of 𝑐 𝑖 , the real-world time 𝑇𝑡 , the number of current avail able charging spots of 𝑐 𝑖 (supply), the number of charging requests around 𝑐 𝑖 in the near future (future demand), the charging power of 𝑐 𝑖 , the estimated time of arrival (ETA) from location 𝑙𝑡 to 𝑐 𝑖 , and the CP of 𝑐 𝑖 at the next ETA. We further define 𝑠𝑡 = {𝑜 1 𝑡 , 𝑜2 𝑡 , . . . , 𝑜𝑁 𝑡 } as the state of all agents at step 𝑡. Action 𝑎 𝑖 𝑡 . Given an observation 𝑜 𝑖 𝑡 , an intuitional design for the action of agent𝑐 𝑖 is a binary decision, i.e., recommending 𝑞𝑡 to itself for charging or not. However, because one 𝑞𝑡 can only choose one station for charging, multiple agents’ actions may be tied together and are difficult to coordinate. Inspired by the bidding mechanism, we design each agent 𝑐 𝑖 offers a scalar value to "bid" for 𝑞𝑡 as its action 𝑎 𝑖 𝑡 . By defining 𝑢𝑡 = {𝑎 1 𝑡 , 𝑎2 𝑡 , . . . , 𝑎𝑁 𝑡 } as the joint action, 𝑞𝑡 will be recommended to the agent with the highest "bid" value, i.e., 𝑟𝑐𝑡 = 𝑐 𝑖 , where 𝑖 = arg max(𝑢𝑡)
`!=` 是 Java 中的一种比较运算符,用于判断两个值是否不相等。而 `java.lang.Long` 是 Java 中的一个类,用于表示长整型数值。
如果要比较两个对象是否相等,应该使用 `equals()` 方法,而不是比较运算符。例如,要判断两个 `java.lang.Long` 类型的对象 `a` 和 `b` 是否相等,可以使用以下方式:
```
if (a.equals(b)) {
// a 和 b 相等
} else {
// a 和 b 不相等
}
```
如果要判断两个 `long` 类型的基本数据类型是否相等,可以使用比较运算符 `==` 或 `!=`。例如:
```
long a = 123;
long b = 456;
if (a != b) {
// a 和 b 不相等
} else {
// a 和 b 相等
}
```
input : Parameters V , {γi, ci}Ni=1, K, training interval δT , Mt update interval δM ; output: Control actions xt,ytKt=1; 1 Initialize the DNN with random parameters θ1 and empty replay memory, M1 ← 2N; 2 Empty initial data queue Qi(1) = 0 and energy queue Yi(1) = 0, for i = 1,··· ,N; 3 fort=1,2,...,Kdo 4 Observe the input ξt = ht, Qi(t), Yi(t)Ni=1 and update Mt using (8) if mod (t, δM ) = 0; 5 Generate a relaxed offloading action xˆt = Πθt ξt with the DNN; 6 Quantize xˆt into Mt binary actions xti|i = 1, · · · , Mt using the NOP method; 7 Compute Gxti,ξt by optimizing resource allocation yit in (P2) for each xti; 8 Select the best solution xt = arg max G xti , ξt and execute the joint action xt , yt ; { x ti } 9 Update the replay memory by adding (ξt,xt); 10 if mod (t, δT ) = 0 then 11 Uniformly sample a batch of data set {(ξτ , xτ ) | τ ∈ St } from the memory; 12 Train the DNN with {(ξτ , xτ ) | τ ∈ St} and update θt using the Adam algorithm; 13 end 14 t ← t + 1; 15 Update {Qi(t),Yi(t)}N based on xt−1,yt−1 and data arrival observation At−1N using (5) and (7). i=1 i i=1 16 end无优化的drl怎么提现
在无优化的 DRL 中,控制策略是由深度神经网络(DNN)直接生成的,而不是通过优化算法进行优化。这意味着在每个时间步骤,DNN 会根据当前的状态和过去的经验生成一组控制动作,然后选择最优的动作执行。由于没有优化算法的参与,无优化的 DRL 通常会受到训练过程中的局部最优问题的影响,并且可能无法达到最优解。因此,与优化算法结合使用的 DRL 方法通常能够更好地解决控制问题。
相关推荐
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![whl](https://img-home.csdnimg.cn/images/20210720083646.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)