input : Parameters V , {γi, ci}Ni=1, K, training interval δT , Mt update interval δM ; output: Control actions xt,ytKt=1; 1 Initialize the DNN with random parameters θ1 and empty replay memory, M1 ← 2N; 2 Empty initial data queue Qi(1) = 0 and energy queue Yi(1) = 0, for i = 1,··· ,N; 3 fort=1,2,...,Kdo 4 Observe the input ξt = ht, Qi(t), Yi(t)Ni=1 and update Mt using (8) if mod (t, δM ) = 0; 5 Generate a relaxed offloading action xˆt = Πθt ξt with the DNN; 6 Quantize xˆt into Mt binary actions xti|i = 1, · · · , Mt using the NOP method; 7 Compute Gxti,ξt by optimizing resource allocation yit in (P2) for each xti; 8 Select the best solution xt = arg max G xti , ξt and execute the joint action xt , yt ; { x ti } 9 Update the replay memory by adding (ξt,xt); 10 if mod (t, δT ) = 0 then 11 Uniformly sample a batch of data set {(ξτ , xτ ) | τ ∈ St } from the memory; 12 Train the DNN with {(ξτ , xτ ) | τ ∈ St} and update θt using the Adam algorithm; 13 end 14 t ← t + 1; 15 Update {Qi(t),Yi(t)}N based on xt−1,yt−1 and data arrival observation At−1N using (5) and (7). i=1 i i=1 16 end无优化的drl怎么提现
时间: 2023-06-16 15:06:13 浏览: 204
82c54CHMOS PROGRAMMABLE INTERVAL TIMER英文介绍
在无优化的 DRL 中,控制策略是由深度神经网络(DNN)直接生成的,而不是通过优化算法进行优化。这意味着在每个时间步骤,DNN 会根据当前的状态和过去的经验生成一组控制动作,然后选择最优的动作执行。由于没有优化算法的参与,无优化的 DRL 通常会受到训练过程中的局部最优问题的影响,并且可能无法达到最优解。因此,与优化算法结合使用的 DRL 方法通常能够更好地解决控制问题。
阅读全文