Algorithm 1: The online LyDROO algorithm for solving (P1). input : Parameters V , {γi, ci}Ni=1, K, training interval δT , Mt update interval δM ; output: Control actions xt,ytKt=1; 1 Initialize the DNN with random parameters θ1 and empty replay memory, M1 ← 2N; 2 Empty initial data queue Qi(1) = 0 and energy queue Yi(1) = 0, for i = 1,··· ,N; 3 fort=1,2,...,Kdo 4 Observe the input ξt = ht, Qi(t), Yi(t)Ni=1 and update Mt using (8) if mod (t, δM ) = 0; 5 Generate a relaxed offloading action xˆt = Πθt ξt with the DNN; 6 Quantize xˆt into Mt binary actions xti|i = 1, · · · , Mt using the NOP method; 7 Compute Gxti,ξt by optimizing resource allocation yit in (P2) for each xti; 8 Select the best solution xt = arg max G xti , ξt and execute the joint action xt , yt ; { x ti } 9 Update the replay memory by adding (ξt,xt); 10 if mod (t, δT ) = 0 then 11 Uniformly sample a batch of data set {(ξτ , xτ ) | τ ∈ St } from the memory; 12 Train the DNN with {(ξτ , xτ ) | τ ∈ St} and update θt using the Adam algorithm; 13 end 14 t ← t + 1; 15 Update {Qi(t),Yi(t)}N based on xt−1,yt−1 and data arrival observation At−1N using (5) and (7). i=1 i i=1 16 end With the above actor-critic-update loop, the DNN consistently learns from the best and most recent state-action pairs, leading to a better policy πθt that gradually approximates the optimal mapping to solve (P3). We summarize the pseudo-code of LyDROO in Algorithm 1, where the major computational complexity is in line 7 that computes Gxti,ξt by solving the optimal resource allocation problems. This in fact indicates that the proposed LyDROO algorithm can be extended to solve (P1) when considering a general non-decreasing concave utility U (rit) in the objective, because the per-frame resource allocation problem to compute Gxti,ξt is a convex problem that can be efficiently solved, where the detailed analysis is omitted. In the next subsection, we propose a low-complexity algorithm to obtain G xti, ξt. B. Low-complexity Algorithm for Optimal Resource Allocation Given the value of xt in (P2), we denote the index set of users with xti = 1 as Mt1, and the complementary user set as Mt0. For simplicity of exposition, we drop the superscript t and express the optimal resource allocation problem that computes G xt, ξt as following (P4) : maximize j∈M0 ajfj/φ − Yj(t)κfj3 + i∈M1 {airi,O − Yi(t)ei,O} (28a) τ,f,eO,rO 17 ,建立了什么模型
时间: 2023-06-16 15:05:49 浏览: 193
java.lang.RuntimeException: Unsupported algorithm: HmacSHA1解决方法
这段文字并没有建立具体的模型,而是介绍了LyDROO算法的伪代码以及解决问题(P1)和(P4)的方法。该算法是一种在线学习算法,利用深度神经网络来近似最优策略,并通过解决最优资源分配问题来求解目标函数。具体而言,算法主要分为两个部分:第一个部分是通过DNN生成松弛的卸载决策,然后使用NOP方法对其进行量化;第二个部分是针对量化后的决策,通过解决最优资源分配问题来选择最优的决策。整个算法的复杂度主要在第七行,即计算最优资源分配问题的部分,但该问题是一个凸优化问题,可以高效地解决。
阅读全文