In order to obtain the Deep Learning network for the LR task, the architecture was first trained to classify the T- stage, with MCC = 0.863 on the training set (one-shot) and MCC = 0.279 on the test set, despite having used all the main techniques to avoid overfitting (e.g., use of Dropout Layers and Data Augmentation): here the small sample size of the dataset plays a critical role. 解释

时间: 2024-04-22 10:27:10 浏览: 144
这段话指的是为了得到用于LR任务的深度学习网络,首先训练了一个用于分类T-阶段的网络,该网络在训练集上的MCC(Matthews相关系数)为0.863(一次性训练),而在测试集上的MCC为0.279。尽管使用了所有主要的防止过拟合的技术(例如使用Dropout层和数据增强),但由于数据集样本量很小,这里的测试集效果不理想。
相关问题

Traditional network security situation prediction methods depend on the accuracy of historical situation value. Moreover, there are differences in correlation and importance among various network security factors. In order to solve these problems, a combined prediction model based on the temporal convolution attention network (TCAN) and bi-directional gate recurrent unit (BiGRU) network optimized by singular spectrum analysis (SSA) and improved quantum particle swarm optimization algorithm (IQPSO) was proposed. This model was first decomposed and reconstructed into a series of subsequences through the SSA of network security situation data. Next, a prediction model of TCAN-BiGRU was established for each subsequence, respectively. The TCN with relatively simple structure was used in the TCAN to extract features from the data. Besides, the improved channel attention mechanism (CAM) was used to extract important feature information from TCN. Afterwards, the before-after status of the learning situation value of the BiGRU neural network was used to extract more feature information from sequences for prediction. Meanwhile, an improved IQPSO was proposed to optimize the hyper-parameter of the BiGRU neural network. Finally, the prediction results of subsequence were superimposed to obtain the final predicted value. In the experiment, on the one hand, the IQPSO was compared with other optimization algorithms; and the results showed that the IQPSO has better optimization performance; on the other hand, the comparison with traditional prediction methods was performed through the simulation experiment and the established prediction model; and the results showed that the combined prediction model established has higher prediction accuracy.

答:基于时间卷积注意网络(TCAN)和双向门驱动单元(BiGRU)网络,结合奇异谱分析(SSA)和改进量子粒子群优化算法(IQPSO)的联合预测模型,解决了传统网络安全态势预测方法依赖于历史态势值的准确性,以及各网络安全因子之间的相关性和重要性差异问题,在实验中,一方面将IQPSO与其他优化算法进行比较,结果表明IQPSO具有更好的优化性能;另一方面,通过仿真实验和建立的预测模型与传统预测方法进行比较,结果表明所建立的联合预测模型具有更高的预测准确性。

检查下列语句的语法和拼写问题。Traditional network security situation prediction methods depend on the accuracy of historical situation value. Moreover, there are differences in correlation and importance among various network security factors. In order to solve these problems, a combined prediction model based on the temporal convolution attention network (TCAN) and bi-directional gate recurrent unit (BiGRU) network optimized by singular spectrum analysis (SSA) and improved quantum particle swarm optimization algorithm (IQPSO) was proposed. This model was first decomposed and reconstructed into a series of subsequences through the SSA of network security situation data. Next, a prediction model of TCAN-BiGRU was established for each subsequence, respectively. The TCN with relatively simple structure was used in the TCAN to extract features from the data. Besides, the improved channel attention mechanism (CAM) was used to extract important feature information from TCN. Afterwards, the before-after status of the learning situation value of the BiGRU neural network was used to extract more feature information from sequences for prediction. Meanwhile, an improved IQPSO was proposed to optimize the hyper-parameter of the BiGRU neural network. Finally, the prediction results of subsequence were superimposed to obtain the final predicted value. In the experiment, on the one hand, the IQPSO was compared with other optimization algorithms; and the results showed that the IQPSO has better optimization performance; on the other hand, the comparison with traditional prediction methods was performed through the simulation experiment and the established prediction model; and the results showed that the combined prediction model established has higher prediction accuracy.

我们可以看到,本文提出的综合预测模型基于时间卷积注意网络(TCAN)和双向门递归单元(BiGRU)网络,并通过奇异谱分析(SSA)和改进的量子粒子群优化算法(IQPSO)进行优化。实验表明,该模型具有更高的预测精度。
阅读全文

相关推荐

A. Encoding Network of PFSPNet The encoding network is divided into three parts. In the part I, RNN is adopted to model the processing time pij of job i on all machines, which can be converted into a fixed dimensional vector pi. In the part II, the number of machines m is integrated into the vector pi through the fully connected layer, and the fixed dimensional vector p˜i is output. In the part III, p˜i is fed into the convolution layer to improve the expression ability of the network, and the final output η p= [ η p1, η p2,..., η pn] is obtained. Fig. 2 illustrates the encoding network. In the part I, the modelling process for pij is described as follows, where WB, hij , h0 are k-dimensional vectors, h0, U, W, b and WB are the network parameters, and f() is the mapping from RNN input to hidden layer output. The main steps of the part I are shown as follows. Step 1: Input pij to the embedding layer and then obtain the output yij = WB pij ; Step 2: Input yi1 and h0 to the RNN and then obtain the hidden layer output hi1 = f(yi1,h0; U,W, b). Let p1 = h1m ; Step 3: Input yij and hi,j−1, j = 2, 3 ··· , m into RNN in turn, and then obtain the hidden layer output hij = f(yij ,hi,j−1; U,W, b), j = 2, 3 ··· , m. Let pi = him . In the part II, the number of machines m and the vector pi are integrated by the fully connected layer. The details are described as follows. WB and h˜i are d-dimensional vectors, WB W and ˜b are network parameters, and g() denotes the mapping from the input to the output of full connection layer. Step 1: Input the number of machines m to the embedding layer, and the output m = WB m is obtained。Step 2: Input m and pi to the fully connected layer and then obtain the output hi = g([m, pi];W, b); Step 3: Let pi = Relu(hi). In the part III, pi, i = 1, 2,...,n are input into onedimensional convolution layer. The final output vector η pi, i = 1, 2, ··· , n are obtained after the output of convolutional layer goes through the Relu layer.首先逐行仔细的分析此过程,其次怎么使用pytorch用EncoderNetwork类完全实现这个过程的所有功能和步骤

3.4 Pair Interaction Feature The interaction pattern between two individuals is encoded by a spatial descriptor with view invariant relative pose encoding. Given the 3D locations of two individual detec- tions zi,zj and two pose features pi,pj, we represent the pairwise relationship using view normalization, pose co-occurrence encoding, semantic compression and a spatial histogram (see Fig. 5 for illustration). The view normalization is performed by rotating the two people in 3D space by θ with respect to their midpoint, making their connecting line perpendicular to the cam- era view point. In this step, the pose features are also shifted accordingly (e.g. if θ = 45‘, shift 1 dimension with a cycle). Then, the co-occurrence feature is obtained by building a 2-dimensional matrix in which each element (r, c) corresponds to min(pi(r), pj (c)). Although the feature is view invariant, there are still elements in the matrix that deliver the same semantic concepts (e.g. left-left and right-right). To reduce such unnecessary variance and obtain a compact representation, we perform another transformation by multiplying a semantic compression matrix Sc to the vector form of the co-occurrence feature. The matrix Sc is learned offline by enumerating all possible configurations of view points and grouping the pairs that are equivalent when rotated by 180 degrees. Finally, we obtain the pair interaction descriptor by building a spatial histogram based on the 3D distance between the two (bin centers at 0.2, 0.6, 2.0 and 6.5 m). Here, we use linear interpolation similarly to contextual feature in Sec. 3.3. Given the interac- tion descriptor for each pair, we represent the interaction feature φxx(xi,xj) using the confidence value from an SVM classifier trained on a dictionary of interaction labels Y.什么意思

Algorithm 1: The online LyDROO algorithm for solving (P1). input : Parameters V , {γi, ci}Ni=1, K, training interval δT , Mt update interval δM ; output: Control actions 􏰕xt,yt􏰖Kt=1; 1 Initialize the DNN with random parameters θ1 and empty replay memory, M1 ← 2N; 2 Empty initial data queue Qi(1) = 0 and energy queue Yi(1) = 0, for i = 1,··· ,N; 3 fort=1,2,...,Kdo 4 Observe the input ξt = 􏰕ht, Qi(t), Yi(t)􏰖Ni=1 and update Mt using (8) if mod (t, δM ) = 0; 5 Generate a relaxed offloading action xˆt = Πθt 􏰅ξt􏰆 with the DNN; 6 Quantize xˆt into Mt binary actions 􏰕xti|i = 1, · · · , Mt􏰖 using the NOP method; 7 Compute G􏰅xti,ξt􏰆 by optimizing resource allocation yit in (P2) for each xti; 8 Select the best solution xt = arg max G 􏰅xti , ξt 􏰆 and execute the joint action 􏰅xt , yt 􏰆; { x ti } 9 Update the replay memory by adding (ξt,xt); 10 if mod (t, δT ) = 0 then 11 Uniformly sample a batch of data set {(ξτ , xτ ) | τ ∈ St } from the memory; 12 Train the DNN with {(ξτ , xτ ) | τ ∈ St} and update θt using the Adam algorithm; 13 end 14 t ← t + 1; 15 Update {Qi(t),Yi(t)}N based on 􏰅xt−1,yt−1􏰆 and data arrival observation 􏰙At−1􏰚N using (5) and (7). i=1 i i=1 16 end With the above actor-critic-update loop, the DNN consistently learns from the best and most recent state-action pairs, leading to a better policy πθt that gradually approximates the optimal mapping to solve (P3). We summarize the pseudo-code of LyDROO in Algorithm 1, where the major computational complexity is in line 7 that computes G􏰅xti,ξt􏰆 by solving the optimal resource allocation problems. This in fact indicates that the proposed LyDROO algorithm can be extended to solve (P1) when considering a general non-decreasing concave utility U (rit) in the objective, because the per-frame resource allocation problem to compute G􏰅xti,ξt􏰆 is a convex problem that can be efficiently solved, where the detailed analysis is omitted. In the next subsection, we propose a low-complexity algorithm to obtain G 􏰅xti, ξt􏰆. B. Low-complexity Algorithm for Optimal Resource Allocation Given the value of xt in (P2), we denote the index set of users with xti = 1 as Mt1, and the complementary user set as Mt0. For simplicity of exposition, we drop the superscript t and express the optimal resource allocation problem that computes G 􏰅xt, ξt􏰆 as following (P4) : maximize 􏰀j∈M0 􏰕ajfj/φ − Yj(t)κfj3􏰖 + 􏰀i∈M1 {airi,O − Yi(t)ei,O} (28a) τ,f,eO,rO 17 ,建立了什么模型

Please revise the paper:Accurate determination of bathymetric data in the shallow water zone over time and space is of increasing significance for navigation safety, monitoring of sea-level uplift, coastal areas management, and marine transportation. Satellite-derived bathymetry (SDB) is widely accepted as an effective alternative to conventional acoustics measurements over coastal areas with high spatial and temporal resolution combined with extensive repetitive coverage. Numerous empirical SDB approaches in previous works are unsuitable for precision bathymetry mapping in various scenarios, owing to the assumption of homogeneous bottom over the whole region, as well as the limitations of constructing global mapping relationships between water depth and blue-green reflectance takes no account of various confounding factors of radiance attenuation such as turbidity. To address the assumption failure of uniform bottom conditions and imperfect consideration of influence factors on the performance of the SDB model, this work proposes a bottom-type adaptive-based SDB approach (BA-SDB) to obtain accurate depth estimation over different sediments. The bottom type can be adaptively segmented by clustering based on bottom reflectance. For each sediment category, a PSO-LightGBM algorithm for depth derivation considering multiple influencing factors is driven to adaptively select the optimal influence factors and model parameters simultaneously. Water turbidity features beyond the traditional impact factors are incorporated in these regression models. Compared with log-ratio, multi-band and classical machine learning methods, the new approach produced the most accurate results with RMSE value is 0.85 m, in terms of different sediments and water depths combined with in-situ observations of airborne laser bathymetry and multi-beam echo sounder.

大家在看

recommend-type

GAMMA软件的InSAR处理流程.pptx

GAMMA软件的InSAR处理流程.pptx
recommend-type

podingsystem.zip_通讯编程_C/C++_

通信系统里面的信道编码中的乘积码合作编码visual c++程序
recommend-type

2020年10m精度江苏省土地覆盖土地利用.rar

2020年发布了空间分辨率为10米的2020年全球陆地覆盖数据,由大量的个GeoTIFF文件组成,该土地利用数据基于10m哨兵影像数据,使用深度学习方法制作做的全球土地覆盖数据。该数据集一共分类十类,分别如下所示:耕地、林地、草地、灌木、湿地、水体、灌木、不透水面(建筑用地))、裸地、雪/冰。我们通过官网下载该数据进行坐标系重新投影使原来墨卡托直角坐标系转化为WGS84地理坐标系,并根据最新的省市级行政边界进行裁剪,得到每个省市的土地利用数据。每个省都包含各个市的土地利用数据格式为TIF格式。坐标系为WGS84坐标系。
recommend-type

OFDM接收机的设计——ADC样值同步-OFDM通信系统基带设计细化方案

OFDM接收机的设计——ADC(样值同步) 修正采样频率偏移(SFC)。 因为FPGA的开发板上集成了压控振荡器(Voltage Controlled Oscillator,VCO),所以我们使用VOC来实现样值同步。具体算法为DDS算法。
recommend-type

轮轨接触几何计算程序-Matlab-2024.zip

MATLAB实现轮轨接触几何计算(源代码和数据) 数据输入可替换,输出包括等效锥度、接触点对、滚动圆半径差、接触角差等。 运行环境MATLAB2018b。 MATLAB实现轮轨接触几何计算(源代码和数据) 数据输入可替换,输出包括等效锥度、接触点对、滚动圆半径差、接触角差等。 运行环境MATLAB2018b。 MATLAB实现轮轨接触几何计算(源代码和数据) 数据输入可替换,输出包括等效锥度、接触点对、滚动圆半径差、接触角差等。 运行环境MATLAB2018b。 MATLAB实现轮轨接触几何计算(源代码和数据) 数据输入可替换,输出包括等效锥度、接触点对、滚动圆半径差、接触角差等。 运行环境MATLAB2018b。主程序一键自动运行。 MATLAB实现轮轨接触几何计算(源代码和数据) 数据输入可替换,输出包括等效锥度、接触点对、滚动圆半径差、接触角差等。 运行环境MATLAB2018b。主程序一键自动运行。 MATLAB实现轮轨接触几何计算(源代码和数据) 数据输入可替换,输出包括等效锥度、接触点对、滚动圆半径差、接触角差等。 运行环境MATLAB2018b。主程序一键自动运行。

最新推荐

recommend-type

解决IDEA错误 Cause: java.sql.SQLException: The server time zone value的问题

标题中的问题“Cause: java.sql.SQLException: The server time zone value”是Java开发者在使用IDEA(IntelliJ IDEA)进行数据库连接时常见的错误提示。这个错误通常发生在尝试连接到MySQL等SQL数据库时,由于...
recommend-type

STM32之光敏电阻模拟路灯自动开关灯代码固件

这是一个STM32模拟天黑天亮自动开关灯代码固件,使用了0.96寸OLED屏幕显示文字,例程亲测可用,视频示例可B站搜索 285902929
recommend-type

PHP在线工具箱源码站长引流+在线工具箱源码+多款有趣的在线工具+一键安装

PHP在线工具箱源码站长引流+在线工具箱源码+多款有趣的在线工具+一键安装 测试环境:nginx+php5.6+mysql5.5 安装说明:上传后访问安装即可
recommend-type

PageNow大数据可视化开发平台-开源版,基于SprigBoot+Vue构建的数据可视化开发平台,灵活的拖拽式布局、支持多种数据源、丰富的通用组件.zip

PageNow大数据可视化开发平台_开源版,基于SprigBoot+Vue构建的数据可视化开发平台,灵活的拖拽式布局、支持多种数据源、丰富的通用组件PageNow-基础开源版(基于SpringBoot+Vue构建的数据可视化开发平台)介绍基于SprigBoot+Vue构建的数据可视化开发平台,灵活的拖拽式布局、丰富的通用组件,帮助您快速构建与迭代数据大屏页面。基础开源版仅作为交流学习使用,基础开源版将于2021年3月1日开始维护正式更新。如需购买功能更加完善且完善的企业版,请前往官网进行查看并在线体验企业版。官方网站http://pagenow.cn内容结构服务器邮政程序源码web前端主程序源码(基于Vue-cli3.0为基础构建的项目结构)总体架构选择1、 SpringBoot 主架构框架2、 决赛 基于Db的数据库操作3、 德鲁伊 数据库连接池4、 Swagger2 接口测试框架5、 Maven 项目建设管理前端架构型1、 vue mvvm 框架2、 vue-router 路由管理3、 vuex 状态管理4、 axios HTTP
recommend-type

简化填写流程:Annoying Form Completer插件

资源摘要信息:"Annoying Form Completer-crx插件" Annoying Form Completer是一个针对Google Chrome浏览器的扩展程序,其主要功能是帮助用户自动填充表单中的强制性字段。对于经常需要在线填写各种表单的用户来说,这是一个非常实用的工具,因为它可以节省大量时间,并减少因重复输入相同信息而产生的烦恼。 该扩展程序的描述中提到了用户在填写表格时遇到的麻烦——必须手动输入那些恼人的强制性字段。这些字段可能包括但不限于用户名、邮箱地址、电话号码等个人信息,以及各种密码、确认密码等重复性字段。Annoying Form Completer的出现,使这一问题得到了缓解。通过该扩展,用户可以在表格填充时减少到“一个压力……或两个”,意味着极大的方便和效率提升。 值得注意的是,描述中也使用了“抽浏览器”的表述,这可能意味着该扩展具备某种数据提取或自动化填充的机制,虽然这个表述不是一个标准的技术术语,它可能暗示该扩展程序能够从用户之前的行为或者保存的信息中提取必要数据并自动填充到表单中。 虽然该扩展程序具有很大的便利性,但用户在使用时仍需谨慎,因为自动填充个人信息涉及到隐私和安全问题。理想情况下,用户应该只在信任的网站上使用这种类型的扩展程序,并确保扩展程序是从可靠的来源获取,以避免潜在的安全风险。 根据【压缩包子文件的文件名称列表】中的信息,该扩展的文件名为“Annoying_Form_Completer.crx”。CRX是Google Chrome扩展的文件格式,它是一种压缩的包格式,包含了扩展的所有必要文件和元数据。用户可以通过在Chrome浏览器中访问chrome://extensions/页面,开启“开发者模式”,然后点击“加载已解压的扩展程序”按钮来安装CRX文件。 在标签部分,我们看到“扩展程序”这一关键词,它明确了该资源的性质——这是一个浏览器扩展。扩展程序通常是通过增加浏览器的功能或提供额外的服务来增强用户体验的小型软件包。这些程序可以极大地简化用户的网上活动,从保存密码、拦截广告到自定义网页界面等。 总结来看,Annoying Form Completer作为一个Google Chrome的扩展程序,提供了一个高效的解决方案,帮助用户自动化处理在线表单的填写过程,从而提高效率并减少填写表单时的麻烦。在享受便捷的同时,用户也应确保使用扩展程序时的安全性和隐私性。
recommend-type

管理建模和仿真的文件

管理Boualem Benatallah引用此版本:布阿利姆·贝纳塔拉。管理建模和仿真。约瑟夫-傅立叶大学-格勒诺布尔第一大学,1996年。法语。NNT:电话:00345357HAL ID:电话:00345357https://theses.hal.science/tel-003453572008年12月9日提交HAL是一个多学科的开放存取档案馆,用于存放和传播科学研究论文,无论它们是否被公开。论文可以来自法国或国外的教学和研究机构,也可以来自公共或私人研究中心。L’archive ouverte pluridisciplinaire
recommend-type

CC-LINK远程IO模块AJ65SBTB1现场应用指南:常见问题快速解决

# 摘要 CC-LINK远程IO模块作为一种工业通信技术,为自动化和控制系统提供了高效的数据交换和设备管理能力。本文首先概述了CC-LINK远程IO模块的基础知识,接着详细介绍了其安装与配置流程,包括硬件的物理连接和系统集成要求,以及软件的参数设置与优化。为应对潜在的故障问题,本文还提供了故障诊断与排除的方法,并探讨了故障解决的实践案例。在高级应用方面,文中讲述了如何进行编程与控制,以及如何实现系统扩展与集成。最后,本文强调了CC-LINK远程IO模块的维护与管理的重要性,并对未来技术发展趋势进行了展望。 # 关键字 CC-LINK远程IO模块;系统集成;故障诊断;性能优化;编程与控制;维护
recommend-type

使用MATLAB写一个在柱坐标系中实现以下功能的代码:1) 生成具有损耗的平面电磁波模型;2) 调整电场分量Ex和Ey的幅度和相位,以仿真三种极化的形成?

在MATLAB中,处理电磁学问题通常需要利用`physconst`、`polar2cartesian`等函数库。以下是一个简化的示例,展示了如何生成一个基本的平面电磁波模型,并调整电场分量的幅度和相位。请注意,实际的损耗模型通常会涉及到复杂的阻抗和吸收系数,这里我们将简化为理想情况。 ```matlab % 初始化必要的物理常数 c = physconst('LightSpeed'); % 光速 omega = 2*pi * 5e9; % 角频率 (例如 GHz) eps0 = physconst('PermittivityOfFreeSpace'); % 真空介电常数 % 定义网格参数
recommend-type

TeraData技术解析与应用

资源摘要信息: "TeraData是一个高性能、高可扩展性的数据仓库和数据库管理系统,它支持大规模的数据存储和复杂的数据分析处理。TeraData的产品线主要面向大型企业级市场,提供多种数据仓库解决方案,包括并行数据仓库和云数据仓库等。由于其强大的分析能力和出色的处理速度,TeraData被广泛应用于银行、电信、制造、零售和其他需要处理大量数据的行业。TeraData系统通常采用MPP(大规模并行处理)架构,这意味着它可以通过并行处理多个计算任务来显著提高性能和吞吐量。" 由于提供的信息中描述部分也是"TeraData",且没有详细的内容,所以无法进一步提供关于该描述的详细知识点。而标签和压缩包子文件的文件名称列表也没有提供更多的信息。 在讨论TeraData时,我们可以深入了解以下几个关键知识点: 1. **MPP架构**:TeraData使用大规模并行处理(MPP)架构,这种架构允许系统通过大量并行运行的处理器来分散任务,从而实现高速数据处理。在MPP系统中,数据通常分布在多个节点上,每个节点负责一部分数据的处理工作,这样能够有效减少数据传输的时间,提高整体的处理效率。 2. **并行数据仓库**:TeraData提供并行数据仓库解决方案,这是针对大数据环境优化设计的数据库架构。它允许同时对数据进行读取和写入操作,同时能够支持对大量数据进行高效查询和复杂分析。 3. **数据仓库与BI**:TeraData系统经常与商业智能(BI)工具结合使用。数据仓库可以收集和整理来自不同业务系统的数据,BI工具则能够帮助用户进行数据分析和决策支持。TeraData的数据仓库解决方案提供了一整套的数据分析工具,包括但不限于ETL(抽取、转换、加载)工具、数据挖掘工具和OLAP(在线分析处理)功能。 4. **云数据仓库**:除了传统的本地部署解决方案,TeraData也在云端提供了数据仓库服务。云数据仓库通常更灵活、更具可伸缩性,可根据用户的需求动态调整资源分配,同时降低了企业的运维成本。 5. **高可用性和扩展性**:TeraData系统设计之初就考虑了高可用性和可扩展性。系统可以通过增加更多的处理节点来线性提升性能,同时提供了多种数据保护措施以保证数据的安全和系统的稳定运行。 6. **优化与调优**:对于数据仓库而言,性能优化是一个重要的环节。TeraData提供了一系列的优化工具和方法,比如SQL调优、索引策略和执行计划分析等,来帮助用户优化查询性能和提高数据访问效率。 7. **行业应用案例**:在金融、电信、制造等行业中,TeraData可以处理海量的交易数据、客户信息和业务数据,它在欺诈检测、客户关系管理、供应链优化等关键业务领域发挥重要作用。 8. **集成与兼容性**:TeraData系统支持与多种不同的业务应用和工具进行集成。它也遵循行业标准,能够与其他数据源、分析工具和应用程序无缝集成,为用户提供一致的用户体验。 以上便是关于TeraData的知识点介绍。由于文件描述内容重复且过于简略,未能提供更深层次的介绍,如果需要进一步详细的知识,建议参考TeraData官方文档或相关技术文章以获取更多的专业信息。
recommend-type

"互动学习:行动中的多样性与论文攻读经历"

多样性她- 事实上SCI NCES你的时间表ECOLEDO C Tora SC和NCESPOUR l’Ingén学习互动,互动学习以行动为中心的强化学习学会互动,互动学习,以行动为中心的强化学习计算机科学博士论文于2021年9月28日在Villeneuve d'Asq公开支持马修·瑟林评审团主席法布里斯·勒菲弗尔阿维尼翁大学教授论文指导奥利维尔·皮耶昆谷歌研究教授:智囊团论文联合主任菲利普·普雷教授,大学。里尔/CRISTAL/因里亚报告员奥利维耶·西格德索邦大学报告员卢多维奇·德诺耶教授,Facebook /索邦大学审查员越南圣迈IMT Atlantic高级讲师邀请弗洛里安·斯特鲁布博士,Deepmind对于那些及时看到自己错误的人...3谢谢你首先,我要感谢我的两位博士生导师Olivier和Philippe。奥利维尔,"站在巨人的肩膀上"这句话对你来说完全有意义了。从科学上讲,你知道在这篇论文的(许多)错误中,你是我可以依