没有合适的资源?快使用搜索试试~ 我知道了~
首页IRGAN:统合生成与判别式信息检索的博弈理论方法
IRGAN:统合生成与判别式信息检索的博弈理论方法
需积分: 0 0 下载量 51 浏览量
更新于2024-08-05
收藏 27.84MB PDF 举报
IRGAN (Iterative Refinement of Generative and Adversarial Networks for Information Retrieval) 是一篇深度整合生成式(Generative)和判别式(Discriminative)信息检索模型的开创性论文。在信息检索领域,传统的研究方法通常分为两派:生成式检索侧重于预测给定查询时的相关文档,而判别式检索则关注于预测查询与文档对之间的相关性。IRGAN通过提出一个博弈论的最小最大化游戏框架,实现了这两种理念的融合。 在这个模型中,生成式检索模型作为“攻击者”,它的目标是利用已标注和未标注数据中的信号,不断优化自身,以更准确地模拟文档与查询之间的潜在相关性分布。这要求生成模型不仅能够预测文档的表面特征,还要深入理解查询意图,从而提高检索结果的质量。与此同时,判别式模型作为“指导者”,它通过学习和评估生成模型的输出,提供反馈和挑战。它不仅要区分真实的相关文档和生成的假例,还要生成具有误导性的例子,以促使生成模型持续提升其生成能力。 IRGAN的关键创新在于它引入了对抗训练的概念,通过循环迭代的方式,生成模型和判别模型相互博弈、共同进步。这种训练策略不仅提高了模型的性能,还使得生成和判别两个模型能够互补,增强了信息检索系统的鲁棒性和泛化能力。IRGAN的研究成果对于理解和改进现代信息检索系统,特别是在处理大规模文本数据和复杂查询场景中,具有重要的理论和实践价值。
资源详情
资源推荐
document identier space) not their features, because our work
here intends to select relevant documents from a given do cument
pool. Note that it is feasible to generate new documents (features,
such as the value of BM25) by IRGAN, but to stay focused, we leave
it for future investigation.
Specically, while keeping the discriminator
f
(q, d)
xed af-
ter its maximisation in Eq. (1), we learn the generative model via
performing its minimisation:
⇤
= arg min
N
X
n=1
✓
E
d⇠p
true
(d |q
n
, r )
f
log (f
(d, q
n
))
g
+
E
d⇠p
(d |q
n
, r )
f
log(1 (f
(d, q
n
)))
g
◆
= arg max
N
X
n=1
E
d⇠p
(d |q
n
, r )
f
log(1 + exp(f
(d, q
n
)))
g
| {z }
denoted as
G
(q
n
)
, (4)
where for each query
q
n
we denote the objective function of the
generator as
G
(q
n
)
1
.
As the sampling of d is discrete, it cannot be directly optimised
by gradient descent as in the original GAN formulation. A common
approach is to use policy gradient based reinforcement learning
(REINFORCE) [42, 44]. Its gradient is derived as follows:
r
G
(q
n
)
= r
E
d⇠p
(d |q
n
, r )
f
log(1 + exp(f
(d, q
n
)))
g
=
M
X
i=1
r
p
(d
i
|q
n
, r ) log(1 + exp(f
(d
i
, q
n
)))
=
M
X
i=1
p
(d
i
|q
n
, r )r
logp
(d
i
|q
n
, r ) log(1 + exp(f
(d
i
, q
n
)))
= E
d⇠p
(d |q
n
, r )
f
r
logp
(d |q
n
, r ) log(1 + exp(f
(d, q
n
)))
g
'
1
K
K
X
k=1
r
logp
(d
k
|q
n
, r ) log(1 + exp(f
(d
k
, q
n
))) , (5)
where we perform a sampling approximation in the last step in
which
d
k
is the
k
-th document sampled from the current version
of generator
p
(d |q
n
, r )
. With reinforcement learning terminology,
the term
log(
1
+ exp(f
(d, q
n
)))
acts as the reward for the policy
p
(d |q
n
, r ) taking an action d in the environment q
n
[38].
In order to reduce variance during the REINFORCE learning, we
also replace the reward term
log(
1
+ exp(f
(d, q
n
)))
by its advan-
tage function:
log(1 + exp(f
(d, q
n
))) E
d⇠p
(d |q
n
, r )
f
log(1 + exp(f
(d, q
n
)))
g
,
where the term
E
d⇠p
(d |q
n
, r )
f
log(1 + exp(f
(d, q
n
)))
g
acts as the
baseline function in policy gradient [38].
e overall logic of our proposed IRGAN solution is summarised
in Algorithm 1. Before the adversarial training, the generator and
discriminator can be initialised by their conventional models. en
during the adversarial training stage, the generator and discrimina-
tor are trained alternatively via Eqs. (22) and (3).
1
Following [
13
],
E
d⇠p
(d |q
n
, r )
[log( (f
(d, q
n
)))]
is normally used instead for max-
imisation, which keeps the same xed point but provides more sucient gradient for
the generative model.
Algorithm 1 Minimax Game for IR (a.k.a IRGAN)
Input: generator p
(d |q, r ); discriminator f
(x
q
i
);
training dataset S =
{
x
}
1: Initialise p
(d |q, r ), f
(q, d ) with random weights , .
2: Pre-train p
(d |q, r ), f
(q, d ) using S
3: repeat
4: for g-steps do
5: p
(d |q, r ) generates K documents for each query q
6: Update generator parameters via policy gradient Eq. (22)
7: end for
8: for d-steps do
9:
Use current
p
(d |q, r )
to generate negative examples and com-
bine with given positive examples S
10: Train discriminator f
(q, d ) by Eq. (3)
11: end for
12: until IRGAN converges
2.2 Extension to Pairwise Case
In many IR problems, it is common that the labelled training data
available for learning to rank are not a set of relevant documents
but a set of ordered document pairs for each query, as it is oen
easier to capture users’ relative preference judgements on a pair of
documents than their absolute relevance judgements on individual
documents (e.g., from a search engine’s click-through log) [
19
].
Furthermore, if we use graded relevance scales (indicating a varying
degree of match between each document and the corresponding
query) rather than binary relevance, the training data could also be
represented naturally as ordered document pairs.
Here we show that our proposed IRGAN framework would also
work in such a pairwise seing for learning to rank. For each query
q
n
, we have a set of labelled document pairs
R
n
= {hd
i
, d
j
i|d
i
d
j
}
where
d
i
d
j
means that
d
i
is more relevant to
q
n
than
d
j
.As
in Section 2.1, we let p
(d |q, r ) and f
(q, d) denote the generative
retrieval model and the discriminative retrieval model respectively.
e generator
G
would try to generate document pairs that are
similar to those in
R
n
, i.e., with the correct ranking. e discrimi-
nator
D
would try to distinguish such generated document pairs
from those real document pairs. e probability that a document
pair
hd
u
, d
i
being correctly ranked can be estimated by the dis-
criminative retrieval model through a sigmoid function:
D(hd
u
, d
i|q) = (f
(d
u
, q) f
(d
, q))
=
exp(f
(d
u
, q) f
(d
, q))
1 + exp(f
(d
u
, q) f
(d
, q))
=
1
1 + exp( z)
, (6)
where
z = f
(d
u
, q) f
(d
, q)
. Note that
log D(hd
u
, d
i|q) =
log(
1
+ exp(z))
is exactly the pairwise ranking loss function used
by the learning to rank algorithm RankNet [
3
]. In addition to the lo-
gistic function
log(
1
+ exp(z))
, it is possible to make use of other
pairwise ranking loss functions [
7
], such as the hinge function
(
1
z)
+
(as used in Ranking SVM [
16
]) and the exponential func-
tion
exp(z)
(as used in RankBoost [
11
]), to dene the probability
D(hd
u
, d
i|q).
If we use the standard cross entropy cost for this binary classier
as before, we have the following minimax game:
G
⇤
, D
⇤
= min
max
N
X
n=1
✓
E
o⇠p
true
(o |q
n
)
[
log D(o|q
n
)
]
+ (7)
E
o
0
⇠p
(o
0
|q
n
)
⇥
log(1 D(o
0
|q
n
))
⇤
◆
,
3
剩余11页未读,继续阅读
zh222333
- 粉丝: 34
- 资源: 296
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- ExtJS 2.0 入门教程与开发指南
- 基于TMS320F2812的能量回馈调速系统设计
- SIP协议详解:RFC3261与即时消息RFC3428
- DM642与CMOS图像传感器接口设计与实现
- Windows Embedded CE6.0安装与开发环境搭建指南
- Eclipse插件开发入门与实践指南
- IEEE 802.16-2004标准详解:固定无线宽带WiMax技术
- AIX平台上的数据库性能优化实战
- ESXi 4.1全面配置教程:从网络到安全与实用工具详解
- VMware ESXi Installable与vCenter Server 4.1 安装步骤详解
- TI MSP430超低功耗单片机选型与应用指南
- DOS环境下的DEBUG调试工具详细指南
- VMware vCenter Converter 4.2 安装与管理实战指南
- HP QTP与QC结合构建业务组件自动化测试框架
- JsEclipse安装配置全攻略
- Daubechies小波构造及MATLAB实现
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功