IRGAN：统合生成与判别式信息检索的博弈理论方法

需积分: 0 51 浏览量更新于2024-08-05 收藏 27.84MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

资源详情

资源推荐

document identier space) not their features, because our work

here intends to select relevant documents from a given do cument

pool. Note that it is feasible to generate new documents (features,

such as the value of BM25) by IRGAN, but to stay focused, we leave

it for future investigation.

Specically, while keeping the discriminator



(q, d)

xed af-

ter its maximisation in Eq. (1), we learn the generative model via

performing its minimisation:



⇤

= arg min



n=1

✓

d⇠p

true

(d |q

, r )

log  (f



(d, q

))

d⇠p



(d |q

, r )

log(1   (f



(d, q

)))

◆

= arg max



n=1

d⇠p



(d |q

, r )

log(1 + exp(f



(d, q

)))

| {z }

denoted as 

)

, (4)

where for each query

we denote the objective function of the

generator as 

)

As the sampling of d is discrete, it cannot be directly optimised

by gradient descent as in the original GAN formulation. A common

approach is to use policy gradient based reinforcement learning

(REINFORCE) [42, 44]. Its gradient is derived as follows:





)

= r



d⇠p



(d |q

, r )

log(1 + exp(f



(d, q

)))

i=1



, r ) log(1 + exp(f



, q

)))

i=1



, r )r



logp



, r ) log(1 + exp(f



, q

)))

= E

d⇠p



(d |q

, r )



logp



(d |q

, r ) log(1 + exp(f



(d, q

)))

k=1



logp



, r ) log(1 + exp(f



, q

))) , (5)

where we perform a sampling approximation in the last step in

which

is the

-th document sampled from the current version

of generator



(d |q

, r )

. With reinforcement learning terminology,

the term

log(

+ exp(f



(d, q

)))

acts as the reward for the policy



(d |q

, r ) taking an action d in the environment q

[38].

In order to reduce variance during the REINFORCE learning, we

also replace the reward term

log(

+ exp(f



(d, q

)))

by its advan-

tage function:

log(1 + exp(f



(d, q

)))  E

d⇠p



(d |q

, r )

log(1 + exp(f



(d, q

)))

where the term

d⇠p



(d |q

, r )

log(1 + exp(f



(d, q

)))

acts as the

baseline function in policy gradient [38].

e overall logic of our proposed IRGAN solution is summarised

in Algorithm 1. Before the adversarial training, the generator and

discriminator can be initialised by their conventional models. en

during the adversarial training stage, the generator and discrimina-

tor are trained alternatively via Eqs. (22) and (3).

Following [

d⇠p



(d |q

, r )

[log( (f



(d, q

)))]

is normally used instead for max-

imisation, which keeps the same xed point but provides more sucient gradient for

the generative model.

Algorithm 1 Minimax Game for IR (a.k.a IRGAN)

Input: generator p



(d |q, r ); discriminator f



);

training dataset S =

{

}

1: Initialise p



(d |q, r ), f



(q, d ) with random weights , .

2: Pre-train p



(d |q, r ), f



(q, d ) using S

3: repeat

4: for g-steps do

5: p



(d |q, r ) generates K documents for each query q

6: Update generator parameters via policy gradient Eq. (22)

7: end for

8: for d-steps do

Use current



(d |q, r )

to generate negative examples and com-

bine with given positive examples S

10: Train discriminator f



(q, d ) by Eq. (3)

11: end for

12: until IRGAN converges

2.2 Extension to Pairwise Case

In many IR problems, it is common that the labelled training data

available for learning to rank are not a set of relevant documents

but a set of ordered document pairs for each query, as it is oen

easier to capture users’ relative preference judgements on a pair of

documents than their absolute relevance judgements on individual

documents (e.g., from a search engine’s click-through log) [

Furthermore, if we use graded relevance scales (indicating a varying

degree of match between each document and the corresponding

query) rather than binary relevance, the training data could also be

represented naturally as ordered document pairs.

Here we show that our proposed IRGAN framework would also

work in such a pairwise seing for learning to rank. For each query

, we have a set of labelled document pairs

= {hd

, d

i|d

 d

}

where

 d

means that

is more relevant to

than

.As

in Section 2.1, we let p



(d |q, r ) and f



(q, d) denote the generative

retrieval model and the discriminative retrieval model respectively.

e generator

would try to generate document pairs that are

similar to those in

, i.e., with the correct ranking. e discrimi-

nator

would try to distinguish such generated document pairs

from those real document pairs. e probability that a document

pair

, d



being correctly ranked can be estimated by the dis-

criminative retrieval model through a sigmoid function:

D(hd

, d



i|q) =  (f



, q)  f





, q))

exp(f



, q)  f





, q))

1 + exp(f



, q)  f





, q))

1 + exp( z)

, (6)

where

z = f



, q)  f





, q)

. Note that

 log D(hd

, d



i|q) =

log(

+ exp(z))

is exactly the pairwise ranking loss function used

by the learning to rank algorithm RankNet [

]. In addition to the lo-

gistic function

log(

+ exp(z))

, it is possible to make use of other

pairwise ranking loss functions [

], such as the hinge function

(

 z)

(as used in Ranking SVM [

]) and the exponential func-

tion

exp(z)

(as used in RankBoost [

]), to dene the probability

D(hd

, d



i|q).

If we use the standard cross entropy cost for this binary classier

as before, we have the following minimax game:



⇤

, D

⇤

= min



max



n=1

✓

o⇠p

true

(o |q

)

[

log D(o|q

)

]

+ (7)

⇠p



)

⇥

log(1  D(o

))

⇤

◆

剩余11页未读，继续阅读

zh222333

粉丝: 34
资源: 296

IRGAN：统合生成与判别式信息检索的博弈理论方法

Alpha-Beta-pruing-and-Minimax-Algorithms-implementation-in-determining-next-move-for-game-Mancala

JSP学生学籍管理系统设计与实现(源代码+论文+开题报告+外文翻译+答辩PPT).zip

省市区数据，完成三级联动，选择地区

机械原理课程设计网球自动捡球机.doc

2024秋招华为笔试题大全-仅供参考具体需要根据实际修改

借助于Ascend310 AI处理器完成深度学习算法部署任务，

基于深度学习的物体识别与抓取方法，六自由度机械臂，python编写程序.zip

电子设计论文文档资料液面检测器电子设计论文文档资料液面检测器

基于深度学习的多特征电力负荷预测.zip

CenterNet 部署版本，便于移植不同平台（onnx、tensorRT、rknn、Horizon）

c#代码介绍23种设计模式-03工厂模式(附代码)

基于深度学习对法国租界地黑白照片上色模型.zip

集团公司战略管理制度.doc

JSP学生网上选课系统设计(源代码+论文+答辩PPT).zip

智能翻译官cpc-bd07-20752777288491826.exe

C++课程设计之变量和类型.pdf

【4层】办公楼全套设计（2400平左右，含计算书，施工组织设计，横道图，平面布置图，建筑图，+结构图）.zip

机械设计课程设计一级减速器.doc

基于MATLAB实现车辆运动目标跟踪检测带GUI界面源码..zip

【7层】4119平米框架办公楼毕业设计（计算书、部分建筑、结构图纸）.zip

最新资源