移动位置感知的Top-k Pub/Sub系统：Lamps在地理位置应用中的实证研究

下载需积分: 0 | PDF格式 | 2.39MB | 更新于2024-08-05 | 78 浏览量 | 举报

本文标题为“2-丁天琛-Lamps_Location-Aware_Moving_Top-k_PubSub1”，主要关注于一种创新的信息推送系统设计——Lamps（Location-Aware Moving Top-k Pub/Sub）。随着地理位置标记的社交媒体内容（如推特）的爆炸性增长，对基于位置的推荐和定向搜索等应用的需求也随之增加。这些应用往往需要支持移动用户订阅的前k个与他们当前位置相关的时空文本信息（如广告）。 Lamps系统的关键在于其针对移动用户的订阅服务，允许用户在移动过程中实时获取与他们所在位置相关的最相关或热门的top-k内容。这涉及到实时地理位置跟踪、数据处理和高效的信息过滤。系统的核心挑战是如何在大规模时空数据中快速定位并推送最具吸引力的内容，同时考虑到用户的位置移动性和兴趣变化。作者Shunya Nishio、Daichi Amagata（均为IEEE会员）和Takahiro Hara（IEEE高级会员）提出了一个新颖的解决方案，它可能包括以下几个关键技术点： 1. **实时地理位置感知**：系统需要实时跟踪用户的移动轨迹，这可能通过GPS或其他定位技术实现，确保信息推送的即时性。 2. **移动订阅模型**：用户可以根据自身的移动速度和方向动态调整订阅范围，以获取更精确的相关信息。 3. **兴趣模型**：结合用户历史行为和偏好，构建个性化兴趣模型，以便推送符合用户当前需求的top-k内容。 4. **高效数据处理和索引**：设计高效的算法和数据结构来处理和存储大量地理位置标记的数据，以便快速检索和排序。 5. **移动计算和边缘计算**：利用移动设备的计算能力，或者在接近用户的地方进行数据处理，减少网络延迟和带宽消耗。 6. **动态更新和内容分发**：当用户位置改变或新内容发布时，系统能够迅速更新和重新计算推荐结果，保持信息的新鲜度。 7. **隐私保护**：在提供个性化服务的同时，确保用户位置信息的隐私得到合理保护，符合相关法律法规的要求。 8. **可扩展性和容错性**：系统设计应具备良好的扩展性，以应对未来更大的用户量和更复杂的数据流，同时也要考虑在网络故障或服务中断时的容错处理。该研究工作发表于《IEEE Transactions on Knowledge and Data Engineering》杂志，尚未经过最终编辑，但已经接受了在未来期刊上发表。文章的引用信息为DOI:10.1109/TKDE.2020.2979176，表明这项成果对于实时移动推荐系统的研究具有重要意义。

1041-4347 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2020.2979176, IEEE

Transactions on Knowledge and Data Engineering

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 14, NO. 8, AUGUST 2015 3

Given an object o and s, the score of o for s, scor e(s, o), is

calculated as follows [10], [13]:

score(s, o) = s.α · dist(s.p, o.p) + (1 − s.α ) · text(s.t, o.t)

(1)

where dist(s.p, o.p) is the spatial proximity and

text(s.t, o.t) is the textual similarity between s and o.

Given an object set O, the answer of s is a subset of O,

A, such that (1) |A| = k , and (2) ∀o

∗

∈ A, ∀o

′

∈ O\A,

score(s, o

∗

) ≤ score(s, o

′

)

In the following of this paper, we abbreviate s.k and s.α as

k and α, respectively, if there is no ambiguity.

To compute spatial proximity, we utilize the Euclidean

distance dist(s.p, o.p) =

Edist(s.p,o.p)

Maxdist

as with [3], [4], [5],

[10], [14], [15], [16]

. Edist(s.p, o.p) is the Euclidean distance

between s.p and o.p, and M axdist is the maximum distance

in the spatial area R

where subscriptions and objects exist.

We see that dist(·, ·) ∈ [0, 1]. For textual similarity, there

are three widely used set-based similarity functions, namely

Jaccard, Dice, and Cosine similarities [18]. Lamps can em-

ploy any similarity function that uniquely determines the

textual similarity by a given subscription and an object.

(Rev-2) In this paper, we use Jaccard similarity as the default

function, because it is usually used in set similarity search

[19], i.e., text(s.t, o.t) = 1 −

|s.t∩o.t|

|s.t∪o.t|

. Note that in Section 7.3

we verify the performance of Lamps when Dice and Cosine

similarities are employed. Then, score(s, o) is within [0, 1].

Problem statement. We consider a set of MkST subscrip-

tions S and a dynamic set O of spatio-textual objects. Sub-

scribers can move randomly at any time (i.e., movements

of users) [4]. Publishers can insert their newly generated

objects into O (i.e., object generation) and remove objects

that they have generated from O (i.e., object expiration).

We aim to continuously monitor the top-k results for all

subscriptions against object generation, object expiration,

and movements of users.

Example 2. Fig. 2 shows a running example used throughout

this paper. In this example, there are ten registered

subscriptions {s

, · · · , s

} and eight objects have been

generated {o

, · · · , o

}. Moreover, two objects o

and

are newly generated, and we assume s

.k = 2.

Speciﬁcally, o

and o

have high spatial and textual

similarities to s

. Thus, the top-k results of s

are o

and

2.2 Safe region technique

Lamps employs the concept of the safe region [12] to mon-

itor top-k results. Therefore, we ﬁrst deﬁne the safe region

and a relevant concept, the dominant region.

1. As with [6], [13], to guarantee the top-k results are textual-relevant,

an object must contain at least one common keyword with a subscrip-

tion to become its top-k result.

2. The works in [11], [17] utilized the road network distance to

compute spatial proximity. Because the computaion cost of measureing

road network distance is higher than that of measureing Eucllidean

distance, we utilize the Euclidean distance to monitor the top-k results

of more subscriptions. However, Lamps can employ the road network

distance. In Section 6, we discuss the modiﬁcations needed when the

road network distance is used.

 Keywords









 



















 











 



 



 











 











 











 



 



















 























































 Keywords









 











 



 











 



 











 



 











 











 



 











 



















 



 



 











 



 



: subscription : object









































 



 



   

 



 



   

Newly generated.

Fig. 2: Running example. Eight objects {o

, · · · , o

} have

been generated and two objects o

and o

are newly gener-

ated.

Deﬁnition 3 (Safe region). Given O, s = (p, t, k, α), and A of

s, the safe region of s, R, is:

R = {p

′

|∀o

∗

∈ A, ∀o

′

∈ O\A, score(s

′

, o

∗

) ≤ score(s

′

, o

′

)}

where s

′

is s after moving to a new location p

′

, i.e., s

′

, t, k, α ).

Note that the safe region of s is a region where the top-k

result of s does not change.

Deﬁnition 4 (Dominant region). Given s = (p, t, k, α) and

two objects o

∗

and o

′

, the dominant region of o

∗

to o

′

, D

∗

′

is:

∗

′

= {p

′

|score(s

′

, o

∗

) ≤ score(s

′

, o

′

)}

where s

′

is s after moving to a new location p

′

, i.e., s

′

, t, k, α ).

We see that the dominant region of o

∗

to o

′

is a region where

score(s, o

∗

) ≤ score(s, o

′

). The following lemma is showed

and proved in literature [10].

Lemma 1. Given O, s, and A, the safe region of s, R, is R =

∩

∗

∈A

(∩

′

∈O\A

∗

′

Local safe region. To efﬁciently compute a safe region, lit-

erature [10] proposed a local safe region (LSR), which is a

subset of the safe region. Notice that we can monitor the

exact top-k results by re-evaluating them only when users

exit their LSR. [10] models each object o as a circle C

with

a center o.p and radius of r

1−α

· text(s.t, o.t). Then,

score(s, o) = α(dist(s.p, o.p) + r

). The LSR is computed in

the following three steps.

Step 1. For each object o

∗

∈ A, we compute an ellipse E

∗

and the intersection of these ellipses E = ∩

∗

∈A

∗

. Note

that

∗

= {p

′

|dist(s

′

, o

∗

.p) + dist(s.p, s

′

) ≤ γ − r

∗

}, (2)

where γ = max{dist(s.p, o

∗

.p) + r

∗

∈ A} + ∆ and ∆

is a parameter of an approximate ratio. Obviously E

∗

is an

ellipse with s.p and o

∗

.p as two foci.

Example 3. In Fig. 3, because s

.k = 2 and the top-k results

of s

are o

and o

, E is the intersection of E

and E

Step 2. Let C

be a circle centered at s.p with radius γ.

When s

′

∈ E, each object o does not become the top-k

result of s

′

if C

is not inside C

. If C

is not inside C

γ ≤ dist(s.p, o.p) + r

. Then, score(s

′

, o

∗

) < score(s

′

, o) for

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 22,2020 at 12:24:04 UTC from IEEE Xplore. Restrictions apply.

剩余13页未读，继续阅读