利用模板对抗OSN垃圾邮件：在线社交网络的防spam策略

研究论文

179 浏览量更新于2024-08-26 收藏 2.73MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"击败人为的混乱：使用其自己的模板与OSN垃圾邮件作斗争" 这篇研究论文探讨了在线社交网络（OSNs）中的垃圾邮件问题，以及如何利用模板识别技术来对抗这种现象。随着互联网用户的增加，OSNs如Facebook、Twitter等变得极其流行，但同时也成为了垃圾信息的滋生地。这些来自朋友或熟人的垃圾信息不仅破坏了用户的网络体验，还可能对网络安全意识较弱的用户造成损害。以往的反垃圾邮件策略从不同角度进行防御，但由于垃圾信息的多样性，很难有单一方法能够独立检测到大部分或大多数的OSN垃圾信息。该论文作者通过实证分析发现，收集到的大规模OSN垃圾信息中有很大一部分（例如2015年高达76.4%）是基于某种底层模板生成的。这一发现为打击OSN垃圾邮件提供了新的思路。基于这个分析，论文提出了名为"Tangram"的OSN垃圾邮件过滤系统。Tangram旨在对用户生成的消息流进行实时检查。系统会提取现有方法检测到的垃圾信息的模板，并利用这些模板来识别和过滤新出现的相似垃圾信息。这种方法的优点在于，它可以针对不断变化的垃圾信息模式进行适应，提高检测效率。 Tangram的工作机制可能包括以下几个关键步骤： 1. **模板提取**：当检测到垃圾信息时，系统会分析其文本结构，提取出用于生成这些信息的模板。 2. **模板库构建**：将提取出的模板存储在模板库中，作为后续识别的参考。 3. **实时监测**：对用户产生的新消息进行实时扫描，与模板库中的模板进行匹配。 4. **动态更新**：随着时间推移，系统会不断学习和更新模板库，以应对新的垃圾信息策略。 5. **误报和漏报控制**：为了减少误报（误将正常信息标记为垃圾信息）和漏报（未能识别垃圾信息），系统可能采用机器学习算法优化模板匹配规则，并结合用户反馈进行调整。 Tangram系统的提出，为解决OSN垃圾邮件问题提供了一个创新且实用的方法，它利用垃圾信息自身的特性来对抗它们，提高了过滤的准确性和效率。这一研究对于提升在线社交网络的用户体验和整体安全环境具有重要意义。

资源详情

资源推荐

3858 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 24, NO. 6, DECEMBER 2016

TABLE I

ETROFITTED SAMPLE SPAM FROM A TEMPLATE-BASED CAMPAIGN

an eye-catching action, and a URL. Each component has one

or more choices of textual content. The number of unique spam

messages that this template can potentially generate, therefore,

increases quickly with the number of components.

We formally model the true spam template as a macro

sequence (m

,...,m

). We deﬁne two types of macros:

dictionary macros and noise macros. At the time of spam

generation, a dictionary macro picks the textual content from a

pre-deﬁned list of choices. It is possible for a dictionary macro

to have only one choice. In this case, the macro reduces to an

invariant substring that all generated messages will contain. In

comparison, we abstract any macro that does not convey any

semantic meaning, but purely increases the message diversity

or increases the chance of exposing the spam to more users,

as a noise macro. The concatenation of the instantiation of

macros constitutes a spam message.

We assume that a template shall contain at least one

dictionary macro, while it may or may not contain any

noise macro. However, we do not assume the existence of

any invariant substring. Written in human language, a spam

message is not restricted to any particular expression to present

a semantic meaning. We have also observed spam template

without invariant substring in our data. Template generation

work [10], [11] that relies on invariant substrings.

Paraphrase Category: Consists of spam tweets that share

the same semantic meaning but cannot be uniformly divided

into semantically equivalent segments. Meanwhile, the tweets

do not share regular wording. We denote them as “paraphrase”

spam.

No-Content Category: Does not contain any semantically

meaningful sentence. Tweets in this category contain only

one URL, followed by a long list of popular keywords and

hashtags. Obviously, the spammers rely on the keywords and

hashtags to increase the chance to expose the URLs to users

when they browse tweets by topics.

Other Spam: Consists of the remaining spam that we have

not systematically categorized.

C. Template-Based Spam Keeps Dominating

We further categorize the spam generated in January, 2012

and October, 2014 in the same way. Table II provides the

popularity of four spam categories in June/July, 2011,

January, 2012, October, 2014, and January, 2015.

Template-based spam remains to be the most popular

category in 2012, 2014 and 2015, with its percentage

increasing to 68.3%, 78.1%, and 76.4%, respectively.

The no-content category almost vanishes. Its percentage

TABLE II

HE POPULARITY OF FOUR SPAM CATEGORIES IN JUNE/JULY, 2011,

ANUARY, 2012, OCTOBER, 2014, AND

JANUARY, 2015, RESPECTIVELY

Fig. 1. Tangram framework: The template generation and matching overview.

dramatically drops to 0.3%, 0.2%, and 0.3%, respectively.

It is possible that the no-content category exhibits strong

patterns and can be easily blocked. The increasingly popular

template-based spam indicates that our detection method with

focus on spam template generation is effective to combat

modern OSN spam.

III. T

ANGRAM:TEMPLATE-BASED

SPAM DETECTION SYSTEM

In this section, we present Tangram, an accurate and fast

template-based spam detection system. We ﬁrst formulate

the notions of template, template matching and template

generation. Next, we detail the online Tangram system.

A. System Design Overview

Tangram builds template-based spam detection on top

of existing detection methods toward higher accuracy and

speed. It generates the underlying templates of spam detected

by various existing methods. It then uses the templates to

accurately, quickly match and detect spam. Figure 1 depicts the

Tangram workﬂow. It takes a stream of raw messages as input,

and classiﬁes them as either spam or legitimate online. After

the classiﬁcation, spam is ﬁltered, while legitimate messages

pass through. Two components can classify messages: the

template matching module and the auxiliary spam ﬁlter. The

template matching module, along with the template generation

technique, is our major contribution. The auxiliary spam ﬁlter,

on the other hand, supplies training spam messages. It can be

any deployed spam ﬁlter, e.g., a blacklist spam ﬁlter.

Template Matching and Template Generation: We deﬁne a

template to be a sequence of macros of two types, dictionary

and noise (Section II-B). We represent a dictionary macro as

a set of values separated by “|” and a noise macro as “.*”.

Thus, templates produced by Tangram are naturally encoded as

regular expressions, speciﬁcally concatenations of “|”clauses

and “.*”s. Template matching matches a given message against

the corresponding regular expression. A successful template

match implies the tested message instantiates the template,

and should be ﬂagged as spam. We deﬁne template generation

剩余13页未读，继续阅读

weixin_38683895

粉丝: 6
资源: 899

利用模板对抗OSN垃圾邮件：在线社交网络的防spam策略

垃圾邮件看上去并不像以前那样多样化：在OSN垃圾邮件的下面放有模板

华为OSN9600产品说明

optix osn1800说明书

osn 3500 产品文档

华为osn9600说明书

华为dwdm+optix+osn8800+t16

9800 osn 文档

这行代码什么意思： pretrained = torch.load('weights/OSN_UNet_weights.pth')

pretrained = torch.load('weights/OSN_UNet_weights.pth') model.load_state_dict(pretrained)

BCryptPasswordEncoder解密

基于大模型技术的算力产业监测服务平台设计

This_honeypot_supports_Telnet_and_SSH_two_protocol_FF-Pot.zip

吉他谱_What I've Done - Linkin Park.pdf

吉他谱_Too sweet - Hozier.pdf

Linux使用的一些笔记，包括shell命令，软件，一些实用的网站的整理_Linux_note.zip

基于ssm的机房预约系统设计与实现.docx

app执行Linux命令_app-Linux-.zip

吉他谱_Would - Alice in Chains.pdf

吉他谱_Yellow Ledbetter - Pearl Jam.pdf

最新资源