匿名消息应用程序中的网络欺凌类型

120 浏览量更新于2023-10-16 收藏 12.69MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

10010了解匿名消息应用程序中的网络欺凌类型0Arpita Chakraborty，Yue Zhang和Arti RameshSUNY Binghamton{achakra4，yzhan202，artir}@binghamton.edu0摘要0匿名性的可能性和缺乏有效的识别不当消息的方法导致了大量的在线互动数据，试图骚扰、欺凌或冒犯接收者。在这项工作中，我们对使用一种流行的网络/智能手机应用程序Sarahah进行的消息交换进行了初步的语言学研究，该应用程序允许朋友们匿名交换消息。由于通过Sarahah交换的消息是私密的，因此我们在接收者在Twitter上分享消息时收集它们。然后，我们对通过该应用程序交换的不同类型的消息进行了分析。我们的语言学分析显示，大约20%的这些消息包含不当、伤人或亵渎性语言，旨在尴尬、冒犯或欺凌接收者。我们的分析有助于了解匿名消息交换平台的使用方式以及此类交换中存在的不同类型的欺凌。0关键词0网络欺凌类型，主题模型，社交媒体分析，匿名消息交换0ACM参考格式：Arpita Chakraborty，Yue Zhang和Arti Ramesh SUNYBinghamton{achakra4，yzhan202，artir}@binghamton.edu。2018。了解匿名消息应用程序中的网络欺凌类型。在WWW '18Companion：2018年Web会议伴侣，2018年4月23日至27日，法国里昂。ACM，纽约，美国，5页。https://doi.org/10.1145/3184558.319153001 引言0近年来，网络互动呈现出惊人的增长。在线互动数据的重要部分以文本形式存在，例如社交网络上的帖子、消息和评论。随着在线互动的普及，对这些内容性质的关注也相应增加。在线骚扰、网络欺凌、网络威胁、跟踪和仇恨等令人不安和负面现象的文本互动日益增多[2]。这种有害的在线行为对经历这种行为的个体会产生严重的心理问题[25]。此外，在线数据往往会被数字化保存很长时间，加剧了其影响。0本文发表在知识共享署名4.0国际许可证（CC BY4.0）下。作者保留在其个人和公司网站上传播作品的权利，并附上适当的归属。WWW '18Companion，2018年4月23日至27日，法国里昂© 2018IW3C2（国际万维网会议委员会），根据知识共享CC BY 4.0许可证发布。ACM ISBN978-1-4503-5640-4/18/04。https://doi.org/10.1145/3184558.31915300匿名性已被证明是网络骚扰和欺凌的一个促成因素。之前关于Ask.fm和Yik-Yak的研究表明，匿名性的可能性显著推动了网络欺凌信息的数量[5，12]。在这项工作中，我们专注于一种匿名消息交换应用程序，该应用程序在2017年7月至9月期间在AppStore上的下载排行榜上名列前茅：Sarahah。Sarahah移动/网络应用程序可以添加到其他社交网络，如Twitter和Facebook，从而允许用户向网络中的朋友发送匿名消息。尽管该应用最初是作为交换匿名消息的平台设计的，但它很快就变成了仇恨的温床[3]。大多数关于网络欺凌的先前工作都是在Ask.fm和Youtube等环境中进行的，在这些环境中，交换欺凌或仇恨评论的人不一定需要在个人层面上相互了解。我们的分析特别突出了在“朋友”之间交换的消息中负面内容的数量，使其比其他欺凌事件更加个人化。在这项工作中，我们对Twitter社交网络上通过Sarahah应用程序交换的消息类型进行了初步研究。由于我们无法直接通过Sarahah访问这些数据，所以我们通过Twitter收集这些数据，当接收者在其Twitter动态中分享Sarahah消息时，有时还会附带对消息的简短回应。我们的数据集包含2017年8月至10月之间交换的消息，这是该应用程序的流行期。虽然在这种情况下发送者是未知的，但接收者是已知的，因为他/她在Twitter上分享了这条消息。由于通过Sarahah平台交换的消息只有一部分有可能被接收者在Twitter上分享，所以这些数据可能无法代表通过Sarahah交换的所有消息，并且由于使用Twitter搜索API进行收集，这些数据可能存在偏差[18]。尽管存在这些可能的偏差，但这些数据的独特特征（匿名发送者，在社交网络上的朋友之间交换，以及接收者的回应）使其成为了了解在线互动中存在的不同种类欺凌的重要信息来源。我们利用主题建模（也称为潜在狄利克雷分配（LDA））对此数据中存在的不同类型的对话和欺凌主题类别进行语言分析。我们首先通过每个主题中的顶级词语识别与欺凌相关的LDA主题。通过过滤与欺凌相关的LDA主题中的Sarahah消息，我们观察到这些消息中有20%属于欺凌主题类别。我们还观察到，大多数欺凌事件发生在匿名发送者分享关于他们对接收者的真实感受的特定观点/坦白或者提问尴尬的个人问题时。我们的分析为了解匿名消息交换中存在的不同类型的对话和欺凌主题类别铺平了道路。0主题：第三届网络安全、在线骚扰和错误信息国际研讨会WWW 2018，2018年4月23日至27日，法国里昂TagalogSomaliIndonesianSwahiliFrenchSwedishPortugueseEstonianItalianWelshTurkishFinnishGermanCatalanSpanishAfrikaans100202 相关工作0在最近几年中，检测和理解社交媒体上的欺凌引起了相当大的兴趣。Corcoran等人[10]和Hosseinmardi等人[13]关注的是更广泛的网络攻击问题，而不仅仅是网络欺凌。Hosseinmardi等人[13]通过Instagram上至少有一个亵渎性词语的媒体会话的评论来识别网络攻击问题。Raisi等人[23，24]提出了一种参与者-词汇一致性模型，用于识别社交网络中的欺凌发起者和受害者，并同时使用社交互动语料库和种子欺凌指标词典构建欺凌词汇表。他们在Twitter和Ask.fm的数据上评估了该模型，并展示了该方法可以检测到新的欺凌词汇以及受害者和欺凌者。一些工作结合了社交互动特征和文本特征来检测网络欺凌[12，14，19]。Bigelow等人[8]使用潜在语义索引来检测网络欺凌。此外，还有先前的工作针对特定群体，包括种族、起源、宗教、性别、性取向和外貌等，检测针对特定群体的辱骂和仇恨言论[11，28]。Dinakar等人[11]表明，对于Youtube评论数据集，个体分类器比多类分类器更好地解决了这个问题。Li等人[15]分析了Instagram和Ask.fm网络上单词的负面和正面意义。Margono等人[17]分析了印度尼西亚Twitter上的欺凌模式。Whittaker等人[29]研究了大学生中网络欺凌的普遍性。Bellmore等人[6]和Tokunaga等人[27]研究了社交媒体数据中的社会心理问题。还有一些先前的工作并不特别关注网络欺凌或辱骂语言，而是关注类似的问题。Nguyen等人[20]研究了在线社交网络中识别受到误导的受害者集合中k个最可疑用户的k-可疑者问题。Mahendiran等人提出了一种新颖的无监督学习算法，该算法使用概率软逻辑构建动态词汇表，以了解社交群体内的真实成员资格，并捕捉与选举活动相关的动态趋势和预测[16]。Bifet等人[7]提出了一种用于挖掘意见和分析不断变化的Twitter数据流中情感的滑动窗口Kappa统计量。03 数据0在这项工作中，我们专注于最近一个匿名移动应用程序Sarahah的数据。它于2017年6月13日进入美国苹果商店，并逐渐传播到加拿大、印度和其他几个国家。该应用程序在2017年7月5日Snapchat推出了一个允许人们分享Sarahah消息的更新后，其受欢迎程度急剧上升。逐渐地，它成为AppStore中评分最高的应用程序，超过了Twitter、Facebook和Snapchat等流行社交媒体应用程序。尽管这个应用程序最初是用来交换匿名消息的，但随着用户开始发布威胁、伤害、亵渎和色情消息，它很快成为仇恨和网络欺凌的温床和平台。0在本文中，我们展示了从第30个开始收集的数据的分析0从2017年8月到2017年10月15日，这段时间恰好是Sarahah最受欢迎的时期。我们通过使用Twitter搜索API[1]搜索带有#Sarahah标签的图像来收集通过Sarahah交换的消息。由于Twitter只允许您提取最近一周发布的推文，我们在指定的时间段内每隔一周收集一次。我们还通过爬取特定用户的Twitter账户来提取Sarahah消息。图1(a)显示了通过Sarahah交换的一条消息的示例。由于使用Sarahah交换的消息是以图像形式存在的，我们使用谷歌的光学字符识别软件从图像中提取文本。提取的数据包括三个组成部分：i）使用Sarahah交换的文本消息，ii）用户在将此消息分享到Twitter时对该消息的反应，以及iii）从用户资料中提取的其他与用户相关的信息。0（a）示例Sarahah消息及相应用户反应00 2000 4000 6000 8000 词频0（b）我们Sarahah数据集中不同语言的分布0图1：显示Sara-hah的示例消息（左）和我们数据中不同语言的统计数据（右）的图示。0消息反应0我们怎么s � xchat -不能等着和你做爱。亲爱的，这不是免费的。0我们能见面并随机s � x吗？只有当你是地球上最后一个男人时。0你为什么这么丑？因为你妈妈让我这样。0表1：通过Sarahah交换的示例消息及相应用户反应0由于Sarahah消息通常来自朋友，它们往往也是其他语言而不是英语。在我们提取的数据集中，我们发现了德语、法语、南非荷兰语、斯瓦希里语、国际语、芬兰语、索马里语、捷克语、塔加洛语、英语、罗马尼亚语、摩尔多瓦语、克罗地亚语、西班牙语、挪威语、拉脱维亚语、威尔士语、葡萄牙语、加泰罗尼亚语、丹麦语、瑞典语、爱沙尼亚语、荷兰语、斯洛文尼亚语、意大利语、阿尔巴尼亚语、匈牙利语、波兰语、斯洛伐克语、土耳其语、立陶宛语、越南语、印地语和泰米尔语等多种语言。英语仍然是我们数据集中最流行的语言。我们在图1(b)中报告了除英语以外的前50%语言的消息分布。我们使用谷歌的语言检测库langdetect来检测语言并将消息和用户的回应转换为英语进行分析。由于我们采用词袋模型的方法，翻译错误对我们的分析影响很小，因为它们主要发生在句子结构上。总共，我们收集了82,193条Sarahah消息和相应的用户反应。在我们的数据集中去除重复和空消息后，我们有76,278条消息和相应的用户回应。我们对消息和回应进行了标准的自然语言处理预处理技术，包括停用词去除、分词和使用波特词干提取器进行词干化。表1和图1(a)给出了使用Sarahah交换的一些消息和相应用户反应的示例。第一条消息是一条性虐待的消息，第二条是一条包含一些性冒犯词语的随意消息，第三条消息表达了对收件人的仇恨。这些消息对应的用户反应展示了人们对冒犯性消息的不同回应方式。从第一和第二个回应中，我们无法确定消息对收件人产生了多大的影响。而接收到第三条欺凌消息的用户通过使用诸如b �tch这样的冒犯性词语来回应，试图伤害发送者。我们在图1(a)中观察到了类似的行为，回应中包含hurt和rude等词语，表明用户受到了这条消息的影响。收件人还在回应中使用了ugly等词语，试图伤害发送者。这些对Sarahah消息的用户回应是我们数据集中非常独特的一个方面。0论文主题：第三届网络安全、在线骚扰和虚假信息国际研讨会 WWW 2018，2018年4月23日至27日，法国里昂to detect the language and convert the message and users’ responseto English for our analysis [21]. Since we follow a bag-of-wordsapproach, translation errors have little effect on our analysis asthey tend to occur mostly in the sentence construction.In all, we collect 82, 193 Sarahah messages and correspondinguser reactions. Removing duplicates and empty messages, we have76, 278 messages and corresponding user responses in our dataset.We perform standard NLP preprocessing techniques of stop-wordremoval, tokenization and stemming using porter stemmer on themessages and responses.Table 1 and Figure 1(a) give some examples of messages ex-changed using Sarahah and corresponding user reactions. Thefirst message is a sexually abusive message, the second is a ca-sual message containing some sexually offensive words, and thethird message expresses hate towards the recipient. The user reac-tions corresponding to these messages give some of the differentways in which people respond to offensive messages. From the firstand second responses, it is unclear how much the message affectedthe recipient. While the user receiving the third bullying messageresponds to it by using offensive words, such as b∗tch, in an attemptto hurt the sender. We observe a similar behavior in Figure 1(a),where the response contains words such as hurt and rude indicatingthat the user is affected by this message. The recipient also useswords such as ugly in the response in an attempt to hurt the sender.These user responses to Sarahah messages are a very unique aspectof our dataset.Opinions on world issues. There are also opinions on world issues,especially sensitive ones, which users may not be comfortable shar-ing in a non-anonymous setting. Some examples of world issues wesee in our data are related to: i) President Trump, ii) India-Pakistanpartition, and iii) North Korean politics.Inspirational. We also find some inspirational messages whichmay/may not be targeted towards the recipient.We identify that bullying messages primarily occur in the con-fession and questions messages.4.3Bullying CategoriesUsing our LDA topics, we identify the different prominent bullyingcategories present in this data. The sensitive nature of this datarestricts creating and sharing labeled data, making unsupervisedmethods lucrative for this problem. In order to overcome the needfor message-level labels, we label the LDA topics based on thepresence of bullying words in them. We find words related to thefollowing bullying categories in the LDA topics.Sexual. Messages in this category contain explicit sexually of-fensive words that are intended to harass, intimidate, or make therecipient uncomfortable. There are both confessions, which conveythe sender’s feeling towards the recipient and personal questionsthat are intended to make the recipient uncomfortable.Hate. Messages in this category are intended to convey hatredand emotionally unsettle the recipient. Messages that convey hatred,death threats, and emotional/physical abuse belong in this bullyingcategory.Inappropriate flirting. The messages in this category are intendedto convey a romantic interest toward the recipient. While this maynot be considered bullying under normal circumstances, since theTrack: The Third International Workshop on Cybersafety, Online Harassment, and Misinformation WWW 2018, April 23-27, 2018, Lyon, France100304 Sarahah消息的语言分析0我们首先对使用Sarahah交换的消息进行语言分析。我们的分析为了理解这些消息的性质，并识别不同的欺凌类别。04.1使用潜在狄利克雷分配（LDA）进行主题分析0主题建模，也称为潜在狄利克雷分配（LDA），是分析文档语料库的一种流行方法[9]。我们首先使用LDA来了解我们的数据中不同的欺凌相关词语的存在。我们将每条消息视为一个文档，并运行10,000次迭代的LDA。我们使用α=0.01和β=0.01的标准超参数值以及30个主题。使用LDA主题，我们确定了我们的数据中存在的不同类型的对话以及更有可能包含欺凌消息的类型。然后，我们从这30个主题的主题-词分布中选择前50个词。我们从这些词中识别出欺凌词，并根据欺凌的类型将它们分组。04.2 使用Sarahah的对话类型0通过分析LDA主题，我们确定了用户使用Sarahah的不同方式。我们将不同类型的消息映射到标准的对话类型[22]。表2给出了我们在每个类别中的不同对话类型和子类别，以及每个类别中的一些示例消息。我们确定了Sarahah上的四种对话类型；我们在下面简要描述它们：0表白。这类消息旨在匿名传达发送者在非匿名情况下不愿透露的感受。这类消息的范围从秘密的钦佩、调情和爱的提议到仇恨/性冒犯的消息。我们将这类消息分为三个子类别：i）积极的，ii）浪漫的，和iii）消极的消息。积极的表白提到对一个人的积极感受，而没有明确提到浪漫的感受。第二个子类别捕捉到通过诸如迷恋和爱这样的词语表达浪漫兴趣的表白。第三个子类别捕捉到对收件人的恶意或仇恨的感受，这些感受通过诸如打击、恶心、令人厌恶和浪费等词语来表示。我们认为消极和浪漫的消息可能会给收件人带来不适。0问题。在这个对话类别中，我们发现针对收件人的问题。这个类别包含一般（即非侵入性）问题以及侵入性和冒犯性的个人问题。请注意，在表2中的个人问题示例中，我们既发现了侵入性的个人问题，比如“你的电话号码是多少？”也发现了冒犯性的问题，比如“你喜欢a�ls�x吗？”。Conversation TypesSub-categoriesExample MessagesConfessionPositiveI always had a crush on you, the way you smile the way how your beautiful eyes are.You’re actually so pretty, people will always hate no matter what only listen to the opinions frompeople who matter most in your life.RomanticHi crush. I love you.And I love u even more when you tweet my messages.Love you more than anythingNegativeI want to punch you. In your face.You are such a waste of oxygen.I think you’re vile, disgusting, and deserve nothing but grief in your life. Pathetic waste.QuestionsGeneralWhat inspired you to photography?Food or volleyball?PersonalDo you have a girlfriend?Can I have Your Number?How many times you literally got ∗ss f∗cked?Opinions on World IssuesNorth Koreas trade China so Trump can threaten ChinaPakistani think Partition bad idea Jinnah Nehru didn’t lose anything and in fact became legends inrespective countries. We are the victims. The decision was made without our consent.Pakistan has started to fence its Afghan border and there is a deep profound message in it for TrumpInspirationalLife is one big road with lots of signs. So when you ricling through the ruts, don’t complicate yourmind. Flee from hate, mischief and jealousy. Don’t bury your thoughts, put your vision to reality. WakeUp and Live!Table 2: Different conversation types found in Sarahah messages and example messages in each conversation type.Coarse-grained Bullying CategoriesWordsSexuals∗xy, ∗ss, b∗tch, gay, hot, f∗ck, b∗∗bs, d∗ck, s∗ck, seductive, b∗∗ty, virgin, lesbian, bl∗wj∗b, straight, homosexual,b∗tt, h∗rny, h∗es, trans, lick, bite, bed, naked, wh∗reHatepunch, shoot, kick, fat, bullsh∗t, beast, threat, fight, death, rude, ruin, sh∗t, slap, ugly, abuse, betray, harm, size,ego, loathe, sad, cheat, trash, pain, tear, cry, emotion, breakup, trap, annoy, heartless, loserInappropriate Flirtingcrush, dreams, appeal, stalk, babe, crave, love, proposal, hit, cheek, sweetie, baby, candy, babe, look, pie, cutie,hug, chick, romance, desires, pleasure, bombAdmirationbeautiful, amazing, smile, awesome, kind, pretty, heart, great, gorgeous, nice, handsome, sweet, funny, hilarious,smart, strong, laugh, adorable, appreciate, proud, laugh, good, like, decent, positive, inspiration, perfect, blessings,genuine, courageous, brighten, honest, respectTable 3: Coarse-grained topic categories in Sarahah and representative words in each categorysender is anonymous, this can potentially cause significant dis-tress to the recipient. We see messages that imply stalking, di-vulging/asking for personal secrets, and usage of words implyingromantic interests, which especially under circumstances when thesender is anonymous can unsettle the recipient. This makes it animportant bullying category to study.We present the different bullying categories and top words ineach category in Table 3. If we filter the messages using the topwords in the bullying categories 1−3 in Table 3, we find that around20% of the messages contain one or more of these words, which isconsiderably high given that the messages are exchanged betweenusers who are friends on the social network. In addition to the threebullying categories, we also find messages that convey admiration(category 4 in Table 3). While the messages in this category aremostly positive ones, some messages do have a touch of flirting inthem. The subtle bullying words that could possibly be present inmessages in this category make it an important category to study.Table 4 gives some example messages in each bullying category.Bullying words are highlighted in italics. Notice that the first twocategories have profane/offensive/hurtful words making it easier toautomatically flag them. However, messages in the flirting categoryhave words such as kiss, hug, which in an anonymous setting couldpotentially cause discomfort to the recipient but is harder to de-tect automatically using vocabulary-based approaches. Also, thesemessages combine positive words such as love, with possibly con-cerning words such as kiss, hug, and shape, making it necessary toTrack: The Third International Workshop on Cybersafety, Online Harassment, and Misinformation WWW 2018, April 23-27, 2018, Lyon, France1004understand the semantics of these messages and their correspond-ing user responses to accurately determine whether they couldpotentially unsettle the recipient.Bullying CategoryExample postSexualDo you wanna s∗x with me?Is small d∗ck a turn off?Leak your d∗ck pics?HateI hate you please leave the twitter.Go cut yourself Iol. Why you an ugly dumb sl∗t?Why are you so pretty b∗tch? It just makes mehate my genes.FlirtingI’d love to steal a kiss from you one day.I wish I could hug and kiss you all day long.In love with the shape of you.Table 4: Bullying topic categories and some example mes-sages in each category.5DISCUSSION AND FUTURE WORKIn this work, we introduced data collected from a recently releasedanonymous web/mobile messenger application, Sarahah. We per-formed a preliminary linguistic analysis of messages and foundthat cyberbullying can occur in different forms, with or withoutthe presence of profane words, calling for a fine-grained analysis.Our analysis is helpful in identifying the different ways in whichanonymous applications are used and understanding the differenttypes of bullying present in anonymous exchanges between peoplewho already know each other. There are several exciting directionsto go from here. The unique aspect of the user responses whenrecipients share these messages opens up possibilities for studyingthe varying levels of discomfort caused by bullying messages. Ourdata and subsequent analysis could potentially help in answeringmany important questions related to cyberbullying. The first andforemost question is what kinds of messages cause the most dis-tress to recipients. Understanding the effect that bullying messageshas on the

下载后可阅读完整内容，剩余1页未读，立即下载