
editors only in the English Wikipedia. However, only a small mi-
nority, specically 127,000 editors are active
1
. Due to the diverse
demographics and interests of editors, to maintain the quality of
the provided information, Wikipedia has a set of editing guidelines
and policies.
One of the core policies is the Neutral Point of View (NPOV)
2
.
It requires that for controversial topics, Wikipedia editors should
proportionally represent all points of view. The core guidelines in
NPOV are to: (i) avoid stating opinions as facts, (ii) avoid stating
seriously contested assertions as facts, (iii) avoid stating facts as
opinions, (iv) prefer nonjudgemental language, and (v) indicate the
relative prominence of opposing views.
Currently, there are approximately 40,000 Wikipedia pages that
are agged with NPOV (or similar quality aws) quality issues.
These represent explicit cases
3
marked by Wikipedia editors, where
specic Wikipedia pages or statements (sentences in Wikipedia
articles) are deemed to be in violation with the NPOV policy. Re-
casens et al. [
17
] analyze these cases that go against the specic
points from the NPOV guidelines. They nd common linguistic
cues, such as the cases of framing bias, where subjective words or
phrases are used that are linked to a particular point of view (point
(iv)), and epistemological bias which focuses on the believability of
a statement, thus violating points (i) and (ii). Similarly, Martin [
11
]
shows the cases of biases which are in violation with all guideli-
nes of NPOV, an experimental study carried out on his personal
Wikipedia page
4
.
Ensuring that Wikipedia pages follow the core principles in Wi-
kipedia is a hard task. Firstly, due to the fact that editors provide and
maintain Wikipedia pages on a voluntarily basis, the editor eorts
are not always inline with the demand by the general viewership
of Wikipedia [
21
] and as such they cannot be redirected to pages
that have quality issues. Furthermore, there are documented cases,
where Wikipedia admins are responsible for policy violations and
pushing forward specic points of view on Wikipedia pages [
2
,
5
],
thus, going directly against the NPOV policy.
In this work, we address quality issues that deal with language
bias in Wikipedia statements that are in violation with the points (i)
– (iv). We classify statements as being biased or unbiased. A statement
in our case corresponds to a sentence in Wikipedia. We address one
of the main deciencies of related work [
17
], which focuses on
detecting bias words. In our work, we show that similar to [
13
],
words that introduce bias or violate NPOV are dependent on the
context in which they appear and furthermore the topic at hand.
1
https://en.wikipedia.org/wiki/Wikipedia:Wikipedians#Number_of_editors
2
https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view
3
This number may as well be much higher for cases that are not spotted by the
Wikipedia editors.
4
https://en.wikipedia.org/wiki/Brian_Martin_(social_scientist)
Track: Track: Wiki Workshop
WWW 2018, April 23-27, 2018, Lyon, France
0
在维基百科中检测有偏见的陈述
0
ChristophHube和BesnikFetahu
0
L3S研究中心,汉诺威莱布尼兹大学,汉诺威,德国
{hube,fetahu}@L3S.de
0
摘要
0
维基百科的质量是通过一系列编辑政策和指南来保证的,这些政策和
指南是为维基百科编辑者推荐的。中立的观点(NPOV)是维基百科
的主要原则之一,它确保在有争议的信息中,所有可能的观点都得到
相应的代表。此外,维基百科中使用的语言应该是中立的,不带有观
点。然而,由于维基百科文章的数量庞大,以及其基于维基百科编辑
者自愿原则的运作方式,质量保证和维基百科指南并不总是能够得到
执行。目前,有超过40,000篇文章被标记为NPOV或类似质量问题。
此外,这些文章仅代表维基百科编辑者明确标记的具有此类质量问题
的文章的一部分,然而,考虑到只有很小一部分文章被维基百科分类
为高质量或特色文章,实际数量可能更高。在这项工作中,我们关注
维基百科中句子级别的语言偏见。语言偏见是一个难题,因为它代表
了一个主观的任务,通常只能通过其上下文来确定语言线索。我们提
出了一种监督分类方法,该方法依赖于自动创建的偏见词汇表以及偏
见陈述的其他句法和语义特征。我们在一个包含有偏见和无偏见陈述
的数据集上对我们的方法进行了实验评估,并表明我们能够以74%的
准确率检测出有偏见的陈述。此外,我们还表明,确定偏见词汇的竞
争方法不适合检测有偏见的陈述,我们的方法相对改进超过20%。
0
关键词
0
语言偏见;维基百科质量;NPOV
0
ACM参考格式:ChristophHube和Besnik
Fetahu。2018年。在维基百科中检测有偏见的陈述。在WWW'18
Companion:2018年网络会议伴侣,2018年4月23日至27日,法国里昂。AC
M,美国纽约,8页。https://doi.org/10.1145/3184558.3191640
0
1引言
00
本文发表在知识共享署名4.0国际许可证(CCBY
4.0)下。作者保留在其个人和公司网站上传播作品的权利,并附上适当的归属。WWW'18
Companion,2018年4月23日至27日,法国里昂,©2018
IW3C2(国际万维网会议委员会),根据知识共享CCBY4.0许可证发布。ACMISBN
978-1-4503-5640-4/18/04。https://doi.org/10.1145/3184558.3191640