跨社交网络用户画像匹配：基于人脸识别的实证研究

需积分: 9 71 浏览量更新于2024-09-05 收藏 290KB PDF 举报

"《基于人脸识别的社交媒体用户身份匹配：一项实验研究》\n\n这篇论文，标题为“arXiv:1905.06081v1[cs.CV]15May2019”，探讨了在当前社交网络广泛应用的背景下，如何通过整合不同平台的数据来更好地理解用户行为，特别是在推荐系统、风险评估和社会学研究等领域。作者Timur Sokhin、Nikolay Butakov和Denis Nasonov来自俄罗斯ITMO University，他们提出了一个新颖的方法，即基于公开用户面部照片进行跨社交媒体用户资料匹配。\n\n该研究的核心关注点在于，尽管不同社交媒体平台在内容类型、交流方式和语言风格上有所差异，但为了更深入地理解人类行为，有必要结合这些平台的数据。传统做法通常是独立分析每个网络，但这种方法可能无法捕捉到个体在多维度上的行为特征。因此，作者开发了一种稳定且能够适应内容和风格变化的用户画像匹配策略，主要依赖于计算机视觉技术，如面部检测（facedetection）、面部嵌入（face embedding）以及数据聚类（clustering）。\n\n论文的关键词揭示了研究的重点领域，包括面部识别技术在用户身份验证中的应用、用户资料（proﬁles）的匹配、社交媒体平台间的数据整合，以及计算机视觉在这一过程中的关键作用。通过实验研究，作者旨在评估这种新方法的有效性，从而为相关领域的实践者提供了一种可能的解决方案，使得在处理多平台用户数据时能更准确地反映用户的真实行为模式。\n\n这篇论文对于理解如何利用现代技术手段，特别是人脸识别技术，来增强跨社交媒体用户数据的整合和分析具有重要的理论价值和实践意义，为个性化推荐、风险评估和社交研究等领域提供了新的思考角度和工具。"

User proﬁles matching in social networks 3

Data Collecting. Our approach consists of se veral stages. At ﬁrst, we must data

from two social media using a crawling framework (proﬁles, photos from albums

and posts) [1]. For the purposes of validation of our results, we collect a set of

proﬁles from VKontakte, which have an explicit link to their seconda ry proﬁle

in I nstagram - the only possible way to build the labelled dataset.

Face Detection and Embedding. We proces s photos using two algorithms:

1. face detection - we apply MTCNN - Multi-task Cascaded Convolutional

Networks [11], w hich achieved eﬃcienc y superior to the closest comp e tito rs

and is not aﬀected by scaling of the faces;

2. face embedding - to construct embeddings of extracted faces FaceNet neural

network is applied [7].

We apply MTCNN pre-trained on the WIDER FACE dataset and FaceNet pr e-

trained on the VGGFace2

. Then this data is ﬁltered.

Filtering. The extracted face embeddings are further ﬁltered by their parameters

according to several heuristics:

1. ﬁltering by number of pixels (hereinafter, we will use the term quality of the

image);

2. ﬁltering by anchors (child faces removing).

FaceNet has limitations on the minimum re quired quality of images a nd

we ﬁlter images of faces by the numbe r of pixels of these faces . The accurate

control of the above parameters allows to achieve an improved precision and

recall o f matching, this is partly due to the behaviour of the s e lected method for

embedding construction. In the experimental study in Sect. 4 we found an eﬀect

of the quality of facial images on the ﬁnal matching eﬃcie nc y - it improves the

F1-score by 4%.

The other heuristics probably can be related to the dataset limitation o f

VGGFace2 with which FaceNet was trained. VGGFace2 contains young and

mature faces of people but does not contain the faces of babies and small children.

This leads to a pro ble m that embeddings of child’s faces have a very small margin

betwee n each other. That is why we should remove their faces from the user’s

collection of photos to avoid mismatching of proﬁles. Figure 1 reveals that the

distribution of distances between embeddings of children’s faces has a bia s fro m

the distribution of distances between embeddings of random people’s faces.

Additional ﬁltering of data is accomplished using so-called anchors. An an-

chor is a vector that represents some space of embedded faces. In our s tudy, we

use the anchor to represent the faces of children. We create it by following way. A

set of children faces was collected semi-automatically: we ﬁnd kindergarten and

photogra phers accounts using tags and speciﬁc usernames. For instance, tags

under the photos with words ”children”, ”kindergarten”, etc. Then we build an

anchor - element-wise mean of a ll vectors of children’s faces. All face embeddings

which are clo se to this anchor are removed from the dataset.

Code repository used - https://github.com/davidsandberg/facenet

剩余12页未读，继续阅读

SimpleUmbrella

粉丝: 0
资源: 35

跨社交网络用户画像匹配：基于人脸识别的实证研究

最新资源